# ARDRegression

In [1]:
# Importing the necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.linear_model import ARDRegression, BayesianRidge

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


* **ARDRegression**, or **Automatic Relevance Determination Regression**, is a regression algorithm that belongs to the family of Bayesian regression models. It is based on the idea of automatic relevance determination, which means that the algorithm automatically determines the relevance of each feature (input variable) during the learning process.

* ARDRegression is particularly useful when dealing with datasets that have a large number of features. It helps in identifying and giving more importance to the relevant features while assigning less importance to irrelevant or redundant ones. This can lead to better generalization and more interpretable models.

Here are some key points about ARDRegression:

**1.Bayesian Approach:** ARDRegression is a Bayesian regression model, which means it uses Bayesian principles for regression. It places a probabilistic prior on the regression coefficients and estimates the posterior distribution based on the observed data.

**2. Relevance Determination:** The algorithm automatically determines the relevance of each feature by assigning a relevance weight to it. Features with higher relevance weights are considered more important for predicting the target variable.

**3. Sparse Models:** ARDRegression tends to produce sparse models, meaning it often assigns zero weights to less relevant features. This can be beneficial for feature selection and model interpretability.

**4. Regularization:** The model includes a regularization term that helps prevent overfitting and improves the generalization performance of the model.

**5. Parameter Estimation:** ARDRegression estimates both the regression coefficients and the precision (inverse of variance) of the noise in the data.

In [2]:
# Loading the dataset
df = pd.read_csv("C:\\Users\\User\\Desktop\\Drive D\\New folder\\ML\\Completed\\AirBnb.csv")
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2020-12-10,146.0,165.0,141.25,144.710007,144.710007,70447500
1,2020-12-11,146.550003,151.5,135.100006,139.25,139.25,26980800
2,2020-12-14,135.0,135.300003,125.160004,130.0,130.0,16966100
3,2020-12-15,126.690002,127.599998,121.5,124.800003,124.800003,10914400
4,2020-12-16,125.830002,142.0,124.910004,137.990005,137.990005,20409600


In [3]:
df.set_index("Date",inplace=True)
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-12-10,146.000000,165.000000,141.250000,144.710007,144.710007,70447500
2020-12-11,146.550003,151.500000,135.100006,139.250000,139.250000,26980800
2020-12-14,135.000000,135.300003,125.160004,130.000000,130.000000,16966100
2020-12-15,126.690002,127.599998,121.500000,124.800003,124.800003,10914400
2020-12-16,125.830002,142.000000,124.910004,137.990005,137.990005,20409600
...,...,...,...,...,...,...
2024-01-08,137.309998,140.250000,136.610001,140.080002,140.080002,4179700
2024-01-09,138.520004,139.539993,137.789993,139.529999,139.529999,3560900
2024-01-10,139.199997,140.824997,138.699997,139.759995,139.759995,2492700
2024-01-11,140.710007,141.199997,137.550003,139.449997,139.449997,2383500


In [4]:
x = df.drop(columns=['Adj Close','Volume'])
x

Unnamed: 0_level_0,Open,High,Low,Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-12-10,146.000000,165.000000,141.250000,144.710007
2020-12-11,146.550003,151.500000,135.100006,139.250000
2020-12-14,135.000000,135.300003,125.160004,130.000000
2020-12-15,126.690002,127.599998,121.500000,124.800003
2020-12-16,125.830002,142.000000,124.910004,137.990005
...,...,...,...,...
2024-01-08,137.309998,140.250000,136.610001,140.080002
2024-01-09,138.520004,139.539993,137.789993,139.529999
2024-01-10,139.199997,140.824997,138.699997,139.759995
2024-01-11,140.710007,141.199997,137.550003,139.449997


In [5]:
y = df['Adj Close']
y

Date
2020-12-10    144.710007
2020-12-11    139.250000
2020-12-14    130.000000
2020-12-15    124.800003
2020-12-16    137.990005
                 ...    
2024-01-08    140.080002
2024-01-09    139.529999
2024-01-10    139.759995
2024-01-11    139.449997
2024-01-12    137.139999
Name: Adj Close, Length: 777, dtype: float64

In [6]:
#Splitting the dataset into training and testing
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)

In [7]:
ard = ARDRegression()

In [8]:
ard.fit(x_train,y_train)

In [9]:
pred = ard.predict(x_test)

In [10]:
r2_score(y_test,pred)

1.0

In [11]:
mean_squared_error(y_test,pred)

3.995212349855322e-26

# Bayesian Ridge

* **Bayesian Ridge Regression** is a regression algorithm that incorporates Bayesian principles to estimate the parameters of a linear regression model. It is a Bayesian approach to linear regression that provides a probabilistic framework for estimating the regression coefficients and their uncertainties.

Here are some key points about Bayesian Ridge Regression:

**1. Probabilistic Framework:** Bayesian Ridge Regression treats the regression coefficients as random variables with prior distributions. It estimates the posterior distribution of the coefficients given the observed data using Bayes' theorem

**2. Regularization:** The algorithm incorporates regularization to prevent overfitting and improve the generalization performance of the model. It achieves regularization by introducing a penalty term in the objective function, which penalizes large coefficients.

**3. Flexibility:** Bayesian Ridge Regression allows for the incorporation of prior knowledge or beliefs about the regression coefficients through the choice of prior distributions. This flexibility makes it suitable for situations where prior information is available or desired.

**4. Handling Collinearity:** Bayesian Ridge Regression can handle multicollinearity (high correlation between predictor variables) effectively. It automatically adapts the coefficients to deal with collinear predictors by shrinking them towards each other.

**5. Estimation of Uncertainty:** Unlike traditional linear regression, Bayesian Ridge Regression provides estimates of uncertainty for the regression coefficients. This uncertainty quantification can be valuable for decision-making and understanding the reliability of predictions.

**6. Robustness:** Bayesian Ridge Regression is robust to outliers in the data. The probabilistic nature of the model allows it to downweight the influence of outliers on the estimation of regression coefficients.

* Overall, Bayesian Ridge Regression offers a principled and flexible approach to linear regression that can lead to more reliable and interpretable models, especially in situations where there is uncertainty or prior knowledge about the relationships in the data.






In [12]:
br = BayesianRidge()

In [13]:
br.fit(x_train,y_train)

In [14]:
br_pred = br.predict(x_test)

In [15]:
r2_score(y_test,br_pred)

1.0

In [16]:
mean_squared_error(y_test,br_pred)

9.402896413179834e-23