# Bayesian Linear Regression

We have two seperate ideas when it comes to predicting an outcome. The Frequestist model believes in finding the output by using the most frequest markers. Bayesian models, on the other hand, predicts the outcome by based on prior knowledge and likelihood both.

## How do classic Linear Regression models work?

In classic linear regression, we try to fit a line passing through our data such that it satisfies a loss criteria
- the most used criteria is ordinary least sqaures, which uses the sqaure of difference between predicted and original values to optimize the fit of our curve
- this approach is particularly helpful when we have a large set of data point covering a high distribution of inputs.
- But, this method tends to give all priority to existing data. So if the data we have is not a correct indication of the data we need to predict, then such models fail because of overfitting.

Assuming we are using this for a critical application, the result is not the only thing we seek. We also need the confidence the model has on a particular result.

In such kind of applications we tend to go with Bayesian interpretation of linear modelling.

- In Bayesian Regression we tend to predict a range of outcomes instead of just one.
- To achieve such kind of results we use normal distributions
- Thus the response y is not estimated based on a single value instead is drawn from a distribution

![bayesian_regression.png](attachment:bayesian_regression.png)

Here, P(β|y, X) is the posterior probability distribution of the model parameters given the inputs and outputs. This is equal to the likelihood of the data, P(y|β, X), multiplied by the prior probability of the parameters and divided by a normalization constant

![bayesian_regression_2.png](attachment:bayesian_regression_2.png)

In the above equations, the prior term helps us add the domain knowledge based prediction as well into our model which can be particularly useful in fields like medicine.

Here the Posterior term can be described as below:

 The result of performing Bayesian Linear Regression is a distribution of possible model parameters based on the data and the prior. This allows us to quantify our uncertainty about the model: if we have fewer data points, the posterior distribution will be more spread out.

## Sources:
https://towardsdatascience.com/introduction-to-bayesian-linear-regression-e66e60791ea7

## Example

In [1]:
# below is the implementation of BayesianRidge. Which uses the L2 norm as an estimator in the bayesian regression

from sklearn import linear_model

In [2]:
X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]]
Y = [0., 1., 2., 3.]

In [3]:
reg = linear_model.BayesianRidge()
reg.fit(X, Y)

BayesianRidge()

In [4]:
reg.predict([[1, 0.]])

array([0.50000013])

In [5]:
print(f'coefficients = {reg.coef_}')

coefficients = [0.49999993 0.49999993]
