# Bayesian Linear Regression

As the Maximum Likelihood Estimation (MLE) suffers from over-fitting, we mooved to the Maximum A Posteriori (MAP) which takes into account a prior on the parameter $\theta$ and return the one that maximises the posterior.

A further approach is to derive the probabilistic distribution of the posterior and sample directly from it.

\begin{align*}
p(\theta|y) = \frac{p(y,\theta)}{p(y)}
\end{align*}

As shown above, this requires to compute the marginal likelihood, which can be expressed:

\begin{align*}
p(y) = \int p(y,\theta)d\theta
\end{align*}

If we choose a conjugate parameter prior $p(\theta)$, we can compute the marginal likelihood in closed form. It is generally intractable though, so numerical approches (Monte Carlo) may be required.

In this example, we consider the following linear regression problem, where $\Phi$ defines the features, and $\epsilon$ is a Gaussian $\mathcal{N}(0,\beta I)$:

\begin{align*}
y = \Phi^T(x)\theta + \epsilon
\end{align*}

In addition, we choose the prior on $\theta$ to be a Gaussian $\mathcal{N}(0,\alpha I)$

The joint distribution can now easily be expressed:

\begin{align*}
p(y, \theta) = \mathcal{N}(y|\Phi^T \theta, \beta I) \mathcal{N}(\theta|0, \alpha I)
\end{align*}

The idea is to find the best $\alpha$ and $\beta$ to maximise the marginal likelihood. A commum way is to derive the expression of the negative log marginal likelihood and try to minimise it by a gradient method descent.

As said above, since we have choosen a conjugate prior, a closed form solution for the marginal likelihood can be found. Using the fact that the product of two Gaussians is an unormalised Gaussian, and in calculating both the mean and the variance of the obtained Gaussian, it can be shown that:

\begin{align*}
p(y) & = \int \mathcal{N}(y|\Phi^T \theta, \beta I) \mathcal{N}(\theta|0, \alpha I)\\
& =\frac{1}{(2\pi)^{\frac{N}{2}}|\alpha \Phi \Phi^T + \beta I|^{\frac{1}{2}}}\exp \left(-\frac{1}{2}y^T(\alpha \Phi \Phi^T + \beta I)^{-1} y \right)
\end{align*}

Taking the negative log:

\begin{align*}
-\log p(y) =\frac{N}{2} \log 2\pi + \frac{1}{2} \log|\alpha \Phi \Phi^T + \beta I| + \frac{1}{2}y^T(\alpha \Phi \Phi^T + \beta I)^{-1} y
\end{align*}

The partial derivatives with respect to $\alpha$ and $\beta$ can be calculated:

\begin{align*}
\frac{\partial }{\partial \alpha} = \\
\frac{\partial }{\partial \beta} = \\
\end{align*}