*Valentina Alto has a great blog post on OLS linear regression! read it here: https://towardsdatascience.com/understanding-the-ols-method-for-simple-linear-regression-e0a4e8f692cc*

## Ordinary Least Squares Linear Regression
Linear regression is a simple linear approach that models the relationship between input variables $X$ and the single output variable $Y$

This can be implemented in a number of ways, most commonly through either ordinary least squares (OLS).

The linear regression problem can be formulated as such:

$Y = \alpha + \Sigma_{i=1..p} \beta_{i}X_{i} + \epsilon_{i}$

where $\epsilon$ is the error term, $\alpha$ is the value of the dependent variable $Y$ when the independent variables are $0$, and $\beta_{j}$ is the weight applied to the independent variable $X_{j}$

For now, lets consider the simple case where there is only one independent variable:

$Y = \alpha + \beta X + \epsilon$

or, for a single data point:

$y_{i} = \alpha + \beta x_{i} + \epsilon_{i}$

The goal here is to select values for $\alpha$ and $\beta$ that minimise the error term $\epsilon$. We can start by rearranging the formula:

$\epsilon_{i} = y_{i} - \alpha - \beta x_{i}$

thus for all data points...

$\Sigma_{i=1}^{n}\epsilon_{i} = \Sigma_{i=1}^{n}(y_{i} - \alpha - \beta x_{i}$)

As error could be positive or negative, we square $\epsilon$ to ensure all error contributes positively to the term:

$\Sigma_{i=1}^{n}\epsilon_{i}^{2} = \Sigma_{i=1}^{n}(y_{i} - \alpha - \beta x_{i})^{2}$

Now we can set about minimising $\epsilon$ by choosing good values for $\alpha$ and $\beta$. This is achieved through calculating the partial derivative of the function with respect to the parameters

let $L(\alpha, \beta) = \Sigma_{i=1}^{n}\epsilon_{i}^{2} = \Sigma_{i=1}^{n}(y_{i} - \alpha - \beta x_{i})^{2}$



Set to $0$ as this is the point of inflection (i.e. the minima)


$L(\alpha \beta) = \Sigma_{i=1}^{n}(y_{i} - \alpha - \beta x_{i})^{2}$

$L(\alpha \beta) = \Sigma_{i=1}^{n}(y_{i} - \alpha - \beta x_{i})(y_{i} - \alpha - \beta x_{i})$

$L(\alpha \beta) = \Sigma_{i=1}^{n} y_{i}^{2} + \alpha^{2} + (\beta x_{i})^{2} - 2y_{i}\alpha - 2y_{i}\beta x_{i} + 2abx_{i}$


Solve for $\beta$: 

$\frac{\delta L(\alpha, \beta)}{\delta \beta} = 0$

$\Sigma_{i=1}^{n} 2\alpha x_{i} - 2y_{i}x_{i} + 2\beta x_{i} = 0$ 

$\Sigma_{i=1}^{n}(y_{i} - \alpha - \beta x_{i})x_{i} = 0$

substituting $\Sigma_{i=1}^{n}\alpha = \Sigma_{i=1}^{n}y_{i}-\beta x_{i}$
$\alpha = \frac{\Sigma_{i=1}^{n}y_{i}-\beta x_{i}}{\Sigma_{i=1}^{n}}$

$\therefore \alpha = \bar{y} - \beta \bar{x}$

$\Sigma_{i=1}^{n}(y_{i} - (\bar{y} - \beta \bar{x}) - \beta x_{i})x_{i} = 0$

$\Sigma_{i=1}^{n} y_{i} - \bar{y} + \beta \bar{x} - \beta x_{i} = 0$

$\beta \Sigma_{i=1}^{n}(\bar{x} - \beta x_{i}) = \Sigma_{i=1}^{n}(\bar{y} - y_{i})$

$\beta = \frac{\Sigma_{i=1}^{n}(\bar{y} - y_{i})}{\Sigma_{i=1}^{n}(\bar{x} - \beta x_{i})}$


Solve for $\alpha$:

$\frac{\delta L(\alpha, \beta)}{\delta \alpha} = 0$

$\Sigma_{i=1}^{n} 2\alpha - 2y_{i} + 2\beta x_{i} = 0$ 

$\Sigma_{i=1}^{n} y_{i} -\alpha - \beta x_{i} = 0$ 

$\Sigma_{i=1}^{n} \alpha = \Sigma_{i=1}^{n} y_{i} - \beta x_{i}$ 

$\alpha = \frac{\Sigma_{i=1}^{n} y_{i} - \beta x_{i}}{\Sigma_{i=1}^{n}}$

$\alpha = \bar{y} - \beta \bar{x}$

with $\alpha$ and $\beta$ found, we now have our regression model by substituting these values back in to the equation above:

$y = \alpha + \beta x$


In [13]:
import jax 
import matplotlib.pyplot as plt
from sklearn import datasets


In [14]:
"""
Get dataset
Split into test and train
"""

diabetes_X, diabetes_Y = datasets.load_diabetes(return_X_y=True)

diabetes_X = diabetes_X[:,None,2]

diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

diabetes_Y_train = diabetes_Y[:-20]
diabetes_Y_test = diabetes_Y[-20:]

