# Linear Models

## Regression
When applying regression, we have a dataset $\textbf{X}$ such that

$$\textbf{X} = \left\{\textbf{x}^{(1)}, \ldots, \textbf{x}^{(n)}\right\}, \textrm{where}\; \textbf{x}^{(i)}\in\mathbb{R}^d$$ 

This dataset has corresponding labels 

$$\textbf{y}=\left\{y^{(1)},\ldots,y^{(n)}\right\}, \textrm{where}\; y^{(i)}\in\mathbb{R}$$ 

The goal is to find a hypothesis $h$ in the set of all hypothesis $H$, such that

$$ h: \mathbb{R}^d \to \mathbb{R}, \textrm{where}\; h \in H $$

This hypothesis $h$ can be written as:

$$ h(\textbf{x}) = \sum_{j=1}^d \theta_j x_j + \theta_0 x_0 \overset{*}{=} \theta^T\textbf{x}, \textrm{where}\; x_0 = 1 $$

Mind that at $\overset{*}=$ the intercept $x_0$ is put into the vector $\textbf{x}$.

Now we would like to know how far our predictions are of from the true values. To do so we need to introduce some metrics.

## Some metrics
As a first exercise we will implement the **sum of squared errors** using simple matrix operations, do this in `./code/utils.py`

These exercises are based on the [scikit-learn User Guide](https://scikit-learn.org/stable/modules/linear_model.html) and more specifically [this example](https://scikit-learn.org/stable/auto_examplesA/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py).

This is the mentioned example from scikit-learn:

In [1]:
import numpy as np
import sys
sys.path.append('code/')

import utils

y1 = np.asarray([1., 1.2, 3.1, 3.7])[:,np.newaxis]
y2 = np.asarray([0.8, 1.5, 2.6, 3.1])[:,np.newaxis]

sse = utils.sum_squared_errors(y1, y2)
print(sse)

0.7400000000000001


The **sum of squared errors** (SSE) can be easily extended to the **mean squared error** (MSE) and the **root mean squared error** (RMSE). Do this in `./code/utils.py`.

In [2]:
mse = utils.mean_squared_error(y1, y2)
rmse = utils.root_mean_squared_error(y1, y2)
print(mse)
print(rmse)

0.18500000000000003
0.4301162633521314


## Fitting Linear Regression
To fit a linear regression mode, one of the solutions could be to minimise the sum of squared errors:

$$ SSE(\theta)=\sum_{i=1}^n\left(y^{(i)}-\theta^T\textbf{x}^{(i)}\right)^2 $$

A general solution can be written as:

$$ \theta = \left(\textbf{X}^T \textbf{X}\right)^{-1} \textbf{X}^T \textbf{y} $$

Try to find this general solution yourself using the derivative of $SSE(\theta)$ with respect to $\theta$ and using the product rule, the first steps are given:

\begin{align}
\frac{\partial SSE(\theta)}{\partial \theta} 
    &= \frac{\partial}{\partial \theta} \sum_{i=1}^n \left(y^{(i)} - \theta^T\textbf{x}^{(i)}\right)^2 \\
    &\overset{\square}= \frac{\partial}{\partial \theta} \left(\textbf{y} - \textbf{X}\theta\right)^2 \\
    &\overset{*}= 2 \textbf{X}^T (\textbf{y} - \textbf{X}\theta)
\end{align}

By using the definition of inner product at $\square$ and the product rule at $*$.

Now set the derivative to zero a

Now we can use the general solution to fit a linear regression model to some data. For this exercise, the diabetes dataset will be used that comes with *scikit-learn*.

In [3]:
from sklearn import datasets, linear_model, model_selection

# load the dataset
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

# And inspect the shapes
print(diabetes_X.shape)
print(diabetes_y.shape)

# Split in train and test data (why?)
X_train, X_test, y_train, y_test = model_selection.train_test_split(diabetes_X, diabetes_y)

(442, 10)
(442,)


This dataset has 442 samples and the samples have 10 parameters. Thus, we need to fit a linear model having 11 parameters, the 10 parameters in the dataset and the intercept. The next exercise will be to fit a least squared erros linear model to the diabetes data.

In [4]:
import sys
sys.path.append('code/')
import linear_models
import utils

# Fit a linear model using least squared error to the training data
theta = linear_models.lse_fit(X_train, y_train)

# Predict y
y_predicted = linear_models.lse_predict(X_test, theta)

# Get performance by evaluating y_predicted with respect to y_test
mse = utils.mean_squared_error(y_test, y_predicted)
print(mse)


3149.4411216258404


## Ideas for additional topics
### polynomial linear fit