# Linear Regression

Linear regression finds the linear relationship between variables. In ML it's between features (an input variable) and a label (an output variable).

The general formula is:

$y' = w_0 + w_1 + x_1 + ... + w_n + x_n$

Where

- $y'$: the output predicted label (what we want to know).
- $w_0/ b$: the bias of the model and is calculated during training.
- $w_i$: the weight of the feature and is calculated during training.
- $x_i$: a feature—the input.

In [1]:
import plotly.express as px
import numpy as np
import pandas as pd

# Create dummy data
x = np.linspace(0, 10, 100)
y = np.sin(x)

df = pd.DataFrame({'x': x, 'y': y})

# Create a line plot
fig = px.line(df, x='x', y='y', title='Sine Wave with Plotly')

fig.show()

## Loss

Loss is a measure of how 'wrong' the model is. It's the difference between a model's predictions and the actual readings, and something we aim to minimise.

The most common ways to measure loss are:

| Loss type                  | Definition                                                                      | Equation                                                    |
|----------------------------|---------------------------------------------------------------------------------|------------------------------------------------------------|
| $L_1$ loss                    | The sum of the absolute values of the difference between the predicted values and the actual values. | $L_{\text{1}} = \sum_{i=1}^{N} \left\| y_i - \hat{y}_i \right\|$ |
| Mean absolute error (MAE)  | The average of L1 losses across a set of $N$ examples.                           | $\text{MAE} = \frac{1}{N} \sum_{i=1}^{N} \left\| y_i - \hat{y}_i \right\|$ |
| $L_2$ loss                    | The sum of the squared difference between the predicted values and the actual values. | $L_{\text{2}} = \sum_{i=1}^{N} \left( y_i - \hat{y}_i \right)^2$ |
| Mean squared error (MSE)   | The average of L2 losses across a set of $N$ examples.                           | $\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} \left( y_i - \hat{y}_i \right)^2$ |

To choose between these loss functions we need to consider how we want to approach outliers. The squaring methods will be more severely effected.

# TODO: Exercise for minimising MSE.


# Gradient Descent

Gradient descent is a mathematical technique that iteratively finds the weights and bias that produce the model with the lowest loss.

The model begins training with randomized weights and biases near zero, and then repeats the following steps:

- Calculate the loss with the current weight and bias.
- Determine the direction to move the weights and bias that reduce loss.
- Move the weight and bias values a small amount in the direction that reduces loss.
- Return to step one and repeat the process until the model can't reduce the loss any further.

In a concrete example

- Set the bias and weight to 0.
- Calculate MSE loss with these values.
- Calculate slope of the tangent to the loss function at each weight and the bias. The reason we calculate the slope is to tell us which _direction_ our loss function is going, and _how far_ in respect to our bias and weight variables.
  - A positive slope means the loss increases if you increase the parameter.
  - A negative slope means the loss decreases if you increase the parameter.
- Move a small amount in the direction of the negative slope to get the next weight and bias.

# Things to think about

- Multi-dimensional linear regression