<a href="https://colab.research.google.com/github/danielpy108/MachineLearningAlgorithms/blob/master/LinearRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear Regression 

Linear regression is a parametric supervised learning model that fits the line which sum of residual errors is the least, to the training data that is suplied as input.

This ML model can be used for:
+ Classification: only linearly separable data
+ Regression: every kind of data (it's not the best always)

## Regression

For regression problems, the input vectors are real valued and the output ones are also real values, this means it's a mapping from $R^n \rightarrow R^n$.

We have a traning dataset $D = \{(x_1, y_1), ..., (x_N, y_N)\}$ of N samples, for each input vector $x_i$ its dimenssion or number of features is $d$: $x_i = (x_{i1}, ..., x_{id})$.

The linear regression model has parameters that can be modified or adjusted in order to fit best the line, plane or hyperplane to the data points. This parameters are most commonly known as **__weights__**. 

In the model, we will have as many parameters as the number of features, so if we want to predict the values of the input given the _weights_ and the _feature vector_:

$$ y = \sum_i{\alpha_i x_i} + b$$

For the following examples, let's suppose we want to know what the value of y is given that our feature vector lies in a 1D space: $x_i \in R^1$.

$$y = \alpha_1 x_1 + b$$

This equation reminds you of something, right? Basically, $\alpha$ is the slope of the line and $b$ is the intersection with the y-axis.

It's worth to mention that the __initial parameters__ are set to zero.

## Learning from inputs

To make the linear regression model learn from the dataset of N samples $D$, we need:

1. A Loss function
2. An error function
3. Miminize the error with respect of the parameters $\alpha$ and $b$
4. Update the parameters 

This procedure must be run inside a loop of _e_ iterations (every iteration is also called an **_epoch_**). This way, the parameters will be updated and the hyperplane with the least sum of  residual errors will be fitted.

### A Loss and an error Function 
The Loss function is the sum of the error function's value for each sample $(x_i, y_i)$:

$$ L = \sum_i{l_i} $$

$l_i$ is the error function for a single sample. Altough, there are many error functions, in Linear Regression is commonly used the MSE (Mean Squared Error).

$$ L = \sum_i{\frac{1}{2N}(y_i - \hat{y}_i)^2} $$
$$ L = \frac{1}{2N}\sum_i{(y_i - \hat{y}_i)^2} $$
$$ L = \frac{1}{2N}\sum_i{(y_i - (\sum_j{\alpha_j x_j+b}))^2} $$

The Loss function is a cumulative sum of the singular errors and its value changes for different values of $\alpha_i$.

### Minizing the Loss function
Since we want to minize the Loss with respect to the parameters model to find the best hyperplane, let's take the two partial derivatives we need:

#### With respect of alpha
$$ \Delta_{\alpha_i}{L} =  \frac{\partial{L}}{\partial{\alpha}}  $$
$$ \Delta_{\alpha_i}{L} =  \frac{1}{2N}\sum_i{\frac{\partial{}}{\partial{\alpha_i}}}(y_i - (\sum_j{\alpha_j x_j+b}))^2 $$

In the second sum term its value is zero everything except when $i = j$, that's the associated "coeficient" $x_j$ of the $\alpha_i$.

$$ \Delta_{\alpha_i}{L} = \frac{1}{N}\sum_i (y_i - (\sum_j{\alpha_j x_j+b})) x_j $$

#### With respect of the bias 
$$ \Delta_{b}{L} =  \frac{\partial{L}}{\partial{b}}  $$
$$ \Delta_{b}{L} =  \frac{1}{2N}\sum_i{\frac{\partial{}}{\partial{\alpha_i}}}(y_i - (\sum_j{\alpha_j x_j+b}))^2 $$

$$ \Delta_{b}{L} = \frac{1}{N}\sum_i (y_i - (\sum_j{\alpha_j x_j+b}))$$

### Updating the parameters

Once we find the gradient of the Loss function with respect to the $\alpha's$ and the bias term $b$, we are ready to update the weights:

$$ \alpha_i = \alpha_i - \mu\Delta_{\alpha_i}{L} $$
$$ \alpha_i = \alpha_i - \frac{\mu}{N}\sum_i (y_i - (\sum_j{\alpha_j x_j+b})) $$

$$ b = b - \mu\Delta_{b}{L} $$
$$ b = \alpha - \frac{\mu}{N}\sum_i (y_i - (\sum_j{\alpha_j x_j+b})) $$


In [0]:
import torch 
import numpy as np
from sklearn import datasets
import plotly.graph_objects as go

In [2]:
# Let's generate a random Dataset for Regression
# In sklearn is as simple as:
#   X, y = datasets.make_regression(n_samples=100, n_features=3, noise=20, random_state=1)
torch.random.manual_seed(9)
a, b = 5, -5                              # Random distribution between -5 and 5
N = 100                                   # Number of samples 
d = 3                                     # Number of features (input dimenssion)
D = (a-b)*torch.rand(size=(N, d+1), dtype=torch.float32) + b   # Dataset (the 4th column is the target variable y)
X = D[:,:3]                               # Input features
Y = D[:,3]                                # Target features

# Let's generate random parameters (alpha's and bias)
W = torch.rand(size=(d,), dtype=torch.float32, requires_grad=True)
b = torch.rand(size=(N,), dtype=torch.float32, requires_grad=True)

print("Dataset      x1       x2       x3       y")
print(D[:5])
print("\nInitial Weights")
print(W)
print("\nInitial Bias")
print(b[:5])

Dataset      x1       x2       x3       y
tensor([[ 1.5578, -1.9798, -0.2010,  2.7737],
        [ 4.1796,  4.3103, -2.3964,  4.5342],
        [-1.1956, -0.8957,  4.5102,  0.6863],
        [-3.6190, -2.9313,  0.1390, -0.1500],
        [ 0.1927, -1.5322, -0.5235,  1.3156]])

Initial Weights
tensor([0.1621, 0.8842, 0.7179], requires_grad=True)

Initial Bias
tensor([0.6858, 0.4205, 0.5289, 0.6379, 0.7051], grad_fn=<SliceBackward>)


In [3]:
# We'll perform step by step the procedure of linear regression
def predict(X):
    global W, b
    return X@W.T + b

def mse(y, y_hat):
    N = y.numel()
    return 1/(2*N) * torch.sum((y - y_hat).pow(2))

x, y = X[0], Y[0]
lr = 1e-03

# Compute the value of y_hat (prediction)
Y_hat = predict(X)
print(f'y_hat = {Y_hat[:3]}')

# Compute the mean squared error (sample loss)
loss = mse(Y, Y_hat)
print(f'loss = {loss}')

# Compute the gradient of the loss w.r.o the parameters (W and b)
loss.backward()
print('Gradients:')
print('W grad = ', W.grad[0].item())
print('b grad', b.grad[0].item())

y_hat = tensor([-0.9564,  3.1891,  2.7810], grad_fn=<SliceBackward>)
loss = 10.507801055908203
Gradients:
W grad =  2.569791555404663
b grad -0.03730182722210884


In [4]:
# To make the linear regression model learn, run it inside a loop
epochs = 100
L = []
for e in range(epochs):
    Y_hat = predict(X)
    loss = mse(Y, Y_hat)
    L.append(loss.detach())
    loss.backward()
    with torch.no_grad():
        W -= lr*W.grad
        b -= lr*b.grad
        W.grad.zero_()
        b.grad.zero_()

print(f'Training final loss = {loss}')

Training final loss = 5.392975330352783


In [5]:
# Plot how the loss change in each epoch
fig = go.Figure()

fig.add_trace(
    go.Scatter(
        y = np.array(L), 
        name = "Loss",
        line = go.scatter.Line(color="red"),
        showlegend = True
    )
)

fig.update_layout(
    title = "Loss function over time",
    xaxis_title = "Epochs (e)",
    yaxis_title = "Loss value",
    font = dict(
        family="Courier New, monospace",
        size=18,
        color="#7f7f7f"
    )
)

fig.show()

## The easiest way 2 LR in Pytorch

We can creare a model by using the torch.nn package. Is easiest than you think:

## Todo 
- [ ] Make a linear regression model using torch.nn
- [ ] Download a linear separable dataset
- [ ] Run a training loop 
- [ ] Plot the loss
- [ ] Test the model accuracy with new data


In [6]:
from torch.nn import Linear
from torch.nn.functional import mse_loss

model = Linear(3, 1)

# Get the model parameters
print(model.weight)
print(f'\n{model.bias}')

# Also, you can run
print(f'\n{list(model.parameters())}')

Parameter containing:
tensor([[-0.1664,  0.5583, -0.1737]], requires_grad=True)

Parameter containing:
tensor([0.1535], requires_grad=True)

[Parameter containing:
tensor([[-0.1664,  0.5583, -0.1737]], requires_grad=True), Parameter containing:
tensor([0.1535], requires_grad=True)]


In [0]:
# Choosing a loss function and an optimizer

# loss_fn = mse_loss
# optimizer = torch.optim.SGD(model.parameters(), lr=1e-03)