[![Dataflowr](https://raw.githubusercontent.com/dataflowr/website/master/_assets/dataflowr_logo.png)](https://dataflowr.github.io/website/)

# Module 2b: Playing with pytorch: linear regression

[Video timestamp](https://youtu.be/Z6H3zakmn6E?t=960)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import torch
import numpy as np

In [None]:
torch.__version__

'2.0.1+cu118'

## Warm-up: Linear regression with numpy

Our model is:
$$
y_t = 2x^1_t-3x^2_t+1, \quad t\in\{1,\dots,30\}
$$

Our task is given the 'observations' $(x_t,y_t)_{t\in\{1,\dots,30\}}$ to recover the weights $w^1=2, w^2=-3$ and the bias $b = 1$.

In order to do so, we will solve the following optimization problem:
$$
\underset{w^1,w^2,b}{\operatorname{argmin}} \sum_{t=1}^{30} \left(w^1x^1_t+w^2x^2_t+b-y_t\right)^2
$$

[Video timestamp](https://youtu.be/Z6H3zakmn6E?t=1080)

In [None]:
import numpy as np
from numpy.random import random
# generate random input data
x = random((30,2)) #générer des données d'entrée aléatoires :

# generate labels corresponding to input data x/générer des labels correspondant aux données d'entrée x, des vraies valleurs de y:
y = np.dot(x, [2., -3.]) + 1.
w_source = np.array([2., -3.])
b_source  = np.array([1.])

In [None]:
x[:5]

array([[0.91088571, 0.3960003 ],
       [0.08553416, 0.63109372],
       [0.44810763, 0.70713165],
       [0.75915325, 0.92049609],
       [0.06849194, 0.14777436]])

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D

def plot_figs(fig_num, elev, azim, x, y, weights, bias):
    fig = plt.figure(fig_num, figsize=(4, 3))
    plt.clf()
    ax = Axes3D(fig, elev=elev, azim=azim)
    ax.scatter(x[:, 0], x[:, 1], y)
    ax.plot_surface(np.array([[0, 0], [1, 1]]),
                    np.array([[0, 1], [0, 1]]),
                    (np.dot(np.array([[0, 0, 1, 1],
                                          [0, 1, 0, 1]]).T, weights) + bias).reshape((2, 2)),
                    alpha=.5)
    ax.set_xlabel('x_1')
    ax.set_ylabel('x_2')
    ax.set_zlabel('y')
    
def plot_views(x, y, w, b):
    #Generate the different figures from different views/ génère différentes figures issues des différents vues :
    elev = 43.5
    azim = -110
    plot_figs(1, elev, azim, x, y, w, b[0])

    plt.show()

In [None]:
plot_views(x, y, w_source, b_source) 
#Notre tâche est de représenter l'équation des points(observation) du plan et miniser la MSE. 

<Figure size 400x300 with 0 Axes>

In vector form, we define:
$$
\hat{y}_t = {\bf w}^T{\bf x}_t+b
$$
and we want to minimize the loss given by:
$$
loss = \sum_t\underbrace{\left(\hat{y}_t-y_t \right)^2}_{loss_t}.
$$

To minimize the loss we first compute the gradient of each $loss_t$:
\begin{eqnarray*}
\frac{\partial{loss_t}}{\partial w^1} &=& 2x^1_t\left({\bf w}^T{\bf x}_t+b-y_t \right)\\
\frac{\partial{loss_t}}{\partial w^2} &=& 2x^2_t\left({\bf w}^T{\bf x}_t+b-y_t \right)\\
\frac{\partial{loss_t}}{\partial b} &=& 2\left({\bf w}^T{\bf x}_t+b-y_t \right)
\end{eqnarray*}

Note that the actual gradient of the loss is given by:
$$
\frac{\partial{loss}}{\partial w^1} =\sum_t \frac{\partial{loss_t}}{\partial w^1},\quad
\frac{\partial{loss}}{\partial w^2} =\sum_t \frac{\partial{loss_t}}{\partial w^2},\quad
\frac{\partial{loss}}{\partial b} =\sum_t \frac{\partial{loss_t}}{\partial b}
$$

For one epoch, **(Batch) Gradient Descent** updates the weights and bias as follows:
\begin{eqnarray*}
w^1_{new}&=&w^1_{old}-\alpha\frac{\partial{loss}}{\partial w^1} \\
w^2_{new}&=&w^2_{old}-\alpha\frac{\partial{loss}}{\partial w^2} \\
b_{new}&=&b_{old}-\alpha\frac{\partial{loss}}{\partial b},
\end{eqnarray*}

and then we run several epochs.

In [None]:
# randomly initialize learnable weights and bias/on inialise de manière aléatoire les poids et biais apprenables: 
w_init = random(2)
b_init = random(1)

w = w_init
b = b_init
print("initial values of the parameters:", w, b )

initial values of the parameters: [0.50720218 0.25600937] [0.42059718]


In [None]:
#Ici on définit 3 prinpales fonctions: 
# our model forward pass/ la première fonction (y^t=wTxt+b)
def forward(x):
    return x.dot(w)+b

# Loss function/ fonction de perte 
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y)**2 

print("initial loss:", np.sum([loss(x_val,y_val) for x_val, y_val in zip(x, y)]) )

# compute gradient (compute les dérivés du gradient)
def gradient(x, y):  # d_loss/d_w, d_loss/d_c
    return 2*(x.dot(w)+b - y)*x, 2 * (x.dot(w)+b - y)
 #première dérivé de la perte qui respecte w, et la second dérivé de la perte qui resptect le bias
learning_rate = 1e-2
# Training loop
for epoch in range(10): #si nous voulons un meilleur résultat il faut plus d'époch passer de 10 à 50 expl
    grad_w = np.array([0,0])
    grad_b = np.array(0)
    l = 0
    for x_val, y_val in zip(x, y):
        grad_w = np.add(grad_w,gradient(x_val, y_val)[0])
        grad_b = np.add(grad_b,gradient(x_val, y_val)[1])
        l += loss(x_val, y_val)
    w = w - learning_rate * grad_w
    b = b - learning_rate * grad_b
    print("progress:", "epoch:", epoch, "loss",l[0])

# After training
print("estimation of the parameters:", w, b) #la perte décroit , si on veut de meilleur résultat il faut plus d'epoch 

initial loss: 29.433702438427858
progress: epoch: 0 loss 29.43370243842785
progress: epoch: 1 loss 26.33138530704353
progress: epoch: 2 loss 24.01898035972118
progress: epoch: 3 loss 21.913359101698493
progress: epoch: 4 loss 19.99572378103657
progress: epoch: 5 loss 18.249078726534336
progress: epoch: 6 loss 16.657977915249866
progress: epoch: 7 loss 15.208382870425982
progress: epoch: 8 loss 13.887533651351326
progress: epoch: 9 loss 12.68383171264598
estimation of the parameters: [ 1.14357401 -0.90209249] [0.46694245]


In [None]:
plot_views(x, y, w, b) #approximation d'équation pour le plan en 3 dim 

<Figure size 400x300 with 0 Axes>

## Linear regression with tensors

[Video timestamp](https://youtu.be/Z6H3zakmn6E?t=1650)

In [None]:
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

In [None]:
#la première chose à faire est de transférer tout de numpy vers tensor . 
x_t = torch.from_numpy(x).type(dtype)
y_t = torch.from_numpy(y).type(dtype).unsqueeze(1)

This is an implementation of **(Batch) Gradient Descent** with tensors.

Note that in the main loop, the functions loss_t and gradient_t are always called with the same inputs: they can easily be incorporated into the loop (we'll do that below).

In [None]:
w_init_t = torch.from_numpy(w_init).type(dtype)
b_init_t = torch.from_numpy(b_init).type(dtype)
#Les fonctions loss et gradient sont appelées avec les mêmes estimations initiales (w et b)
w_t = w_init_t.clone()
w_t.unsqueeze_(1)
b_t = b_init_t.clone()
b_t.unsqueeze_(1)
print("initial values of the parameters:", w_t, b_t )

initial values of the parameters: tensor([[0.5072],
        [0.2560]]) tensor([[0.4206]])


In [None]:
# our model forward pass
def forward_t(x):
    return x.mm(w_t)+b_t #ici seule modif (avec la fonction faite avec numpy) : on utilise non pas dot mais la matrice multiplication : mm

# Loss function
def loss_t(x, y):
    y_pred = forward_t(x) #compile les prédictions fait sur la boucle x
    return (y_pred - y).pow(2).sum() # je les compare avec les vraies valeurs : y et somme au carré de pytorch 

# compute gradient
def gradient_t(x, y):  # d_loss/d_w, d_loss/d_c
    return 2*torch.mm(torch.t(x),x.mm(w_t)+b_t - y), 2 * (x.mm(w_t)+b_t - y).sum()
#la dérivé de la perte qui respecte le poids et la dérivé de la perte qui respecte le bias
learning_rate = 1e-2
for epoch in range(10):
    l_t = loss_t(x_t,y_t)
    grad_w, grad_b = gradient_t(x_t,y_t)
    w_t = w_t-learning_rate*grad_w
    b_t = b_t-learning_rate*grad_b
    print("progress:", "epoch:", epoch, "loss",l_t)

# After training
print("estimation of the parameters:", w_t, b_t )

progress: epoch: 0 loss tensor(29.4337)
progress: epoch: 1 loss tensor(26.3314)
progress: epoch: 2 loss tensor(24.0190)
progress: epoch: 3 loss tensor(21.9134)
progress: epoch: 4 loss tensor(19.9957)
progress: epoch: 5 loss tensor(18.2491)
progress: epoch: 6 loss tensor(16.6580)
progress: epoch: 7 loss tensor(15.2084)
progress: epoch: 8 loss tensor(13.8875)
progress: epoch: 9 loss tensor(12.6838)
estimation of the parameters: tensor([[ 1.1436],
        [-0.9021]]) tensor([[0.4669]])


## Linear regression with Autograd

[Video timestamp](https://youtu.be/Z6H3zakmn6E?t=1890)

In [None]:
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass./on veut compiler le gradient par rapport à ses tensors durant le backward pass
#cette partie de code initialise les poids et le biais de la régression linéaire, les marque comme nécessitant le calcul des gradients,
# et affiche les valeurs initiales des paramètres. Cela prépare le modèle pour l'apprentissage et la mise à jour des poids et du biais 
#à l'aide de la rétropropagation. 
w_v = w_init_t.clone().unsqueeze(1) #on crée une copie du tenseur w_init
w_v.requires_grad_(True) #nous souhaitons calculer les gradients par rapport à ce tenseur lors de la rétropropagation (backward pass)
b_v = b_init_t.clone().unsqueeze(1) #on crée une copie du tenseur b_init
b_v.requires_grad_(True)
print("initial values of the parameters:", w_v.data, b_v.data )

initial values of the parameters: tensor([[0.5072],
        [0.2560]]) tensor([[0.4206]])


An implementation of **(Batch) Gradient Descent** without computing explicitly the gradient and using autograd instead.

In [None]:
for epoch in range(10):
    y_pred = x_t.mm(w_v)+b_v
    loss = (y_pred - y_t).pow(2).sum()
    
    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Variables with requires_grad=True.
    # After this call w.grad and b.grad will be tensors holding the gradient
    # of the loss with respect to w and b respectively.

    #on utilise l'autograd pour calculer le bacward pass.On calculera le gradient du loss par rapport 
    #à toutes les variables
    loss.backward()
    
    # Update weights using gradient descent. For this step we just want to mutate
    # the values of w_v and b_v in-place; we don't want to build up a computational
    # graph for the update steps, so we use the torch.no_grad() context manager
    # to prevent PyTorch from building a computational graph for the updates
    with torch.no_grad(): #on empèche torc de construire un graph computationnel pour les mises à jours de poids, on économise en mémoire 
        w_v -= learning_rate * w_v.grad #soustrait le produit du taux d'apprentissage (learning_rate) et du gradient de w_v à la valeur actuelle de w_v. 
        b_v -= learning_rate * b_v.grad #on fait pareil ici avec b_v
    
    # Manually zero the gradients after updating weights
    # otherwise gradients will be acumulated after each .backward()
    #ne pas oublier de remettre les gradients à zéro pour éviter que les gradients calculés lors de la mise à jour précédente 
    #ne s'accumulent avec les gradients calculés lors des prochaines itérations de la descente de gradient.
    w_v.grad.zero_()
    b_v.grad.zero_()
    
    print("progress:", "epoch:", epoch, "loss",loss.data.item())

# After training
print("estimation of the parameters:", w_v.data, b_v.data.t() )

progress: epoch: 0 loss 29.433704376220703
progress: epoch: 1 loss 26.331384658813477
progress: epoch: 2 loss 24.018978118896484
progress: epoch: 3 loss 21.913360595703125
progress: epoch: 4 loss 19.995723724365234
progress: epoch: 5 loss 18.24907875061035
progress: epoch: 6 loss 16.657978057861328
progress: epoch: 7 loss 15.208383560180664
progress: epoch: 8 loss 13.887533187866211
progress: epoch: 9 loss 12.683832168579102
estimation of the parameters: tensor([[ 1.1436],
        [-0.9021]]) tensor([[0.4669]])


## Linear regression with neural network

[Video timestamp](https://youtu.be/Z6H3zakmn6E?t=2075)

An implementation of **(Batch) Gradient Descent** using the nn package. Here we have a super simple model with only one layer and no activation function!
On crée donc un modèle linéaire simple , avec une couche linaire à deux entrées (features) et une sortie. 

In [None]:
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Variables for its weight and bias.
model = torch.nn.Sequential(
    torch.nn.Linear(2, 1),
)

for m in model.children(): #on fait une boucle pour parcourir chaque module enfant du modèle (ici une seule couche linéaire)
    m.weight.data = w_init_t.clone().unsqueeze(0) #on initialise avec les valeurs clonées 
    m.bias.data = b_init_t.clone()

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum') #on crée une instance (torch.nn.MSELoss) qui représente la fonction de perte 
#pour une régression linéaire basée sur la MSE.

# switch to train mode
model.train()

for epoch in range(10):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Variable of input data to the Module and it produces
    # a Variable of output data.
    y_pred = model(x_t) #on entraine le modèle sur 10 epoch et on effectue les prédictions sur les données d'entrée (x_t)
  
    # Note this operation is equivalent to: pred = model.forward(x_v)

    # Compute and print loss. We pass Variables containing the predicted and true
    # values of y, and the loss function returns a Variable containing the
    # loss.
    loss = loss_fn(y_pred, y_t) #on calcule la perte (sachant que l'objectif de l'entrainement est de minimiser cette perte)

    # Zero the gradients before running the backward pass.
    model.zero_grad() #réinitialiser les gradients à zéro à chaque itération pour éviter l'accumulation de gradients indésirables 
    #provenant d'itérations précédentes.

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Variables with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its data and gradients like we did before.
    #on met à jour les param en utilisant la méthode de descente de gradients 
    with torch.no_grad(): #opération non prise en compte dans le backward et n'affecte pas le calcul de nouveaux gradients
        for param in model.parameters():
            param.data -= learning_rate * param.grad
        
    print("progress:", "epoch:", epoch, "loss",loss.data.item())

# After training
print("estimation of the parameters:")
for param in model.parameters():
    print(param)

progress: epoch: 0 loss 29.433704376220703
progress: epoch: 1 loss 26.331384658813477
progress: epoch: 2 loss 24.018978118896484
progress: epoch: 3 loss 21.913360595703125
progress: epoch: 4 loss 19.995723724365234
progress: epoch: 5 loss 18.24907875061035
progress: epoch: 6 loss 16.657978057861328
progress: epoch: 7 loss 15.208383560180664
progress: epoch: 8 loss 13.887533187866211
progress: epoch: 9 loss 12.683832168579102
estimation of the parameters:
Parameter containing:
tensor([[ 1.1436, -0.9021]], requires_grad=True)
Parameter containing:
tensor([0.4669], requires_grad=True)


Last step, we use directly the optim package to update the weights and bias.
Le package optim de PyTorch fournit des implémentations d'algorithmes d'optimisation couramment utilisés pour mettre à jour les poids et les biais (paramètres) d'un modèle lors de l'entraînement

[Video timestamp](https://youtu.be/Z6H3zakmn6E?t=2390)

In [None]:
model = torch.nn.Sequential(
    torch.nn.Linear(2, 1),
)

for m in model.children():
    m.weight.data = w_init_t.clone().unsqueeze(0)
    m.bias.data = b_init_t.clone()

loss_fn = torch.nn.MSELoss(reduction='sum')

model.train()

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)


for epoch in range(10):
    y_pred = model(x_t)
    loss = loss_fn(y_pred, y_t)
    print("progress:", "epoch:", epoch, "loss",loss.item())
    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    
# After training
print("estimation of the parameters:")
for param in model.parameters():
    print(param)

progress: epoch: 0 loss 29.433704376220703
progress: epoch: 1 loss 26.331384658813477
progress: epoch: 2 loss 24.018978118896484
progress: epoch: 3 loss 21.913358688354492
progress: epoch: 4 loss 19.995725631713867
progress: epoch: 5 loss 18.24907875061035
progress: epoch: 6 loss 16.657978057861328
progress: epoch: 7 loss 15.208383560180664
progress: epoch: 8 loss 13.887533187866211
progress: epoch: 9 loss 12.683832168579102
estimation of the parameters:
Parameter containing:
tensor([[ 1.1436, -0.9021]], requires_grad=True)
Parameter containing:
tensor([0.4669], requires_grad=True)


## Remark

This problem can be solved in 3 lines of code!

In [None]:
xb_t = torch.cat((x_t,torch.ones(30).unsqueeze(1)),1)
# for old version of pytorch
#sol, _ =torch.lstsq(y_t,xb_t)
#sol[:3]
# for pytorch 1.9 and newer
sol = torch.linalg.lstsq(xb_t,y_t)
sol.solution

tensor([[ 2.0000],
        [-3.0000],
        [ 1.0000]])

## Exercise: Play with the code

Change the number of samples from 30 to 300. What happens? How to correct it?

In [None]:
x = random((300,2))
y = np.dot(x, [2., -3.]) + 1.
x_t = torch.from_numpy(x).type(dtype)
y_t = torch.from_numpy(y).type(dtype).unsqueeze(1)

In [None]:
model = torch.nn.Sequential(
    torch.nn.Linear(2, 1),
)

for m in model.children():
    m.weight.data = w_init_t.clone().unsqueeze(0)
    m.bias.data = b_init_t.clone()

loss_fn = torch.nn.MSELoss(reduction = 'sum')

model.train()

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)


for epoch in range(10):
    y_pred = model(x_t)
    loss = loss_fn(y_pred, y_t)
    print("progress:", "epoch:", epoch, "loss",loss.item())
    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    
# After training
print("estimation of the parameters:")
for param in model.parameters():
    print(param)

progress: epoch: 0 loss 358.25164794921875
progress: epoch: 1 loss 4185.1220703125
progress: epoch: 2 loss 281676.59375
progress: epoch: 3 loss 19350164.0
progress: epoch: 4 loss 1329407360.0
progress: epoch: 5 loss 91333804032.0
progress: epoch: 6 loss 6274877489152.0
progress: epoch: 7 loss 431100765667328.0
progress: epoch: 8 loss 2.961778094060339e+16
progress: epoch: 9 loss 2.034820402353537e+18
estimation of the parameters:
Parameter containing:
tensor([[2.2745e+08, 2.4235e+08]], requires_grad=True)
Parameter containing:
tensor([4.3650e+08], requires_grad=True)


Quand on modifie le nombre d'échantillon de 30 à 300, les dimensions des données d'entrée (x) et des données cibles (y) doivent être ajustées en conséquence. Dans le code donné, x a une forme de (300, 2) et y a une forme de (300,), ce qui est incompatible avec les dimensions attendues par le modèle et la fonction de perte.

Pour corriger cela, on doit générer x et y avec les dimensions correctes :
x = np.random.random((300, 2))
y = np.dot(x, [2., -3.]) + 1.

In [None]:
x = np.random.random((300, 2))
y = np.dot(x, [2., -3.]) + 1
x_t = torch.from_numpy(x).type(dtype)
y_t = torch.from_numpy(y).type(dtype).unsqueeze(1)

In [None]:
model = torch.nn.Sequential(
    torch.nn.Linear(2, 1),
)

for m in model.children():
    m.weight.data = w_init_t.clone().unsqueeze(0)
    m.bias.data = b_init_t.clone()

loss_fn = torch.nn.MSELoss(reduction = 'sum')

model.train()

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)


for epoch in range(10):
    y_pred = model(x_t)
    loss = loss_fn(y_pred, y_t)
    print("progress:", "epoch:", epoch, "loss",loss.item())
    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    
# After training
print("estimation of the parameters:")
for param in model.parameters():
    print(param)

progress: epoch: 0 loss 328.1256103515625
progress: epoch: 1 loss 2330.90478515625
progress: epoch: 2 loss 154817.109375
progress: epoch: 3 loss 10684682.0
progress: epoch: 4 loss 737528320.0
progress: epoch: 5 loss 50909188096.0
progress: epoch: 6 loss 3514096943104.0
progress: epoch: 7 loss 242566700204032.0
progress: epoch: 8 loss 1.674359714349056e+16
progress: epoch: 9 loss 1.1557560455832535e+18
estimation of the parameters:
Parameter containing:
tensor([[1.8096e+08, 1.7505e+08]], requires_grad=True)
Parameter containing:
tensor([3.2867e+08], requires_grad=True)


In [None]:
[![Dataflowr](https://raw.githubusercontent.com/dataflowr/website/master/_assets/dataflowr_logo.png)](https://dataflowr.github.io/website/)