<center>
<h1>Advanced Machine Learning and Deep Learning (Master DAC)</h1>
<h2>TP 02 : Graphe de calcul, autograd et modules</h2>

<hr>
<strong>Ben Kabongo</strong>, M2 MVA <br>
ben.kabongo_buzangu@ens.paris-saclay.fr <br>
<i>Novembre 2023</i>
<hr>
</center>

In [1]:
import torch
import torch.nn as nn
from torch.utils.tensorboard import SummaryWriter
## Installer datamaestro et datamaestro-ml pip install datamaestro datamaestro-ml
import datamaestro
from tqdm import tqdm

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
data = datamaestro.prepare_dataset("edu.uci.boston")
colnames, datax, datay = data.data()
datax = torch.tensor(datax,dtype=torch.float)
datay = torch.tensor(datay,dtype=torch.float).reshape(-1,1)

## Question 1

Implémenter un algorithme de descente du gradient batch pour la régression linéaire en utilisant les fonctionnalités de la différenciation automatique. Utiliser votre code du TME 1 (en utilisant ou non vos propres fonctions), supprimer le contexte et utiliser la différenciation automatique.

In [3]:
def BGD_linear_regression(X, y, eps=1e-4, max_iter=10):
    linear = nn.Linear(X.size(1), 1)
    mse = nn.MSELoss()

    for i in tqdm(range(max_iter)):
        pred = linear(X)
        loss = mse(pred, y)

        writer.add_scalar('Loss/train', loss, i)
        print(f"Epoch {i}: loss {loss}")

        loss.backward()

        with torch.no_grad():
            linear.weight -= eps * linear.weight.grad
            linear.bias -= eps * linear.bias.grad

        linear.weight.grad.zero_()
        linear.bias.grad.zero_()

    return linear

Tester votre implémentation avec les données de Boston Housing. Tracer la courbe du coût en apprentissage et celle en test. Utiliser pour cela de préférence `tensorboard`. Utilisez pour l’instant seulement la fonction `add_scalar` après avoir créé un fichier de log grâce à la commande `SummaryWriter(path)`.

In [4]:
writer = SummaryWriter("runs/", comment="batch")
_ = BGD_linear_regression(datax, datay, eps=1e-6, max_iter=40)

100%|██████████| 40/40 [00:00<00:00, 564.52it/s]

Epoch 0: loss 7240.02490234375
Epoch 1: loss 1184.814208984375
Epoch 2: loss 332.5049133300781
Epoch 3: loss 208.3081512451172
Epoch 4: loss 186.38661193847656
Epoch 5: loss 179.1766815185547
Epoch 6: loss 174.395263671875
Epoch 7: loss 170.29058837890625
Epoch 8: loss 166.5868377685547
Epoch 9: loss 163.21731567382812
Epoch 10: loss 160.14727783203125
Epoch 11: loss 157.34866333007812
Epoch 12: loss 154.7965545654297
Epoch 13: loss 152.46839904785156
Epoch 14: loss 150.3436279296875
Epoch 15: loss 148.4036865234375
Epoch 16: loss 146.63162231445312
Epoch 17: loss 145.0120849609375
Epoch 18: loss 143.53109741210938
Epoch 19: loss 142.1759796142578
Epoch 20: loss 140.93524169921875
Epoch 21: loss 139.7984161376953
Epoch 22: loss 138.7560272216797
Epoch 23: loss 137.79940795898438
Epoch 24: loss 136.9207000732422
Epoch 25: loss 136.1128387451172
Epoch 26: loss 135.36932373046875
Epoch 27: loss 134.6842803955078
Epoch 28: loss 134.0523681640625
Epoch 29: loss 133.46878051757812
Epoch 30: 




Implémenter une descente de gradient stochastique et une mini-batch. Comparer la vitesse de convergence et les résultats obtenus.

In [5]:
def SGD_linear_regression(X, y, eps=1e-4, max_iter=10):
    linear = nn.Linear(X.size(1), 1)
    mse = nn.MSELoss()

    for i in tqdm(range(max_iter)):
        total_loss = 0.0

        for j in range(X.size(0)):
            index = torch.randint(0, X.size(0), (1,))
            sample_X = X[index]
            sample_y = y[index]

            pred = linear(sample_X)
            loss = mse(pred, sample_y)

            total_loss += loss.item()

            loss.backward()

            with torch.no_grad():
                linear.weight -= eps * linear.weight.grad
                linear.bias -= eps * linear.bias.grad

            linear.weight.grad.zero_()
            linear.bias.grad.zero_()

        avg_loss = total_loss / X.size(0)

        writer.add_scalar('Loss/train', avg_loss, i)
        print(f"Epoch {i}: average loss {avg_loss}")

    return linear


In [6]:
writer = SummaryWriter("runs/", comment="stochastic")
_ = SGD_linear_regression(datax, datay, eps=1e-6, max_iter=40)

  0%|          | 0/40 [00:00<?, ?it/s]

Epoch 0: average loss 152.14925797648863
Epoch 1: average loss 89.62942103107079


 10%|█         | 4/40 [00:00<00:01, 34.62it/s]

Epoch 2: average loss 94.21605929816229
Epoch 3: average loss 99.08319909891031
Epoch 4: average loss 100.00253381442921
Epoch 5: average loss 96.16227481286043
Epoch 6: average loss 100.89375995898726


 20%|██        | 8/40 [00:00<00:00, 34.45it/s]

Epoch 7: average loss 121.02738563370087
Epoch 8: average loss 89.77554371136308


 30%|███       | 12/40 [00:00<00:00, 33.72it/s]

Epoch 9: average loss 107.62542786611071
Epoch 10: average loss 83.0144473517292
Epoch 11: average loss 89.39328234382087
Epoch 12: average loss 86.42862208271337
Epoch 13: average loss 95.69864511758308


 40%|████      | 16/40 [00:00<00:00, 34.17it/s]

Epoch 14: average loss 92.04644870265408
Epoch 15: average loss 87.5682984004897


 50%|█████     | 20/40 [00:00<00:00, 34.65it/s]

Epoch 16: average loss 97.46834600978579
Epoch 17: average loss 86.31503307857356
Epoch 18: average loss 86.8840832363781
Epoch 19: average loss 78.28739411942787
Epoch 20: average loss 97.98470614814033
Epoch 21: average loss 87.1894566345027


 60%|██████    | 24/40 [00:00<00:00, 34.89it/s]

Epoch 22: average loss 89.10090888954429
Epoch 23: average loss 95.55881452540679


 70%|███████   | 28/40 [00:00<00:00, 34.99it/s]

Epoch 24: average loss 75.01043305948248
Epoch 25: average loss 82.33257305602582
Epoch 26: average loss 87.7139450124679
Epoch 27: average loss 91.71854599604552
Epoch 28: average loss 86.42235369863268
Epoch 29: average loss 85.26041293510912


 80%|████████  | 32/40 [00:00<00:00, 34.81it/s]

Epoch 30: average loss 75.4639505542369
Epoch 31: average loss 88.61876963019158


 90%|█████████ | 36/40 [00:01<00:00, 34.92it/s]

Epoch 32: average loss 86.42623379142525
Epoch 33: average loss 71.73236917125149
Epoch 34: average loss 84.91854294615968
Epoch 35: average loss 84.88251330639376
Epoch 36: average loss 68.33994150601899
Epoch 37: average loss 70.88424099094419


100%|██████████| 40/40 [00:01<00:00, 34.78it/s]

Epoch 38: average loss 83.02520347198494
Epoch 39: average loss 80.9326351682661





In [7]:
def MiniBatch_linear_regression(X, y, eps=1e-4, batch_size=32, max_iter=10):
    linear = nn.Linear(X.size(1), 1)
    mse = nn.MSELoss()

    for i in tqdm(range(max_iter)):
        total_loss = 0.0

        for j in range(0, X.size(0), batch_size):
            batch_X = X[j:j+batch_size]
            batch_y = y[j:j+batch_size]

            pred = linear(batch_X)
            loss = mse(pred, batch_y)

            total_loss += loss.item()

            loss.backward()

            with torch.no_grad():
                linear.weight -= eps * linear.weight.grad
                linear.bias -= eps * linear.bias.grad

            linear.weight.grad.zero_()
            linear.bias.grad.zero_()

        avg_loss = total_loss / (X.size(0) / batch_size)

        writer.add_scalar('Loss/train', avg_loss, i)
        print(f"Epoch {i}: average loss {avg_loss}")

    return linear


In [8]:
writer = SummaryWriter("runs/", comment="mini-batch")
_ = MiniBatch_linear_regression(datax, datay, eps=1e-6, batch_size=32, max_iter=40)

100%|██████████| 40/40 [00:00<00:00, 1070.63it/s]

Epoch 0: average loss 760.9731768581707
Epoch 1: average loss 229.24980965716094
Epoch 2: average loss 160.1992240574049
Epoch 3: average loss 130.64296701303113
Epoch 4: average loss 118.20572152722023
Epoch 5: average loss 113.04170582888155
Epoch 6: average loss 110.86841170118731
Epoch 7: average loss 109.8572376839257
Epoch 8: average loss 109.25089671018095
Epoch 9: average loss 108.75226891559103
Epoch 10: average loss 108.25663389424561
Epoch 11: average loss 107.73615334250711
Epoch 12: average loss 107.19094016335227
Epoch 13: average loss 106.62941045346467
Epoch 14: average loss 106.06074131901556
Epoch 15: average loss 105.49255576152575
Epoch 16: average loss 104.93043536159831
Epoch 17: average loss 104.378173828125
Epoch 18: average loss 103.83824887483016
Epoch 19: average loss 103.312098325948
Epoch 20: average loss 102.80051394979002
Epoch 21: average loss 102.30378150186048
Epoch 22: average loss 101.82188783427002
Epoch 23: average loss 101.354594475667
Epoch 24: a




### Question 2

Utiliser les modules `torch.nn.Linear`, `torch.nn.Tanh` et `torch.nn.MSELoss` pour implémenter un réseau à deux couches : `lineaire → tanh → lineaire → MSE`. Implémenter la boucle de descente de gradient avec l’optimiseur.

In [9]:
def Optim_GD_linear_regression(X, y, eps=1e-4, max_iter=10, hidden_dim=10):
    linear_1 = nn.Linear(X.size(1), hidden_dim)
    linear_2 = nn.Linear(hidden_dim, 1)
    tanh = nn.Tanh()
    mse = nn.MSELoss()

    optimizer = torch.optim.SGD(params=[*linear_1.parameters(), *linear_2.parameters()], lr=eps)

    for i in tqdm(range(max_iter)):
        output = linear_1(X)
        output = tanh(output)
        pred = linear_2(output)
        loss = mse(pred, y)

        writer.add_scalar('Loss/train', loss, i)
        print(f"Epoch {i}: loss {loss}")

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    return [linear_1, linear_2]

In [10]:
writer = SummaryWriter("runs/", comment="optim")
_ = Optim_GD_linear_regression(datax, datay, eps=1e-2, max_iter=40, hidden_dim=10)

100%|██████████| 40/40 [00:00<00:00, 4253.00it/s]


Epoch 0: loss 621.6826782226562
Epoch 1: loss 452.42254638671875
Epoch 2: loss 309.60089111328125
Epoch 3: loss 221.37013244628906
Epoch 4: loss 167.0432891845703
Epoch 5: loss 134.68777465820312
Epoch 6: loss 115.00262451171875
Epoch 7: loss 103.02374267578125
Epoch 8: loss 93.6973648071289
Epoch 9: loss 132.7603759765625
Epoch 10: loss 113.83011627197266
Epoch 11: loss 102.31292724609375
Epoch 12: loss 95.3058853149414
Epoch 13: loss 91.04280090332031
Epoch 14: loss 88.44915771484375
Epoch 15: loss 86.87117004394531
Epoch 16: loss 85.9111099243164
Epoch 17: loss 85.32701110839844
Epoch 18: loss 84.9716567993164
Epoch 19: loss 84.75545501708984
Epoch 20: loss 84.62391662597656
Epoch 21: loss 84.54388427734375
Epoch 22: loss 84.49520874023438
Epoch 23: loss 84.465576171875
Epoch 24: loss 84.44755554199219
Epoch 25: loss 84.43659973144531
Epoch 26: loss 84.42992401123047
Epoch 27: loss 84.42586517333984
Epoch 28: loss 84.42340087890625
Epoch 29: loss 84.4218978881836
Epoch 30: loss 84.4

Utiliser maintenant un conteneur - par exemple le module `torch.nn.Sequential` - pour implémenter le même réseau. Parcourer la doc pour comprendre la différence entre les différents types de conteneurs. Que se passe-t-il pour les paramètres des modules mis ainsi ensemble ?

In [11]:
def Sequential_Optim_GD_linear_regression(X, y, eps=1e-4, max_iter=10, hidden_dim=10):
    model = nn.Sequential(
        nn.Linear(X.size(1), hidden_dim),
        nn.Tanh(),
        nn.Linear(hidden_dim, 1)
    )
    mse = nn.MSELoss()

    optimizer = torch.optim.SGD(params=model.parameters(), lr=eps)

    for i in tqdm(range(max_iter)):
        pred = model(X)
        loss = mse(pred, y)

        writer.add_scalar('Loss/train', loss, i)
        print(f"Epoch {i}: loss {loss}")

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    return model

In [12]:
writer = SummaryWriter("runs/", comment="sequential")
_ = Optim_GD_linear_regression(datax, datay, eps=1e-2, max_iter=40, hidden_dim=10)

100%|██████████| 40/40 [00:00<00:00, 3497.15it/s]

Epoch 0: loss 561.9471435546875
Epoch 1: loss 404.00128173828125
Epoch 2: loss 280.6111145019531
Epoch 3: loss 203.78248596191406
Epoch 4: loss 157.04000854492188
Epoch 5: loss 128.601806640625
Epoch 6: loss 111.30004119873047
Epoch 7: loss 100.77365112304688
Epoch 8: loss 94.36937713623047
Epoch 9: loss 90.47303771972656
Epoch 10: loss 88.10249328613281
Epoch 11: loss 86.66024780273438
Epoch 12: loss 85.78279113769531
Epoch 13: loss 85.24895477294922
Epoch 14: loss 84.92416381835938
Epoch 15: loss 84.72655487060547
Epoch 16: loss 84.60633087158203
Epoch 17: loss 84.53319549560547
Epoch 18: loss 84.48869323730469
Epoch 19: loss 84.46162414550781
Epoch 20: loss 84.44514465332031
Epoch 21: loss 84.43513488769531
Epoch 22: loss 84.42902374267578
Epoch 23: loss 84.4253158569336
Epoch 24: loss 84.4230728149414
Epoch 25: loss 84.42169189453125
Epoch 26: loss 84.42085266113281
Epoch 27: loss 84.42034912109375
Epoch 28: loss 84.42003631591797
Epoch 29: loss 84.41985321044922
Epoch 30: loss 84.


