# ***TP3 - Clovis Lechien***

1. [Utils](#Utils)
2. [SGD](#SGD)
3. [RMSProp](#RMSProp)
4. [Adagrad](#Adagrad)
5. [Adam](#Adam)
6. [AdamW](#AdamW)
7. [Evaluation des Optimiseurs](#evaluation-des-optimiseurs) FIXME
8. [Réseau de Neurones](#réseau-de-neurones) FIXME
9. [Scheduler de Taux d'Apprentissage](#schedulers) FIXME

In [194]:
import numpy as np
import torch
import torch.nn as nn
from torch.optim import Optimizer

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


In [195]:
# Génération du jeu de données linéaire
np.random.seed(0)
n_samples = 100
x_linear = np.linspace(-10, 10, n_samples)
y_linear = 3 * x_linear + 5 + np.random.normal(0, 2, n_samples)

 # Génération du jeu de données non linéaire
y_nonlinear = 0.5 * x_linear **2 - 4 * x_linear + np.random.normal(0 ,5 ,n_samples)

# ***Utils<a name="Utils"></a>***

In [196]:
def optimizer_testing_loop(parameters : dict[str,]):
    model = parameters['model']

    criterion = parameters['criterion']
    optimizer = parameters['optimizer']

    x_tensor = parameters['x_tensor']
    y_tensor = parameters['y_tensor']

    epochs = parameters['epochs']
    for epoch in range(epochs):
        optimizer.zero_grad()
        predictions = model(x_tensor)
        loss = criterion(predictions, y_tensor)
        loss.backward()
        optimizer.step()

        if (epoch + 1) % 10 == 0:
            print(f"Epoch {epoch + 1}, Loss: {loss.item()}")

    for name, param in model.named_parameters():
        print(f"{name}: {param.data}")

# ***SGD<a name="SGD"></a>***

## **Implementation de SGD**

In [197]:
class SGD(Optimizer):
    def __init__(self, params, learning_rate=0.01):
        hyperparams = {'lr': learning_rate}
        super().__init__(params=params, defaults=hyperparams)

    @torch.no_grad()
    def step(self):
        for group in self.param_groups:
            lr = group['lr']

            for theta_t in group['params']:
                if theta_t.grad is None:
                    continue

                theta_t -= lr * theta_t.grad

## **Test de SGD**

In [198]:
model = nn.Linear(1, 1)
linear_parameters = {
    'model': model,
    'criterion': nn.MSELoss(),
    'optimizer': SGD(model.parameters(), learning_rate=0.01),
    'x_tensor': torch.from_numpy(x_linear).float().view(-1, 1),
    'y_tensor': torch.from_numpy(y_linear).float().view(-1, 1),
    'epochs': 100
}

optimizer_testing_loop(linear_parameters)

Epoch 10, Loss: 16.806262969970703
Epoch 20, Loss: 12.560613632202148
Epoch 30, Loss: 9.726184844970703
Epoch 40, Loss: 7.833895206451416
Epoch 50, Loss: 6.570591449737549
Epoch 60, Loss: 5.727196216583252
Epoch 70, Loss: 5.164138317108154
Epoch 80, Loss: 4.788238525390625
Epoch 90, Loss: 4.537283420562744
Epoch 100, Loss: 4.369743824005127
weight: tensor([[2.9703]])
bias: tensor([4.5511])


In [199]:
model = nn.Linear(1, 1)
linear_parameters = {
    'model': model,
    'criterion': nn.MSELoss(),
    'optimizer': SGD(model.parameters(), learning_rate=0.01),
    'x_tensor': torch.from_numpy(x_linear).float().view(-1, 1),
    'y_tensor': torch.from_numpy(y_nonlinear).float().view(-1, 1),
    'epochs': 100
}

optimizer_testing_loop(linear_parameters)

Epoch 10, Loss: 484.959228515625
Epoch 20, Loss: 415.849609375
Epoch 30, Loss: 369.71160888671875
Epoch 40, Loss: 338.9094543457031
Epoch 50, Loss: 318.345703125
Epoch 60, Loss: 304.6171875
Epoch 70, Loss: 295.45196533203125
Epoch 80, Loss: 289.3331298828125
Epoch 90, Loss: 285.2481689453125
Epoch 100, Loss: 282.5210266113281
weight: tensor([[-4.1445]])
bias: tensor([15.1198])


# ***RMSProp<a name="RMSProp"></a>***

## **Implementation de RMSProp**

In [200]:
class RMSProp(Optimizer):
    def __init__(self, params, learning_rate=0.01, decay=0.9):
        hyperparams = {'lr': learning_rate, 'decay': decay}
        super().__init__(params=params, defaults=hyperparams)

    @torch.no_grad()
    def step(self):
        for group in self.param_groups:
            decay = group['decay']
            lr = group['lr']

            for theta_t in group['params']:
                if theta_t.grad is None:
                    continue

                state = self.state[theta_t]
                if 'square_avg' not in state:
                    state['square_avg'] = torch.zeros_like(theta_t)

                square_avg = state['square_avg']
                square_avg = decay * square_avg + (1 - decay) * (theta_t.grad ** 2)
                state['square_avg'] = square_avg

                theta_t -= lr * theta_t.grad / square_avg.sqrt()

## **Test de RMSProp**

In [201]:
model = nn.Linear(1, 1)
linear_parameters = {
    'model': model,
    'criterion': nn.MSELoss(),
    'optimizer': RMSProp(model.parameters(), learning_rate=0.01, decay=0.8),
    'x_tensor': torch.from_numpy(x_linear).float().view(-1, 1),
    'y_tensor': torch.from_numpy(y_linear).float().view(-1, 1),
    'epochs': 100
}

optimizer_testing_loop(linear_parameters)

Epoch 10, Loss: 310.5052795410156
Epoch 20, Loss: 289.93426513671875
Epoch 30, Loss: 270.5417175292969
Epoch 40, Loss: 251.8887481689453
Epoch 50, Loss: 233.93060302734375
Epoch 60, Loss: 216.66220092773438
Epoch 70, Loss: 200.08251953125
Epoch 80, Loss: 184.19105529785156
Epoch 90, Loss: 168.98728942871094
Epoch 100, Loss: 154.4706268310547
weight: tensor([[0.9513]])
bias: tensor([1.8938])


In [202]:
model = nn.Linear(1, 1)
linear_parameters = {
    'model': model,
    'criterion': nn.MSELoss(),
    'optimizer': RMSProp(model.parameters(), learning_rate=0.01, decay=0.8),
    'x_tensor': torch.from_numpy(x_linear).float().view(-1, 1),
    'y_tensor': torch.from_numpy(y_nonlinear).float().view(-1, 1),
    'epochs': 100
}

optimizer_testing_loop(linear_parameters)

Epoch 10, Loss: 1295.18603515625
Epoch 20, Loss: 1259.952392578125
Epoch 30, Loss: 1226.224853515625
Epoch 40, Loss: 1193.274169921875
Epoch 50, Loss: 1161.0260009765625
Epoch 60, Loss: 1129.4725341796875
Epoch 70, Loss: 1098.6126708984375
Epoch 80, Loss: 1068.4461669921875
Epoch 90, Loss: 1038.9730224609375
Epoch 100, Loss: 1010.192626953125
weight: tensor([[-0.4023]])
bias: tensor([1.4741])


# ***Adagrad<a name="Adagrad"></a>***

## **Implementation de Adagrad**

In [203]:
class Adagrad(Optimizer):
    def __init__(self, params, learning_rate=0.01):
        hyperparams = {'lr': learning_rate}
        super().__init__(params=params, defaults=hyperparams)

    @torch.no_grad()
    def step(self):
        for group in self.param_groups:
            lr = group['lr']

            for theta_t in group['params']:
                if theta_t.grad is None:
                    continue

                state = self.state[theta_t]
                if 'sum_squared_grads' not in state:
                    state['sum_squared_grads'] = torch.zeros_like(theta_t)

                sum_squared_grads = state['sum_squared_grads']
                sum_squared_grads += theta_t.grad ** 2
                state['sum_squared_grads'] = sum_squared_grads

                adjusted_lr = lr / sum_squared_grads.sqrt()

                theta_t -= adjusted_lr * theta_t.grad

## **Test de Adagrad**

In [204]:
model = nn.Linear(1, 1)
linear_parameters = {
    'model': model,
    'criterion': nn.MSELoss(),
    'optimizer': Adagrad(model.parameters(), learning_rate=0.5),
    'x_tensor': torch.from_numpy(x_linear).float().view(-1, 1),
    'y_tensor': torch.from_numpy(y_linear).float().view(-1, 1),
    'epochs': 100
}

optimizer_testing_loop(linear_parameters)

Epoch 10, Loss: 54.12459945678711
Epoch 20, Loss: 14.844115257263184
Epoch 30, Loss: 6.648614406585693
Epoch 40, Loss: 4.724640846252441
Epoch 50, Loss: 4.235114574432373
Epoch 60, Loss: 4.098513603210449
Epoch 70, Loss: 4.056282043457031
Epoch 80, Loss: 4.0419111251831055
Epoch 90, Loss: 4.036640644073486
Epoch 100, Loss: 4.034607410430908
weight: tensor([[2.9692]])
bias: tensor([5.0849])


In [205]:
model = nn.Linear(1, 1)
linear_parameters = {
    'model': model,
    'criterion': nn.MSELoss(),
    'optimizer': Adagrad(model.parameters(), learning_rate=0.5),
    'x_tensor': torch.from_numpy(x_linear).float().view(-1, 1),
    'y_tensor': torch.from_numpy(y_nonlinear).float().view(-1, 1),
    'epochs': 100
}

optimizer_testing_loop(linear_parameters)

Epoch 10, Loss: 693.0726318359375
Epoch 20, Loss: 540.7066650390625
Epoch 30, Loss: 472.582275390625
Epoch 40, Loss: 435.4715881347656
Epoch 50, Loss: 412.2949523925781
Epoch 60, Loss: 396.10198974609375
Epoch 70, Loss: 383.7505187988281
Epoch 80, Loss: 373.7216796875
Epoch 90, Loss: 365.2349548339844
Epoch 100, Loss: 357.8613586425781
weight: tensor([[-4.0844]])
bias: tensor([8.4687])


# ***Adam<a name="Adam"></a>***

## **Implementation de Adam**

In [206]:
class Adam(Optimizer):
    def __init__(self, params, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
        hyperparams = {'lr': learning_rate, 'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon}
        super().__init__(params=params, defaults=hyperparams)

    @torch.no_grad()
    def step(self):
        for group in self.param_groups:
            lr = group['lr']
            beta1 = group['beta1']
            beta2 = group['beta2']
            epsilon = group['epsilon']

            for theta_t in group['params']:
                if theta_t.grad is None:
                    continue

                state = self.state[theta_t]
                if 'm' not in state: # Moment d'ordre 1
                    state['m'] = torch.zeros_like(theta_t)
                if 'v' not in state: # Moment d'ordre 2
                    state['v'] = torch.zeros_like(theta_t)
                if 't' not in state: # Temps
                    state['t'] = 0

                # Premier Moment
                m = state['m']
                m_t = beta1 * m + (1 - beta1) * theta_t.grad
                state['m'] = m_t

                # Second Moment
                v = state['v']
                v_t = beta2 * v + (1 - beta2) * theta_t.grad ** 2
                state['v'] = v_t

                # Temps
                t = state['t'] + 1
                state['t'] = t

                # Correction des biais
                m_hat = m_t / (1 - beta1 ** t)
                v_hat = v_t / (1 - beta2 ** t)

                theta_t -= lr * m_hat / (v_hat.sqrt() + epsilon)

## **Test de Adam**

In [207]:
model = nn.Linear(1, 1)
linear_parameters = {
    'model': model,
    'criterion': nn.MSELoss(),
    'optimizer': Adam(model.parameters(), learning_rate=0.1, beta1=0.9, beta2=0.999, epsilon=1e-8),
    'x_tensor': torch.from_numpy(x_linear).float().view(-1, 1),
    'y_tensor': torch.from_numpy(y_linear).float().view(-1, 1),
    'epochs': 100
}

optimizer_testing_loop(linear_parameters)

Epoch 10, Loss: 272.73614501953125
Epoch 20, Loss: 122.67634582519531
Epoch 30, Loss: 42.47823715209961
Epoch 40, Loss: 11.343353271484375
Epoch 50, Loss: 4.416177272796631
Epoch 60, Loss: 4.3578972816467285
Epoch 70, Loss: 4.522104740142822
Epoch 80, Loss: 4.262188911437988
Epoch 90, Loss: 4.072848320007324
Epoch 100, Loss: 4.0351057052612305
weight: tensor([[2.9678]])
bias: tensor([5.1600])


In [208]:
model = nn.Linear(1, 1)
linear_parameters = {
    'model': model,
    'criterion': nn.MSELoss(),
    'optimizer': Adam(model.parameters(), learning_rate=0.1, beta1=0.9, beta2=0.999, epsilon=1e-8),
    'x_tensor': torch.from_numpy(x_linear).float().view(-1, 1),
    'y_tensor': torch.from_numpy(y_nonlinear).float().view(-1, 1),
    'epochs': 100
}

optimizer_testing_loop(linear_parameters)

Epoch 10, Loss: 1007.5213012695312
Epoch 20, Loss: 769.0822143554688
Epoch 30, Loss: 606.478271484375
Epoch 40, Loss: 506.4487609863281
Epoch 50, Loss: 449.4682312011719
Epoch 60, Loss: 416.75750732421875
Epoch 70, Loss: 395.20257568359375
Epoch 80, Loss: 378.142822265625
Epoch 90, Loss: 363.25933837890625
Epoch 100, Loss: 350.08673095703125
weight: tensor([[-4.1920]])
bias: tensor([8.9434])


# ***AdamW<a name="AdamW"></a>***

In [209]:
class AdamW(Optimizer):
    def __init__(self, params, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8, weight_decay=0.01):
        hyperparams = {'lr': learning_rate, 'beta1': beta1, 'beta2': beta2, 'epsilon': epsilon, 'weight_decay': weight_decay}
        super().__init__(params=params, defaults=hyperparams)

    @torch.no_grad()
    def step(self):
        for group in self.param_groups:
            lr = group['lr']
            beta1 = group['beta1']
            beta2 = group['beta2']
            epsilon = group['epsilon']
            weight_decay = group['weight_decay']

            for theta_t in group['params']:
                if theta_t.grad is None:
                    continue

                state = self.state[theta_t]
                if 'm' not in state: # Moment d'ordre 1
                    state['m'] = torch.zeros_like(theta_t)
                if 'v' not in state: # Moment d'ordre 2
                    state['v'] = torch.zeros_like(theta_t)
                if 't' not in state: # Temps
                    state['t'] = 0

                # Premier Moment
                m = state['m']
                m_t = beta1 * m + (1 - beta1) * theta_t.grad
                state['m'] = m_t

                # Second Moment
                v = state['v']
                v_t = beta2 * v + (1 - beta2) * theta_t.grad ** 2
                state['v'] = v_t

                # Temps
                t = state['t'] + 1
                state['t'] = t

                # Correction des biais
                m_hat = m_t / (1 - beta1 ** t)
                v_hat = v_t / (1 - beta2 ** t)

                theta_t -= lr * m_hat / (v_hat.sqrt() + epsilon) - lr * weight_decay * theta_t

## ***Test de AdamW***

In [210]:
model = nn.Linear(1, 1)
linear_parameters = {
    'model': model,
    'criterion': nn.MSELoss(),
    'optimizer': AdamW(model.parameters(), learning_rate=0.1, beta1=0.9, beta2=0.999, epsilon=1e-8, weight_decay=0.01),
    'x_tensor': torch.from_numpy(x_linear).float().view(-1, 1),
    'y_tensor': torch.from_numpy(y_linear).float().view(-1, 1),
    'epochs': 100
}

optimizer_testing_loop(linear_parameters)

Epoch 10, Loss: 258.0908203125
Epoch 20, Loss: 112.39228820800781
Epoch 30, Loss: 36.251949310302734
Epoch 40, Loss: 8.770923614501953
Epoch 50, Loss: 4.215054512023926
Epoch 60, Loss: 4.99793815612793
Epoch 70, Loss: 5.110691070556641
Epoch 80, Loss: 4.594119071960449
Epoch 90, Loss: 4.2212371826171875
Epoch 100, Loss: 4.091061592102051
weight: tensor([[3.0012]])
bias: tensor([5.2599])


In [211]:
model = nn.Linear(1, 1)
linear_parameters = {
    'model': model,
    'criterion': nn.MSELoss(),
    'optimizer': AdamW(model.parameters(), learning_rate=0.1, beta1=0.9, beta2=0.999, epsilon=1e-8, weight_decay=0.01),
    'x_tensor': torch.from_numpy(x_linear).float().view(-1, 1),
    'y_tensor': torch.from_numpy(y_linear).float().view(-1, 1),
    'epochs': 100
}

optimizer_testing_loop(linear_parameters)

Epoch 10, Loss: 169.35293579101562
Epoch 20, Loss: 61.04060363769531
Epoch 30, Loss: 16.49265480041504
Epoch 40, Loss: 7.252600193023682
Epoch 50, Loss: 6.957948684692383
Epoch 60, Loss: 5.84036922454834
Epoch 70, Loss: 4.5421953201293945
Epoch 80, Loss: 4.0788893699646
Epoch 90, Loss: 4.040548801422119
Epoch 100, Loss: 4.054375648498535
weight: tensor([[2.9785]])
bias: tensor([5.2621])


# ***Evaluation des Optimiseurs<a name="evaluation-des-optimiseurs"></a>***

In [212]:
def f(x : torch.Tensor):
    return (x - 2) ** 2


def f_nonconvexe(x : torch.Tensor):
    return 3*x ** 2 - 2*x

In [213]:
def eval_optim(x : torch.Tensor, convexe : bool = True, scheduler : bool = False):
    if convexe:
        print(f"Optimisation de la fonction convexe f(x) = (x - 2)²")
        y = f(x)
    else:
        print(f"Optimisation de la fonction non convexe f(x) = 3x² - 2x")
        y = f_nonconvexe(x)

    y.backward()
    print(f"Gradient de f en x={x.item()}: x.grad={x.grad.item()}")

    optimizers = [
        SGD,
        RMSProp,
        Adagrad,
        Adam,
        AdamW
    ]

    for optimizer in optimizers:
        x = torch.tensor([100.], requires_grad=True)
        optimizer = optimizer([x])
        if scheduler:
            scheduler = LRSchedulerOnPlateau(optimizer, initial_lr=0.01, patience=5, factor=0.5, min_lr=1e-6, mode='min', threshold=1e-4)
        for i in range(100):
            optimizer.zero_grad()
            if convexe:
                y = f(x)
            else:
                y = f_nonconvexe(x)
            y.backward()
            optimizer.step()
            if scheduler:
                scheduler.step(y)
        print(f"Optimiseur {optimizer.__class__.__name__}: x={x.item()}, f(x)={f(x).item()}")

In [214]:
x = torch.tensor([69.], requires_grad=True)
eval_optim(x, convexe=True)
print()
eval_optim(x, convexe=False)

Optimisation de la fonction convexe f(x) = (x - 2)²
Gradient de f en x=69.0: x.grad=134.0
Optimiseur SGD: x=14.996723175048828, f(x)=168.91481018066406
Optimiseur RMSProp: x=98.90937805175781, f(x)=9391.427734375
Optimiseur Adagrad: x=99.81420135498047, f(x)=9567.6181640625
Optimiseur Adam: x=99.90005493164062, f(x)=9584.4208984375
Optimiseur AdamW: x=100.0, f(x)=9604.0

Optimisation de la fonction non convexe f(x) = 3x² - 2x
Gradient de f en x=69.0: x.grad=546.0
Optimiseur SGD: x=0.5381360054016113, f(x)=2.1370463371276855
Optimiseur RMSProp: x=98.90937805175781, f(x)=9391.427734375
Optimiseur Adagrad: x=99.81420135498047, f(x)=9567.6181640625
Optimiseur Adam: x=99.90005493164062, f(x)=9584.4208984375
Optimiseur AdamW: x=100.0, f(x)=9604.0


# ***Réseau de Neurones<a name="réseau-de-neurones"></a>***

In [215]:
def func_nn(x, W1, b1, W2, b2):
    h1 = W1 * x + b1
    y = W2 * h1 + b2
    return y


def mse(y, y_hat):
    return (y - y_hat) ** 2

In [216]:
def eval_nn_optim(scheduler : bool = False):

    optimizers = [
        SGD,
        RMSProp,
        Adagrad,
        Adam,
        AdamW
    ]

    for optimizer in optimizers:
        W1 = torch.tensor([1.], requires_grad=True)
        b1 = torch.tensor([1.], requires_grad=True)
        W2 = torch.tensor([1.], requires_grad=True)
        b2 = torch.tensor([1.], requires_grad=True)

        x = torch.tensor([1.], requires_grad=True)
        y = torch.tensor([10.])

        optimizer = optimizer([W1, b1, W2, b2])

        if scheduler:
            scheduler = LRSchedulerOnPlateau(optimizer, initial_lr=0.01, patience=5, factor=0.5, min_lr=1e-6, mode='min', threshold=1e-4)

        for i in range(100):
            optimizer.zero_grad()

            y_hat = func_nn(x, W1, b1, W2, b2)
            loss = mse(y, y_hat)

            loss.backward()
            optimizer.step()

            if scheduler:
                scheduler.step(loss)

        print(f"Optimiseur {optimizer.__class__.__name__}:\nW1={W1.item()}, b1={b1.item()}, W2={W2.item()}, b2={b2.item()}")
        print(f"Optimisation du réseau de neurones:\nW1={W1.item()}, b1={b1.item()}, W2={W2.item()}, b2={b2.item()}\n")

In [217]:
eval_nn_optim()

Optimiseur SGD:
W1=1.7965515851974487, b1=1.7965515851974487, W2=2.356534481048584, b2=1.532727837562561
Optimisation du réseau de neurones:
W1=1.7965515851974487, b1=1.7965515851974487, W2=2.356534481048584, b2=1.532727837562561

Optimiseur RMSProp:
W1=1.956400752067566, b1=1.956400752067566, W2=1.956400752067566, b2=1.899566411972046
Optimisation du réseau de neurones:
W1=1.956400752067566, b1=1.956400752067566, W2=1.956400752067566, b2=1.899566411972046

Optimiseur Adagrad:
W1=1.186563491821289, b1=1.186563491821289, W2=1.186563491821289, b2=1.1806892156600952
Optimisation du réseau de neurones:
W1=1.186563491821289, b1=1.186563491821289, W2=1.186563491821289, b2=1.1806892156600952

Optimiseur Adam:
W1=1.1003371477127075, b1=1.1003371477127075, W2=1.1003371477127075, b2=1.0987093448638916
Optimisation du réseau de neurones:
W1=1.1003371477127075, b1=1.1003371477127075, W2=1.1003371477127075, b2=1.0987093448638916

Optimiseur AdamW:
W1=1.1013890504837036, b1=1.1013890504837036, W2=1.

# ***Scheduler de Taux d'apprentissage<a name="schedulers"></a>***

## **Implementation de LRScheduler**

In [218]:
class LRScheduler:
    def __init__(self, optimizer, initial_lr):
        self.optimizer = optimizer
        self.initial_lr = initial_lr

    def get_lr(self):
        return self.optimizer.param_groups[0]['lr']

    def set_lr(self, lr):
        for group in self.optimizer.param_groups:
            group['lr'] = lr

## **Implementation de LRSchedulerOnPlateau**

In [219]:
class LRSchedulerOnPlateau(LRScheduler):
    def __init__(self, optimizer, initial_lr, patience=10, factor=0.1, min_lr=1e-6, mode='min', threshold=1e-4):
        super().__init__(optimizer, initial_lr)
        self.patience = patience
        self.factor = factor
        self.min_lr = min_lr
        self.mode = mode
        self.threshold = threshold

        self.best_value = None
        self.num_bad_epochs = 0

    def step(self, current_value):
        if self.best_value is None:
            self.best_value = current_value
            return

        if self.mode == 'min':
            improvement = self.best_value - current_value
        elif self.mode == 'max':
            improvement = current_value - self.best_value
        else:
            raise ValueError("Mode must be either 'min' (minimize) or 'max' (maximize).")

        if improvement > self.threshold:
            self.best_value = current_value
            self.num_bad_epochs = 0
        else:
            self.num_bad_epochs += 1

        if self.num_bad_epochs >= self.patience:
            self.reduce_lr()

    def reduce_lr(self):
        current_lr = self.get_lr()
        new_lr = max(current_lr * self.factor, self.min_lr)
        if new_lr < current_lr:
            print(f"Reducing learning rate: {current_lr:.6f} -> {new_lr:.6f}")
            self.set_lr(new_lr)
        self.num_bad_epochs = 0

## **Test de LRSchedulerOnPlateau**

In [220]:
x = torch.tensor([69.], requires_grad=True)
eval_optim(x, convexe=True, scheduler=True)
print()
eval_optim(x, convexe=False, scheduler=True)

Optimisation de la fonction convexe f(x) = (x - 2)²
Gradient de f en x=69.0: x.grad=134.0
Optimiseur SGD: x=14.996723175048828, f(x)=168.91481018066406
Optimiseur RMSProp: x=98.90937805175781, f(x)=9391.427734375
Optimiseur Adagrad: x=99.81420135498047, f(x)=9567.6181640625
Optimiseur Adam: x=99.90005493164062, f(x)=9584.4208984375
Reducing learning rate: 0.001000 -> 0.000500
Reducing learning rate: 0.000500 -> 0.000250
Reducing learning rate: 0.000250 -> 0.000125
Reducing learning rate: 0.000125 -> 0.000063
Reducing learning rate: 0.000063 -> 0.000031
Reducing learning rate: 0.000031 -> 0.000016
Reducing learning rate: 0.000016 -> 0.000008
Reducing learning rate: 0.000008 -> 0.000004
Reducing learning rate: 0.000004 -> 0.000002
Reducing learning rate: 0.000002 -> 0.000001
Optimiseur AdamW: x=100.0, f(x)=9604.0

Optimisation de la fonction non convexe f(x) = 3x² - 2x
Gradient de f en x=69.0: x.grad=546.0
Optimiseur SGD: x=0.5381360054016113, f(x)=2.1370463371276855
Optimiseur RMSProp: 

In [221]:
eval_nn_optim(scheduler=True)

Reducing learning rate: 0.010000 -> 0.005000
Reducing learning rate: 0.005000 -> 0.002500
Reducing learning rate: 0.002500 -> 0.001250
Reducing learning rate: 0.001250 -> 0.000625
Reducing learning rate: 0.000625 -> 0.000313
Reducing learning rate: 0.000313 -> 0.000156
Reducing learning rate: 0.000156 -> 0.000078
Reducing learning rate: 0.000078 -> 0.000039
Reducing learning rate: 0.000039 -> 0.000020
Reducing learning rate: 0.000020 -> 0.000010
Reducing learning rate: 0.000010 -> 0.000005
Reducing learning rate: 0.000005 -> 0.000002
Reducing learning rate: 0.000002 -> 0.000001
Reducing learning rate: 0.000001 -> 0.000001
Optimiseur SGD:
W1=1.796550989151001, b1=1.796550989151001, W2=2.3565328121185303, b2=1.5327274799346924
Optimisation du réseau de neurones:
W1=1.796550989151001, b1=1.796550989151001, W2=2.3565328121185303, b2=1.5327274799346924

Optimiseur RMSProp:
W1=1.956400752067566, b1=1.956400752067566, W2=1.956400752067566, b2=1.899566411972046
Optimisation du réseau de neuron