Lors de l'optimisation du modèle VGG16 pour votre ensemble de données, voici quelques paramètres qui pourraient être intéressants à optimiser :

Taux d'apprentissage (learning rate) : Le taux d'apprentissage contrôle la taille des pas effectués lors de la mise à jour des poids du modèle. Un taux d'apprentissage trop élevé peut entraîner une convergence lente ou une instabilité, tandis qu'un taux d'apprentissage trop faible peut entraîner une convergence lente ou un risque de rester coincé dans un minimum local. Vous pouvez essayer différentes valeurs de taux d'apprentissage pour trouver celui qui fonctionne le mieux pour votre ensemble de données.

Nombre d'époques (epochs) : Le nombre d'époques correspond au nombre de fois où l'ensemble de données complet est passé par le modèle lors de l'entraînement. Trop peu d'époques peuvent entraîner un sous-apprentissage, tandis que trop d'époques peuvent entraîner un surapprentissage. Vous pouvez effectuer une validation croisée avec différents nombres d'époques pour trouver le bon équilibre.

Taille du batch (batch size) : La taille du batch détermine le nombre d'échantillons utilisés pour mettre à jour les poids du modèle à chaque itération. Une taille de batch trop petite peut entraîner une convergence lente, tandis qu'une taille de batch trop grande peut nécessiter plus de mémoire et ralentir l'entraînement. Vous pouvez essayer différentes tailles de batch pour voir leur impact sur les performances du modèle.

Régularisation (regularization) : La régularisation est utilisée pour prévenir le surapprentissage en ajoutant une pénalité aux poids du modèle. Vous pouvez expérimenter avec différentes techniques de régularisation, telles que la pénalité L1 (Lasso) ou L2 (Ridge), ainsi qu'ajuster les paramètres de régularisation pour trouver le bon équilibre entre ajustement et généralisation.

Architecture du réseau (network architecture) : Bien que le modèle VGG16 ait une architecture prédéfinie, vous pouvez explorer la possibilité d'ajouter ou de modifier certaines couches pour mieux s'adapter à votre ensemble de données. Par exemple, vous pouvez ajouter des couches de régularisation supplémentaires, ajuster le nombre de filtres dans les couches de convolution, ou modifier la taille des couches entièrement connectées.

Ces paramètres peuvent être optimisés à l'aide de techniques d'optimisation telles que la recherche par grille (grid search) ou l'optimisation bayésienne. Vous pouvez également utiliser des bibliothèques telles que scikit-learn ou Optuna pour faciliter ce processus d'optimisation des hyperparamètres.

In [66]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import csv
from datetime import datetime
from os import path
import os
import sklearn
from torch.utils.data import DataLoader, TensorDataset, random_split
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold, ParameterGrid
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import GridSearchCV
from tqdm.auto import tqdm
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import optuna
from optuna import Trial
import torch.utils.data as data_utils
from random import randint
import datetime

<h2>ResNet_9</h2>

In [67]:
def conv_block(in_channels, out_channels, pool=False):
    layers = [
              nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
              nn.BatchNorm2d(out_channels),
              nn.ReLU(inplace=True)
    ]

    if pool:
        layers.append(nn.MaxPool2d(kernel_size=2))
    
    return nn.Sequential(*layers)

class ResNet9(nn.Module):
    def __init__(self,  num_classes=7, in_channels = 1, lr = 0.01,  dropout = 0.5, num_hidden = 4096, model_name = "ResNet9"):
        super(ResNet9, self).__init__()
        self.num_classes = num_classes
        self.in_channels = in_channels
        self.lr = lr
        self.dropout = dropout
        self.num_hidden = num_hidden
        self.model_name = model_name
        self.conv1 = conv_block(in_channels, 16, pool=False) # 16 x 48 x 48
        self.conv2 = conv_block(16, 32, pool=True) # 32 x 24 x 24
        self.res1 = nn.Sequential( #  32 x 24 x 24
            conv_block(32, 32, pool=False), 
            conv_block(32, 32, pool=False)
        )

        self.conv3 = conv_block(32, 64, pool=True) # 64 x 12 x 12
        self.conv4 = conv_block(64, 128, pool=True) # 128 x 6 x 6

        self.res2 = nn.Sequential( # 128 x 6 x 6
             conv_block(128, 128), 
             conv_block(128, 128)
        )
        
        self.classifier = nn.Sequential(
            nn.MaxPool2d(kernel_size=2), # 128 x 3 x 3
            nn.Flatten(),
            nn.Linear(128*3*3, self.num_hidden), #512
            nn.Linear(self.num_hidden, num_classes) # 7
        )
        self.network = nn.Sequential(
            self.conv1,
            self.conv2,
            self.res1,
            self.conv3,
            self.conv4,
            self.res2,
            self.classifier,
        )

    def forward(self, xb):
        out = self.conv1(xb)
        out = self.conv2(out)
        out = self.res1(out) + out
        out = self.conv3(out)
        out = self.conv4(out)
        out = self.res2(out) + out
        out = self.classifier(out)
        return out    

<h2> New Model test

In [68]:
class Model48(nn.Module):
    def __init__(self,  num_classes=7, in_channels = 1, lr = 0.01,  dropout = 0.5, num_hidden = 4096, model_name = "Model48"):
        super(Model48, self).__init__()
        self.num_classes = num_classes
        self.in_channels = in_channels
        self.lr = lr
        self.dropout = dropout
        self.num_hidden = num_hidden
        self.model_name = model_name
        self.network = nn.Sequential(
            nn.Conv2d(self.in_channels, 16, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2), # output: 16 x 24 x 24

            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2), # output: 64 x 12 x 12

            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2), # output: 128 x 6 x 6

            nn.Flatten(), 
            nn.Linear(128*6*6, 1024),
            nn.ReLU(),
            nn.Dropout(self.dropout),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Dropout(self.dropout-0.1),
            nn.Linear(512, self.num_classes))


    def forward(self, x):
        return self.network(x)

<h2> Last Model test

In [69]:
class EmotionRecognitionModel(nn.Module):
    def __init__(self,  num_classes=7, in_channels = 1, lr = 0.01,  dropout = 0.5, num_hidden = 4096, model_name = "EmotionRecognitionModel"):
        super(EmotionRecognitionModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
        self.pool = nn.MaxPool2d(2, 2)
        self.dropout1 = nn.Dropout(dropout)
        self.fc1 = nn.Linear(64 * 10 * 10, 128)
        self.dropout2 = nn.Dropout(dropout)
        self.fc2 = nn.Linear(128, 7)
        self.num_classes = num_classes
        self.in_channels = in_channels
        self.lr = lr
        self.dropout = dropout
        self.num_hidden = num_hidden
        self.model_name = model_name


    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.dropout1(x)
        x = self.pool(F.relu(self.conv2(x)))
        x = self.dropout1(x)
        x = x.reshape(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = self.dropout2(x)
        x = self.fc2(x)
        return x

<h4> Create Folder </h4>

In [70]:
def create_folder():
    date = datetime.datetime.now().strftime("%m-%d")
    
    folder_name = "modele"
    if not os.path.exists(folder_name): os.makedirs(folder_name)

    folder_name = f"modele/hyperparameters_search_{date}"
    if not os.path.exists(folder_name): os.makedirs(folder_name)
    
    folder_name = "graphs"
    if not os.path.exists(folder_name): os.makedirs(folder_name)

    folder_name = f"graphs/hyperparameters_search_{date}"
    if not os.path.exists(folder_name): os.makedirs(folder_name)
    
    return date

<h5> Save model </h5>

In [71]:
def save_model(date, train_loss_history, train_acc_history, val_loss_history, val_acc_history, model, optimizer, epochs) :
    nom_fichier = f"_tr-acc{(train_acc_history[-1]*100):.1f}_val-acc{(val_acc_history[-1]*100):.1f}"
    file_path = f'modele/cross_val_{date}/{nom_fichier}.pth'
    
    torch.save({
        'epochs': epochs,
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'lr': model.lr,
        'batch_size': model.batch_size,
        'dropout': model.dropout,
        'num_hidden': model.num_hidden,
        'num_classes': model.num_classes,
        'in_channels': model.in_channels,
        'train_loss': train_loss_history,
        'train_accuracy': train_acc_history,
        'val_loss': val_loss_history,
        'val_accuracy': val_acc_history,
    }, file_path)

In [72]:
def plot_loss_acc(train_loss_history, train_acc_history, val_loss_history, val_acc_history, model_name, date):

        plt.figure()
        plt.plot(train_loss_history, label='Train Loss')
        plt.plot(val_loss_history, label='Test Loss')
        plt.legend()
        plt.title(f"Graphe de perte pour {model_name}")	
        plt.savefig(f'graphs/hyperparameters_search_{date}/loss_{model_name}_acc_{val_acc_history[-1]}.png')

        plt.figure()
        plt.plot(train_acc_history, label='Train Accuracy')
        plt.plot(val_acc_history, label='Test Accuracy')
        plt.legend()
        plt.title(f"Graphe d'accuracy pour {model_name}")
        plt.savefig(f'graphs/hyperparameters_search_{date}/acc_{model_name}_acc_{val_acc_history[-1]}.png')

<h4>Load Data</h4>

In [73]:
def load_data():
    len_of_task = randint(3, 20)  # take some random length of time
    
    data = pd.read_csv('fer2013.csv')
    pixels = data['pixels'].tolist()
    width, height = 48, 48
    faces = []

    for pixel_sequence in pixels:
        face = [int(pixel) for pixel in pixel_sequence.split(' ')]
        face = np.asarray(face).reshape(width, height)
        faces.append(face.astype('float32'))

    faces = np.asarray(faces)
    faces = np.expand_dims(faces, -1)

    # Normalize the pixels
    faces /= 255.0

    # Emotion labels
    emotions = pd.get_dummies(data['emotion']).values

    # Convert to PyTorch tensors
    X = torch.tensor(faces, dtype=torch.float32)
    y = torch.tensor(emotions, dtype=torch.long)
    return X,y


In [74]:
def split_data (X,y, batch_size = 32) :
    #use train test split to split our data into 80% training 20% testing
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)

    # Créer des objets DataLoader pour les ensembles d'entraînement et de validation
    train_dataset = data_utils.TensorDataset(X_train, y_train)
    train_loader = data_utils.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

    test_dataset = data_utils.TensorDataset(X_val, y_val)
    test_loader = data_utils.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
    
    return train_loader, test_loader, X_train, X_val, y_train, y_val


<h4> Train Data </h4>

In [75]:
def fit(model, train_loader, test_loader, date, optimizer, criterion, epochs=10):

    train_losses = []
    train_accuracies = []
    test_losses = []
    test_accuracies = []

    for epoch in tqdm(range(epochs), desc="Traitement en cours", bar_format="{l_bar}{bar:10}{r_bar}"):
        # Entraînement
        model.train()
        train_loss = 0
        correct_train = 0
        total_train = 0
        
        
        for inputs, targets in train_loader :
            inputs = inputs.permute(0, 3, 1, 2)  # Changez l'ordre des dimensions pour correspondre à l'entrée du modèle
            optimizer.zero_grad()
            outputs = model(inputs[:targets.size(0)])
            loss = criterion(outputs, torch.max(targets, 1)[1])
            loss.backward()
            optimizer.step()
            train_loss += loss.item()
            _, predicted = outputs.max(1)
            total_train += targets.size(0) 
            correct_train += predicted.eq(torch.max(targets, 1)[1]).sum().item()

        # Calcul des métriques de performance pour l'ensemble d'entraînement
        train_loss /= len(train_loader.dataset)
        train_accuracy = correct_train / len(train_loader.dataset)

        # Évaluation
        model.eval()
        test_loss = 0
        correct = 0
        total = 0 
        correct_test = 0
        total_test = 0

        with torch.no_grad():
            for inputs, targets in test_loader:
                inputs = inputs.permute(0, 3, 1, 2)
                outputs = model(inputs[:targets.size(0)])  
                loss = criterion(outputs, torch.max(targets, 1)[1])

                test_loss += loss.item()
                _, predicted = outputs.max(1)
                total_test += targets.size(0) 
                correct_test += predicted.eq(torch.max(targets, 1)[1]).sum().item()

        # Calcul des métriques de performance pour l'ensemble de test
        test_loss /= len(test_loader.dataset)
        test_accuracy = correct_test / len(test_loader.dataset)

        # Affichage des métriques de performance
        print(f'Epoch: {epoch}, Train Loss: {train_loss:.4f}, Train Acc: {train_accuracy:.4f}, Test Loss: {test_loss:.4f}, Test Acc: {test_accuracy:.4f}')

        # Stocker les métriques de performance pour chaque itération
        train_losses.append(train_loss)
        train_accuracies.append(train_accuracy)
        test_losses.append(test_loss)
        test_accuracies.append(test_accuracy)
    
    save_model(date, train_losses, train_accuracies, test_losses, test_accuracies, model, optimizer, epochs)
    plot_loss_acc( train_losses, train_accuracies, test_losses, test_accuracies, model.model_name, date)
        
    return train_losses, train_accuracies, test_losses, test_accuracies

def train_model(train_loader, test_loader, X_train, X_val, y_train, y_val, date, model_name = "Model48", lr=0.001, batch_size=32, dropout = 0.2, num_hidden = 4096,  epochs=10, save = False):
    if model_name == "Model48" :
        model = Model48(num_classes=7, in_channels=1 ,lr=lr, dropout=dropout, num_hidden=num_hidden)
    elif model_name == "EmotionRecognitionModel" :
        model = EmotionRecognitionModel(num_classes=7, in_channels=1 ,lr=lr, dropout=dropout, num_hidden=num_hidden)
    elif model_name == "ResNet" :
        model = ResNet9(num_classes=7, in_channels=1 ,lr=lr, dropout=dropout, num_hidden=num_hidden)
    else :
        raise NameError("Model name not found")
        
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=model.lr)

    if save == True :
        # run function fit
        train_losses, train_accuracies, test_losses, test_accuracies = fit(model, train_loader, test_loader, date, optimizer, criterion, epochs=epochs)
        return model, train_losses, train_accuracies, test_losses, test_accuracies
    else :
        return fit(model, train_loader, test_loader, date, optimizer, criterion, epochs=epochs)

In [76]:
def objective(trial, X, y, date):
    
    lr = trial.suggest_float('lr', 1e-5, 1e-1)
    epochs = trial.suggest_int('epochs', 5, 20)
    batch_size = trial.suggest_categorical('batch_size', [16, 32, 64])
    dropout = trial.suggest_float('dropout', 0.2, 0.5)
    num_hidden = trial.suggest_categorical('num_hidden', [512, 1024, 2048, 4096])
    model_name = trial.suggest_categorical('model_name', ["Model48", "EmotionRecognitionModel", "ResNet"])

    print("hyperparameters: {}".format(trial.params))
    
    train_loader, test_loader, X_train, X_val, y_train, y_val = split_data(X, y, batch_size=batch_size)
    
    # Entraînement du modele avec la fonction train_model
    train_loss_history, train_acc_history, val_loss_history, val_acc_history = train_model(train_loader, test_loader, X_train, X_val, y_train, y_val, date, model_name = model_name, lr=lr, batch_size=batch_size, dropout = dropout, num_hidden = num_hidden,  epochs=epochs)
    return val_acc_history[-1]

In [77]:
X,y = load_data()
date = create_folder()

func = lambda trial: objective(trial, X, y, date)

study = optuna.create_study(direction = "maximize")
study.optimize(func, n_trials=10)

trial = study.best_trial
#print accuracy and best parameters
print('Accuracy: {}'.format(trial.value))
print("Best hyperparameters: {}".format(trial.params))

[32m[I 2023-05-14 20:35:32,026][0m A new study created in memory with name: no-name-63dc95ca-395f-48d1-a036-5c827398f5bf[0m


hyperparameters: {'lr': 0.025761839231960088, 'epochs': 11, 'batch_size': 16, 'dropout': 0.22001646877195702, 'num_hidden': 2048, 'model_name': 'ResNet'}


Entraînement en cours: 100%|██████████| 1795/1795 [05:23<00:00,  5.54it/s]
Traitement en cours:   9%|▉         | 1/11 [05:42<57:09, 342.96s/it]

Epoch: 0, Train Loss: 0.8309, Train Acc: 0.2131, Test Loss: 0.1142, Test Acc: 0.2547


Entraînement en cours: 100%|██████████| 1795/1795 [06:19<00:00,  4.73it/s]
Traitement en cours:  18%|█▊        | 2/11 [12:22<56:25, 376.14s/it]

Epoch: 1, Train Loss: 0.1159, Train Acc: 0.2294, Test Loss: 0.1136, Test Acc: 0.2547




In [None]:
# # Tracer la courbe de perte
# plt.plot(val_loss_history_bp, label='HP Val loss')
# plt.plot(val_loss_history, label='Val Loss')
# plt.legend()
# plt.title('Loss')
# plt.xlabel('Epoch')
# plt.ylabel('Loss')
# plt.show()

# # Tracer la courbe d'accuracy
# plt.plot(val_acc_history_bp, label='HP Val Accuracy')
# plt.plot(val_acc_history, label='Val Accuracy')
# plt.legend()
# plt.title('Accuracy')
# plt.xlabel('Epoch')
# plt.ylabel('Accuracy')
# plt.show()
