# Implémentation du papier *A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification* de Zhang et Wallace [(2012)](https://arxiv.org/pdf/1510.03820.pdf)

L'article propose une heuristique afin d'utiliser des réseaux de neurones convolutionnel pour la classification de phrases. Ces réseaux se basent sur une couche convoltuionnelle.

L'entrée du réseau est une matrice dont les lignes correspondent à un plongement lexical ou *embedding* de chaque mot. Les auteurs proposent de considérer trois *embeddings* :
- One hot encoding
- Word2Vec de Google
- GloVe


Nous allons essayer d'implémenter le modèle de Zhang et Wallace (2012) en utilisant trois jeux de données : 
- allocine_review
- flue
- orange_sum

Le jeu de données `allocine_review` est directement disponible en utilisant la librarie [`datasets`](https://github.com/huggingface/datasets) de `huggingface`.

### Implémentation du CNN

In [None]:
!pip install datasets
!pip install torch
!pip install gensim
!pip install -U spaCy
!python -m spacy download fr_core_news_sm

In [1]:
from datasets import load_dataset

In [2]:
import torch
import spacy 
import nltk
import re
import gensim
from spacy import displacy
from math import floor
import numpy as np
import os
import pandas as pd 
import random 
import seaborn as sns
import itertools
from itertools import combinations
import torch.nn as nn
from torchtext import data    
from torchtext.vocab import Vectors

In [4]:
#Reproducing same results
SEED = 2019

BATCH_SIZE = 50 #Same as Zhang and Wallace (2016)
SENTENCE_SIZE = 67 #attention changer correspond à 80% du training set en entier apres tok

#Torch
torch.manual_seed(SEED)

#Cuda
torch.backends.cudnn.deterministic = True  
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')  

In [5]:
from google.colab import drive
drive.mount('/content/drive')
dossier_donnees = "/content/drive/My Drive/projet_nlp"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Téléchargement de données complémentaires et initialisation de spacy

-Téléchargement du modèle Word2Vec en français
- Téléchargement du jeu de données *allocine_review*
- Mise en place de spacy
- Création d'un fichier word2vec.txt contenant l'embedding pour pouvoir l'ouvrir avec `Vectors` de `torchtext`

In [5]:
model_fr = gensim.models.KeyedVectors.load_word2vec_format("https://s3.us-east-2.amazonaws.com/embeddings.net/embeddings/frWac_non_lem_no_postag_no_phrase_200_skip_cut100.bin",binary=True, unicode_errors='ignore')

In [6]:
dataset_allocine = load_dataset("allocine")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1201.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=813.0, style=ProgressStyle(description_…


Downloading and preparing dataset allocine_dataset/allocine (download: 63.54 MiB, generated: 109.12 MiB, post-processed: Unknown size, total: 172.66 MiB) to /root/.cache/huggingface/datasets/allocine_dataset/allocine/1.0.0/bbee2ebb45a067891973b91ebdd40a93598d1e2dd5710b6714cdc2cd81d0ed65...


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=66625305.0, style=ProgressStyle(descrip…




HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset allocine_dataset downloaded and prepared to /root/.cache/huggingface/datasets/allocine_dataset/allocine/1.0.0/bbee2ebb45a067891973b91ebdd40a93598d1e2dd5710b6714cdc2cd81d0ed65. Subsequent calls will reuse this data.


In [7]:
#Si OSError: [E050] Can't find model 'fr_core_news_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory sur 
#Collab alors il suffit de relancer le notebook --> Ctrl + M
nlp = spacy.load('fr_core_news_sm', disable=["tagger", "parser","ner"])

#### Création du fichier word2Vec.txt et des fichiers à lire dans `torchtext`

Cette étape est necessaire car dans `torch.text`, l'embedding Word2Text en français n'est pas disponible ...

In [38]:
model_fr = gensim.models.KeyedVectors.load_word2vec_format("https://s3.us-east-2.amazonaws.com/embeddings.net/embeddings/frWac_non_lem_no_postag_no_phrase_200_skip_cut100.bin",binary=True, unicode_errors='ignore')
name_embedding = list(model_fr.vocab.keys())

In [39]:
vector_embedding = model_fr.syn0

  """Entry point for launching an IPython kernel.


In [None]:
f = open(dossier_donnees + "/table/" + "word2vec_extra.txt", "w+", encoding = "utf-8")

for i in range(len(name_embedding)):
    f.write( name_embedding[i] + ' ' + " ".join([str(i) for i in vector_embedding[i]]) + " \n")

f.close() 

In [None]:
pd.DataFrame.from_dict(dataset_allocine["train"]).to_csv(dossier_donnees + "/table/" + "allocine_train.csv",header = True, index = False)
pd.DataFrame.from_dict(dataset_allocine["validation"]).to_csv(dossier_donnees + "/table/" + "allocine_validation.csv",header = True, index = False)
pd.DataFrame.from_dict(dataset_allocine["test"]).to_csv(dossier_donnees + "/table/" + "allocine_test.csv",header = True, index = False)

### Tokenisation du jeu de données

In [6]:
def tokenizer(exemple,nom_col):
    return [X.lemma_ for X in nlp(exemple) if X.is_alpha & (not(X.is_stop))]

In [7]:
model_fr = gensim.models.KeyedVectors.load_word2vec_format("https://s3.us-east-2.amazonaws.com/embeddings.net/embeddings/frWac_non_lem_no_postag_no_phrase_200_skip_cut100.bin",binary=True, unicode_errors='ignore')
nlp = spacy.load('fr_core_news_sm', disable=["tagger", "parser","ner"])
vector_embedding = model_fr.syn0

  This is separate from the ipykernel package so we can avoid doing imports until


In [8]:
vectors = Vectors(name= dossier_donnees + "/table/" + "word2vec_extra.txt")
text = data.Field(tokenize= lambda x : tokenizer(x,"review"), lower = True, fix_length = SENTENCE_SIZE)
label = data.LabelField(dtype = torch.float,batch_first=True)

In [9]:
fields = {'review' : ('t',text), 'label' : ('l',label)}

In [10]:
train_data, valid_data, test_data = data.TabularDataset.splits(
                                        path = dossier_donnees + "/table/",
                                        train = 'allocine_train.csv',
                                        validation = 'allocine_validation.csv',
                                        test = 'allocine_test.csv',
                                        format = 'csv',
                                        fields = fields,
                                        skip_header = False)

In [17]:
text.build_vocab(train_data,min_freq=3,vectors = vectors)  #Is min_freq a hyperparameter ?
label.build_vocab(train_data)

In [18]:
train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data),
    sort = False, #don't sort test/validation data
    batch_size= BATCH_SIZE,
    device = device
    )

Bien gérer plus tard les paddings et unknown ici!

### CNN !

In [20]:
class classifier(nn.Module):
    
    #define all the layers used in model
    def __init__(self, wv, no_words, embedding_dim, nb_filter, height_filter, output_dim, dropout):
        
        #Constructor
        super().__init__()          
        
        #embedding layer
        self.embedding = nn.Embedding.from_pretrained(wv)
        
        #Ne pas oublier d'ajouter un view !
        #Convolutionnal layer
        #it uses initialization as proposed by Kaiming et.al
        self.conv1 = nn.Sequential(
                nn.Conv2d(1,nb_filter,(height_filter,embedding_dim)),
                nn.ReLU(),
                nn.MaxPool2d((no_words - height_filter + 1,1), stride = 1),
            )
        
        
        self.fc = nn.Linear(nb_filter, output_dim)
    
        self.sm = nn.Softmax(dim = 1)  

        self.dp = nn.Dropout(p = dropout)


    def forward(self,text):
        x = self.embedding(text)
        x = x.transpose(1,0).unsqueeze(1) #[nb_batch, nb_channel = 1, nb_words_in_sentences, embedding_dim]
        x = self.conv1(x) #[nb_batch, nb_filter, no_words - height_filter + 1, 1] (last dim because conv on the whole width)
        x = x.squeeze() #[nb_batch, no_words - height_filter + 1]
        x = self.dp(x)
        x = self.fc(x) #[nb_batch, 2]
        x = self.sm(x)
        return x

In [21]:
class classifier3F(nn.Module):
    #TODO : remove embedding_dim using wv.shape[1]
    #define all the layers used in model
    def __init__(self, wv, no_words, embedding_dim, nb_filter, height_filter, output_dim, dropout):
        
        #Constructor
        super().__init__()          
        
        #embedding layer
        self.embedding = nn.Embedding.from_pretrained(wv)
        
        #Ne pas oublier d'ajouter un view !
        #Convolutionnal layer
        #it uses initialization as proposed by Kaiming et.al

        self.conv = nn.ModuleList()

        for height in height_filter:
          conv_lay = nn.Sequential(
                nn.Conv2d(1,nb_filter,(height,embedding_dim)),
                nn.ReLU(),
                nn.MaxPool2d((no_words - height + 1,1), stride = 1),
            )
          self.conv.append(conv_lay)

        self.fc = nn.Linear(len(height_filter)*nb_filter, output_dim)
    
        self.sm = nn.Softmax(dim = 1)  

        self.dp = nn.Dropout(p = dropout)


    def forward(self,text):
        x = self.embedding(text)
        x = x.transpose(1,0).unsqueeze(1) #[nb_batch, nb_channel = 1, nb_words_in_sentences, embedding_dim]
        x = [conv(x).squeeze() for conv in self.conv]
        x = torch.cat(tuple(x), dim = 1)
        x = self.dp(x)
        x = self.fc(x) #[nb_batch, 2]
        x = self.sm(x)
        return x

In [22]:
#define metric
def binary_accuracy(preds, y):
    #round predictions to the closest integer
    rounded_preds = torch.round(preds[:,1])    
    correct = (rounded_preds == y).float() 
    acc = correct.sum() / len(correct)
    return acc

In [23]:
def train(model, iterator, optimizer, criterion):
    
    #initialize epoch 
    epoch_loss = 0
    epoch_acc = 0
    
    #set the model in training phase
    model.train()  
    
    for batch in iterator:
        
        #resets the gradients after every batch
        #each batch is used in order to provide an estimation of gradient C according to the paramaeters
        optimizer.zero_grad()   
        
        #retrieve text and no. of words
        text = batch.t

        #convert to 1D tensor
        predictions = model(text).squeeze()  
        
        #compute the loss
        loss = criterion(predictions, batch.l.long())        
        
        #compute the binary accuracy
        acc = binary_accuracy(predictions, batch.l)   
        
        #backpropage the loss and compute the gradients
        loss.backward()       
        
        #update the weights
        optimizer.step()      
        
        #loss and accuracy
        epoch_loss += loss.item()  
        epoch_acc += acc.item()    
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [24]:
def evaluate(model, iterator, criterion):
    
    #initialize every epoch
    epoch_loss = 0
    epoch_acc = 0

    #deactivating dropout layers
    model.eval()
    
    #deactivates autograd
    with torch.no_grad():
    
        for batch in iterator:
        
            #retrieve text and no. of words
            text = batch.t
            
            #convert to 1d tensor
            predictions = model(text).squeeze()
            
            #compute loss and accuracy
            loss = criterion(predictions, batch.l.long())
            acc = binary_accuracy(predictions, batch.l)
            
            #keep track of loss and accuracy
            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator) , epoch_acc / len(iterator)

In [25]:
def train_BCE(model, iterator, optimizer, criterion):
    
    #initialize epoch 
    epoch_loss = 0
    epoch_acc = 0
    
    #set the model in training phase
    model.train()  
    
    for batch in iterator:
        
        #resets the gradients after every batch
        #each batch is used in order to provide an estimation of gradient C according to the paramaeters
        optimizer.zero_grad()   
        
        #retrieve text and no. of words
        text = batch.t

        #convert to 1D tensor
        predictions = model(text).squeeze()  
        
        #compute the loss
        loss = criterion(predictions[:,1], batch.l)        
        
        #compute the binary accuracy
        acc = binary_accuracy(predictions, batch.l)   
        
        #backpropage the loss and compute the gradients
        loss.backward()       
        
        #update the weights
        optimizer.step()      
        
        #loss and accuracy
        epoch_loss += loss.item()  
        epoch_acc += acc.item()    
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [26]:
def evaluate_BCE(model, iterator, criterion):
    
    #initialize every epoch
    epoch_loss = 0
    epoch_acc = 0

    #deactivating dropout layers
    model.eval()
    
    #deactivates autograd
    with torch.no_grad():
    
        for batch in iterator:
        
            #retrieve text and no. of words
            text = batch.t
            
            #convert to 1d tensor
            predictions = model(text).squeeze()
            
            #compute loss and accuracy
            loss = criterion(predictions[:,1], batch.l)
            acc = binary_accuracy(predictions, batch.l)
            
            #keep track of loss and accuracy
            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

### Entrainement 

#### Variation de la largeur des filtres au voisinage large de la taille optimale 2


In [None]:
N_EPOCHS = 50
best_valid_loss = float('inf')

list_model = list(itertools.chain.from_iterable([list(combinations(range(1,5),i)) for i in range(1,4)]))

pd.DataFrame({"filter" : "", "btl" : 0,"bta" : 0, "bvl" : 0, "bva" : 0}, index = [0]).to_csv(dossier_donnees + "/resultat_multi_filtre")

for filtre in list_model:
  model = classifier3F(torch.from_numpy(vector_embedding),SENTENCE_SIZE,vector_embedding.shape[1],400,filtre,2,0.5) #check the difference between syn0 and the other choice. 
  #I think it deals with negative sampling
  print(model)
  import torch.optim as optim

  criterion = nn.CrossEntropyLoss()
  #optimizer = optim.Adam(model.parameters())
  optimizer = optim.Adadelta(model.parameters(), lr=1.0, rho=0.9, eps=1e-06, weight_decay=0) #weight_decay : L2 penalisation !

  model = model.to(device)
  criterion = criterion.to(device)

  best_valid_loss = float('inf')
  best_acc = float('inf')
  best_train_loss = float('inf')
  best_train_acc = float('inf')

  for epoch in range(N_EPOCHS):
      
      #train the model
      train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
      
      #evaluate the model
      valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
      
      #save the best model
      if valid_loss < best_valid_loss:
          best_valid_loss = valid_loss
          best_acc = valid_acc
          best_train_loss = train_loss
          best_train_acc = train_acc
          #torch.save(model.state_dict(), 'saved_weights.pt')
      
      print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
      print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')
  
  pd.DataFrame({"filter": str(filtre) , "btl" : best_train_loss,"bta" : best_train_acc, "bvl" : best_valid_loss, "bva" : best_acc}, index = [0]).to_csv(dossier_donnees + "/resultat_multi_filtre",mode='a', header=False)

#changer taille phrase ?

#### Variation de la largeur des filtres au voisinage de la taille optimale 2

In [None]:
N_EPOCHS = 50
best_valid_loss = float('inf')

list_model = [(1,1),(1,1,1),(1,2),(1,1,2),(1,2,2)]

pd.DataFrame({"filter" : "", "btl" : 0,"bta" : 0, "bvl" : 0, "bva" : 0}, index = [0]).to_csv(dossier_donnees + "/resultat_multi_filtre_2")

for filtre in list_model:
  model = classifier3F(torch.from_numpy(vector_embedding),SENTENCE_SIZE,vector_embedding.shape[1],400,filtre,2,0.5) #check the difference between syn0 and the other choice. 
  #I think it deals with negative sampling
  print(model)
  import torch.optim as optim

  criterion = nn.CrossEntropyLoss()
  #optimizer = optim.Adam(model.parameters())
  optimizer = optim.Adadelta(model.parameters(), lr=1.0, rho=0.9, eps=1e-06, weight_decay=0) #weight_decay : L2 penalisation !

  model = model.to(device)
  criterion = criterion.to(device)

  best_valid_loss = float('inf')
  best_acc = float('inf')
  best_train_loss = float('inf')
  best_train_acc = float('inf')

  for epoch in range(N_EPOCHS):
      
      #train the model
      train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
      
      #evaluate the model
      valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
      
      #save the best model
      if valid_loss < best_valid_loss:
          best_valid_loss = valid_loss
          best_acc = valid_acc
          best_train_loss = train_loss
          best_train_acc = train_acc
          #torch.save(model.state_dict(), 'saved_weights.pt')
      
      print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
      print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')
  
  pd.DataFrame({"filter": str(filtre) , "btl" : best_train_loss,"bta" : best_train_acc, "bvl" : best_valid_loss, "bva" : best_acc}, index = [0]).to_csv(dossier_donnees + "/resultat_multi_filtre_2",mode='a', header=False)

#changer taille phrase ?

#### Variation du nombre de filtres à utiliser à partir du filtre (1,2)

Les tailles de filtre utilisés sont : [10,50,100,200,400,600,1000,2000] conformément à Zheng et *al.* (2012)

In [None]:
N_EPOCHS = 50
best_valid_loss = float('inf')

pd.DataFrame({"nb_filter" : 0, "btl" : 0,"bta" : 0, "bvl" : 0, "bva" : 0}, index = [0]).to_csv(dossier_donnees + "/resultat_taille.filtre.csv")

for nb_filtre in [10,50,100,200,400,600,1000,2000]:
  model = classifier3F(torch.from_numpy(vector_embedding),SENTENCE_SIZE,vector_embedding.shape[1],nb_filtre,(1,2),2,0.5) #check the difference between syn0 and the other choice. 
  #I think it deals with negative sampling
  print(model)
  import torch.optim as optim

  criterion = nn.CrossEntropyLoss()
  #optimizer = optim.Adam(model.parameters())
  optimizer = optim.Adadelta(model.parameters(), lr=1.0, rho=0.9, eps=1e-06, weight_decay=0) #weight_decay : L2 penalisation !

  model = model.to(device)
  criterion = criterion.to(device)

  best_valid_loss = float('inf')
  best_acc = float('inf')
  best_train_loss = float('inf')
  best_train_acc = float('inf')

  for epoch in range(N_EPOCHS):
      
      #train the model
      train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
      
      #evaluate the model
      valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
      
      #save the best model
      if valid_loss < best_valid_loss:
          best_valid_loss = valid_loss
          best_acc = valid_acc
          best_train_loss = train_loss
          best_train_acc = train_acc
          #torch.save(model.state_dict(), 'saved_weights.pt')
      
      print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
      print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

  pd.DataFrame({"nb_filter" : nb_filtre, "btl" : best_train_loss,"bta" : best_train_acc, "bvl" : best_valid_loss, "bva" : best_acc}, index = [0]).to_csv(dossier_donnees + "/resultat_taille.filtre.csv", mode='a', header=False)  

In [None]:
N_EPOCHS = 50
best_valid_loss = float('inf')

for nb_filtre in [1000,1500,2000]:
  model = classifier3F(torch.from_numpy(vector_embedding),SENTENCE_SIZE,vector_embedding.shape[1],nb_filtre,(1,2),2,0.5) #check the difference between syn0 and the other choice. 
  #I think it deals with negative sampling
  print(model)
  import torch.optim as optim

  criterion = nn.CrossEntropyLoss()
  #optimizer = optim.Adam(model.parameters())
  optimizer = optim.Adadelta(model.parameters(), lr=1.0, rho=0.9, eps=1e-06, weight_decay=0) #weight_decay : L2 penalisation !

  model = model.to(device)
  criterion = criterion.to(device)

  best_valid_loss = float('inf')
  best_acc = float('inf')
  best_train_loss = float('inf')
  best_train_acc = float('inf')

  for epoch in range(N_EPOCHS):
      
      #train the model
      train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
      
      #evaluate the model
      valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
      
      #save the best model
      if valid_loss < best_valid_loss:
          best_valid_loss = valid_loss
          best_acc = valid_acc
          best_train_loss = train_loss
          best_train_acc = train_acc
          #torch.save(model.state_dict(), 'saved_weights.pt')
      
      print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
      print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

  pd.DataFrame({"nb_filter" : nb_filtre, "btl" : best_train_loss,"bta" : best_train_acc, "bvl" : best_valid_loss, "bva" : best_acc}, index = [0]).to_csv(dossier_donnees + "/resultat_taille.filtre.csv", mode='a', header=False)  

#### Variation de la régularisation


#### Dropout entre 0.1 et 0.5

In [None]:
N_EPOCHS = 30
best_valid_loss = float('inf')

pd.DataFrame({"dropout" : 0, "btl" : 0,"bta" : 0, "bvl" : 0, "bva" : 0}, index = [0]).to_csv(dossier_donnees + "/resultat_dropout.csv")

for dropout in [0.1,0.2,0.3,0.4,0.5]:
  model = classifier3F(torch.from_numpy(vector_embedding),SENTENCE_SIZE,vector_embedding.shape[1],200,(1,2),2,dropout) #check the difference between syn0 and the other choice. 
  #I think it deals with negative sampling
  print(model)
  import torch.optim as optim

  criterion = nn.CrossEntropyLoss()
  #optimizer = optim.Adam(model.parameters())
  optimizer = optim.Adadelta(model.parameters(), lr=1.0, rho=0.9, eps=1e-06, weight_decay=0) #weight_decay : L2 penalisation !

  model = model.to(device)
  criterion = criterion.to(device)

  best_valid_loss = float('inf')
  best_acc = float('inf')
  best_train_loss = float('inf')
  best_train_acc = float('inf')

  for epoch in range(N_EPOCHS):
      
      #train the model
      train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
      
      #evaluate the model
      valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
      
      #save the best model
      if valid_loss < best_valid_loss:
          best_valid_loss = valid_loss
          best_acc = valid_acc
          best_train_loss = train_loss
          best_train_acc = train_acc
          #torch.save(model.state_dict(), 'saved_weights.pt')
      
      print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
      print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

  pd.DataFrame({"dropout" : dropout, "btl" : best_train_loss,"bta" : best_train_acc, "bvl" : best_valid_loss, "bva" : best_acc}, index = [0]).to_csv(dossier_donnees +  "/resultat_dropout.csv", mode='a', header=False)  

#### Dropout entre 0.6 et 0.9

In [None]:
N_EPOCHS = 30
best_valid_loss = float('inf')

for dropout in [0.5,0.6,0.7,0.8,0.9]:
  model = classifier3F(torch.from_numpy(vector_embedding),SENTENCE_SIZE,vector_embedding.shape[1],200,(1,2),2,dropout) #check the difference between syn0 and the other choice. 
  #I think it deals with negative sampling
  print(model)
  import torch.optim as optim

  criterion = nn.CrossEntropyLoss()
  #optimizer = optim.Adam(model.parameters())
  optimizer = optim.Adadelta(model.parameters(), lr=1.0, rho=0.9, eps=1e-06, weight_decay=0) #weight_decay : L2 penalisation !

  model = model.to(device)
  criterion = criterion.to(device)

  best_valid_loss = float('inf')
  best_acc = float('inf')
  best_train_loss = float('inf')
  best_train_acc = float('inf')

  for epoch in range(N_EPOCHS):
      
      #train the model
      train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
      
      #evaluate the model
      valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
      
      #save the best model
      if valid_loss < best_valid_loss:
          best_valid_loss = valid_loss
          best_acc = valid_acc
          best_train_loss = train_loss
          best_train_acc = train_acc
          #torch.save(model.state_dict(), 'saved_weights.pt')
      
      print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
      print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

  pd.DataFrame({"dropout" : dropout, "btl" : best_train_loss,"bta" : best_train_acc, "bvl" : best_valid_loss, "bva" : best_acc}, index = [0]).to_csv(dossier_donnees +  "/resultat_dropout.csv", mode='a', header=False)  

## Version corrigée

### Version avec BCELoss

In [None]:
N_EPOCHS = 30
best_valid_loss = float('inf')

model = classifier3F(torch.from_numpy(vector_embedding),SENTENCE_SIZE,vector_embedding.shape[1],1000,(1,2),2,0.5) #check the difference between syn0 and the other choice. 
#I think it deals with negative sampling
print(model)
import torch.optim as optim

criterion = nn.BCELoss()
#optimizer = optim.Adam(model.parameters())
optimizer = optim.Adadelta(model.parameters(), lr=1.0, rho=0.9, eps=1e-06, weight_decay=0) #weight_decay : L2 penalisation !

model = model.to(device)
criterion = criterion.to(device)

best_valid_loss = float('inf')
best_acc = float('inf')
best_train_loss = float('inf')
best_train_acc = float('inf')

for epoch in range(N_EPOCHS):
    
    #train the model
    train_loss, train_acc = train_BCE(model, train_iterator, optimizer, criterion)
    
    #evaluate the model
    valid_loss, valid_acc = evaluate_BCE(model, valid_iterator, criterion)
    
    #save the best model
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        best_acc = valid_acc
        best_train_loss = train_loss
        best_train_acc = train_acc
        torch.save(model.state_dict(), 'final_allocine.pt')
    
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')



### Taille de filtre optimale avec BCELoss avec le bon embedding ?

In [42]:
text.vocab.vectors.shape

torch.Size([49075, 200])

In [None]:
N_EPOCHS = 30
best_valid_loss = float('inf')

model = classifier3F(text.vocab.vectors,SENTENCE_SIZE,vector_embedding.shape[1],600,(1,2),2,0.5) #check the difference between syn0 and the other choice. 
#I think it deals with negative sampling
print(model)
import torch.optim as optim

criterion = nn.BCELoss()
#optimizer = optim.Adam(model.parameters())
optimizer = optim.Adadelta(model.parameters(), lr=1.0, rho=0.9, eps=1e-06, weight_decay=0) #weight_decay : L2 penalisation !

model = model.to(device)
criterion = criterion.to(device)

best_valid_loss = float('inf')
best_acc = float('inf')
best_train_loss = float('inf')
best_train_acc = float('inf')

for epoch in range(N_EPOCHS):
    
    #train the model
    train_loss, train_acc = train_BCE(model, train_iterator, optimizer, criterion)
    
    #evaluate the model
    valid_loss, valid_acc = evaluate_BCE(model, valid_iterator, criterion)
    
    #save the best model
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        best_acc = valid_acc
        best_train_loss = train_loss
        best_train_acc = train_acc
        torch.save(model.state_dict(), 'final_allocine.pt')
    
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')


#sauvegarder les iterators

### Largeur de filtre optimale

In [None]:
N_EPOCHS = 30
best_valid_loss = float('inf')

pd.DataFrame({"filtre" : 0, "btl" : 0,"bta" : 0, "bvl" : 0, "bva" : 0}, index = [0]).to_csv(dossier_donnees + "/resultat_larg_filtre.csv")

for largeur in range(1,11):
  model = classifier3F(text.vocab.vectors,SENTENCE_SIZE,vector_embedding.shape[1],100,tuple([largeur]),2,0.5) #check the difference between syn0 and the other choice. 
  #I think it deals with negative sampling
  print(model)
  import torch.optim as optim

  criterion = nn.BCELoss()
  #optimizer = optim.Adam(model.parameters())
  optimizer = optim.Adadelta(model.parameters(), lr=1.0, rho=0.9, eps=1e-06, weight_decay=0) #weight_decay : L2 penalisation !

  model = model.to(device)
  criterion = criterion.to(device)

  best_valid_loss = float('inf')
  best_acc = float('inf')
  best_train_loss = float('inf')
  best_train_acc = float('inf')

  for epoch in range(N_EPOCHS):
      
      #train the model
      train_loss, train_acc = train_BCE(model, train_iterator, optimizer, criterion)
      
      #evaluate the model
      valid_loss, valid_acc = evaluate_BCE(model, valid_iterator, criterion)
      
      #save the best model
      if valid_loss < best_valid_loss:
          best_valid_loss = valid_loss
          best_acc = valid_acc
          best_train_loss = train_loss
          best_train_acc = train_acc
          torch.save(model.state_dict(), 'final_allocine.pt')
      
      print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
      print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')
  
  pd.DataFrame({"filtre" : largeur, "btl" : best_train_loss,"bta" : best_train_acc, "bvl" : best_valid_loss, "bva" : best_acc}, index = [0]).to_csv(dossier_donnees +  "/resultat_larg_filtre.csv", mode='a', header=False)  
#sauvegarder les iterators

# Plusieurs filtres autour de la largeur optimale (2)

In [None]:
list_model = list(itertools.chain.from_iterable([list(combinations(range(1,5),i)) for i in range(1,5)]))
N_EPOCHS = 30
best_valid_loss = float('inf')

pd.DataFrame({"filtre" : "0", "btl" : 0,"bta" : 0, "bvl" : 0, "bva" : 0}, index = [0]).to_csv(dossier_donnees + "/resultat_larg_filtre_diff_type.csv")

for largeur in list_model:
  model = classifier3F(text.vocab.vectors,SENTENCE_SIZE,vector_embedding.shape[1],400,largeur,2,0.5) #check the difference between syn0 and the other choice. 
  #I think it deals with negative sampling
  print(model)
  import torch.optim as optim

  criterion = nn.BCELoss()
  #optimizer = optim.Adam(model.parameters())
  optimizer = optim.Adadelta(model.parameters(), lr=1.0, rho=0.9, eps=1e-06, weight_decay=0) #weight_decay : L2 penalisation !

  model = model.to(device)
  criterion = criterion.to(device)

  best_valid_loss = float('inf')
  best_acc = float('inf')
  best_train_loss = float('inf')
  best_train_acc = float('inf')

  for epoch in range(N_EPOCHS):
      
      #train the model
      train_loss, train_acc = train_BCE(model, train_iterator, optimizer, criterion)
      
      #evaluate the model
      valid_loss, valid_acc = evaluate_BCE(model, valid_iterator, criterion)
      
      #save the best model
      if valid_loss < best_valid_loss:
          best_valid_loss = valid_loss
          best_acc = valid_acc
          best_train_loss = train_loss
          best_train_acc = train_acc
          #torch.save(model.state_dict(), 'final_allocine.pt')
      
      print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
      print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')
  
  pd.DataFrame({"filtre" : str(largeur), "btl" : best_train_loss,"bta" : best_train_acc, "bvl" : best_valid_loss, "bva" : best_acc}, index = [0]).to_csv(dossier_donnees +  "/resultat_larg_filtre_diff_type.csv", mode='a', header=False)  
#sauvegarder les iterators

classifier3F(
  (embedding): Embedding(49075, 200)
  (conv): ModuleList(
    (0): Sequential(
      (0): Conv2d(1, 400, kernel_size=(1, 200), stride=(1, 1))
      (1): ReLU()
      (2): MaxPool2d(kernel_size=(67, 1), stride=1, padding=0, dilation=1, ceil_mode=False)
    )
  )
  (fc): Linear(in_features=400, out_features=2, bias=True)
  (sm): Softmax(dim=1)
  (dp): Dropout(p=0.5, inplace=False)
)
	Train Loss: 0.373 | Train Acc: 83.31%
	 Val. Loss: 0.285 |  Val. Acc: 88.14%


## Lecture d'un modèle enregistré

In [None]:
model = classifier3F(torch.from_numpy(vector_embedding),SENTENCE_SIZE,vector_embedding.shape[1],1000,(1,2),2,0.5) #check the difference between syn0 and the other choice. 
weights = model.load_state_dict(torch.load(dossier_donnees + "/final_allocine.pt"))

In [None]:
model = model.to(device)

In [None]:
pred = []

for element in valid_iterator:
    pred.append(model(element.t))

RuntimeError: ignored

In [None]:
torch.cat([pred,torch.tensor([2,3]),torch.tensor([2,3])], dim = -1)

tensor([2., 3., 2., 3.])