--- 

# <center> Project: NLP ENSAE 
## <center> Intents Classification for Neural Text Generation

<center>Work done by : 

##### <center> Ali HAIDAR email : ali.haidar@polytechnique.edu
##### <center> François Bertholom   email : francois.bertholom@ensae.fr

---

## Import Libraries

In [None]:
import numpy as np 
import pandas as pd
from datasets import load_dataset
from keras_preprocessing.sequence import pad_sequences
from pytorch_pretrained_bert import BertTokenizer
from pytorch_pretrained_bert import BertModel
import torch
from torch import nn
from torch.optim import Adam
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from torch.nn.utils import clip_grad_norm_
from IPython.display import clear_output
from tabulate import tabulate


from keras.models import Model
from keras.optimizers import Adam
from keras.layers import Input, Dense, Embedding, SpatialDropout1D, add, concatenate,Flatten
from keras.layers import CuDNNLSTM, LSTM, Bidirectional, GlobalMaxPooling1D, GlobalAveragePooling1D, LSTM, Lambda,MaxPooling1D, AveragePooling1D,GRU, Reshape
from keras.preprocessing import text, sequence
from gensim.models import KeyedVectors
from keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.optimizers.schedules import PolynomialDecay
import keras.backend as K
from keras.losses import sparse_categorical_crossentropy

import Process
import Models

## Multi-label One-target classifier

In this section, we will evaluate and compare the effectiveness of a multi-label one target classifier, which takes an utterance as input and predicts the label for that utterance. 

To assess the performance of these models, we utilized the Silicone datasets provided by Hugging Face.

We build three models based on Bert

### BertMLP1Layer: 

The model utilizes the embedding layer of a pre-trained BERT model. Following this, we incorporated a concatenation layer of GlobalMaxPooling1D and GlobalAveragePooling1D after the embedding. The resulting output was then passed through a single neural network layer.

GlobalMaxPooling1D and GlobalAveragePooling1D are commonly used pooling operations in deep learning models, particularly for natural language processing (NLP) tasks.

In the case of NLP, we often have input sequences of variable lengths. The pooling operations allow us to aggregate the information from these sequences into a fixed-length vector that can be passed to subsequent layers of the neural network.

GlobalMaxPooling1D computes the maximum value from each feature dimension across the entire input sequence. This can be useful for capturing the most salient information in the input sequence.

GlobalAveragePooling1D computes the average value from each feature dimension across the entire input sequence. This can be useful for capturing the overall distribution of information in the input sequence.

By using both GlobalMaxPooling1D and GlobalAveragePooling1D in a concatenated layer, we can capture both the most salient information and the overall distribution of information in the input sequence, resulting in a more robust representation of the input that can improve model performance.

The model employs the sparse_categorical_crossentropy as the loss function during the training phase. This loss function is commonly used for multi-class classification problems with integer labels.

Furthermore, we evaluate the model's performance using the 0-1 accuracy metric. This metric measures the percentage of instances where the model correctly predicts the label for the input utterance. In other words, it calculates the ratio of the number of correctly predicted labels to the total number of labels, and then expresses this as a percentage. The 0-1 accuracy metric is a commonly used evaluation metric for multi-label classification problems, where each input can have multiple correct labels.

### BertMLP2Layer:

The main difference between this model and BertMLP1Layer is that we added an additional dense layer before the output layer. 

### BertGRU : 



<img src="GRUModel.png" alt="Bert GRU " />

In [None]:
def masked_sparse_categorical_crossentropy(mask_value):
    def loss_function(y_true, y_pred):
        mask = K.cast(K.not_equal(y_true, mask_value), K.floatx())
        masked_true = K.cast(mask * K.cast(y_true, K.floatx()), "int32")
        loss = sparse_categorical_crossentropy(masked_true, y_pred)
        masked_loss = loss * mask
        return K.sum(masked_loss) / K.sum(mask)

    return loss_function

def generate_result(dataset, model, embedding_matrix, multi_target = 0):
    
    train = pd.DataFrame(data=dataset['train'])
    val = pd.DataFrame(data=dataset['validation'])
    test = pd.DataFrame(data=dataset['test'])
    label = 'Label'
    
    if multi_target == 1 and 'Dialogue_ID' not in train.columns:
        return 0
    
    if multi_target == 1:
        train, val, test = Process.context(train.copy(), val.copy(), test.copy()) 
    
    X_train = train['Utterance']
    y_train = np.array(train[label]) 
    
    
    X_val = val['Utterance']
    y_val = np.array(val[label]) 
    
    X_test = test['Utterance']
    y_test = np.array(test[label]) 
    
    if multi_target ==1:
        y_train = np.array([[j for j in i] for i in y_train])
        y_val = np.array([[j for j in i] for i in y_val])
        y_test = np.array([[j for j in i] for i in y_test])
    
    if(multi_target == 1):
        out = y_train.shape[1]
    else:
        out = 1
        
    n_classes = len(np.unique(y_train.reshape(-1)))
    
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
    
    train_tokens_ids = Process.tokenize(X_train, tokenizer)
    val_tokens_ids = Process.tokenize(X_val, tokenizer)
    test_tokens_ids = Process.tokenize(X_test, tokenizer)
    
    if(multi_target == 1):
        y_train_masks = Process.mask(y_train)
        y_val_masks = Process.mask(y_val)
        y_test_masks = Process.mask(y_test)
        
        y_train_masks = np.array(y_train_masks)
        y_val_masks = np.array(y_val_masks)
        y_test_masks = np.array(y_test_masks)
    
    NUM_TRAIN_STEPS = (len(train_tokens_ids) // BATCH_SIZE) * EPOCHS
    
    model = model.build_model(embedding_matrix, n_classes, out)
    
    lr_scheduler = PolynomialDecay(initial_learning_rate=5e-5, end_learning_rate=0.0, decay_steps=NUM_TRAIN_STEPS)
    opt = Adam(learning_rate=lr_scheduler, clipnorm=1)
    
    if(multi_target == 1):
        model.compile(loss=masked_sparse_categorical_crossentropy(-1), optimizer=opt)
    else:
        model.compile(loss='sparse_categorical_crossentropy', optimizer=opt)
    
    earlyStopping = EarlyStopping(monitor='val_loss', patience=6, verbose=1, mode='min')
    mcp_save = ModelCheckpoint('.mdl_wts.hdf5', save_best_only=True, monitor='val_loss', mode='min')
   
    model.fit(
        train_tokens_ids,
        y_train,
        validation_data=(val_tokens_ids, y_val),
        validation_batch_size=512,
        batch_size=BATCH_SIZE,
        epochs=EPOCHS,
        verbose=1,
        callbacks=[earlyStopping, mcp_save]
    )
    
    bert_predicted = np.argmax(model.predict(test_tokens_ids, batch_size=512), axis=-1).reshape(-1)
    y_test = y_test.reshape(-1)
    if(multi_target == 1):
        bert_predicted = bert_predicted[y_test_masks.reshape(-1)]
        y_test = y_test[y_test_masks.reshape(-1)]
  
    acc = (np.sum(bert_predicted == y_test) / len(y_test)) * 100
    return acc


In [None]:
# Set hyperparameters
BATCH_SIZE = 32
EPOCHS = 100

# Get the embedding matrix for BERT
embedding_matrix = Models.get_bert_embed_matrix()

# Initialize a results dataframe
results = pd.DataFrame(columns=['model','dyda_da', 'dyda_e','maptask', 'meld_e', 'meld_s', 'mrda', 'oasis', 'sem'])

# Initialize a list of models to evaluate
models = [Models.BertMLP1Layer(), Models.BertMLP2Layers(), Models.BertGRU()]

# Loop through the models and datasets to generate results
for model in models:
    # Create a list to store results for this model
    res = [model.__class__.__name__]
    
    # Loop through the datasets
    for d in ['dyda_da', 'dyda_e' ,'maptask', 'meld_e', 'meld_s', 'mrda', 'oasis', 'sem']:
        # Load the dataset
        dataset = load_dataset('silicone',d)
        
        # Generate accuracy for this dataset and model
        acc = generate_result(dataset, model, embedding_matrix)
        
        # Print accuracy
        print("Accuracy on " + d + " :",acc)
        
        # Append accuracy to results list
        res.append(acc)
    
    # Append results for this model to the overall results dataframe
    results.loc[len(results)] = res
    
# Save results to a csv file
results.to_csv("MultiLabelOneTarget.csv",index=False)


## Multi-label Multi-target classifier

In this part, we try to predict the labels from a context instead of a single utterance.


In [None]:
# Set hyperparameters
BATCH_SIZE = 32
EPOCHS = 100

# Get the embedding matrix for BERT
embedding_matrix = Models.get_bert_embed_matrix()

# Initialize a results dataframe
results = pd.DataFrame(columns=['model','dyda_da', 'dyda_e','maptask', 'meld_e', 'meld_s', 'mrda', 'oasis', 'sem'])

# Initialize a list of models to evaluate
models = [Models.BertMLP1Layer(), Models.BertMLP2Layers(), Models.BertGRU()]

# Loop through the models and datasets to generate results
for model in models:
    # Create a list to store results for this model
    res = [model.__class__.__name__]
    
    # Loop through the datasets
    for d in ['dyda_da', 'dyda_e' ,'maptask', 'meld_e', 'meld_s', 'mrda', 'oasis', 'sem']:
        # Load the dataset
        dataset = load_dataset('silicone',d)
        
        # Generate accuracy for this dataset and model
        acc = generate_result(dataset, model, embedding_matrix, multi_target = 1)
        
        # Print accuracy
        print("Accuracy on " + d + " :",acc)
        
        # Append accuracy to results list
        res.append(acc)
    
    # Append results for this model to the overall results dataframe
    results.loc[len(results)] = res
    
# Save results to a csv file
results.to_csv("MultiLabelMultiTarget.csv",index=False)


In [25]:
multiLabelOneTarget = pd.read_csv("results/MultiLabelOneTarget.csv")
print(tabulate(multiLabelOneTarget, headers='keys', tablefmt='psql'))

+----+----------------+-----------+----------+-----------+----------+----------+---------+---------+---------+
|    | model          |   dyda_da |   dyda_e |   maptask |   meld_e |   meld_s |    mrda |   oasis |     sem |
|----+----------------+-----------+----------+-----------+----------+----------+---------+---------+---------|
|  0 | BertGRU        |   81.3178 |  84.7804 |   62.8542 |  62.1839 |  67.433  | 90.0194 | 66.7794 | 64.123  |
|  1 | BertMLP2Layers |   78.9147 |  84.5349 |   59.6406 |  60      |  65.977  | 89.8255 | 58.728  | 57.631  |
|  2 | BertMLP1Layer  |   76.3307 |  83.3204 |   53.1099 |  52.1839 |  60.6513 | 87.4337 | 50.6089 | 54.8975 |
+----+----------------+-----------+----------+-----------+----------+----------+---------+---------+---------+


In [26]:
multiLabelMultiTarget = pd.read_csv("results/MultiLabelMultiTarget.csv")
print(tabulate(multiLabelMultiTarget, headers='keys', tablefmt='psql'))

+----+----------------+-----------+----------+-----------+----------+----------+---------+---------+---------+
|    | model          |   dyda_da |   dyda_e |   maptask |   meld_e |   meld_s |    mrda |   oasis |     sem |
|----+----------------+-----------+----------+-----------+----------+----------+---------+---------+---------|
|  0 | BertGRU        |   54.031  |  76.5891 |         0 |  41.1494 |  41.2261 | 46.7292 |       0 | 26.9932 |
|  1 | BertMLP2Layers |   39.199  |  70.9173 |         0 |  30.2682 |  30.3448 | 45.223  |       0 | 28.4738 |
|  2 | BertMLP1Layer  |   34.4574 |  66.2145 |         0 |  30.2299 |  30.1533 | 46.8649 |       0 | 30.8656 |
+----+----------------+-----------+----------+-----------+----------+----------+---------+---------+---------+


We conclude that the performance of our models degrades for context classification