# Transformer

What is a Transformer?

A Transformer is a type of neural network architecture developed by Vaswani et al. in 2017.
Without going into too much detail, this model architecture consists of a multi-head self-attention mechanism combined with an encoder-decoder structure. It can achieve SOTA results that outperform various other models leveraging recurrent (RNN) or convolutional neural networks (CNN) both in terms of evaluation score (BLEU score) and training time.

The Transformer model structure has largely replaced other NLP model implementations such as RNNs.
The GPT model only uses the decoder of the Transformer structure (unidirectional), while **BERT** is based on the Transformer encoder (bidirectional).

Many Transformer-based NLP models were specifically created for transfer learning. Transfer learning describes an approach where a model is first pre-trained on large unlabeled text corpora using self-supervised learning. 

While GPT used a standard language modeling objective which predicts the next word in a sentence, BERT was trained on Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). The RoBERTa model replicated the BERT model architecture but changed the pre-training using more data, training for longer, and removing the NSP objective.

The model checkpoints of the pre-trained models serve as the starting point for fine-tuning. A labeled dataset for a specific downstream task is used as training data. There are several different fine-tuning approaches, including the following:

* Training the entire model on the labeled data.
* Training only higher layers and freezing the lower layers.
* Freezing the entire model and training one or more additional layers added on top.
   
No matter the approach, a task-specific output layer usually needs to be attached to the model. 

Source: [How to use transformer-based NLP models](https://towardsdatascience.com/how-to-use-transformer-based-nlp-models-a42adbc292e5)

## Multilabel Classification with BERT

In [None]:
#!pip install simpletransformers

In [None]:
#!pip install gin-config
!pip install tensorflow-addons

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn
import torch
import wandb

from itertools import cycle
from simpletransformers.classification import MultiLabelClassificationModel
from sklearn.metrics import accuracy_score, auc, classification_report, confusion_matrix, ConfusionMatrixDisplay, roc_curve, roc_auc_score
from sklearn.model_selection import train_test_split

In [None]:
# load data
df = pd.read_csv('../data/df_cleaned.csv')

In [None]:
# Remove new lines from comments
df['comment_text'] = df.comment_text.apply(lambda x: x.replace('\n', ' '))

In [None]:
# category list for plots
categories = ['toxic', 'severe_toxic', 'obscene', 'threat',  'insult', 'identity_hate']

In [None]:
# prepare dataframe for train test split. MultilabelClassificator needs a text column and a labels column, 
# which provides all categories as a list
new_df = pd.DataFrame()
new_df['id'] = df['id']
new_df['text'] = df['comment_text']
new_df['labels'] = df.iloc[:, 2:8].values.tolist()

In [None]:
def split(df):
    train_df, eval_df = train_test_split(df, test_size=0.2, random_state=0)
    return train_df, eval_df

In [None]:
# Create trand and eval df for the model training and evaluation
train_df, eval_df = split(new_df)

In [None]:
# Model args
args = {
    'logging_steps': 10, 
    'overwrite_output_dir':True, 
    'train_batch_size':2, 
    'gradient_accumulation_steps':16, 
    'learning_rate': 3e-5, 
    'num_train_epochs': 4, 
    'max_seq_length': 128, 
    'wandb_project': 'toxic-comment-classification', 
    "wandb_kwargs": 
      {"name": "bert-lr3e-5"},
    }

In [None]:
# load pretrained model for the multilabel classification task
model = MultiLabelClassificationModel('bert', 'bert-base-uncased', num_labels=6, args=args)

In [None]:
# train the model with the train data
model.train_model(train_df = train_df)

In [None]:
# save model
torch.save(model, 'saved_models/bert_lr3e-5')

In [None]:
# load model
model = torch.load('saved_models/bert_lr3e-5')

In [None]:
# evaluate model on eval_df
result, model_outputs, wrong_predictions = model.eval_model(eval_df=eval_df, roc_auc_score=sklearn.metrics.roc_auc_score)

In [None]:
# make predictions
preds, outputs = model.predict(eval_df.text)

In [None]:
# define y_true for roc_auc plot and classification report
y_true = np.array(eval_df['labels'].values.tolist())

In [None]:
def evaluate_roc(probs, y_true, category, color):
    """
    - Print AUC and accuracy on the test set
    - Plot ROC
    @params    probs (np.array): an array of predicted probabilities with shape (len(y_true), 2)
    @params    y_true (np.array): an array of the true values with shape (len(y_true),)
    """
    preds = probs
    fpr, tpr, threshold = roc_curve(y_true, preds)
    roc_auc = auc(fpr, tpr)
    roc_aucs.append(roc_auc)
    print(f'AUC: {roc_auc:.4f}')
       
    # Get accuracy over the test set
    y_pred = np.where(preds >= 0.3, 1, 0)
    accuracy = accuracy_score(y_true, y_pred)
    print(f'Accuracy: {accuracy*100:.2f}%')
    
    # Plot ROC AUC
    plt.title('Receiver Operating Characteristic')
    plt.plot(fpr, tpr, color=color, label="{0} (area = {1:0.5f})".format(category, roc_auc),)
    plt.plot(fpr, tpr, color=color)
    
    plt.legend(loc = 'lower right')
    plt.plot([0, 1], [0, 1],'k--')
    plt.xlim([0, 1])
    plt.ylim([0, 1])
    plt.ylabel('True Positive Rate')
    plt.xlabel('False Positive Rate')
    plt.savefig('plots/roc_auc_curve.png')


In [None]:
# evalutae roc auc score and plot curves per category
colors = cycle(["aqua", "darkorange", "cornflowerblue"])

for i, color in zip(range(6), colors):
    print('-----------')
    print(categories[i])
    print('-----------')
    evaluate_roc(outputs[:, i].ravel(), y_true[:, i].ravel(), categories[i], color)

In [None]:
# Plot confusion matrix per category
y_test = np.array(eval_df['labels'].to_list())
preds = np.array(preds)

f, axes = plt.subplots(2, 3, figsize=(25, 15))
axes = axes.ravel()
for i in range(6):
    disp = ConfusionMatrixDisplay(confusion_matrix(y_test[:, i],
                                                   preds[:, i]),
                                  display_labels=[f'non {categories[i]}', categories[i]])
    disp.plot(ax=axes[i], values_format='.4g')
    disp.ax_.set_title(f'toxicity label:\n {categories[i]}', fontsize=20)
    if i<3:
        disp.ax_.set_xlabel('')
    if i%3!=0:
        disp.ax_.set_ylabel('')
    disp.im_.colorbar.remove()

plt.subplots_adjust(wspace=0.8, hspace=0.01)
f.colorbar(disp.im_, ax=axes)
plt.show()

In [None]:
# Print classification report
print(f"Classification Report : \n\n{classification_report(y_test, preds)}") 

In [None]:
# Create submission_file
test_df = pd.read_csv('data/test.csv')

comments = test_df.comment_text.apply(lambda x: x.replace('\n', ' ')).tolist()
preds, outputs = model.predict(comments)

submission = pd.DataFrame(outputs, columns=categories)


submission['id'] = test_df['id']
submission = submission[categories]

# write to csv and upload at Kaggle to get ROC AUC Scores for Kaggles testdata
submisssion.to_csv('/content/drive/MyDrive/data/submission_roberta_tuning_lr2e5.csv', index=False)