# Comparing Transformers based on the size

In this notebook, I'll compare the base- and large-sized XML-RoBERTa. The setup is the same to the setup for the first experiments, the only difference is in the following hyperparameters, used due to GPU memory size constraint on the Kaggle environment (the large-sized XML-Roberta raises errors if the max_seq_length or the batch size are too large.)

* max_seq_length = 128 (default)
* smaller size of batches - 21

Import all necessary libraries and install everything you need for training:

In [1]:
# install pytorch
!conda install --yes pytorch>=1.6 cudatoolkit=11.0 -c pytorch

In [2]:
# install simpletransformers
!pip install -q transformers
!pip install --upgrade transformers
!pip install -q simpletransformers

# check installed version
!pip freeze | grep simpletransformers

In [3]:
# install stable torch
!pip uninstall -q torch -y
!pip install -q torch==1.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

In [4]:
# install the libraries necessary for data wrangling, prediction and result analysis
from sklearn.metrics import f1_score, confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

### Import the data

In [52]:
# Import the data, prepared for the experiments
train_df = pd.read_csv("/kaggle/input/gincodataframededuptraindevtest/GINCO_dataframe_dedup_train_dev.csv")
test_df = pd.read_csv("/kaggle/input/gincodataframededuptraindevtest/GINCO_dataframe_dedup_test.csv")

print("Train shape: {}, Test shape: {}.".format(train_df.shape, test_df.shape))

In [53]:
# Create a list of labels
LABELS = train_df.labels.unique().tolist()

In [54]:
# Drop the instances with no text
train_df = train_df.dropna()
test_df = test_df.dropna()
print("Train shape: {}, Test shape: {}.".format(train_df.shape, test_df.shape))

## Transformers

Let's start with arguments which are the same for all the models.

In [55]:
# define hyperparameters
model_args ={"overwrite_output_dir": True,
             "num_train_epochs": 90,
             "labels_list": LABELS,
             "learning_rate": 1e-5,
             "train_batch_size": 21,
             "no_cache": True,
             "no_save": True,
             "max_seq_length": 128,
             "save_steps": -1,
             }

In [56]:
import gc
import torch
gc.collect()
torch.cuda.empty_cache()

### Base-sized XLM-RoBERTa

Multilingual model
https://huggingface.co/xlm-roberta-base

In [70]:
from simpletransformers.classification import ClassificationModel

roberta_base_model = ClassificationModel(
        "xlmroberta", "xlm-roberta-base",
        num_labels=21,
        use_cuda=True,
        args=model_args
    )

### Large-sized XML-RoBERTa
Multilingual model https://huggingface.co/xlm-roberta-large

from simpletransformers.classification import ClassificationModel

roberta_large_model = ClassificationModel(
        "xlmroberta", "xlm-roberta-large",
        num_labels=21,
        use_cuda=True,
        args=model_args
    )

## Training and evaluation

### Train

In [71]:
# Base-sized XML-Roberta
roberta_base_model.train_model(train_df)

# Large-sized XML-Roberta
#roberta_large_model.train_model(train_df)

### Evaluate

In [72]:
def eval_model(model,plot_title=None):
    """ Evaluates the model by calculating the micro and macro F1 score on predictions on the test data
    and by plotting a confusion matrix which is saved automatically.
    It takes the test data, named as "test_df", and labels list named "LABELS".
    
    Args: 
        model (simpletransformers.ClassificationModel): the model name.
        plot_title (string): the title of the confusion matrix, defaults to None.
    
    Returns:
        results (dict): dictionary with fields `plot_title`, `microF1`, `macroF1`, `y_true`, `y_pred`.    
    """
    instance_predictions, raw_outputs = model.predict(['Danes poročamo o dogodku, ki se je zgodil 1. 1. 2020. Oseba je dejala:"To je res nenormalen dogodek"'])
    print("Instance prediction: ", instance_predictions)
    
    # Get the true labels from the dataframe
    y_true = test_df.labels

    # Calculate the model's predictions
    y_pred = model.predict(test_df.text.tolist())[0]
    
    macro = f1_score(y_true, y_pred, labels=LABELS, average="macro")
    micro = f1_score(y_true, y_pred, labels=LABELS,  average="micro")
    print(f"Macro f1: {macro:0.3}, Micro f1: {micro:0.3}")
      
    cm = confusion_matrix(y_true, y_pred, labels=LABELS)
    plt.figure(figsize=(9, 9))
    plt.imshow(cm, cmap="Oranges")
    for (i, j), z in np.ndenumerate(cm):
        plt.text(j, i, '{:d}'.format(z), ha='center', va='center')
    classNames = LABELS
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    tick_marks = np.arange(len(classNames))
    plt.xticks(tick_marks, classNames, rotation=90)
    plt.yticks(tick_marks, classNames)

    metrics = f"{micro:0.3}, {macro:0.3}"
    if plot_title:
        plt.title(plot_title +";\n" + metrics)
    else:
        plt.title(metrics)
    plt.tight_layout()
   
    fig1 = plt.gcf()
    image_title = f"{plot_title}.png"
    plt.show()
    plt.draw()
    fig1.savefig(image_title, dpi=100)
    
    return {"Run": plot_title,
            "microF1": micro,
            "macroF1": macro,
            "y_true": y_true.tolist(),
            "y_pred": y_pred}

Choose from the following models:
roberta_base_model(train_df), roberta_large_model(train_df)

In [73]:
rundict = eval_model(roberta_base_model,plot_title="XMLRobertaBase_run_1")

In [74]:
rundict["run"] = 1
rundict["model"] = "XMLRobertaBase"
print(rundict)