The next cell is just for checking the graphics card assigned

In [None]:
!nvidia-smi

In case it´s necessary unccomment the next line to install HuggingFace Transformers and Datasets library

In [None]:
#! pip install datasets transformers

For fine-tuning the models it´s necessary to authenticate you´r HuggingFace user. Execute the next cell and write your user´s token

In [3]:
from huggingface_hub import notebook_login

notebook_login()

Login successful
Your token has been saved to /root/.huggingface/token
[1m[31mAuthenticated through git-credential store but this isn't the helper defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default

git config --global credential.helper store[0m


Then you need to install Git-LFS. Uncomment the following instructions:

In [None]:
# !apt install git-lfs

Make sure your version of Transformers is at least 4.11.0 since the functionality was introduced in that version:

In [None]:
import transformers

print(transformers.__version__)

You can find a script version of this notebook to fine-tune your model in a distributed fashion using multiple GPUs or TPUs [here](https://github.com/huggingface/transformers/tree/master/examples/text-classification).

# Fine-tuning a model on a text classification task

## Loading the dataset

We will use the [🤗 Datasets](https://github.com/huggingface/datasets) library to create the Dataset object needed for the fine-tuning 

In [5]:
from datasets import load_dataset, Dataset, DatasetDict

The next cell loads the Dataset object from the CSV we have generated. The name of the files is specified in the code, just indicate the rout where they are. 

In order to define the labels for the model we have to define the seven classes of temporal relations as a Features object. 
Then the load_dataset() method will pair the values we have defined with the values on the CSVs.

As we have two datasets avaliable, it´s necessary to determine which one to use.

In [None]:
from datasets import Sequence, ClassLabel, Features, Value
from datasets import load_dataset, load_metric, Dataset, DatasetDict
class_names = ['NO', 'BEFORE', 'SIMULTANEOUS', 'CONTAINS', 'OVERLAP', 'ENDS-ON', 'BEGINS-ON']

event_features = Features({'file': Value('string'), 'sentence': Value('string'), 'type': Value('string'), 'labels': ClassLabel(num_classes=7, names=class_names)})

#Load line for dataset 1
#dataset_csv = load_dataset("csv", data_files={'train':'dataset_link_train.csv', 'test':'dataset_link_test.csv', 'dev':'dataset_link_dev.csv'}, features = event_features)

#Load line for dataset 2
#dataset_csv = load_dataset("csv", data_files={'train':'dataset_link_train_DOS.csv', 'test':'dataset_link_test_DOS.csv', 'dev':'dataset_link_dev_DOS.csv'}, features = event_features)



For checking that the dataset features are loaded correctly we can run the next cell

In [None]:
print(dataset_csv['dev'].features)

{'file': Value(dtype='string', id=None), 'sentence': Value(dtype='string', id=None), 'type': Value(dtype='string', id=None), 'labels': ClassLabel(num_classes=7, names=['NO', 'BEFORE', 'SIMULTANEOUS', 'CONTAINS', 'OVERLAP', 'ENDS-ON', 'BEGINS-ON'], id=None)}


To get a sense of what the data looks like, the following function will show some examples picked randomly in the dataset.

In [None]:
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [None]:
show_random_elements(dataset_csv["train"])

Unnamed: 0,file,sentence,type,labels
0,ES100363.xml,Paciente varón de 89 años sin antecedentes patológicos de interés que consultó por anorexia de <t>mes y medio</t><t>anorexia</t> de mes y medio de evolución acompañada de edema en manos y pies.,TIMEX3TimexLinkLink,BEFORE
1,ES100030.xml,A pesar del <t>tratamiento</t> oncológico recibido el paciente <t>falleció</t> a los 4 meses del diagnóstico.,EVENTTLINKLink,CONTAINS
2,ES100412.xml,Las <t>características</t> que presentó en la <t>ecografía</t> ocular fueron compatibles con un melanoma ocular de gran tamaño y extensión difusa.,EVENTTLINKLink,BEFORE
3,ES100410.xml,"Desde el cateterismo el paciente presentó un síndrome general con astenia y anorexia evidenciándose en la <t>analítica</t><t>cateterismo</t> el paciente presentó un síndrome general con astenia y anorexia evidenciándose en la analítica realizada a las dos semanas un empeoramiento de su función renal, con una creatinina de 7,5 mg/dl y una eosinofilia del 13%, por lo que fue ingresado con sospecha de insuficiencia renal aguda por CES tras cateterismo cardíaco.",TIMEX3TimexLinkLink,NO
4,ES100417.xml,"Entre los antecedentes familiares destaca que la madre tuvo la varicela estando embarazada (entre <t>las 16 y 18 semanas</t><t>remitido</t> por su pediatra por estrabismo. Presenta endotropía (15 grados) y pequeña hipertropía en ojo derecho (OD), el ojo izquierdo (OI) es el dominante. La agudeza visual (AV) es baja en OD (se molesta al ocluir el OI) y aceptable en OI (fija la mirada y coge objetos pequeños). El reflejo luminoso sobre la cornea nos hace sospechar una fijación excéntrica en ambos ojos.\r\nEl niño había padecido convulsiones febriles. Entre los antecedentes familiares destaca que la madre tuvo la varicela estando embarazada (entre las 16 y 18 semanas), y que un tío materno padece el síndrome de Alport.",TIMEX3TimexLinkLink,NO
5,ES100417.xml,"En el OD es de mayor tamaño, muy pigmentada y con tejido glial blanquecino en el centro; las arcadas temporales están curvadas hacia la zona patológica, y existe <t>atrofia</t><t>tamaño</t>, muy pigmentada y con tejido glial blanquecino en el centro; las arcadas temporales están curvadas hacia la zona patológica, y existe atrofia parapapilar temporal.",TIMEX3TimexLinkLink,NO
6,ES100079.xml,"Para completar <t>estudio</t> realizamos urografías por resonancia magnética (Uro-RNM), y estudio angiográfico mediante tomografia computerizada (angio-TC) con cavografía para <t>valorar</t> con la máxima precisión posible el alcance del trombo y la posible infiltración vascular tumoral.",EVENTTLINKLink,CONTAINS
7,ES100410.xml,"Remitido a oftalmología a los 40 días, para estudio de fondo de ojo, presentaba la siguiente exploración oftalmológica: la agudeza visual (AV) era de 0,8 en ambos ojos, la presión intraocular (PIO) de 13 mm de Hg y en el fondo de ojo derecho (OD) se observó una <t>microhemorragia</t><t>exploración</t> oftalmológica: la agudeza visual (AV) era de 0,8 en ambos ojos, la presión intraocular (PIO) de 13 mm de Hg y en el fondo de ojo derecho (OD) se observó una microhemorragia por encima de papila y cuatro émbolos de colesterol y en el fondo del ojo izquierdo (OI) otros dos émbolos localizados en rama temporal y nasal.",EVENTTLINKLink,BEFORE
8,ES100848.xml,A lo largo del ingresó fueron necesarias varias transfusiones y se inició <t>tratamiento</t><t>inició</t> tratamiento con somatostatina pese a lo que el sangrado se mantuvo activo.,EVENTTLINKLink,OVERLAP
9,ES100042.xml,"A la paciente se le indicó medidas higiénico-dietéticas y profilaxis antibiótica mantenida ciclo largo a dosis única diaria nocturna 3 meses y posteriormente días alternos durante 6 meses con ciprofloxacino, vitamina A dosis única diaria 6 meses, prednisona 30mg durante 45 días y posteriormente en días alternos durante otros 45 días hasta su suspensión definitiva, y por último <t>protección</t><t>vitamina</t> A dosis única diaria 6 meses, prednisona 30mg durante 45 días y posteriormente en días alternos durante otros 45 días hasta su suspensión definitiva, y por último protección digestiva con omeprazol.",TIMEX3TimexLinkLink,NO


## Preprocessing the data

For the preprocess we need to tokenize the input, for doing so we can use the AutoTokenizer method from HuggingFace. This will ensure to load the correct tokenizer.

As we are testing three models we have to load three tokenizers. Uncomment the line with the tokenizer of the model you want to train and the AutoTokenizer line.

For loading the tokenizer we use the use_fast=True argument for reducing the tokenizer time cost. This fast tokenizers are avaliable for almost every model. 

In [None]:
from transformers import AutoTokenizer

#ROBERTA-BIOMEDICAL
#model_checkpoint = "PlanTL-GOB-ES/roberta-base-biomedical-clinical-es"

#BETO
#model_checkpoint = 'dccuchile/bert-base-spanish-wwm-cased'

#ROBERTA BASE
#model_checkpoint = 'roberta-base'

#tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

For automitize the tokenization process we define the next method so we can tokenize it all together. The truncation argument is set to True so the data will be truncated to the maximum length the model accepts.

In [8]:
def preprocess_function(examples):
    return tokenizer(examples['sentence'], truncation=True, padding="max_length")

The next step of the tokenizing part is using the map method for mapping all the values of the existing dataset to the tokenized dataset.

In [None]:
encoded_dataset = dataset_csv.map(preprocess_function, batched=True)

We can see the tokenized dataset executing the next cell

In [None]:
print(encoded_dataset)

## Fine-tuning the model

Now that our data is ready, we can download the pretrained model and fine-tune it. Since all our tasks are about sentence classification, we use the `AutoModelForSequenceClassification` class. Like with the tokenizer, the `from_pretrained` method will download and cache the model for us. The only thing we have to specify is the number of labels for our problem (which is always 2, except for STS-B which is a regression problem and MNLI where we have 3 labels):

Once the data is prepared for use we have to load the pretrained model. For doing that we use the AutoModelForSequenceClassification class. As we have three models it´s nedded to unncomment the line with the model we want to use.

class_names is defined here too in case the cell it´s defined before have not been executed.

In [None]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, RobertaForSequenceClassification
class_names = ['NO', 'BEFORE', 'SIMULTANEOUS', 'CONTAINS', 'OVERLAP', 'ENDS-ON', 'BEGINS-ON']

#ROBERTA-BIOMEDICAL
#model_checkpoint = 'PlanTL-GOB-ES/roberta-base-biomedical-clinical-es'

#BETO
#model_checkpoint = 'dccuchile/bert-base-spanish-wwm-cased'

#ROBERTA BASE
#model_checkpoint = 'roberta-base'

model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=len(class_names))


We can check the model´s number of parameters uncommenting the next cell.

In [None]:
#print(model.num_parameters())

We load the DataCollator. This is needed in case the model doesn´t include it. It´s used for batching the data to the model in the fine-tuning

In [11]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer)


The next cell defines the model arguments. it includes the batch size, the weight decay and the batch_size. The TrainingArguments method have more options we don´t need to use. The push_to_hub argument is used for saving the model into your HuggingFace account.

Here we set the evaluation to be done at the end of each epoch. By default the Trainer loads the model with the best F-1 in the evaluation process. Another metric can be defined in the metric_for_best_model

In [None]:
#metric_name = "pearson" if task == "stsb" else "matthews_correlation" if task == "cola" else "accuracy"
model_name = model_checkpoint.split("/")[-1]
task = 'text_classification'
batch_size = 8
args = TrainingArguments(
    f"{model_name}-finetuned-{task}",
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=8,
    weight_decay=0.01,
    load_best_model_at_end=True,
    #metric_for_best_model=metric_name,
    #push_to_hub=True,
    logging_steps = 1
)

The last thing to define for our `Trainer` is how to compute the metrics from the predictions. We need to define a function for this, which will just use the `metric` we loaded earlier, the only preprocessing we have to do is to take the argmax of our predicted logits (our just squeeze the last axis in the case of STS-B):

The next line is for creating a csv file for storing the training results. 

In [31]:
#import pandas as pd
#columnas = ['precision', 'recall', 'f1-score', 'support', 'batch_size', 'model']
#df__ = pd.DataFrame(columns = columnas)
#df__.to_csv('class_report_dataset_1.csv')

In case you want to store the training results just uncomment the .to_csv() line and the previous cell

We calculate the precision, recall and F-1 with the sklearn methods. By defining the compute_metrics method we are telling the Trainer how to compute the metrics for the evaluation.

In [13]:
from sklearn import metrics
from sklearn.metrics import classification_report, precision_score, recall_score, f1_score, log_loss
import numpy as np
import pandas as pd

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    class_report = classification_report(labels, predictions, output_dict=True)
    df_class_report = pd.DataFrame(class_report).T
    df_class_report = df_class_report.drop('accuracy')
    df_class_report = df_class_report.drop('macro avg')
    df_class_report = df_class_report.assign(batch_size=[batch_size,batch_size,batch_size,batch_size,batch_size,batch_size,batch_size,batch_size])
    df_class_report = df_class_report.assign(model=['roberta', 'roberta', 'roberta', 'roberta','roberta', 'roberta', 'roberta', 'roberta'])
    #df_class_report.to_csv('class_report_dataset_1.csv', mode='a', header=False)


    precision_ = precision_score(labels, predictions, average = 'weighted')
    recall_ = recall_score(labels, predictions, average = 'weighted')
    f1_ =  f1_score(labels, predictions, average = 'weighted')
    return{
        "precision": precision_,
        "recall": recall_,
        "f1": f1_,
    }
    

As we have defined manually the loss function we need to overwrite the Trainer class by defining this CustomTrainer. 
The function is defined by the torch library as CrossEntropyLoss. We have added the .cuda() option at the end because if it´s not included the metrics will be calculated in the cpu instead of gpu/tpu and it can produce an error. If it´s necessary just errase or comment that part of the line

In [15]:
import torch
from torch import nn
from transformers import Trainer


class CustomTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.get("logits")
        # compute custom loss
        loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([0.45, 0.73, 0.96, 0.98, 0.93, 0.98, 0.98])).cuda()
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        return (loss, outputs) if return_outputs else loss

The next cell defines out custom trainer with the custom loss function.

In [29]:
#LOSS CON WEIGHTS
trainer = CustomTrainer(
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset['dev'],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

In case we don´t want to use the custom trainer we can define the default trainer with the default loss function

In [None]:
#validation_key = "validation_mismatched" if task == "mnli-mm" else "validation_matched" if task == "mnli" else "validation"
#LOSS SIN WEIGHTS
trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset['dev'],
    tokenizer=tokenizer,
    #data_collator=data_collator,
    compute_metrics=compute_metrics
)

We can now finetune our model by just calling the `train` method:

In [None]:
trainer.train()

We can check with the `evaluate` method that our `Trainer` did reload the best model properly (if it was not the last one):

In [None]:
trainer.evaluate()

At last we can check the model performance on the test dataset

In [None]:
from sklearn import metrics
from sklearn.metrics import classification_report, precision_score, recall_score, f1_score
import numpy as np
predictions, labels, _ = trainer.predict(encoded_dataset["test"])
predictions = np.argmax(predictions, axis=1)
print(classification_report(labels, predictions))
results = {
        "precision": precision_score(labels, predictions, average = 'weighted'),
        "recall": recall_score(labels, predictions, average = 'weighted'),
        "f1": f1_score(labels, predictions, average = 'weighted'),
    }
results

You can now upload the result of the training to the Hub, just execute this instruction:

In [None]:
#trainer.push_to_hub()