# Clasificación de tweets

## Fine-tuning del modelo BETO

Autor: Alberto Ramos Sánchez

### Contenido

* [Fine-tuning del modelo BETO](#Fine-tuning-del-modelo-BETO)
* [Dataset: TASS](#Dataset:-TASS)
* [Preparar tweets](#Preparar-tweets)
* [Crear dataset de entrenamiento](#Crear-dataset-de-entrenamiento)
    * [Tokenizar contenido](#Tokenizar-contenido)
    * [Sets de entrenamiento, validación y test](#Sets-de-entrenamiento,-validación-y-test)
    * [Crear dataset](#Crear-dataset)
* [Modelo de clasificación *Bert*](#Modelo-de-clasificación-*Bert*)
* [Entrenamiento](#Entrenamiento)
    * [Evaluación](#Evaluación)
* [Resultados](#Resultados)
    * [Con 3 clases](#Con-3-clases)
        * [Entrenamiento](#Entrenamiento)
        * [Validación](#Validación)
        * [Test](#Test)
    * [Con 6 clases](#Con-6-clases)
        * [Entrenamiento](#Entrenamiento)
        * [Validation](#Validation)
        * [Test](#Test)


In [1]:
import pandas as pd
import numpy as np

import datetime

import nltk
from nltk.stem import SnowballStemmer
stemmer = SnowballStemmer('spanish')

import re
import string

from pathlib import Path
import gzip

import tqdm

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, precision_score

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import Dataset, IterableDataset, DataLoader

from transformers import BertTokenizer, BertModel, AdamW

from pytorchtools.pytorchtools import EarlyStopping

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(device)

# seed
import random

seed = 42

random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

cuda


En este *notebook* se ha entrenado el modelo *Bert* para realizar *sentiment analysis* sobre el conjunto de tweets en español de TASS. El entrenamiento se ha llevado a cabo haciendo *finetuning* sobre la versión para español de *Bert* [*BETO*](https://github.com/dccuchile/beto).

## Dataset: TASS

Se ha trabajado con los *datasets* de tweets de [*TASS SEPLN*](http://tass.sepln.org/tass_data/download.php) desde el año 2012 al 2019. 
Todos los ficheros *xml* se han unido en un único dataset contenido en los ficheros *tweets.csv.gz*, *topics.csv.gz* y *polarities.csv.gz*

In [3]:
dataset_path = Path("./TASS/conversion_result/dataset31k")

In [4]:
df_tweets = pd.read_csv(dataset_path / Path("tweets.csv.gz"), compression='gzip', header=0, sep=';', quotechar='"')
df_topics = pd.read_csv(dataset_path / Path("topics.csv.gz"), compression='gzip', header=0, sep=';', quotechar='"')
df_polarities = pd.read_csv(dataset_path / Path("polarities.csv.gz"), compression='gzip', header=0, sep=';', quotechar='"')

df_topics = df_topics.rename(columns={'tweetid_fk': 'tweetid'})
df_polarities = df_polarities.rename(columns={'tweetid_fk': 'tweetid'})

En total, en el dataset hay aproximadamente 31 mil tweets.

In [5]:
f"Número total de tweets = {len(df_tweets)}"

'Número total de tweets = 31375'

Y están etiquetados en 6 categorías:

* P+ : Positivo fuerte (strong negative)
* P : Positivo
* NONE : Sin sentimiento (no sentiment tag)
* NEU : Neutro
* N : Negativo
* N+ : Negativo fuerte (strong negative)

En la siguiente tabla se muestra la cantidad de tweets por categoría:

In [6]:
df_polarities[['value', 'tweetid']].drop_duplicates(subset='tweetid', keep="first").groupby("value").count().rename(columns={"value": "Categoría", "tweetid": "Número de tweets"})

Unnamed: 0_level_0,Número de tweets
value,Unnamed: 1_level_1
N,6219
N+,976
NEU,2755
NONE,5597
P,5442
P+,2793


## Preparar tweets

En este apartado preprocesamos el texto de los tweets para entrenar el modelo.

En la función *clean_tweet* limpiamos el texto quedándonos solamente con los caractéres alfanuméricos. Eliminamos los nombres de usuario, url, vocales seguidas repetidas más de dos veces, tabulaciones, etc.

In [7]:
def clean_tweet(text):
    res_txt = re.sub("@\w+", "", text) # drop username
    res_txt = re.sub("https?://[A-Za-z0-9\./]+", "", res_txt) # drop url
    
    for p in string.punctuation:
        res_txt = res_txt.replace(p, " ")
    #res_txt = " ".join([c for c in res_txt if c not in string.punctuation])
    
    # eliminar palabras con más de 2 vocales seguidas "largooooo -> largoo"
    res_txt = re.sub("([A-Za-z])\\1{2,}", "\\1\\1", res_txt)
    
    # eliminar espacios y tabulaciones repetidas
    res_txt = re.sub("[ \t]+", " ", res_txt.strip())
    
    # mantener solamente caracteres alfanuméricos
    res_txt = re.sub(r'[^a-zñÑA-Z0-9áéíóúÁÉÍÓÚ ]', '', res_txt)
    return res_txt

En *prepare_tweet* eliminamos las *stopwords*.

In [8]:
def prepare_tweet(text):
    stop_words = nltk.corpus.stopwords.words('spanish')
    custom_stop_words = ["d", "q"]
    
    tokens = nltk.word_tokenize(text)
    tokens = [w for w in tokens if w not in stop_words]
    tokens = [w for w in tokens if w not in custom_stop_words]
    tokens = [w.lower() for w in tokens]
    
    # stemming
    #tokens = [stemmer.stem(w) for w in tokens]
    
    return " ".join(tokens)

Aplicamos el preprocesado, obteniendo el siguiente resultado.

In [9]:
print(f"Original :\n {df_tweets['content'][8235]}")

content = df_tweets['content']

content = content.apply(lambda tweet: prepare_tweet(clean_tweet(str(tweet))))

df_tweets['content'] = content

print(f"Resultado :\n {df_tweets['content'][8235]}")

Original :
 ;-)) RT @doloresvela: El #iPhone 5 podría ser presentado en marzo http://t.co/2kjKTjfF
Resultado :
 rt el iphone 5 podría ser presentado marzo


## Crear dataset de entrenamiento

En este apartado se crea el dataset de entrenamiento.

Unimos los dataframes en uno solo, que contiene el id del tweet, el contenido del tweet y la categoría.

In [10]:
data_tweets = df_tweets[['tweetid', 'content']]
data_sentim = df_polarities[['tweetid', 'value']]

data_tweets = data_tweets.merge(data_sentim, on="tweetid").drop_duplicates(subset='tweetid', keep="first").reset_index(drop=True)
data_tweets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23782 entries, 0 to 23781
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   tweetid  23782 non-null  int64 
 1   content  23782 non-null  object
 2   value    23782 non-null  object
dtypes: int64(1), object(2)
memory usage: 557.5+ KB


A continuación se cambia la etiqueta por un valor numérico que indica la clase.

In [11]:
label2id = {"N": 0,
            "N+": 1,
            "NONE": 2,
            "NEU": 3,
            "P": 4,
            "P+": 5}

NUMBER_CLASSES = 6

data_tweets["value"].replace(label2id, inplace=True)

data_tweets[['value', 'tweetid']].drop_duplicates(subset='tweetid', keep="first").groupby("value").count().rename(columns={"value": "Categoría", "tweetid": "Número de tweets"})

Unnamed: 0_level_0,Número de tweets
value,Unnamed: 1_level_1
0,6219
1,976
2,5597
3,2755
4,5442
5,2793


Con la variable *apply_balance* controlamos si queremos aplicar o no balanceo de datos.

En el caso de utilizar 6 clases, no aplicamos balanceo de datos, pues perderíamos la mayoría de los tweets debido a que la clase N+ tiene muchos menos tweets.

In [12]:
apply_balance = False

if apply_balance:
    g = data_tweets.groupby('value')
    data_tweets = g.apply(lambda x: x.sample(g.size().min()).reset_index(drop=True))
    data_tweets

### Tokenizar contenido

A continuación se tokeniza el texto, convirtiendo las palabras a números enteros, y añadiendo un token de clasificación *[CLS]* al inicio de la frase. Para esto se utiliza la clase *BertTokenizer* con los pesos de *BETO*.

In [13]:
BASE_MODEL = 'dccuchile/bert-base-spanish-wwm-uncased'

In [14]:
tokenizer = BertTokenizer.from_pretrained(BASE_MODEL)

Buscamos el tweet de mayor longitud y asignamos un valor de longitud mayor, para que la longitud de las frases tokenizadas contenga todos los tweets.

In [15]:
tweets = data_tweets['content']
tok = tweets.apply(lambda tuit: tokenizer.encode(tuit, add_special_tokens=True))
tok.apply(lambda x: len(x)).max()

45

In [16]:
max_token_len = 55

def tokenizar(df):
    # tokeniza texto y genera máscara de atención
    tweets = df['content'].tolist()
    tokenize_result = tokenizer(tweets,
                                add_special_tokens=True, 
                                max_length=max_token_len,
                                return_attention_mask=True,
                                padding='max_length')
    
    data_tensor = torch.LongTensor(tokenize_result.input_ids)
    mask_tensor = torch.LongTensor(tokenize_result.attention_mask)
    
    # crea etiquetas
    polarity_id = df['value'].tolist()
    
    target_tensor = torch.tensor(polarity_id)
    
    return data_tensor, target_tensor, mask_tensor

Además de tokenizar el texto, se crea un vector adicional llamado *attention mask*. Esta máscara indica al *transformer* donde está realmente el contenido de la frase, para así evitar aplicar mecanismos de atención sobre los tokens de *padding*.

In [17]:
data, target, mask = tokenizar(data_tweets)

data[0], target[0], mask[0]

(tensor([    4,  1035, 10406, 30956,  1836,  1948,  2803,  2403,  1361, 20922,
          1200,  1544, 20578, 30958,  1002, 28253,  1250,  7684,     5,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1]),
 tensor(4),
 tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0]))

In [18]:
data[2], tokenizer.convert_ids_to_tokens(data[2])

(tensor([    4,  1733,  1784, 30962, 11441,  2566, 30957,  2030, 15491,  2849,
          1825, 17810, 30958, 11925,  1943, 14352,  1137,  5489,  4878, 30955,
             5,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1]),
 ['[CLS]',
  'nom',
  '##eo',
  '##l',
  '##vido',
  'aprob',
  '##o',
  'ley',
  'aborto',
  'libre',
  'todas',
  'ministra',
  '##s',
  'salta',
  '##ban',
  'alegr',
  '##ia',
  'congreso',
  'llor',
  '##e',
  '[SEP]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[P

### Sets de entrenamiento, validación y test

Dividimos el *dataset* en entrenamiento, validación y test. Con la opción *stratify* aseguramos que se mantenga la misma proporción de clases para cada conjunto que el *dataset* original. Para asegurarlo, es posible que se pierdan ciertos tweets. Esta operación no balancea el dataset.

In [19]:
train_size = 0.90
val_size = 0.05

In [20]:
data_train, data_val, target_train, target_val, mask_train, mask_val = train_test_split(data, target, mask, test_size=1-train_size, stratify=target)

data_val, data_test, target_val, target_test, mask_val, mask_test = train_test_split(data_val, target_val, mask_val, test_size=1-(val_size/(1-train_size)), stratify=target_val)

data_train.shape, data_val.shape, data_test.shape

(torch.Size([21403, 55]), torch.Size([1189, 55]), torch.Size([1190, 55]))

### Crear dataset

La clase *TweetDataset* permite iterar sobre el *dataset* devolviendo los *tokens*.

In [21]:
class TweetDataset(Dataset):
    
    def __init__(self, data, target, mask):
        self.data = data
        self.target = target
        self.mask = mask
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        return self.data[idx], self.target[idx], self.mask[idx]

train_dataset = TweetDataset(data_train, target_train, mask_train)
val_dataset   = TweetDataset(data_val, target_val, mask_val)
test_dataset  = TweetDataset(data_test, target_test, mask_test)

## Modelo de clasificación *Bert*

El siguiente modelo utiliza la clase *BertModel* del módulo *transformers* de [*HuggingFace*](https://huggingface.co/transformers/) para clasificar los textos. La salida del *transformer* para el token *[CLS]* se combina con dos capas *fully connected*, generando un vector *one-hot* que indica la clase.

El modelo tiene tres métodos para serializar: *load_from*, *save_to* y *checkpoint*.

In [22]:
class ClassifierTransformerModel(nn.Module):
    
    def __init__(self, transformer_dim, hidden_dim, n_out, initialize_transformer=True):
        super(ClassifierTransformerModel, self).__init__()
        
        self.transformer_model = None
        if initialize_transformer:
            self.transformer_model = BertModel.from_pretrained(BASE_MODEL)
        
        self.in_classifier = nn.Linear(transformer_dim, hidden_dim)
        self.out_classifier = nn.Linear(hidden_dim, n_out)
    
    def forward(self, x, mask=None):
        
        transformer_out = self.transformer_model(x, attention_mask=mask).last_hidden_state
        cls_out = transformer_out[:, 0, :]
        
        hidden_t = self.in_classifier(cls_out)
        hidden_t = self.out_classifier(hidden_t)
        
        return F.log_softmax(hidden_t, dim=-1)
    
    def save_to(self, folder):
        """
        folder/model_*datestamp*/transformer/...
        folder/model_*datestamp*/classifier/...
        """
        
        # save model
        self.transformer_model.save_pretrained(folder / Path("transformer"))
        
        (folder / Path("classifier")).mkdir(parents=True, exist_ok=True)
        
        torch.save(self.state_dict(), folder / Path("classifier/model.pth"))
    
    @classmethod
    def load_from(cls, folder, dev, transformer_dim, hidden_dim, n_out):
        """
        folder = folder/*datestamp*
        """
        model = cls(transformer_dim, hidden_dim, n_out, initialize_transformer=False)
        
        recover_dict = torch.load(folder / Path("classifier/model.pth"), map_location=dev) # all keys
        model_dict = model.state_dict() # classifier keys
        
        filtered_dict = {k: v for k, v in recover_dict.items() if k in model_dict.keys()}
        
        model_dict.update(filtered_dict)
        model.load_state_dict(model_dict)
        
        model.transformer_model = BertModel.from_pretrained(folder / Path("transformer"))
        return model.to(dev)
    
    def checkpoint(self, folder, optimizer, stats):
        status = {
            'state_dict': self.state_dict(),
            'optimizer': optimizer.state_dict()
        }
        status = {**status, **stats}
        
        ts = datetime.datetime.now()
        ts_str = f"checkpoint_{ts.day}{ts.month}{ts.year}{ts.hour}{ts.minute}{ts.second}"
        (folder / Path(ts_str)).mkdir(parents=True, exist_ok=True)
        
        torch.save(status, folder / Path(ts_str) / Path("checkpoint.pth"))
    

El tamaño de salida del *BertModel* —768— será el tamaño de entrada de la primera *FFN*. Este valor se indica con la variable *n_subwords*.

In [23]:
model = BertModel.from_pretrained(BASE_MODEL, output_hidden_states=False).to(device)
cuda_data = data_train[0:1].to(device)
cuda_mask = mask_train[0:1].to(device)

a = model(cuda_data, attention_mask=cuda_mask).last_hidden_state.shape
del model
del cuda_data
del cuda_mask

print(a[2])

768


## Entrenamiento

Las dos siguientes funciones definen el entrenamiento y validación de la red. Mediante el flag *stop_train* interrumpiendo el *kernel* con el botón de *stop* puede parase el entrenamiento en cualquier momento.

Para evitar sobrecargar la memoria de la GPU, las variables de cargan a CUDA por *batches* únicamente en el momento de utilizarlas.

El modelo final se almacena en el directorio elegido.

In [24]:
n_subwords = 768
n_hidden = 16
n_categories = NUMBER_CLASSES

model = ClassifierTransformerModel(n_subwords, n_hidden, n_categories).to(device)

In [25]:
criterion = nn.CrossEntropyLoss()
lr = 1e-6
optimizer = AdamW(model.parameters(), lr=lr)
batch_size = 16

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True)

In [26]:
EPOCHS = 100

patience = 20
global stop_train
stop_train = False

def evaluation_iteration(model, loader, loss_function):
    global stop_train
    
    total_acc = 0
    total_precision = 0
    total_recall = 0
    
    total_loss = 0
    
    for batch_step, (data, target, mask) in enumerate(loader):
        try:
            cuda_data = data.to(device)
            cuda_target = target.to(device)
            cuda_mask = mask.to(device)

            with torch.no_grad():
                output = model(cuda_data, cuda_mask)

                loss = loss_function(output, cuda_target)

                total_loss += loss.item()

                prediction = output.argmax(dim=1)

                acc = accuracy_score(prediction.cpu(), cuda_target.cpu())#(prediction == cuda_target).cpu().numpy().mean()
                prec = precision_score(prediction.cpu(), cuda_target.cpu(), average="macro", zero_division=0.0)
                rec = recall_score(prediction.cpu(), cuda_target.cpu(), average="macro", zero_division=0.0)

                total_acc += acc
                total_precision += prec
                total_recall += rec
        except KeyboardInterrupt:
            try: del cuda_data
            except: pass

            try: del cuda_target
            except: pass

            try: del cuda_mask
            except: pass
            
            stop_train = True
            print("*** train-data variables removed from cuda memory ***")
            break
        
        # Free variables
        del cuda_data
        del cuda_target
        del cuda_mask
    
    T = len(loader)
    return map(lambda r: r/T, [total_loss, total_acc, total_precision, total_recall])

def train_iteration(folder, model, loader, loss_function, optim, nepochs, patience):
    global stop_train
    
    loader_train = loader[0]
    loader_val = loader[1]
    
    ts = datetime.datetime.now()
    best_model_path = Path(f"./temp/beto/best_model_{ts.day}{ts.month}{ts.year}{ts.hour}{ts.minute}{ts.second}")
    early_stopping = EarlyStopping(patience=patience, verbose=True, path=best_model_path)
    
    model.train()
    
    for epoch in range(1, nepochs+1):
        
        
        total_loss = 0
        total_acc = 0
        
        progress_bar = tqdm.tqdm(train_loader, desc='Bar desc', leave=True)

        for batch_step, (data, target, mask) in enumerate(progress_bar, 1):
            
            try:
                cuda_data = data.to(device)
                cuda_target = target.to(device)
                cuda_mask = mask.to(device)

                optim.zero_grad()

                output = model(cuda_data, cuda_mask)

                loss = loss_function(output, cuda_target)

                total_loss += loss.item()
                
                acc = accuracy_score(output.detach().cpu().argmax(dim=1), target)
                total_acc += acc

                loss.backward()

                torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

                optim.step()
                
                progress_bar.set_description(f"{epoch = } {batch_step = } loss = {total_loss / batch_step :0.4f} acc = {total_acc / batch_step :0.4f}")
            
            except KeyboardInterrupt:
                try: del cuda_data
                except: pass
                
                try: del cuda_target
                except: pass
                
                try: del cuda_mask
                except: pass
                print("*** train-data variables removed from cuda memory ***")
                global stop_train
                stop_train = True
                break
            
            # Free variables
            del cuda_data
            del cuda_target
            del cuda_mask
        
        if stop_train: # stop from train
            print("*** stoping training ***")
            break
        
        model.eval()
        val_loss, val_acc, val_prec, val_recall = evaluation_iteration(model, loader_val, loss_function)
        model.train()
        
        if stop_train: # stop from evaluation
            break
        
        print("*"*20)
        print(f"{epoch = }/{nepochs} loss = {total_loss / len(loader_train) :.4f} acc = {total_acc / len(loader_train) :.4f} | {val_loss = :.4f} | {val_acc = :.4f} {val_prec = :.4f} {val_recall = :.4f}")
        print("*"*20)
        
        stats = {
            'epoch': epoch,
            'total_epochs': nepochs,
            'loss': total_loss / len(loader_train),
            'val_loss': val_loss,
            'val_accuracy': val_acc,
            'val_precision': val_prec,
            'val_recall': val_recall
        }
        
        model.checkpoint(folder=folder,
                         optimizer=optimizer,
                         stats=stats)
        
        early_stopping(val_loss, model)
        if early_stopping.early_stop:
            print("Early stopping")
            break
    
    if not best_model_path.is_dir(): # there is no best model to save
        return model
    
    # load best model found
    best_model = ClassifierTransformerModel.load_from(folder=best_model_path,
                                                      dev=device,
                                                      transformer_dim=n_subwords,
                                                      hidden_dim=n_hidden,
                                                      n_out=n_categories)
    
    # save model
    ts = datetime.datetime.now()
    ts_str = f"model_{ts.day}{ts.month}{ts.year}{ts.hour}{ts.minute}{ts.second}"
    best_model.save_to(folder / Path(ts_str))
    return best_model

In [27]:
directory = Path(f"modelos_beto/train_{datetime.datetime.timestamp(datetime.datetime.now()):0.0f}")
best_model = train_iteration(directory, model, (train_loader, val_loader), criterion, optimizer, EPOCHS, patience)

epoch = 1 batch_step = 1338 loss = 1.5685 acc = 0.3704: 100%|████| 1338/1338 [09:18<00:00,  2.39it/s]


********************
epoch = 1/100 loss = 1.5685 acc = 0.3704 | val_loss = 1.4224 | val_acc = 0.4405 val_prec = 0.3544 val_recall = 0.2688
********************
Validation loss decreased (inf --> 1.422391).  Saving model ...


epoch = 2 batch_step = 1338 loss = 1.3709 acc = 0.4659: 100%|████| 1338/1338 [09:20<00:00,  2.39it/s]


********************
epoch = 2/100 loss = 1.3709 acc = 0.4659 | val_loss = 1.3329 | val_acc = 0.4630 val_prec = 0.3697 val_recall = 0.3058
********************
Validation loss decreased (1.422391 --> 1.332904).  Saving model ...


epoch = 3 batch_step = 1338 loss = 1.2965 acc = 0.4928: 100%|████| 1338/1338 [09:20<00:00,  2.39it/s]


********************
epoch = 3/100 loss = 1.2965 acc = 0.4928 | val_loss = 1.2983 | val_acc = 0.4870 val_prec = 0.3964 val_recall = 0.3535
********************
Validation loss decreased (1.332904 --> 1.298311).  Saving model ...


epoch = 4 batch_step = 1338 loss = 1.2566 acc = 0.5073: 100%|████| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 4/100 loss = 1.2566 acc = 0.5073 | val_loss = 1.2759 | val_acc = 0.4915 val_prec = 0.3995 val_recall = 0.3508
********************
Validation loss decreased (1.298311 --> 1.275895).  Saving model ...


epoch = 5 batch_step = 1338 loss = 1.2249 acc = 0.5229: 100%|████| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 5/100 loss = 1.2249 acc = 0.5229 | val_loss = 1.2737 | val_acc = 0.4903 val_prec = 0.4196 val_recall = 0.4110
********************
Validation loss decreased (1.275895 --> 1.273665).  Saving model ...


epoch = 6 batch_step = 1338 loss = 1.1922 acc = 0.5339: 100%|████| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 6/100 loss = 1.1922 acc = 0.5339 | val_loss = 1.2786 | val_acc = 0.4938 val_prec = 0.4160 val_recall = 0.4013
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 1 out of 20


epoch = 7 batch_step = 1338 loss = 1.1629 acc = 0.5493: 100%|████| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 7/100 loss = 1.1629 acc = 0.5493 | val_loss = 1.2689 | val_acc = 0.5030 val_prec = 0.4268 val_recall = 0.4101
********************
Validation loss decreased (1.273665 --> 1.268867).  Saving model ...


epoch = 8 batch_step = 1338 loss = 1.1340 acc = 0.5595: 100%|████| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 8/100 loss = 1.1340 acc = 0.5595 | val_loss = 1.2772 | val_acc = 0.4928 val_prec = 0.4225 val_recall = 0.4108
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 1 out of 20


epoch = 9 batch_step = 1338 loss = 1.1079 acc = 0.5701: 100%|████| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 9/100 loss = 1.1079 acc = 0.5701 | val_loss = 1.2937 | val_acc = 0.4937 val_prec = 0.4258 val_recall = 0.4126
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 2 out of 20


epoch = 10 batch_step = 1338 loss = 1.0828 acc = 0.5850: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 10/100 loss = 1.0828 acc = 0.5850 | val_loss = 1.2828 | val_acc = 0.4930 val_prec = 0.4267 val_recall = 0.4167
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 3 out of 20


epoch = 11 batch_step = 1338 loss = 1.0557 acc = 0.5934: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 11/100 loss = 1.0557 acc = 0.5934 | val_loss = 1.3137 | val_acc = 0.4912 val_prec = 0.4285 val_recall = 0.4106
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 4 out of 20


epoch = 12 batch_step = 1338 loss = 1.0307 acc = 0.6015: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 12/100 loss = 1.0307 acc = 0.6015 | val_loss = 1.3166 | val_acc = 0.4825 val_prec = 0.4164 val_recall = 0.4229
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 5 out of 20


epoch = 13 batch_step = 1338 loss = 1.0041 acc = 0.6154: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 13/100 loss = 1.0041 acc = 0.6154 | val_loss = 1.3081 | val_acc = 0.5057 val_prec = 0.4401 val_recall = 0.4321
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 6 out of 20


epoch = 14 batch_step = 1338 loss = 0.9748 acc = 0.6259: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 14/100 loss = 0.9748 acc = 0.6259 | val_loss = 1.3466 | val_acc = 0.4875 val_prec = 0.4247 val_recall = 0.4297
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 7 out of 20


epoch = 15 batch_step = 1338 loss = 0.9550 acc = 0.6360: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 15/100 loss = 0.9550 acc = 0.6360 | val_loss = 1.3574 | val_acc = 0.4943 val_prec = 0.4397 val_recall = 0.4191
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 8 out of 20


epoch = 16 batch_step = 1338 loss = 0.9229 acc = 0.6494: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 16/100 loss = 0.9229 acc = 0.6494 | val_loss = 1.3659 | val_acc = 0.4950 val_prec = 0.4257 val_recall = 0.4397
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 9 out of 20


epoch = 17 batch_step = 1338 loss = 0.8959 acc = 0.6627: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 17/100 loss = 0.8959 acc = 0.6627 | val_loss = 1.3982 | val_acc = 0.4923 val_prec = 0.4325 val_recall = 0.4255
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 10 out of 20


epoch = 18 batch_step = 1338 loss = 0.8641 acc = 0.6727: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 18/100 loss = 0.8641 acc = 0.6727 | val_loss = 1.4041 | val_acc = 0.5013 val_prec = 0.4386 val_recall = 0.4390
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 11 out of 20


epoch = 19 batch_step = 1338 loss = 0.8388 acc = 0.6850: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 19/100 loss = 0.8388 acc = 0.6850 | val_loss = 1.4257 | val_acc = 0.4973 val_prec = 0.4555 val_recall = 0.4179
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 12 out of 20


epoch = 20 batch_step = 1338 loss = 0.8126 acc = 0.6949: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 20/100 loss = 0.8126 acc = 0.6949 | val_loss = 1.4606 | val_acc = 0.4912 val_prec = 0.4320 val_recall = 0.4286
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 13 out of 20


epoch = 21 batch_step = 1338 loss = 0.7866 acc = 0.7058: 100%|███| 1338/1338 [09:20<00:00,  2.39it/s]


********************
epoch = 21/100 loss = 0.7866 acc = 0.7058 | val_loss = 1.4774 | val_acc = 0.4962 val_prec = 0.4330 val_recall = 0.4357
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 14 out of 20


epoch = 22 batch_step = 1338 loss = 0.7606 acc = 0.7192: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 22/100 loss = 0.7606 acc = 0.7192 | val_loss = 1.5022 | val_acc = 0.4895 val_prec = 0.4403 val_recall = 0.4274
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 15 out of 20


epoch = 23 batch_step = 1338 loss = 0.7302 acc = 0.7311: 100%|███| 1338/1338 [09:19<00:00,  2.39it/s]


********************
epoch = 23/100 loss = 0.7302 acc = 0.7311 | val_loss = 1.5302 | val_acc = 0.4810 val_prec = 0.4180 val_recall = 0.4425
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 16 out of 20


epoch = 24 batch_step = 1338 loss = 0.7051 acc = 0.7399: 100%|███| 1338/1338 [09:18<00:00,  2.39it/s]


********************
epoch = 24/100 loss = 0.7051 acc = 0.7399 | val_loss = 1.5699 | val_acc = 0.4895 val_prec = 0.4172 val_recall = 0.4172
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 17 out of 20


epoch = 25 batch_step = 1338 loss = 0.6754 acc = 0.7508: 100%|███| 1338/1338 [09:18<00:00,  2.39it/s]


********************
epoch = 25/100 loss = 0.6754 acc = 0.7508 | val_loss = 1.5809 | val_acc = 0.4920 val_prec = 0.4355 val_recall = 0.4268
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 18 out of 20


epoch = 26 batch_step = 1338 loss = 0.6569 acc = 0.7598: 100%|███| 1338/1338 [09:18<00:00,  2.39it/s]


********************
epoch = 26/100 loss = 0.6569 acc = 0.7598 | val_loss = 1.6286 | val_acc = 0.4728 val_prec = 0.4113 val_recall = 0.4163
********************


Bar desc:   0%|                                                             | 0/1338 [00:00<?, ?it/s]

EarlyStopping counter: 19 out of 20


epoch = 27 batch_step = 1338 loss = 0.6300 acc = 0.7717: 100%|███| 1338/1338 [09:18<00:00,  2.39it/s]


********************
epoch = 27/100 loss = 0.6300 acc = 0.7717 | val_loss = 1.6419 | val_acc = 0.4798 val_prec = 0.4261 val_recall = 0.4242
********************
EarlyStopping counter: 20 out of 20
Early stopping


----

Guardamos los parámetros utilizados junto al modelo:

In [28]:
f = open(directory / Path("parameters.txt"), "w")
f.write(f"{n_hidden = }\n" + \
        f"{lr = }\n" + \
        f"{NUMBER_CLASSES = }\n"
       )
f.close()

---

### Evaluación

In [29]:
_ = best_model.eval()

In [30]:
val_loss, val_acc, val_prec, val_recall = evaluation_iteration(best_model, val_loader, criterion)

f"{val_loss = :.4f} {val_acc = :.4f} {val_prec = :.4f} {val_recall = :.4f}"

'val_loss = 1.2750 val_acc = 0.5012 val_prec = 0.4311 val_recall = 0.4064'

In [31]:
test_loss, test_acc, test_prec, test_recall = evaluation_iteration(best_model, test_loader, criterion)

f"{test_loss = :.4f} {test_acc = :.4f} {test_prec = :.4f} {test_recall = :.4f}"

'test_loss = 1.2643 test_acc = 0.4908 test_prec = 0.4076 test_recall = 0.3849'

<hr style="height: 4px">

## Resultados

### Con 3 clases

Se ha entrenado el *dataset* agrupado y **balanceado en 3 categorías** (positivo con P y P+, neutro con NEU y NONE, y negativo con N y N+).
Los siguientes resultados se han obtenido recorriendo el *dataset* en ***batches* de tamaño 16**, y con **una *patience* a 5 del *early stopping***. 

El tamaño de *batches* es el máximo que permite la memoria de la GPU donde se entrenó la red. Se ha elegido un valor bajo de *patience* para reducir el tiempo de entrenamiento.

#### Entrenamiento

<table>
<thead>
  <tr>
    <th>lr</th>
    <th>Hidden dim</th>
    <th>8</th>
    <th>16</th>
    <th>32</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="2"><b>1e-4</b></td>
    <td><b>Loss</b></td>
    <td>0.9899</td>
    <td>0.9260</td>
    <td>0.9397</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.5267</td>
    <td>0.5805</td>
    <td>0.5610</td>
  </tr>
  <tr>
    <td rowspan="2"><b>1e-6</b></td>
    <td><b>Loss</b></td>
    <td>0.7660</td>
    <td>0.7022</td>
    <td>0.7532</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.6622</td>
    <td>0.6976</td>
    <td>0.6730</td>
  </tr>
</tbody>
</table>

#### Validación

<table>
<thead>
  <tr>
    <th>lr</th>
    <th>Hidden dim</th>
    <th>8</th>
    <th>16</th>
    <th>32</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="4"><b>1e-4</b></td>
    <td><b>Loss</b></td>
    <td>1.0009</td>
    <td>0.8769</td>
    <td>0.9882</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.5192</td>
    <td>0.6138</td>
    <td>0.5247</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.5139</td>
    <td>0.6256</td>
    <td>0.5201</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.4010</td>
    <td>0.6396</td>
    <td>0.5253</td>
  </tr>
  <tr>
    <td rowspan="4"><b>1e-6</b></td>
    <td><b>Loss</b></td>
    <td>0.7975</td>
    <td>0.7894</td>
    <td>0.7989</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.6578</td>
    <td>0.6636</td>
    <td>0.6580</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.6513</td>
    <td>0.6584</td>
    <td>0.6581</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.6459</td>
    <td>0.6593</td>
    <td>0.6703</td>
  </tr>
</tbody>
</table>

#### Test

<table>
<thead>
  <tr>
    <th>lr</th>
    <th>Hidden dim</th>
    <th>8</th>
    <th>16</th>
    <th>32</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="4"><b>1e-4</b></td>
    <td><b>Loss</b></td>
    <td>1.0063</td>
    <td>0.8866</td>
    <td>1.0004</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.5138</td>
    <td>0.6048</td>
    <td>0.5184</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.5095</td>
    <td>0.6017</td>
    <td>0.5149</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.3857</td>
    <td>0.6195</td>
    <td>0.5328</td>
  </tr>
  <tr>
    <td rowspan="4"><b>1e-6</b></td>
    <td><b>Loss</b></td>
    <td>0.7981</td>
    <td>0.8066</td>
    <td>0.7945</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.6461</td>
    <td>0.6397</td>
    <td>0.6581</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.6344</td>
    <td>0.6324</td>
    <td>0.6551</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.6402</td>
    <td>0.6402</td>
    <td>0.6525</td>
  </tr>
</tbody>
</table>

### Con 6 clases

Se ha entrenado el *dataset* agrupado y **con las 6 categorías**.
Los siguientes resultados se han obtenido recorriendo el *dataset* en ***batches* de tamaño 16**, y con **una *patience* a 5 del *early stopping***. 

El tamaño de *batches* es el máximo que permite la memoria de la GPU donde se entrenó la red. Se ha elegido un valor bajo de *patience* para reducir el tiempo de entrenamiento.

#### Entrenamiento

<table>
<thead>
  <tr>
    <th>lr</th>
    <th>Hidden dim</th>
    <th>8</th>
    <th>16</th>
    <th>32</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="2"><b>1e-4</b></td>
    <td><b>Loss</b></td>
    <td>1.4271</td>
    <td>1.2971</td>
    <td>1.4243</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.4435</td>
    <td>0.5159</td>
    <td>0.4463</td>
  </tr>
  <tr>
    <td rowspan="2"><b>1e-6</b></td>
    <td><b>Loss</b></td>
    <td>1.2457</td>
    <td>1.1629</td>
    <td>1.1655</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.5157</td>
    <td>0.5493</td>
    <td>0.5481</td>
  </tr>
</tbody>
</table>

#### Validation

<table>
<thead>
  <tr>
    <th>lr</th>
    <th>Hidden dim</th>
    <th>8</th>
    <th>16</th>
    <th>32</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="4"><b>1e-4</b></td>
    <td><b>Loss</b></td>
    <td>1.3767</td>
    <td>1.4305</td>
    <td>1.3279</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.4443</td>
    <td>0.4612</td>
    <td>0.4738</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.3664</td>
    <td>0.3639</td>
    <td>0.3746</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.3283</td>
    <td>0.2809</td>
    <td>0.3096</td>
  </tr>
  <tr>
    <td rowspan="4"><b>1e-6</b></td>
    <td><b>Loss</td>
    <td>1.2832</td>
    <td>1.2689</td>
    <td>1.2797</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.4997</td>
    <td>0.5030</td>
    <td>0.4895</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.4270</td>
    <td>0.4268</td>
    <td>0.4104</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.3831</td>
    <td>0.4101</td>
    <td>0.3970</td>
  </tr>
</tbody>
</table>

#### Test

<table>
<thead>
  <tr>
    <th>lr</th>
    <th>Hidden dim</th>
    <th>8</th>
    <th>16</th>
    <th>32</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="4"><b>1e-4</b></td>
    <td><b>Loss</b></td>
    <td>1.3336</td>
    <td>1.4271</td>
    <td>1.3293</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.4900</td>
    <td>0.4733</td>
    <td>0.4828</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.4108</td>
    <td>0.3779</td>
    <td>0.3926</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.3606</td>
    <td>0.2784</td>
    <td>0.3268</td>
  </tr>
  <tr>
    <td rowspan="4"><b>1e-6</b></td>
    <td><b>Loss</b></td>
    <td>1.2907</td>
    <td>1.2643</td>
    <td>1.2641</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.4975</td>
    <td>0.4908</td>
    <td>0.4933</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.4079</td>
    <td>0.4076</td>
    <td>0.4185</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.3728</td>
    <td>0.3849</td>
    <td>0.4004</td>
  </tr>
</tbody>
</table>