# Clasificación de tweets

## Fine-tuning con BertForSequenceClassification

Autor: Alberto Ramos Sánchez



### Contenido
* [Dataset: TASS](#Dataset:-TASS)
* [Preparar tweets](#Preparar-tweets)
* [Crear dataset de entrenamiento](#Crear-dataset-de-entrenamiento)
    * [Tokenizar contenido](#Tokenizar-contenido)
    * [Sets de entrenamiento, validación y test](#Sets-de-entrenamiento,-validación-y-test)
    * [Crear dataset](#Crear-dataset)
* [Modelo de clasificación: *BertForSequenceClassification*](#Modelo-de-clasificación:-*BertForSequenceClassification*)
* [Entrenamiento](#Entrenamiento)
    * [Evaluación](#Evaluación)
* [Resultados](#Resultados)
    * [Con 3 clases](#Con-3-clases)
        * [Entrenamiento](#Entrenamiento)
        * [Validación](#Validación)
        * [Test](#Test)
    * [Con 6 clases](#Con-6-clases)
        * [Entrenamiento](#Entrenamiento)
        * [Validation](#Validación)
        * [Test](#Test)


In [1]:
import pandas as pd
import numpy as np

import datetime

import nltk
from nltk.stem import SnowballStemmer
stemmer = SnowballStemmer('spanish')

import re
import string

from pathlib import Path
import gzip

import tqdm

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, precision_score

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import Dataset, IterableDataset, DataLoader

from transformers import BertTokenizer, BertModel, AdamW

from pytorchtools.pytorchtools import EarlyStopping

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(device)

# seed
import random

seed = 42

random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

cuda


En este *notebook* se ha entrenado el modelo *Bert* para realizar *sentiment analysis* sobre el conjunto de tweets en español de TASS. El entrenamiento se ha llevado a cabo haciendo *finetuning* sobre la versión para español de *Bert* [*BETO*](https://github.com/dccuchile/beto).

## Dataset: TASS

Se ha trabajado con los *datasets* de tweets de [*TASS SEPLN*](http://tass.sepln.org/tass_data/download.php) desde el año 2012 al 2019. 
Todos los ficheros *xml* se han unido en un único dataset contenido en los ficheros *tweets.csv.gz*, *topics.csv.gz* y *polarities.csv.gz*

In [3]:
dataset_path = Path("./TASS/conversion_result/dataset31k")

In [4]:
df_tweets = pd.read_csv(dataset_path / Path("tweets.csv.gz"), compression='gzip', header=0, sep=';', quotechar='"')
df_topics = pd.read_csv(dataset_path / Path("topics.csv.gz"), compression='gzip', header=0, sep=';', quotechar='"')
df_polarities = pd.read_csv(dataset_path / Path("polarities.csv.gz"), compression='gzip', header=0, sep=';', quotechar='"')

df_topics = df_topics.rename(columns={'tweetid_fk': 'tweetid'})
df_polarities = df_polarities.rename(columns={'tweetid_fk': 'tweetid'})

En total, en el dataset hay aproximadamente 31 mil tweets.

In [5]:
f"Número total de tweets = {len(df_tweets)}"

'Número total de tweets = 31375'

Y están etiquetados en 6 categorías:

* P+ : Positivo fuerte (strong negative)
* P : Positivo
* NONE : Sin sentimiento (no sentiment tag)
* NEU : Neutro
* N : Negativo
* N+ : Negativo fuerte (strong negative)

En la siguiente tabla se muestra la cantidad de tweets por categoría:

In [6]:
df_polarities[['value', 'tweetid']].drop_duplicates(subset='tweetid', keep="first").groupby("value").count().rename(columns={"value": "Categoría", "tweetid": "Número de tweets"})

Unnamed: 0_level_0,Número de tweets
value,Unnamed: 1_level_1
N,6219
N+,976
NEU,2755
NONE,5597
P,5442
P+,2793


## Preparar tweets

En este apartado preprocesamos el texto de los tweets para entrenar el modelo.

En la función *clean_tweet* limpiamos el texto quedándonos solamente con los caractéres alfanuméricos. Eliminamos los nombres de usuario, url, vocales seguidas repetidas más de dos veces, tabulaciones, etc.

In [7]:
def clean_tweet(text):
    res_txt = re.sub("@\w+", "", text) # drop username
    res_txt = re.sub("https?://[A-Za-z0-9\./]+", "", res_txt) # drop url
    
    for p in string.punctuation:
        res_txt = res_txt.replace(p, " ")
    #res_txt = " ".join([c for c in res_txt if c not in string.punctuation])
    
    # eliminar palabras con más de 2 vocales seguidas "largooooo -> largoo"
    res_txt = re.sub("([A-Za-z])\\1{2,}", "\\1\\1", res_txt)
    
    # eliminar espacios y tabulaciones repetidas
    res_txt = re.sub("[ \t]+", " ", res_txt.strip())
    
    # mantener solamente caracteres alfanuméricos
    res_txt = re.sub(r'[^a-zñÑA-Z0-9áéíóúÁÉÍÓÚ ]', '', res_txt)
    return res_txt

En *prepare_tweet* eliminamos las *stopwords*.

In [8]:
def prepare_tweet(text):
    stop_words = nltk.corpus.stopwords.words('spanish')
    custom_stop_words = ["d", "q"]
    
    tokens = nltk.word_tokenize(text)
    tokens = [w for w in tokens if w not in stop_words]
    tokens = [w for w in tokens if w not in custom_stop_words]
    tokens = [w.lower() for w in tokens]
    
    # stemming
    #tokens = [stemmer.stem(w) for w in tokens]
    
    return " ".join(tokens)

Aplicamos el preprocesado, obteniendo el siguiente resultado.

In [9]:
print(f"Original :\n {df_tweets['content'][8235]}")

content = df_tweets['content']

content = content.apply(lambda tweet: prepare_tweet(clean_tweet(str(tweet))))

df_tweets['content'] = content

print(f"Resultado :\n {df_tweets['content'][8235]}")

Original :
 ;-)) RT @doloresvela: El #iPhone 5 podría ser presentado en marzo http://t.co/2kjKTjfF
Resultado :
 rt el iphone 5 podría ser presentado marzo


## Crear dataset de entrenamiento

En este apartado se crea el dataset de entrenamiento.

Unimos los dataframes en uno solo, que contiene el id del tweet, el contenido del tweet y la categoría.

In [10]:
data_tweets = df_tweets[['tweetid', 'content']]
data_sentim = df_polarities[['tweetid', 'value']]

data_tweets = data_tweets.merge(data_sentim, on="tweetid").drop_duplicates(subset='tweetid', keep="first").reset_index(drop=True)
data_tweets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23782 entries, 0 to 23781
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   tweetid  23782 non-null  int64 
 1   content  23782 non-null  object
 2   value    23782 non-null  object
dtypes: int64(1), object(2)
memory usage: 557.5+ KB


A continuación se cambia la etiqueta por un valor numérico que indica la clase.

In [11]:
label2id = {"N": 0,
            "N+": 1,
            "NONE": 2,
            "NEU": 3,
            "P": 4,
            "P+": 5}

NUMBER_CLASSES = 6

data_tweets["value"].replace(label2id, inplace=True)

data_tweets[['value', 'tweetid']].drop_duplicates(subset='tweetid', keep="first").groupby("value").count().rename(columns={"value": "Categoría", "tweetid": "Número de tweets"})

Unnamed: 0_level_0,Número de tweets
value,Unnamed: 1_level_1
0,6219
1,976
2,5597
3,2755
4,5442
5,2793


Con la variable *apply_balance* controlamos si queremos aplicar o no balanceo de datos.

En el caso de utilizar 6 clases, no aplicamos balanceo de datos, pues perderíamos la mayoría de los tweets debido a que la clase N+ tiene muchos menos tweets.

In [12]:
apply_balance = False

if apply_balance:
    g = data_tweets.groupby('value')
    data_tweets = g.apply(lambda x: x.sample(g.size().min()).reset_index(drop=True))
    data_tweets

### Tokenizar contenido

A continuación se tokeniza el texto, convirtiendo las palabras a números enteros, y añadiendo un token de clasificación *[CLS]* al inicio de la frase. Para esto se utiliza la clase *BertTokenizer* con los pesos de *BETO*.

In [13]:
BASE_MODEL = 'dccuchile/bert-base-spanish-wwm-uncased'

In [14]:
tokenizer = BertTokenizer.from_pretrained(BASE_MODEL)

Buscamos el tweet de mayor longitud y asignamos un valor de longitud mayor, para que la longitud de las frases tokenizadas contenga todos los tweets.

In [15]:
tweets = data_tweets['content']
tok = tweets.apply(lambda tuit: tokenizer.encode(tuit, add_special_tokens=True))
tok.apply(lambda x: len(x)).max()

45

In [16]:
max_token_len = 55

def tokenizar(df):
    # tokeniza texto y genera máscara de atención
    tweets = df['content'].tolist()
    tokenize_result = tokenizer(tweets,
                                add_special_tokens=True, 
                                max_length=max_token_len,
                                return_attention_mask=True,
                                padding='max_length')
    
    data_tensor = torch.LongTensor(tokenize_result.input_ids)
    mask_tensor = torch.LongTensor(tokenize_result.attention_mask)
    
    # crea etiquetas
    polarity_id = df['value'].tolist()
    
    target_tensor = torch.tensor(polarity_id)
    
    return data_tensor, target_tensor, mask_tensor

Además de tokenizar el texto, se crea un vector adicional llamado *attention mask*. Esta máscara indica al *transformer* donde está realmente el contenido de la frase, para así evitar aplicar mecanismos de atención sobre los tokens de *padding*.

In [17]:
data, target, mask = tokenizar(data_tweets)

data[0], target[0], mask[0]

(tensor([    4,  1035, 10406, 30956,  1836,  1948,  2803,  2403,  1361, 20922,
          1200,  1544, 20578, 30958,  1002, 28253,  1250,  7684,     5,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1]),
 tensor(4),
 tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0]))

In [18]:
data[2], tokenizer.convert_ids_to_tokens(data[2])

(tensor([    4,  1733,  1784, 30962, 11441,  2566, 30957,  2030, 15491,  2849,
          1825, 17810, 30958, 11925,  1943, 14352,  1137,  5489,  4878, 30955,
             5,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1]),
 ['[CLS]',
  'nom',
  '##eo',
  '##l',
  '##vido',
  'aprob',
  '##o',
  'ley',
  'aborto',
  'libre',
  'todas',
  'ministra',
  '##s',
  'salta',
  '##ban',
  'alegr',
  '##ia',
  'congreso',
  'llor',
  '##e',
  '[SEP]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[PAD]',
  '[P

### Sets de entrenamiento, validación y test

Dividimos el *dataset* en entrenamiento, validación y test. Con la opción *stratify* aseguramos que se mantenga la misma proporción de clases para cada conjunto que el *dataset* original. Para asegurarlo, es posible que se pierdan ciertos tweets. Esta operación no balancea el dataset.

In [19]:
train_size = 0.90
val_size = 0.05

In [20]:
data_train, data_val, target_train, target_val, mask_train, mask_val = train_test_split(data, target, mask, test_size=1-train_size, stratify=target)

data_val, data_test, target_val, target_test, mask_val, mask_test = train_test_split(data_val, target_val, mask_val, test_size=1-(val_size/(1-train_size)), stratify=target_val)

data_train.shape, data_val.shape, data_test.shape

(torch.Size([21403, 55]), torch.Size([1189, 55]), torch.Size([1190, 55]))

### Crear dataset

La clase *TweetDataset* permite iterar sobre el *dataset* devolviendo los *tokens*.

In [21]:
class TweetDataset(Dataset):
    
    def __init__(self, data, target, mask):
        self.data = data
        self.target = target
        self.mask = mask
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        return self.data[idx], self.target[idx], self.mask[idx]

train_dataset = TweetDataset(data_train, target_train, mask_train)
val_dataset = TweetDataset(data_val, target_val, mask_val)
test_dataset = TweetDataset(data_test, target_test, mask_test)

## Modelo de clasificación: *BertForSequenceClassification*

En este caso utilizaremos de la librería *transformers* de [*HuggingFace*](https://huggingface.co/transformers/) la clase *BertForSequenceClassification*, que viene preparada para reutilizar *Bert* para tareas de clasificación de texto.

In [22]:
from transformers import BertForSequenceClassification

In [23]:
MODEL = BertForSequenceClassification.from_pretrained(BASE_MODEL, num_labels=NUMBER_CLASSES).to(device)
lr = 1e-6
OPTIMIZER = AdamW(MODEL.parameters(), lr=lr)
CRITERION = nn.CrossEntropyLoss()
ITERATIONS = 100
PATIENCE = 20

batchsize = 8

TRAIN_LOADER = DataLoader(train_dataset, batch_size=batchsize, shuffle=True)
VAL_LOADER = DataLoader(val_dataset, batch_size=batchsize, shuffle=True)
TEST_LOADER = DataLoader(test_dataset, batch_size=batchsize, shuffle=True)


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

Según el ***warning*** anterior, se cargó todo el modelo excepto el clasificador, que es lo que vamos a entrenar desde cero.

## Entrenamiento

Las dos siguientes funciones definen el entrenamiento y validación de la red. Mediante el flag *stop_train* interrumpiendo el *kernel* con el botón de *stop* puede parase el entrenamiento en cualquier momento.

Para evitar sobrecargar la memoria de la GPU, las variables de cargan a CUDA por *batches* únicamente en el momento de utilizarlas.

El modelo final se almacena en el directorio elegido.

In [24]:
global stop_train
stop_train = False

def evaluate(model, loss_function, val_loader):
    
    total_acc = 0
    total_precision = 0
    total_recall = 0
    
    total_loss = 0
    
    for batch_step, (data, target, mask) in enumerate(val_loader):
        try:
            cuda_data = data.to(device)
            cuda_target = target.to(device)
            cuda_mask = mask.to(device)

            output = model(cuda_data, cuda_mask).logits

            loss = loss_function(output, cuda_target)

            total_loss += loss.item()

            prediction = output.argmax(dim=1)

            acc = accuracy_score(prediction.cpu(), cuda_target.cpu())#(prediction == cuda_target).cpu().numpy().mean()
            prec = precision_score(prediction.cpu(), cuda_target.cpu(), average="macro", zero_division=0.0)
            rec = recall_score(prediction.cpu(), cuda_target.cpu(), average="macro", zero_division=0.0)

            total_acc += acc
            total_precision += prec
            total_recall += rec
        except KeyboardInterrupt:
            try: del cuda_data
            except: pass
            
            try: del cuda_target
            except: pass
            
            try: del cuda_mask
            except: pass
            
            stop_train = True
            print("*** train-data variables removed from cuda memory ***")
            break
        
        # force to free cuda memory
        del cuda_data
        del cuda_target
        del cuda_mask
    T = len(val_loader)
    return map(lambda r: r/T, [total_loss, total_acc, total_precision, total_recall])


def train(folder, model, loss_function, optim, nepochs, patience, train_loader, val_loader):
    model.train()
    
    ts = datetime.datetime.now()
    best_model_path = Path(f"./temp/best_model_{ts.day}{ts.month}{ts.year}{ts.hour}{ts.minute}{ts.second}")
    best_val_loss = np.inf
    early_counter = 0
    
    
    for epoch in range(1, nepochs+1):
        
        total_loss = 0
        total_acc = 0
        
        progress_bar = tqdm.tqdm(train_loader, desc='Bar desc', leave=True)
        
        for batch_step, (data, target, mask) in enumerate(progress_bar, 1):
            try:
                data_cuda = data.to(device)
                target_cuda = target.to(device)
                mask_cuda = mask.to(device)

                optim.zero_grad()

                out = MODEL(data_cuda, mask_cuda).logits
                loss = loss_function(out, target_cuda)

                total_loss += loss.item()

                acc = accuracy_score(out.detach().cpu().argmax(dim=1), target)
                total_acc += acc

                loss.backward()

                torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

                optim.step()

                progress_bar.set_description(f"{batch_step = } / {len(train_loader)} / {epoch = } | loss = {total_loss / batch_step : 0.4f} acc = {total_acc / batch_step  : 0.4f}")

                del data_cuda
                del target_cuda
                del mask_cuda
            except KeyboardInterrupt:
                try: del cuda_data
                except: pass
                
                try: del cuda_target
                except: pass
                
                try: del cuda_mask
                except: pass
                
                print("*** train-data variables removed from cuda memory ***")
                global stop_train
                stop_train = True
                break
        
        if stop_train: # stop from train
            print("*** stoping training ***")
            break
        
        model.eval()
        val_loss, val_acc, val_prec, val_rec = evaluate(model, loss_function, val_loader)
        model.train()
        
        state_ss = f"Epoch : {epoch} / {nepochs} | loss = {total_loss / len(train_loader) : 0.4f} acc = {total_acc / len(train_loader) : 0.4f}"
        state_ss += f" | {val_loss = : 0.4f} {val_acc = : 0.4f} {val_prec = : 0.4f} {val_rec = : 0.4f}"
        
        print(state_ss)
        
        if val_loss < best_val_loss:
            print(f"Validation loss decreased ({best_val_loss:0.4f} --> {val_loss:0.4f}).  Saving model ...")
            model.save_pretrained(best_model_path)
            
            best_val_loss = val_loss
            early_counter = 0
        else:
            if early_counter < patience:
                early_counter += 1
                print(f"EarlyStopping counter: {early_counter} out of {patience}")
            else:
                break
                print("Early stopping")
    
    if not best_model_path.is_dir(): # there are no best model to save
        return model
    
    best_model = BertForSequenceClassification.from_pretrained(best_model_path)
    
    ts = datetime.datetime.now()
    ts_str = f"model_{ts.day}{ts.month}{ts.year}{ts.hour}{ts.minute}{ts.second}"
    model.save_pretrained(folder / Path(ts_str))
    
    return best_model

In [25]:
DIRECTORY = Path(f"model_beto_classification/train_{datetime.datetime.timestamp(datetime.datetime.now()):0.0f}")
best_model = train(DIRECTORY, MODEL, CRITERION, OPTIMIZER, ITERATIONS, PATIENCE, TRAIN_LOADER, VAL_LOADER)

batch_step = 2676 / 2676 / epoch = 1 | loss =  1.5528 acc =  0.3743: 100%|█| 2676/2676 [12:22<00:00, 


Epoch : 1 / 100 | loss =  1.5528 acc =  0.3743 | val_loss =  1.3859 val_acc =  0.4557 val_prec =  0.3953 val_rec =  0.3188
Validation loss decreased (inf --> 1.3859).  Saving model ...


batch_step = 2676 / 2676 / epoch = 2 | loss =  1.3461 acc =  0.4775: 100%|█| 2676/2676 [12:21<00:00, 


Epoch : 2 / 100 | loss =  1.3461 acc =  0.4775 | val_loss =  1.3111 val_acc =  0.4837 val_prec =  0.4073 val_rec =  0.3517
Validation loss decreased (1.3859 --> 1.3111).  Saving model ...


batch_step = 2676 / 2676 / epoch = 3 | loss =  1.2764 acc =  0.5015: 100%|█| 2676/2676 [12:21<00:00, 


Epoch : 3 / 100 | loss =  1.2764 acc =  0.5015 | val_loss =  1.2915 val_acc =  0.4988 val_prec =  0.4272 val_rec =  0.3734
Validation loss decreased (1.3111 --> 1.2915).  Saving model ...


batch_step = 2676 / 2676 / epoch = 4 | loss =  1.2291 acc =  0.5227: 100%|█| 2676/2676 [12:21<00:00, 


Epoch : 4 / 100 | loss =  1.2291 acc =  0.5227 | val_loss =  1.2693 val_acc =  0.5022 val_prec =  0.4247 val_rec =  0.4000
Validation loss decreased (1.2915 --> 1.2693).  Saving model ...


batch_step = 2676 / 2676 / epoch = 5 | loss =  1.1864 acc =  0.5400: 100%|█| 2676/2676 [12:20<00:00, 


Epoch : 5 / 100 | loss =  1.1864 acc =  0.5400 | val_loss =  1.2633 val_acc =  0.5101 val_prec =  0.4186 val_rec =  0.4071
Validation loss decreased (1.2693 --> 1.2633).  Saving model ...


batch_step = 2676 / 2676 / epoch = 6 | loss =  1.1502 acc =  0.5551: 100%|█| 2676/2676 [12:21<00:00, 
batch_step = 1 / 2676 / epoch = 7 | loss =  0.9280 acc =  0.6250:   0%|     | 0/2676 [00:00<?, ?it/s]

Epoch : 6 / 100 | loss =  1.1502 acc =  0.5551 | val_loss =  1.2795 val_acc =  0.5099 val_prec =  0.4311 val_rec =  0.4058
EarlyStopping counter: 1 out of 20


batch_step = 2676 / 2676 / epoch = 7 | loss =  1.1108 acc =  0.5704: 100%|█| 2676/2676 [12:21<00:00, 
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 7 / 100 | loss =  1.1108 acc =  0.5704 | val_loss =  1.2937 val_acc =  0.5037 val_prec =  0.4310 val_rec =  0.4080
EarlyStopping counter: 2 out of 20


batch_step = 2676 / 2676 / epoch = 8 | loss =  1.0736 acc =  0.5891: 100%|█| 2676/2676 [12:19<00:00, 
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 8 / 100 | loss =  1.0736 acc =  0.5891 | val_loss =  1.2999 val_acc =  0.4977 val_prec =  0.4372 val_rec =  0.4142
EarlyStopping counter: 3 out of 20


batch_step = 2676 / 2676 / epoch = 9 | loss =  1.0407 acc =  0.6015: 100%|█| 2676/2676 [12:19<00:00, 
batch_step = 1 / 2676 / epoch = 10 | loss =  1.4688 acc =  0.3750:   0%|    | 0/2676 [00:00<?, ?it/s]

Epoch : 9 / 100 | loss =  1.0407 acc =  0.6015 | val_loss =  1.3017 val_acc =  0.5052 val_prec =  0.4236 val_rec =  0.4104
EarlyStopping counter: 4 out of 20


batch_step = 2676 / 2676 / epoch = 10 | loss =  1.0054 acc =  0.6151: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 10 / 100 | loss =  1.0054 acc =  0.6151 | val_loss =  1.3195 val_acc =  0.5002 val_prec =  0.4330 val_rec =  0.4166
EarlyStopping counter: 5 out of 20


batch_step = 2676 / 2676 / epoch = 11 | loss =  0.9685 acc =  0.6331: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 11 / 100 | loss =  0.9685 acc =  0.6331 | val_loss =  1.3482 val_acc =  0.4968 val_prec =  0.4195 val_rec =  0.4069
EarlyStopping counter: 6 out of 20


batch_step = 2676 / 2676 / epoch = 12 | loss =  0.9292 acc =  0.6500: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 12 / 100 | loss =  0.9292 acc =  0.6500 | val_loss =  1.3643 val_acc =  0.5086 val_prec =  0.4382 val_rec =  0.4173
EarlyStopping counter: 7 out of 20


batch_step = 2676 / 2676 / epoch = 13 | loss =  0.8961 acc =  0.6639: 100%|█| 2676/2676 [12:19<00:00,
batch_step = 1 / 2676 / epoch = 14 | loss =  0.7907 acc =  0.8750:   0%|    | 0/2676 [00:00<?, ?it/s]

Epoch : 13 / 100 | loss =  0.8961 acc =  0.6639 | val_loss =  1.3716 val_acc =  0.4977 val_prec =  0.4334 val_rec =  0.4107
EarlyStopping counter: 8 out of 20


batch_step = 2676 / 2676 / epoch = 14 | loss =  0.8557 acc =  0.6845: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 14 / 100 | loss =  0.8557 acc =  0.6845 | val_loss =  1.4314 val_acc =  0.4883 val_prec =  0.4134 val_rec =  0.4133
EarlyStopping counter: 9 out of 20


batch_step = 2676 / 2676 / epoch = 15 | loss =  0.8229 acc =  0.6961: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 15 / 100 | loss =  0.8229 acc =  0.6961 | val_loss =  1.4588 val_acc =  0.4874 val_prec =  0.4101 val_rec =  0.4098
EarlyStopping counter: 10 out of 20


batch_step = 2676 / 2676 / epoch = 16 | loss =  0.7826 acc =  0.7112: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 16 / 100 | loss =  0.7826 acc =  0.7112 | val_loss =  1.4856 val_acc =  0.4973 val_prec =  0.4195 val_rec =  0.4103
EarlyStopping counter: 11 out of 20


batch_step = 2676 / 2676 / epoch = 17 | loss =  0.7484 acc =  0.7281: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 17 / 100 | loss =  0.7484 acc =  0.7281 | val_loss =  1.5248 val_acc =  0.4935 val_prec =  0.4168 val_rec =  0.4203
EarlyStopping counter: 12 out of 20


batch_step = 2676 / 2676 / epoch = 18 | loss =  0.7117 acc =  0.7423: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 18 / 100 | loss =  0.7117 acc =  0.7423 | val_loss =  1.5615 val_acc =  0.4867 val_prec =  0.4138 val_rec =  0.4054
EarlyStopping counter: 13 out of 20


batch_step = 2676 / 2676 / epoch = 19 | loss =  0.6713 acc =  0.7589: 100%|█| 2676/2676 [12:19<00:00,
batch_step = 1 / 2676 / epoch = 20 | loss =  1.1724 acc =  0.6250:   0%|    | 0/2676 [00:00<?, ?it/s]

Epoch : 19 / 100 | loss =  0.6713 acc =  0.7589 | val_loss =  1.5870 val_acc =  0.4938 val_prec =  0.4142 val_rec =  0.4169
EarlyStopping counter: 14 out of 20


batch_step = 2676 / 2676 / epoch = 20 | loss =  0.6390 acc =  0.7721: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 20 / 100 | loss =  0.6390 acc =  0.7721 | val_loss =  1.6645 val_acc =  0.4938 val_prec =  0.4188 val_rec =  0.4110
EarlyStopping counter: 15 out of 20


batch_step = 2676 / 2676 / epoch = 21 | loss =  0.6079 acc =  0.7822: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 21 / 100 | loss =  0.6079 acc =  0.7822 | val_loss =  1.6988 val_acc =  0.4810 val_prec =  0.4125 val_rec =  0.4031
EarlyStopping counter: 16 out of 20


batch_step = 2676 / 2676 / epoch = 22 | loss =  0.5748 acc =  0.7963: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 22 / 100 | loss =  0.5748 acc =  0.7963 | val_loss =  1.7411 val_acc =  0.4948 val_prec =  0.4209 val_rec =  0.4299
EarlyStopping counter: 17 out of 20


batch_step = 2676 / 2676 / epoch = 23 | loss =  0.5360 acc =  0.8155: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 23 / 100 | loss =  0.5360 acc =  0.8155 | val_loss =  1.8132 val_acc =  0.4928 val_prec =  0.4203 val_rec =  0.4101
EarlyStopping counter: 18 out of 20


batch_step = 2676 / 2676 / epoch = 24 | loss =  0.5116 acc =  0.8175: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 24 / 100 | loss =  0.5116 acc =  0.8175 | val_loss =  1.8234 val_acc =  0.4872 val_prec =  0.4063 val_rec =  0.4121
EarlyStopping counter: 19 out of 20


batch_step = 2676 / 2676 / epoch = 25 | loss =  0.4750 acc =  0.8348: 100%|█| 2676/2676 [12:19<00:00,
Bar desc:   0%|                                                             | 0/2676 [00:00<?, ?it/s]

Epoch : 25 / 100 | loss =  0.4750 acc =  0.8348 | val_loss =  1.9152 val_acc =  0.4971 val_prec =  0.4065 val_rec =  0.4026
EarlyStopping counter: 20 out of 20


batch_step = 2676 / 2676 / epoch = 26 | loss =  0.4501 acc =  0.8440: 100%|█| 2676/2676 [12:19<00:00,


Epoch : 26 / 100 | loss =  0.4501 acc =  0.8440 | val_loss =  2.0114 val_acc =  0.4842 val_prec =  0.4126 val_rec =  0.4088


---

Anotamos los parámetros utilizados en el entrenamiento junto al modelo almacenado.

In [26]:
f = open(DIRECTORY / Path("parameters.txt"), "w")
f.write(f"{lr = }\n" + \
        f"{NUMBER_CLASSES = }\n" + \
        f"{batchsize = }"
       )
f.close()

---

### Evaluación

In [27]:
_ = best_model.eval()

In [28]:
val_loss, val_acc, val_prec, val_rec = evaluate(best_model.to(device), CRITERION, VAL_LOADER)
f"{val_loss = :0.4f} {val_acc = :0.4f} {val_prec = :0.4f} {val_rec = :0.4f}"

'val_loss = 1.2670 val_acc = 0.5081 val_prec = 0.4342 val_rec = 0.4197'

In [29]:
test_loss, test_acc, test_prec, test_rec = evaluate(best_model.to(device), CRITERION, TEST_LOADER)
f"{test_loss = :0.4f} {test_acc = :0.4f} {test_prec = :0.4f} {test_rec = :0.4f}"

'test_loss = 1.2703 test_acc = 0.4955 test_prec = 0.4193 test_rec = 0.4026'

<hr style="height: 4px">

## Resultados

### Con 3 clases

Se ha entrenado el *dataset* agrupado y **balanceado en 3 categorías** (positivo con P y P+, neutro con NEU y NONE, y negativo con N y N+).
Los siguientes resultados se han obtenido recorriendo el *dataset* en ***batches* de tamaño 8**, y con **una *patience* a 5 del *early stopping***. 

El tamaño de *batches* es el máximo que permite la memoria de la GPU donde se entrenó la red. Se ha elegido un valor bajo de *patience* para reducir el tiempo de entrenamiento.

#### Entrenamiento

| lr | Loss | Accuracy |
|----|------|----------|
|**1e-4**|1.1212|  0.3361  |
|**1e-6**|0.7817|  0.6543  |

#### Validación

| lr | Loss | Accuracy | Precision | Recall |
|----|------|----------|-----------|--------|
|**1e-4**|1.0990|  0.3336  |   0.3210  | 0.1143 |
|**1e-6**|0.7997|  0.6608  |   0.6551  | 0.6411 |

#### Test

| lr | Loss | Accuracy | Precision | Recall |
|----|------|----------|-----------|--------|
|**1e-4**|1.0991|  0.3333  |   0.3383  | 0.1184 |
|**1e-6**|0.7985|  0.6472  |   0.6308  | 0.6226 |

### Con 6 clases

Se ha entrenado el *dataset* agrupado y **con las 6 categorías**.
Los siguientes resultados se han obtenido recorriendo el *dataset* en ***batches* de tamaño 8**, y con **una *patience* a 5 del *early stopping***. 

El tamaño de *batches* es el máximo que permite la memoria de la GPU donde se entrenó la red. Se ha elegido un valor bajo de *patience* para reducir el tiempo de entrenamiento.

#### Entrenamiento

| lr | Loss | Accuracy |
|----|------|----------|
|**1e-4**|1.6653|  0.2551  |
|**1e-6**|1.1864|  0.5400  |

#### Validación

| lr | Loss | Accuracy | Precision | Recall |
|----|------|----------|-----------|--------|
|**1e-4**|1.6619|  0.2624  |  0.2228   | 0.0651 |
|**1e-6**|1.2633|  0.5101  |  0.4186   | 0.4071 |

#### Test

| lr | Loss | Accuracy | Precision | Recall |
|----|------|----------|-----------|--------|
|**1e-4**|1.6625|  0.2609  |  0.2257   | 0.0703 |
|**1e-6**|1.2703|  0.4955  |  0.4193   | 0.4026 |