# Clasificación de tweets: LSTM

Autor: Alberto Ramos Sánchez

### Contenido

* [Dataset: TASS](#Dataset:-TASS)
* [Preparar tweets](#Preparar-tweets)
* [Crear dataset de entrenamiento](#Crear-dataset-de-entrenamiento)
    * [Eliminar frases vacias](#Eliminar-frases-vacias)
    * [Sets de entrenamiento, validación y test](#Sets-de-entrenamiento,-validación-y-test)
    * [Tokenizamos texto](#Tokenizamos-texto)
* [Modelo de clasificación: *LSTM*](#Modelo-de-clasificación:-*LSTM*)
* [Entrenamiento](#Entrenamiento)
    * [Evaluación](#Evaluación)
* [Resultados](#Resultados)
    * [Con 3 clases](#Con-3-clases)
        * [Entrenamiento](#Entrenamiento)
        * [Validación](#Validación)
        * [Test](#Test)
    * [Con 6 clases](#Con-6-clases)
        * [Entrenamiento](#Entrenamiento)
        * [Validation](#Validation)
        * [Test](#Test)


In [1]:
import pandas as pd
import numpy as np

import nltk
from nltk.stem import SnowballStemmer
stemmer = SnowballStemmer('spanish')

import re
import string
import tqdm

from pathlib import Path
import datetime

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, precision_score

from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.preprocessing import LabelEncoder

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import Dataset, IterableDataset, DataLoader

from pytorchtools.pytorchtools import EarlyStopping

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(device)

# seed
import random

seed = 42

random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

cuda


En este *notebook* se ha entrenado un modelo *LSTM* para realizar *sentiment analysis* sobre el conjunto de tweets en español de TASS.

## Dataset: TASS

Se ha trabajado con los *datasets* de tweets de [*TASS SEPLN*](http://tass.sepln.org/tass_data/download.php) desde el año 2012 al 2019. 
Todos los ficheros *xml* se han unido en un único dataset contenido en los ficheros *tweets.csv.gz*, *topics.csv.gz* y *polarities.csv.gz*

In [3]:
dataset_path = Path("./TASS/conversion_result/dataset31k")

In [4]:
df_tweets = pd.read_csv(dataset_path / Path("tweets.csv.gz"), compression='gzip', header=0, sep=';', quotechar='"')
df_topics = pd.read_csv(dataset_path / Path("topics.csv.gz"), compression='gzip', header=0, sep=';', quotechar='"')
df_polarities = pd.read_csv(dataset_path / Path("polarities.csv.gz"), compression='gzip', header=0, sep=';', quotechar='"')

df_topics = df_topics.rename(columns={'tweetid_fk': 'tweetid'})
df_polarities = df_polarities.rename(columns={'tweetid_fk': 'tweetid'})

En total, en el dataset hay aproximadamente 31 mil tweets.

In [5]:
f"Número total de tweets = {len(df_tweets)}"

'Número total de tweets = 31375'

Y están etiquetados en 6 categorías:

* P+ : Positivo fuerte (strong negative)
* P : Positivo
* NONE : Sin sentimiento (no sentiment tag)
* NEU : Neutro
* N : Negativo
* N+ : Negativo fuerte (strong negative)

En la siguiente tabla se muestra la cantidad de tweets por categoría:

In [6]:
df_polarities[['value', 'tweetid']].drop_duplicates(subset='tweetid', keep="first").groupby("value").count().rename(columns={"value": "Categoría", "tweetid": "Número de tweets"})

Unnamed: 0_level_0,Número de tweets
value,Unnamed: 1_level_1
N,6219
N+,976
NEU,2755
NONE,5597
P,5442
P+,2793


## Preparar tweets

En este apartado preprocesamos el texto de los tweets para entrenar el modelo.

En la función *clean_tweet* limpiamos el texto quedándonos solamente con los caractéres alfanuméricos. Eliminamos los nombres de usuario, url, vocales seguidas repetidas más de dos veces, tabulaciones, etc.

In [7]:
def clean_tweet(text):
    res_txt = re.sub("@\w+", "", text) # drop username
    res_txt = re.sub("https?://[A-Za-z0-9\./]+", "", res_txt) # drop url
    
    for p in string.punctuation:
        res_txt = res_txt.replace(p, " ")
    #res_txt = " ".join([c for c in res_txt if c not in string.punctuation])
    
    # eliminar palabras con más de 2 vocales seguidas "largooooo -> largoo"
    res_txt = re.sub("([A-Za-z])\\1{2,}", "\\1\\1", res_txt)
    
    # eliminar espacios y tabulaciones repetidas
    res_txt = re.sub("[ \t]+", " ", res_txt.strip())
    
    # mantener solamente caracteres alfanuméricos
    res_txt = re.sub(r'[^a-zñÑA-Z0-9áéíóúÁÉÍÓÚ ]', '', res_txt)
    return res_txt

En *prepare_tweet* eliminamos las *stopwords*.

In [8]:
def prepare_tweet(text):
    stop_words = nltk.corpus.stopwords.words('spanish')
    custom_stop_words = ["d", "q"]
    
    tokens = nltk.word_tokenize(text)
    tokens = [w for w in tokens if w not in stop_words]
    tokens = [w for w in tokens if w not in custom_stop_words]
    tokens = [w.lower() for w in tokens]
    
    # stemming
    #tokens = [stemmer.stem(w) for w in tokens]
    
    return " ".join(tokens)

Aplicamos el preprocesado, obteniendo el siguiente resultado.

In [9]:
print(f"Original :\n {df_tweets['content'][8235]}")

content = df_tweets['content']

content = content.apply(lambda tweet: prepare_tweet(clean_tweet(str(tweet))))

df_tweets['content'] = content

print(f"Resultado :\n {df_tweets['content'][8235]}")

Original :
 ;-)) RT @doloresvela: El #iPhone 5 podría ser presentado en marzo http://t.co/2kjKTjfF
Resultado :
 rt el iphone 5 podría ser presentado marzo


## Crear dataset de entrenamiento

En este apartado se crea el dataset de entrenamiento.

Unimos los dataframes en uno solo, que contiene el id del tweet, el contenido del tweet y la categoría.

In [10]:
data_tweets = df_tweets[['tweetid', 'content']]
data_sentim = df_polarities[['tweetid', 'value']]

data_tweets = data_tweets.merge(data_sentim, on="tweetid").drop_duplicates(subset='tweetid', keep="first").reset_index(drop=True)
data_tweets

Unnamed: 0,tweetid,content,value
0,137228516625367040,en españa cosas pueden deben van hacer infinit...,P
1,137228522019229697,en pso corre vuela todavía caliente cadáver po...,N
2,137228533029277696,nomeolvido aprobo ley aborto libre todas minis...,N
3,137228551198998528,ccoo exige nuevo gobierno reactive mercado int...,NEU
4,137228569750405120,a inviable parecen fraudes fiscales cometen mi...,P
...,...,...,...
23777,819452909318504448,ya contando días volver vernos todavía voy,N
23778,819456529543798784,gracias comedy central mtv voy nueva temporada...,P
23779,819456610829471744,quiero necesito verte yaa,P
23780,819469945167720448,demas solo den npm install instalen paquetes g...,NONE


A continuación se cambia la etiqueta por un valor numérico que indica la clase.

In [11]:
label2id = {"N": 0,
            "N+": 1,
            "NEU": 2,
            "NONE": 3,
            "P": 4,
            "P+": 5}

NUMBER_CLASSES = 6

data_tweets["value"].replace(label2id, inplace=True)
data_tweets

Unnamed: 0,tweetid,content,value
0,137228516625367040,en españa cosas pueden deben van hacer infinit...,4
1,137228522019229697,en pso corre vuela todavía caliente cadáver po...,0
2,137228533029277696,nomeolvido aprobo ley aborto libre todas minis...,0
3,137228551198998528,ccoo exige nuevo gobierno reactive mercado int...,2
4,137228569750405120,a inviable parecen fraudes fiscales cometen mi...,4
...,...,...,...
23777,819452909318504448,ya contando días volver vernos todavía voy,0
23778,819456529543798784,gracias comedy central mtv voy nueva temporada...,4
23779,819456610829471744,quiero necesito verte yaa,4
23780,819469945167720448,demas solo den npm install instalen paquetes g...,3


### Eliminar frases vacias

Hay ciertas frases vacías que debemos eliminar pues el tokenizador utilizado no las acepta.

In [12]:
data_tweets = data_tweets[data_tweets['content'].apply(lambda x: len(x)) > 0]
data_tweets

Unnamed: 0,tweetid,content,value
0,137228516625367040,en españa cosas pueden deben van hacer infinit...,4
1,137228522019229697,en pso corre vuela todavía caliente cadáver po...,0
2,137228533029277696,nomeolvido aprobo ley aborto libre todas minis...,0
3,137228551198998528,ccoo exige nuevo gobierno reactive mercado int...,2
4,137228569750405120,a inviable parecen fraudes fiscales cometen mi...,4
...,...,...,...
23777,819452909318504448,ya contando días volver vernos todavía voy,0
23778,819456529543798784,gracias comedy central mtv voy nueva temporada...,4
23779,819456610829471744,quiero necesito verte yaa,4
23780,819469945167720448,demas solo den npm install instalen paquetes g...,3


Con la variable *apply_balance* controlamos si queremos aplicar o no balanceo de datos.

En el caso de utilizar 6 clases, no aplicamos balanceo de datos, pues perderíamos la mayoría de los tweets debido a que la clase N+ tiene muchos menos tweets.

In [13]:
apply_balance = True

if apply_balance:
    g = data_tweets.groupby('value')
    data_tweets = g.apply(lambda x: x.sample(g.size().min()).reset_index(drop=True))
    data_tweets

### Sets de entrenamiento, validación y test

Dividimos el *dataset* en entrenamiento, validación y test. Con la opción *stratify* aseguramos que se mantenga la misma proporción de clases para cada conjunto que el *dataset* original. Para asegurarlo, es posible que se pierdan ciertos tweets. Esta operación no balancea el dataset.

In [14]:
train_size = 0.90
val_size = 0.05

In [15]:
df_train, data_val = train_test_split(data_tweets, test_size=1-train_size, stratify=data_tweets[['value']])

df_val, df_test = train_test_split(data_val, test_size=1-(val_size/(1-train_size)), stratify=data_val[['value']])

df_train.shape, df_val.shape, df_test.shape

((5270, 3), (293, 3), (293, 3))

Guardamos de forma temporal el *dataset* en archivos csv.

In [16]:
Path("./temp").mkdir(parents=True, exist_ok=True)

df_train[['content', 'value']].to_csv("./temp/tweet_train.tsv", index=False, sep="\t")
df_val[['content', 'value']].to_csv("./temp/tweet_val.tsv", index=False, sep="\t")
df_test[['content', 'value']].to_csv("./temp/tweet_test.tsv", index=False, sep="\t")

### Tokenizamos texto

El tokenizador nos convierte las palabras a indices, que son utilizadas por el *embedding* para crear la representación *word2vec*. Además, con la opción *include_lengths* a *True* añade la longitud de cada una de las frases.

In [17]:
from torchtext.data import TabularDataset
from torchtext.data import Field

tokenize = lambda x: x.split()

TEXT = Field(tokenize=tokenize, include_lengths=True, batch_first=True, lower=True)
LABEL = Field(sequential=False, use_vocab=False, batch_first=True)


datafields = [("content", TEXT),
              ("value", LABEL)]

train, validation, test = TabularDataset.splits(
        path="./temp/",
        train="tweet_train.tsv",
        validation="tweet_val.tsv",
        test="tweet_test.tsv",
        format='tsv',
        skip_header=True,
        #csv_reader_params={"delimiter":";"},
        fields=datafields)

TEXT.build_vocab(train, min_freq=3)

In [18]:
from torchtext.data import Iterator, BucketIterator

TRAIN_ITER, VAL_ITER, TEST_ITER = BucketIterator.splits(
    (train, validation, test),
    batch_sizes=(10, 10, 10),
    device=device,
    sort_key=lambda x: len(x.content),
    repeat=False
)

In [19]:
for ((content, content_len), target), _ in TRAIN_ITER:
    print(content)
    print(content_len)
    break

tensor([[   6,  290,    0,    0,    0,  734, 3093,  583,    0,  979,    0,    1,
            1,    1,    1,    1,    1],
        [  15,    0,    0, 2932, 2584,  160,  474,   21,    0,    0,  123,    0,
            0,  244,    1,    1,    1],
        [1656,  154,    1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
            1,    1,    1,    1,    1],
        [1036,  323,    0,    0,    0,    0,   29,  226,    1,    1,    1,    1,
            1,    1,    1,    1,    1],
        [ 904,    0,  263,   16,   34,   19,  431,    0,    1,    1,    1,    1,
            1,    1,    1,    1,    1],
        [   0,    0, 1635, 1705,    0,    0, 1635, 1892,    4,  931,    0,    0,
         1513,  343,   96,   43,   47],
        [   0,    5,    4,  198,  900,    0,    0,  504,   23,    0,    1,    1,
            1,    1,    1,    1,    1],
        [   0,   55,   83, 2682,  567, 3109, 1490,   36,   31, 1398,    1,    1,
            1,    1,    1,    1,    1],
        [   3,  154,  117,    1,

## Modelo de clasificación: *LSTM*

En este caso utilizamos el modelo *LSTM* para clasificar los tweets. Este modelo se compone de una capa de *word embedding*, una red *LSTM* y una red *fully connected* para generar las etiquetas de salida. La función *pack_padded_sequence* empaqueta la entrada de la LSTM para tener en cuenta distintos tamaños de frases.

El modelo tiene tres métodos para serializar: *load_from*, *save_to* y *checkpoint*.

In [20]:
class ClassifierLSTMModel(nn.Module):
    
    def __init__(self, vocab_size, embed_dim, drop_dim, n_out):
        super(ClassifierLSTMModel, self).__init__()
        
        self.embed = nn.Embedding(vocab_size, embed_dim)
        
        self.lstm = nn.LSTM(input_size=embed_dim,
                            hidden_size=embed_dim,
                            num_layers=1,
                            batch_first=True)
        
        self.classifier = nn.Linear(embed_dim, n_out)
        
        self.drop = nn.Dropout(drop_dim)
    
    
    def forward(self, x, max_len):
        
        embeds_out = self.embed(x)
        
        packed_in = nn.utils.rnn.pack_padded_sequence(embeds_out, max_len, batch_first=True, enforce_sorted=False)
        lstm_out, (lstm_hidden, lstm_cell) = self.lstm(packed_in)
        
        tags = self.classifier(self.drop(lstm_hidden[0]))
        tag_score = F.log_softmax(self.drop(tags), dim=1)
        
        return tag_score
    
    def save_to(self, path):
        path.mkdir(parents=True, exist_ok=True)
        torch.save(self.state_dict(), path / Path("lstm_model.pt"))
    
    @classmethod
    def load_from(cls, folder, dev, vocab_size, embed_dim, drop_dim, n_out ):
        model = cls(vocab_size, embed_dim, drop_dim, n_out)
        state_dict = torch.load(folder / Path("lstm_model.pt"))
        model.load_state_dict(state_dict)
        return model.to(dev)
    
    def checkpoint(self, folder, optimizer, stats):
        status = {
            'state_dict': self.state_dict(),
            'optimizer': optimizer.state_dict()
        }
        status = {**status, **stats}
        
        ts = datetime.datetime.now()
        ts_str = f"checkpoint_{ts.day}{ts.month}{ts.year}{ts.hour}{ts.minute}{ts.second}"
        (folder / Path(ts_str)).mkdir(parents=True, exist_ok=True)
        
        torch.save(status, folder / Path(ts_str) / Path("checkpoint.pth"))


----

## Entrenamiento

Las dos siguientes funciones definen el entrenamiento y validación de la red. Para evitar sobrecargar la memoria de la GPU, las variables de cargan a CUDA por *batches* únicamente en el momento de utilizarlas.

El modelo final se almacena en el directorio elegido.

In [21]:
MAX_WORDS = len(TEXT.vocab)
EMBED_DIM = 32
DROPSIZE = 0.2
PATIENCE = 10

ITERATIONS = 100


MODEL = ClassifierLSTMModel(MAX_WORDS, EMBED_DIM, DROPSIZE, NUMBER_CLASSES).to(device)

CRITERION = nn.CrossEntropyLoss()
lr = 1e-4
OPTIMIZER = torch.optim.Adam(MODEL.parameters(), lr=lr)

In [22]:

def evaluate(model, loss_func, val_iter):
    total_loss = 0
    total_acc = 0
    total_prec = 0
    total_rec = 0
    
    for batch_step, ( ((content, content_len), target), _ ) in enumerate(val_iter, 1):
        
        with torch.no_grad():
            output = model(content, content_len)
            
            loss = loss_func(output, target)
            
            prediction = output.detach().cpu().argmax(dim=1)
            
            total_loss += loss.item()
            total_acc += accuracy_score(prediction, target.cpu())
            
            total_prec += precision_score(prediction, target.cpu(), average="macro", zero_division=0.0)
            total_rec += recall_score(prediction, target.cpu(), average="macro", zero_division=0.0)
    
    T = len(val_iter)
    return map(lambda r: r/T, [total_loss, total_acc, total_prec, total_rec])

def train(folder, model, loss_func, optimizer, nepochs, train_iter, val_iter, patience):
    model.train()
    
    ts = datetime.datetime.now()
    best_model_path = Path(f"./temp/best_model_lstm_{ts.day}{ts.month}{ts.year}{ts.hour}{ts.minute}{ts.second}")
    early_stopping = EarlyStopping(patience=patience, verbose=True, path=best_model_path)
    
    for epoch in range(1, nepochs+1):
        total_loss = 0
        total_acc = 0
        total_prec = 0
        total_rec = 0
        
        progress_bar = tqdm.tqdm(train_iter, desc="")
        
        for batch_step, ( ((content, content_len), target), _ ) in enumerate(progress_bar, 1):
            
            optimizer.zero_grad()
            
            out = model(content, content_len)

            loss = loss_func(out, target)
            
            total_loss += loss.item()
            total_acc += accuracy_score(out.detach().cpu().argmax(dim=1), target.detach().cpu())
            
            progress_bar.set_description(f"{batch_step = } / {len(train_iter)} / {epoch = } | loss = {total_loss/batch_step : 0.4f} acc = {total_acc/batch_step : 0.4f}")
            
            loss.backward()
            
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            
            optimizer.step()
        progress_bar.close()
        
        model.eval()
        val_loss, val_acc, val_prec, val_rec = evaluate(model, loss_func, val_iter)
        model.train()
        
        print("*"*10)
        print(f"Epoch {epoch} / {nepochs} | loss = {total_loss/len(train_iter) : 0.4f} acc = {total_acc/len(train_iter) : 0.4f} | {val_loss = : 0.4f} {val_acc = : 0.4f} {val_prec = :0.4f} {val_rec = :0.4f}")
        print("*"*10)
        
        stats = {
            'epoch': epoch,
            'total_epochs': nepochs,
            'loss': total_loss / len(train_iter),
            'val_loss': val_loss,
            'val_accuracy': val_acc,
            'val_prec': val_prec,
            'val_rec': val_rec
        }
        
        model.checkpoint(folder=folder,
                         optimizer=optimizer,
                         stats=stats)
        
        early_stopping(val_loss, model)
        if early_stopping.early_stop:
            print("Early stopping")
            break
    if not best_model_path.is_dir(): # there are no best model to save
        return
    
    best_model = ClassifierLSTMModel.load_from(folder=best_model_path,
                                               dev=device,
                                               vocab_size=MAX_WORDS,
                                               embed_dim=EMBED_DIM,
                                               drop_dim=DROPSIZE,
                                               n_out=NUMBER_CLASSES)
    
    # save model
    ts = datetime.datetime.now()
    ts_str = f"model_{ts.day}{ts.month}{ts.year}{ts.hour}{ts.minute}{ts.second}"
    best_model.save_to(folder / Path(ts_str))
    return best_model


In [23]:
DIRECTORY = Path(f"modelos_LSTM/train_{datetime.datetime.timestamp(datetime.datetime.now()):0.0f}")
best_model = train(DIRECTORY, MODEL, CRITERION, OPTIMIZER, ITERATIONS, TRAIN_ITER, VAL_ITER, PATIENCE)

batch_step = 527 / 527 / epoch = 1 | loss =  1.8018 acc =  0.1717: 100%|████████████| 527/527 [00:03<00:00, 171.25it/s]
batch_step = 32 / 527 / epoch = 2 | loss =  1.7910 acc =  0.1969:   3%|▍             | 18/527 [00:00<00:02, 173.84it/s]

**********
Epoch 1 / 100 | loss =  1.8018 acc =  0.1717 | val_loss =  1.7823 val_acc =  0.2233 val_prec = 0.2364 val_rec = 0.1173
**********
Validation loss decreased (inf --> 1.782267).  Saving model ...


batch_step = 527 / 527 / epoch = 2 | loss =  1.7944 acc =  0.1759: 100%|████████████| 527/527 [00:02<00:00, 181.31it/s]
batch_step = 32 / 527 / epoch = 3 | loss =  1.7956 acc =  0.1656:   3%|▍             | 17/527 [00:00<00:03, 167.07it/s]

**********
Epoch 2 / 100 | loss =  1.7944 acc =  0.1759 | val_loss =  1.7801 val_acc =  0.2133 val_prec = 0.2194 val_rec = 0.1280
**********
Validation loss decreased (1.782267 --> 1.780077).  Saving model ...


batch_step = 527 / 527 / epoch = 3 | loss =  1.7931 acc =  0.1776: 100%|████████████| 527/527 [00:02<00:00, 176.84it/s]
batch_step = 32 / 527 / epoch = 4 | loss =  1.7853 acc =  0.1875:   3%|▍             | 18/527 [00:00<00:02, 173.49it/s]

**********
Epoch 3 / 100 | loss =  1.7931 acc =  0.1776 | val_loss =  1.7776 val_acc =  0.2056 val_prec = 0.2017 val_rec = 0.1321
**********
Validation loss decreased (1.780077 --> 1.777598).  Saving model ...


batch_step = 527 / 527 / epoch = 4 | loss =  1.7872 acc =  0.1799: 100%|████████████| 527/527 [00:02<00:00, 185.92it/s]
batch_step = 33 / 527 / epoch = 5 | loss =  1.7866 acc =  0.2121:   3%|▍             | 18/527 [00:00<00:02, 178.64it/s]

**********
Epoch 4 / 100 | loss =  1.7872 acc =  0.1799 | val_loss =  1.7744 val_acc =  0.2367 val_prec = 0.2395 val_rec = 0.1747
**********
Validation loss decreased (1.777598 --> 1.774424).  Saving model ...


batch_step = 527 / 527 / epoch = 5 | loss =  1.7853 acc =  0.1913: 100%|████████████| 527/527 [00:02<00:00, 184.80it/s]
batch_step = 32 / 527 / epoch = 6 | loss =  1.7805 acc =  0.2156:   3%|▍             | 18/527 [00:00<00:02, 171.93it/s]

**********
Epoch 5 / 100 | loss =  1.7853 acc =  0.1913 | val_loss =  1.7719 val_acc =  0.2467 val_prec = 0.2409 val_rec = 0.2044
**********
Validation loss decreased (1.774424 --> 1.771896).  Saving model ...


batch_step = 527 / 527 / epoch = 6 | loss =  1.7814 acc =  0.2008: 100%|████████████| 527/527 [00:02<00:00, 176.53it/s]
batch_step = 32 / 527 / epoch = 7 | loss =  1.7804 acc =  0.1937:   3%|▍             | 17/527 [00:00<00:03, 168.72it/s]

**********
Epoch 6 / 100 | loss =  1.7814 acc =  0.2008 | val_loss =  1.7684 val_acc =  0.2600 val_prec = 0.2515 val_rec = 0.2269
**********
Validation loss decreased (1.771896 --> 1.768448).  Saving model ...


batch_step = 527 / 527 / epoch = 7 | loss =  1.7770 acc =  0.2095: 100%|████████████| 527/527 [00:02<00:00, 177.31it/s]
batch_step = 30 / 527 / epoch = 8 | loss =  1.7755 acc =  0.2000:   3%|▍             | 18/527 [00:00<00:02, 175.18it/s]

**********
Epoch 7 / 100 | loss =  1.7770 acc =  0.2095 | val_loss =  1.7644 val_acc =  0.2667 val_prec = 0.2509 val_rec = 0.2299
**********
Validation loss decreased (1.768448 --> 1.764374).  Saving model ...


batch_step = 527 / 527 / epoch = 8 | loss =  1.7744 acc =  0.2082: 100%|████████████| 527/527 [00:02<00:00, 176.53it/s]
batch_step = 31 / 527 / epoch = 9 | loss =  1.7742 acc =  0.1839:   3%|▍             | 17/527 [00:00<00:03, 167.16it/s]

**********
Epoch 8 / 100 | loss =  1.7744 acc =  0.2082 | val_loss =  1.7595 val_acc =  0.2633 val_prec = 0.2405 val_rec = 0.2277
**********
Validation loss decreased (1.764374 --> 1.759478).  Saving model ...


batch_step = 527 / 527 / epoch = 9 | loss =  1.7679 acc =  0.2209: 100%|████████████| 527/527 [00:02<00:00, 176.48it/s]
batch_step = 29 / 527 / epoch = 10 | loss =  1.7502 acc =  0.2759:   3%|▍            | 18/527 [00:00<00:02, 170.27it/s]

**********
Epoch 9 / 100 | loss =  1.7679 acc =  0.2209 | val_loss =  1.7534 val_acc =  0.2833 val_prec = 0.2619 val_rec = 0.2307
**********
Validation loss decreased (1.759478 --> 1.753412).  Saving model ...


batch_step = 527 / 527 / epoch = 10 | loss =  1.7626 acc =  0.2252: 100%|███████████| 527/527 [00:02<00:00, 175.88it/s]
batch_step = 33 / 527 / epoch = 11 | loss =  1.7420 acc =  0.2848:   3%|▍            | 18/527 [00:00<00:02, 178.69it/s]

**********
Epoch 10 / 100 | loss =  1.7626 acc =  0.2252 | val_loss =  1.7472 val_acc =  0.2744 val_prec = 0.2443 val_rec = 0.2306
**********
Validation loss decreased (1.753412 --> 1.747194).  Saving model ...


batch_step = 527 / 527 / epoch = 11 | loss =  1.7579 acc =  0.2423: 100%|███████████| 527/527 [00:02<00:00, 185.84it/s]
batch_step = 31 / 527 / epoch = 12 | loss =  1.7437 acc =  0.2581:   3%|▍            | 17/527 [00:00<00:03, 168.75it/s]

**********
Epoch 11 / 100 | loss =  1.7579 acc =  0.2423 | val_loss =  1.7378 val_acc =  0.3033 val_prec = 0.2793 val_rec = 0.2577
**********
Validation loss decreased (1.747194 --> 1.737842).  Saving model ...


batch_step = 527 / 527 / epoch = 12 | loss =  1.7478 acc =  0.2499: 100%|███████████| 527/527 [00:03<00:00, 175.16it/s]
batch_step = 30 / 527 / epoch = 13 | loss =  1.7417 acc =  0.2200:   3%|▍            | 17/527 [00:00<00:03, 167.11it/s]

**********
Epoch 12 / 100 | loss =  1.7478 acc =  0.2499 | val_loss =  1.7268 val_acc =  0.3167 val_prec = 0.2877 val_rec = 0.2708
**********
Validation loss decreased (1.737842 --> 1.726828).  Saving model ...


batch_step = 527 / 527 / epoch = 13 | loss =  1.7402 acc =  0.2632: 100%|███████████| 527/527 [00:02<00:00, 176.71it/s]
batch_step = 31 / 527 / epoch = 14 | loss =  1.7282 acc =  0.2452:   3%|▍            | 17/527 [00:00<00:03, 168.76it/s]

**********
Epoch 13 / 100 | loss =  1.7402 acc =  0.2632 | val_loss =  1.7149 val_acc =  0.3200 val_prec = 0.2892 val_rec = 0.2671
**********
Validation loss decreased (1.726828 --> 1.714911).  Saving model ...


batch_step = 527 / 527 / epoch = 14 | loss =  1.7294 acc =  0.2598: 100%|███████████| 527/527 [00:03<00:00, 175.51it/s]
batch_step = 30 / 527 / epoch = 15 | loss =  1.7192 acc =  0.2667:   3%|▍            | 17/527 [00:00<00:03, 167.11it/s]

**********
Epoch 14 / 100 | loss =  1.7294 acc =  0.2598 | val_loss =  1.6997 val_acc =  0.3100 val_prec = 0.2818 val_rec = 0.2528
**********
Validation loss decreased (1.714911 --> 1.699671).  Saving model ...


batch_step = 527 / 527 / epoch = 15 | loss =  1.7234 acc =  0.2638: 100%|███████████| 527/527 [00:03<00:00, 175.00it/s]
batch_step = 32 / 527 / epoch = 16 | loss =  1.7081 acc =  0.3063:   3%|▍            | 17/527 [00:00<00:03, 169.90it/s]

**********
Epoch 15 / 100 | loss =  1.7234 acc =  0.2638 | val_loss =  1.6909 val_acc =  0.3200 val_prec = 0.2881 val_rec = 0.2607
**********
Validation loss decreased (1.699671 --> 1.690891).  Saving model ...


batch_step = 527 / 527 / epoch = 16 | loss =  1.7120 acc =  0.2700: 100%|███████████| 527/527 [00:03<00:00, 175.30it/s]
batch_step = 31 / 527 / epoch = 17 | loss =  1.7331 acc =  0.2484:   3%|▍            | 17/527 [00:00<00:03, 167.16it/s]

**********
Epoch 16 / 100 | loss =  1.7120 acc =  0.2700 | val_loss =  1.6821 val_acc =  0.3133 val_prec = 0.2861 val_rec = 0.2510
**********
Validation loss decreased (1.690891 --> 1.682115).  Saving model ...


batch_step = 527 / 527 / epoch = 17 | loss =  1.6985 acc =  0.2937: 100%|███████████| 527/527 [00:03<00:00, 174.24it/s]
batch_step = 31 / 527 / epoch = 18 | loss =  1.6523 acc =  0.2935:   3%|▍            | 17/527 [00:00<00:03, 167.12it/s]

**********
Epoch 17 / 100 | loss =  1.6985 acc =  0.2937 | val_loss =  1.6718 val_acc =  0.3300 val_prec = 0.2964 val_rec = 0.2560
**********
Validation loss decreased (1.682115 --> 1.671751).  Saving model ...


batch_step = 527 / 527 / epoch = 18 | loss =  1.6947 acc =  0.2937: 100%|███████████| 527/527 [00:03<00:00, 172.10it/s]
batch_step = 28 / 527 / epoch = 19 | loss =  1.7015 acc =  0.2750:   3%|▍            | 17/527 [00:00<00:03, 162.34it/s]

**********
Epoch 18 / 100 | loss =  1.6947 acc =  0.2937 | val_loss =  1.6663 val_acc =  0.3233 val_prec = 0.2955 val_rec = 0.2611
**********
Validation loss decreased (1.671751 --> 1.666321).  Saving model ...


batch_step = 527 / 527 / epoch = 19 | loss =  1.6793 acc =  0.2979: 100%|███████████| 527/527 [00:03<00:00, 173.26it/s]
batch_step = 31 / 527 / epoch = 20 | loss =  1.7063 acc =  0.2516:   3%|▍            | 17/527 [00:00<00:03, 168.82it/s]

**********
Epoch 19 / 100 | loss =  1.6793 acc =  0.2979 | val_loss =  1.6615 val_acc =  0.3300 val_prec = 0.2960 val_rec = 0.2749
**********
Validation loss decreased (1.666321 --> 1.661500).  Saving model ...


batch_step = 527 / 527 / epoch = 20 | loss =  1.6701 acc =  0.3163: 100%|███████████| 527/527 [00:03<00:00, 175.11it/s]
batch_step = 30 / 527 / epoch = 21 | loss =  1.6562 acc =  0.3167:   3%|▍            | 17/527 [00:00<00:03, 163.89it/s]

**********
Epoch 20 / 100 | loss =  1.6701 acc =  0.3163 | val_loss =  1.6544 val_acc =  0.3233 val_prec = 0.2941 val_rec = 0.2659
**********
Validation loss decreased (1.661500 --> 1.654406).  Saving model ...


batch_step = 527 / 527 / epoch = 21 | loss =  1.6648 acc =  0.3074: 100%|███████████| 527/527 [00:03<00:00, 166.83it/s]
batch_step = 31 / 527 / epoch = 22 | loss =  1.6533 acc =  0.2839:   3%|▍            | 17/527 [00:00<00:03, 164.49it/s]

**********
Epoch 21 / 100 | loss =  1.6648 acc =  0.3074 | val_loss =  1.6502 val_acc =  0.3267 val_prec = 0.2906 val_rec = 0.2630
**********
Validation loss decreased (1.654406 --> 1.650195).  Saving model ...


batch_step = 527 / 527 / epoch = 22 | loss =  1.6584 acc =  0.3102: 100%|███████████| 527/527 [00:03<00:00, 173.42it/s]
batch_step = 32 / 527 / epoch = 23 | loss =  1.6906 acc =  0.2937:   3%|▍            | 17/527 [00:00<00:03, 165.79it/s]

**********
Epoch 22 / 100 | loss =  1.6584 acc =  0.3102 | val_loss =  1.6447 val_acc =  0.3167 val_prec = 0.2869 val_rec = 0.2529
**********
Validation loss decreased (1.650195 --> 1.644710).  Saving model ...


batch_step = 527 / 527 / epoch = 23 | loss =  1.6486 acc =  0.3224: 100%|███████████| 527/527 [00:03<00:00, 173.51it/s]
batch_step = 31 / 527 / epoch = 24 | loss =  1.5736 acc =  0.3806:   3%|▍            | 17/527 [00:00<00:03, 168.83it/s]

**********
Epoch 23 / 100 | loss =  1.6486 acc =  0.3224 | val_loss =  1.6405 val_acc =  0.3200 val_prec = 0.2944 val_rec = 0.2550
**********
Validation loss decreased (1.644710 --> 1.640528).  Saving model ...


batch_step = 527 / 527 / epoch = 24 | loss =  1.6433 acc =  0.3271: 100%|███████████| 527/527 [00:03<00:00, 175.06it/s]
batch_step = 31 / 527 / epoch = 25 | loss =  1.6422 acc =  0.3194:   3%|▍            | 18/527 [00:00<00:02, 172.09it/s]

**********
Epoch 24 / 100 | loss =  1.6433 acc =  0.3271 | val_loss =  1.6404 val_acc =  0.3133 val_prec = 0.2832 val_rec = 0.2518
**********
Validation loss decreased (1.640528 --> 1.640388).  Saving model ...


batch_step = 527 / 527 / epoch = 25 | loss =  1.6295 acc =  0.3304: 100%|███████████| 527/527 [00:03<00:00, 172.24it/s]
batch_step = 29 / 527 / epoch = 26 | loss =  1.6401 acc =  0.3345:   3%|▍            | 17/527 [00:00<00:03, 164.61it/s]

**********
Epoch 25 / 100 | loss =  1.6295 acc =  0.3304 | val_loss =  1.6378 val_acc =  0.3067 val_prec = 0.2728 val_rec = 0.2401
**********
Validation loss decreased (1.640388 --> 1.637767).  Saving model ...


batch_step = 527 / 527 / epoch = 26 | loss =  1.6205 acc =  0.3353: 100%|███████████| 527/527 [00:03<00:00, 173.54it/s]
batch_step = 31 / 527 / epoch = 27 | loss =  1.6491 acc =  0.3065:   3%|▍            | 18/527 [00:00<00:02, 175.94it/s]

**********
Epoch 26 / 100 | loss =  1.6205 acc =  0.3353 | val_loss =  1.6318 val_acc =  0.3133 val_prec = 0.2776 val_rec = 0.2468
**********
Validation loss decreased (1.637767 --> 1.631799).  Saving model ...


batch_step = 527 / 527 / epoch = 27 | loss =  1.6154 acc =  0.3362: 100%|███████████| 527/527 [00:03<00:00, 173.30it/s]
batch_step = 30 / 527 / epoch = 28 | loss =  1.6515 acc =  0.3267:   3%|▍            | 17/527 [00:00<00:03, 169.37it/s]

**********
Epoch 27 / 100 | loss =  1.6154 acc =  0.3362 | val_loss =  1.6335 val_acc =  0.3033 val_prec = 0.2717 val_rec = 0.2431
**********
EarlyStopping counter: 1 out of 10


batch_step = 527 / 527 / epoch = 28 | loss =  1.6033 acc =  0.3448: 100%|███████████| 527/527 [00:03<00:00, 173.65it/s]
batch_step = 30 / 527 / epoch = 29 | loss =  1.6088 acc =  0.3233:   3%|▍            | 17/527 [00:00<00:03, 165.82it/s]

**********
Epoch 28 / 100 | loss =  1.6033 acc =  0.3448 | val_loss =  1.6262 val_acc =  0.3000 val_prec = 0.2641 val_rec = 0.2413
**********
Validation loss decreased (1.631799 --> 1.626224).  Saving model ...


batch_step = 527 / 527 / epoch = 29 | loss =  1.5976 acc =  0.3454: 100%|███████████| 527/527 [00:03<00:00, 174.11it/s]
batch_step = 29 / 527 / epoch = 30 | loss =  1.5369 acc =  0.4241:   3%|▍            | 17/527 [00:00<00:03, 165.59it/s]

**********
Epoch 29 / 100 | loss =  1.5976 acc =  0.3454 | val_loss =  1.6219 val_acc =  0.3200 val_prec = 0.2913 val_rec = 0.2507
**********
Validation loss decreased (1.626224 --> 1.621914).  Saving model ...


batch_step = 527 / 527 / epoch = 30 | loss =  1.5882 acc =  0.3552: 100%|███████████| 527/527 [00:03<00:00, 174.04it/s]
batch_step = 32 / 527 / epoch = 31 | loss =  1.5564 acc =  0.3937:   3%|▍            | 17/527 [00:00<00:03, 168.81it/s]

**********
Epoch 30 / 100 | loss =  1.5882 acc =  0.3552 | val_loss =  1.6235 val_acc =  0.3100 val_prec = 0.2801 val_rec = 0.2412
**********
EarlyStopping counter: 1 out of 10


batch_step = 527 / 527 / epoch = 31 | loss =  1.5811 acc =  0.3594: 100%|███████████| 527/527 [00:03<00:00, 174.57it/s]
batch_step = 31 / 527 / epoch = 32 | loss =  1.5528 acc =  0.3677:   3%|▍            | 17/527 [00:00<00:03, 168.77it/s]

**********
Epoch 31 / 100 | loss =  1.5811 acc =  0.3594 | val_loss =  1.6196 val_acc =  0.3000 val_prec = 0.2716 val_rec = 0.2327
**********
Validation loss decreased (1.621914 --> 1.619595).  Saving model ...


batch_step = 527 / 527 / epoch = 32 | loss =  1.5709 acc =  0.3524: 100%|███████████| 527/527 [00:03<00:00, 174.82it/s]
batch_step = 31 / 527 / epoch = 33 | loss =  1.5911 acc =  0.3452:   3%|▍            | 17/527 [00:00<00:03, 167.11it/s]

**********
Epoch 32 / 100 | loss =  1.5709 acc =  0.3524 | val_loss =  1.6213 val_acc =  0.3033 val_prec = 0.2781 val_rec = 0.2422
**********
EarlyStopping counter: 1 out of 10


batch_step = 527 / 527 / epoch = 33 | loss =  1.5594 acc =  0.3657: 100%|███████████| 527/527 [00:03<00:00, 173.73it/s]
batch_step = 30 / 527 / epoch = 34 | loss =  1.5467 acc =  0.3600:   3%|▍            | 17/527 [00:00<00:03, 168.73it/s]

**********
Epoch 33 / 100 | loss =  1.5594 acc =  0.3657 | val_loss =  1.6199 val_acc =  0.3133 val_prec = 0.2827 val_rec = 0.2404
**********
EarlyStopping counter: 2 out of 10


batch_step = 527 / 527 / epoch = 34 | loss =  1.5465 acc =  0.3727: 100%|███████████| 527/527 [00:03<00:00, 174.13it/s]
batch_step = 30 / 527 / epoch = 35 | loss =  1.5374 acc =  0.4100:   3%|▍            | 18/527 [00:00<00:02, 173.59it/s]

**********
Epoch 34 / 100 | loss =  1.5465 acc =  0.3727 | val_loss =  1.6288 val_acc =  0.2967 val_prec = 0.2713 val_rec = 0.2405
**********
EarlyStopping counter: 3 out of 10


batch_step = 527 / 527 / epoch = 35 | loss =  1.5422 acc =  0.3769: 100%|███████████| 527/527 [00:03<00:00, 175.21it/s]
batch_step = 29 / 527 / epoch = 36 | loss =  1.5770 acc =  0.3690:   3%|▍            | 17/527 [00:00<00:03, 165.41it/s]

**********
Epoch 35 / 100 | loss =  1.5422 acc =  0.3769 | val_loss =  1.6150 val_acc =  0.3167 val_prec = 0.2911 val_rec = 0.2435
**********
Validation loss decreased (1.619595 --> 1.614986).  Saving model ...


batch_step = 527 / 527 / epoch = 36 | loss =  1.5477 acc =  0.3693: 100%|███████████| 527/527 [00:03<00:00, 175.13it/s]
batch_step = 30 / 527 / epoch = 37 | loss =  1.5520 acc =  0.3567:   3%|▍            | 17/527 [00:00<00:03, 168.76it/s]

**********
Epoch 36 / 100 | loss =  1.5477 acc =  0.3693 | val_loss =  1.6194 val_acc =  0.3033 val_prec = 0.2807 val_rec = 0.2387
**********
EarlyStopping counter: 1 out of 10


batch_step = 527 / 527 / epoch = 37 | loss =  1.5344 acc =  0.3810: 100%|███████████| 527/527 [00:03<00:00, 174.35it/s]
batch_step = 30 / 527 / epoch = 38 | loss =  1.5536 acc =  0.3733:   3%|▍            | 17/527 [00:00<00:03, 168.72it/s]

**********
Epoch 37 / 100 | loss =  1.5344 acc =  0.3810 | val_loss =  1.6165 val_acc =  0.2933 val_prec = 0.2717 val_rec = 0.2295
**********
EarlyStopping counter: 2 out of 10


batch_step = 527 / 527 / epoch = 38 | loss =  1.5251 acc =  0.3943: 100%|███████████| 527/527 [00:03<00:00, 175.45it/s]
batch_step = 31 / 527 / epoch = 39 | loss =  1.4767 acc =  0.4484:   3%|▍            | 17/527 [00:00<00:03, 162.24it/s]

**********
Epoch 38 / 100 | loss =  1.5251 acc =  0.3943 | val_loss =  1.6175 val_acc =  0.2933 val_prec = 0.2731 val_rec = 0.2322
**********
EarlyStopping counter: 3 out of 10


batch_step = 527 / 527 / epoch = 39 | loss =  1.5200 acc =  0.3896: 100%|███████████| 527/527 [00:03<00:00, 175.26it/s]
batch_step = 31 / 527 / epoch = 40 | loss =  1.5511 acc =  0.3645:   3%|▍            | 18/527 [00:00<00:03, 168.64it/s]

**********
Epoch 39 / 100 | loss =  1.5200 acc =  0.3896 | val_loss =  1.6211 val_acc =  0.3133 val_prec = 0.2907 val_rec = 0.2592
**********
EarlyStopping counter: 4 out of 10


batch_step = 527 / 527 / epoch = 40 | loss =  1.5097 acc =  0.3992: 100%|███████████| 527/527 [00:03<00:00, 174.85it/s]
batch_step = 29 / 527 / epoch = 41 | loss =  1.5648 acc =  0.3448:   3%|▍            | 17/527 [00:00<00:03, 165.90it/s]

**********
Epoch 40 / 100 | loss =  1.5097 acc =  0.3992 | val_loss =  1.6148 val_acc =  0.3200 val_prec = 0.2933 val_rec = 0.2524
**********
Validation loss decreased (1.614986 --> 1.614820).  Saving model ...


batch_step = 527 / 527 / epoch = 41 | loss =  1.5048 acc =  0.3926: 100%|███████████| 527/527 [00:03<00:00, 174.42it/s]
batch_step = 30 / 527 / epoch = 42 | loss =  1.4738 acc =  0.3933:   3%|▍            | 17/527 [00:00<00:03, 163.85it/s]

**********
Epoch 41 / 100 | loss =  1.5048 acc =  0.3926 | val_loss =  1.6232 val_acc =  0.3000 val_prec = 0.2696 val_rec = 0.2315
**********
EarlyStopping counter: 1 out of 10


batch_step = 527 / 527 / epoch = 42 | loss =  1.4947 acc =  0.3977: 100%|███████████| 527/527 [00:03<00:00, 174.08it/s]
batch_step = 31 / 527 / epoch = 43 | loss =  1.5265 acc =  0.3968:   3%|▍            | 17/527 [00:00<00:03, 160.81it/s]

**********
Epoch 42 / 100 | loss =  1.4947 acc =  0.3977 | val_loss =  1.6232 val_acc =  0.3033 val_prec = 0.2727 val_rec = 0.2335
**********
EarlyStopping counter: 2 out of 10


batch_step = 527 / 527 / epoch = 43 | loss =  1.4876 acc =  0.3979: 100%|███████████| 527/527 [00:03<00:00, 174.12it/s]
batch_step = 32 / 527 / epoch = 44 | loss =  1.4341 acc =  0.4719:   3%|▍            | 18/527 [00:00<00:02, 171.89it/s]

**********
Epoch 43 / 100 | loss =  1.4876 acc =  0.3979 | val_loss =  1.6238 val_acc =  0.3067 val_prec = 0.2752 val_rec = 0.2376
**********
EarlyStopping counter: 3 out of 10


batch_step = 527 / 527 / epoch = 44 | loss =  1.4755 acc =  0.4121: 100%|███████████| 527/527 [00:03<00:00, 174.89it/s]
batch_step = 30 / 527 / epoch = 45 | loss =  1.4814 acc =  0.3967:   3%|▍            | 17/527 [00:00<00:03, 167.16it/s]

**********
Epoch 44 / 100 | loss =  1.4755 acc =  0.4121 | val_loss =  1.6265 val_acc =  0.3200 val_prec = 0.2915 val_rec = 0.2542
**********
EarlyStopping counter: 4 out of 10


batch_step = 527 / 527 / epoch = 45 | loss =  1.4642 acc =  0.4120: 100%|███████████| 527/527 [00:03<00:00, 173.41it/s]
batch_step = 29 / 527 / epoch = 46 | loss =  1.4358 acc =  0.4345:   3%|▍            | 16/527 [00:00<00:03, 155.71it/s]

**********
Epoch 45 / 100 | loss =  1.4642 acc =  0.4120 | val_loss =  1.6248 val_acc =  0.3267 val_prec = 0.2978 val_rec = 0.2663
**********
EarlyStopping counter: 5 out of 10


batch_step = 527 / 527 / epoch = 46 | loss =  1.4613 acc =  0.4165: 100%|███████████| 527/527 [00:03<00:00, 173.37it/s]
batch_step = 30 / 527 / epoch = 47 | loss =  1.4410 acc =  0.4500:   3%|▍            | 17/527 [00:00<00:03, 166.86it/s]

**********
Epoch 46 / 100 | loss =  1.4613 acc =  0.4165 | val_loss =  1.6298 val_acc =  0.3133 val_prec = 0.2739 val_rec = 0.2418
**********
EarlyStopping counter: 6 out of 10


batch_step = 527 / 527 / epoch = 47 | loss =  1.4653 acc =  0.4205: 100%|███████████| 527/527 [00:03<00:00, 174.90it/s]
batch_step = 32 / 527 / epoch = 48 | loss =  1.3685 acc =  0.4563:   3%|▍            | 17/527 [00:00<00:03, 168.77it/s]

**********
Epoch 47 / 100 | loss =  1.4653 acc =  0.4205 | val_loss =  1.6248 val_acc =  0.3200 val_prec = 0.2926 val_rec = 0.2650
**********
EarlyStopping counter: 7 out of 10


batch_step = 527 / 527 / epoch = 48 | loss =  1.4483 acc =  0.4245: 100%|███████████| 527/527 [00:02<00:00, 175.74it/s]
batch_step = 31 / 527 / epoch = 49 | loss =  1.4350 acc =  0.4194:   3%|▍            | 18/527 [00:00<00:02, 170.98it/s]

**********
Epoch 48 / 100 | loss =  1.4483 acc =  0.4245 | val_loss =  1.6296 val_acc =  0.3267 val_prec = 0.2984 val_rec = 0.2670
**********
EarlyStopping counter: 8 out of 10


batch_step = 527 / 527 / epoch = 49 | loss =  1.4378 acc =  0.4304: 100%|███████████| 527/527 [00:03<00:00, 174.96it/s]
batch_step = 30 / 527 / epoch = 50 | loss =  1.4296 acc =  0.4400:   3%|▍            | 17/527 [00:00<00:03, 167.39it/s]

**********
Epoch 49 / 100 | loss =  1.4378 acc =  0.4304 | val_loss =  1.6385 val_acc =  0.3300 val_prec = 0.3093 val_rec = 0.2781
**********
EarlyStopping counter: 9 out of 10


batch_step = 527 / 527 / epoch = 50 | loss =  1.4270 acc =  0.4406: 100%|███████████| 527/527 [00:03<00:00, 175.24it/s]


**********
Epoch 50 / 100 | loss =  1.4270 acc =  0.4406 | val_loss =  1.6307 val_acc =  0.3400 val_prec = 0.3049 val_rec = 0.2725
**********
EarlyStopping counter: 10 out of 10
Early stopping


---

In [24]:
f = open(DIRECTORY / Path("parameters.txt"), "w")
f.write(f"{DROPSIZE = }\n" + \
        f"{EMBED_DIM = }\n" + \
        f"{NUMBER_CLASSES = }\n" + \
        f"{lr = }\n"
       )
f.close()

---

### Evaluación

In [25]:
val_loss, val_acc, val_prec, val_rec = evaluate(best_model, CRITERION, VAL_ITER)
f"{val_loss = :0.4f} {val_acc = :0.4f} {val_prec = :0.4f} {val_rec = :0.4f}"

'val_loss = 1.6977 val_acc = 0.2956 val_prec = 0.2330 val_rec = 0.2236'

In [26]:
val_loss, val_acc, val_prec, val_rec = evaluate(best_model, CRITERION, TEST_ITER)
f"{val_loss = :0.4f} {val_acc = :0.4f} {val_prec = :0.4f} {val_rec = :0.4f}"

'val_loss = 1.7128 val_acc = 0.2589 val_prec = 0.2046 val_rec = 0.2204'

<hr style="height: 4px">

## Resultados

<style>
table {float:left}
</style>

### Con 3 clases

Se ha entrenado el *dataset* agrupado y **balanceado en 3 categorías** (positivo con P y P+, neutro con NEU y NONE, y negativo con N y N+).
Los siguientes resultados se han obtenido recorriendo el *dataset* en ***batches* de tamaño 10**, y con **una *patience* a 10 del *early stopping***. 

El tamaño de *batches* es el máximo que permite la memoria de la GPU donde se entrenó la red.

#### Entrenamiento

<table>
<thead>
  <tr>
    <th>Dropout</th>
    <th>Embedding</th>
    <th>8</th>
    <th>16</th>
    <th>32</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="2"><b>0.1</b></td>
    <td><b>Loss</b></td>
    <td>0.8802</td>
    <td>0.8702</td>
    <td>0.9284</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.5995</td>
    <td>0.6027</td>
    <td>0.5596</td>
  </tr>
  <tr>
    <td rowspan="2"><b>0.2</b></td>
    <td><b>Loss</b></td>
    <td>0.8665</td>
    <td>0.8787</td>
    <td>0.8703</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.6111</td>
    <td>0.6004</td>
    <td>0.6006</td>
  </tr>
</tbody>
</table>

#### Validación

<table>
<thead>
  <tr>
    <th>Dropout</th>
    <th>Embedding</th>
    <th>8</th>
    <th>16</th>
    <th>32</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="4"><b>0.1</b></td>
    <td><b>Loss</b></td>
    <td>0.9884</td>
    <td>0.9818</td>
    <td>1.0013</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.5174</td>
    <td>0.5302</td>
    <td>0.5211</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.4855</td>
    <td>0.5046</td>
    <td>0.4964</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.5045</td>
    <td>0.5110</td>
    <td>0.5044</td>
  </tr>
  <tr>
    <td rowspan="4"><b>0.2</b></td>
    <td><b>Loss</b></td>
    <td>0.9812</td>
    <td>0.9684</td>
    <td>0.9940</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.5174</td>
    <td>0.5340</td>
    <td>0.5384</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.4892</td>
    <td>0.5193</td>
    <td>0.5148</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.5102</td>
    <td>0.5153</td>
    <td>0.5243</td>
  </tr>
</tbody>
</table>

#### Test

<table>
<thead>
  <tr>
    <th>Dropout</th>
    <th>Embedding</th>
    <th>8</th>
    <th>16</th>
    <th>32</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="4"><b>0.1</b></td>
    <td><b>Loss</b></td>
    <td>1.0188</td>
    <td>1.0103</td>
    <td>1.0125</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.5130</td>
    <td>0.4963</td>
    <td>0.5009</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.4911</td>
    <td>0.4607</td>
    <td>0.4874</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.4979</td>
    <td>0.4684</td>
    <td>0.4853</td>
  </tr>
  <tr>
    <td rowspan="4"><b>0.2</b></td>
    <td><b>Loss</b></td>
    <td>1.0250</td>
    <td>1.0193</td>
    <td>1.0210</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.5093</td>
    <td>0.5037</td>
    <td>0.4815</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.4941</td>
    <td>0.4697</td>
    <td>0.4637</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.5014</td>
    <td>0.4758</td>
    <td>0.4644</td>
  </tr>
</tbody>
</table>

### Con 6 clases

Se ha entrenado el *dataset* agrupado y **con las 6 categorías**.
Los siguientes resultados se han obtenido recorriendo el *dataset* en ***batches* de tamaño 10**, y con **una *patience* a 10 del *early stopping***. 

El tamaño de *batches* es el máximo que permite la memoria de la GPU donde se entrenó la red.

#### Entrenamiento

<table>
<thead>
  <tr>
    <th>Dropout</th>
    <th>Embedding</th>
    <th>8</th>
    <th>16</th>
    <th>32</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="2"><b>0.1</b></td>
    <td><b>Loss</b></td>
    <td>1.5451</td>
    <td>1.4947</td>
    <td>1.5269</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.3767</td>
    <td>0.4080</td>
    <td>0.3915</td>
  </tr>
  <tr>
    <td rowspan="2"><b>0.2</b></td>
    <td><b>Loss</b></td>
    <td>1.6179</td>
    <td>1.5680</td>
    <td>1.5097</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.3353</td>
    <td>0.3600</td>
    <td>0.3992</td>
  </tr>
</tbody>
</table>

#### Validation

<table>
<thead>
  <tr>
    <th>Dropout</th>
    <th>Embedding</th>
    <th>8</th>
    <th>16</th>
    <th>32</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="4"><b>0.1</b></td>
    <td><b>Loss</b></td>
    <td>1.7536</td>
    <td>1.6718</td>
    <td>1.6179</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.2833</td>
    <td>0.3322</td>
    <td>0.3067</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.2507</td>
    <td>0.2780</td>
    <td>0.2792</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.2338</td>
    <td>0.2582</td>
    <td>0.2367</td>
  </tr>
  <tr>
    <td rowspan="4"><b>0.2</b></td>
    <td><b>Loss</b></td>
    <td>1.7469</td>
    <td>1.6689</td>
    <td>1.6148</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.2522</td>
    <td>0.3389</td>
    <td>0.3200</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.2075</td>
    <td>0.2825</td>
    <td>0.2933</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.1968</td>
    <td>0.2664</td>
    <td>0.2524</td>
  </tr>
</tbody>
</table>

#### Test

<table>
<thead>
  <tr>
    <th>Dropout</th>
    <th>Embedding</th>
    <th>8</th>
    <th>16</th>
    <th>32</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td rowspan="4"><b>0.1</b></td>
    <td><b>Loss</b></td>
    <td>1.7554</td>
    <td>1.7520</td>
    <td>1.7185</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.2489</td>
    <td>0.2967</td>
    <td>0.2678</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.2007</td>
    <td>0.2492</td>
    <td>0.2152</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.2048</td>
    <td>0.2537</td>
    <td>0.2120</td>
  </tr>
  <tr>
    <td rowspan="4"><b>0.2</b></td>
    <td><b>Loss</b></td>
    <td>1.7487</td>
    <td>1.7579</td>
    <td>1.7128</td>
  </tr>
  <tr>
    <td><b>Accuracy</b></td>
    <td>0.2489</td>
    <td>0.2800</td>
    <td>0.2589</td>
  </tr>
  <tr>
    <td><b>Precision</b></td>
    <td>0.2015</td>
    <td>0.2348</td>
    <td>0.2046</td>
  </tr>
  <tr>
    <td><b>Recall</b></td>
    <td>0.1851</td>
    <td>0.2444</td>
    <td>0.2204</td>
  </tr>
</tbody>
</table>