# Data Loading

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EDTWlyRw-RnVIaHQu31Tom4UZWvJcbwZ?usp=sharing)

In [3]:
!wget --no-check-certificate 'https://drive.google.com/u/1/uc?id=13TdNgvAcccAFIW0V1xFIxR4WjUeAzmqu&export=download' -O for_bert.zip

--2020-09-22 11:29:20--  https://drive.google.com/u/1/uc?id=13TdNgvAcccAFIW0V1xFIxR4WjUeAzmqu&export=download
Resolving drive.google.com (drive.google.com)... 173.194.76.102, 173.194.76.101, 173.194.76.139, ...
Connecting to drive.google.com (drive.google.com)|173.194.76.102|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-0g-9c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/5n688rpc1sr9o3190lhef1n909siu895/1600774125000/06490012421496691278/*/13TdNgvAcccAFIW0V1xFIxR4WjUeAzmqu?e=download [following]
--2020-09-22 11:29:21--  https://doc-0g-9c-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/5n688rpc1sr9o3190lhef1n909siu895/1600774125000/06490012421496691278/*/13TdNgvAcccAFIW0V1xFIxR4WjUeAzmqu?e=download
Resolving doc-0g-9c-docs.googleusercontent.com (doc-0g-9c-docs.googleusercontent.com)... 74.125.133.132, 2a00:1450:400c:c07::84
Connecting to doc-0g-9c-docs.googleusercontent.com (do

In [4]:
!unzip for_bert.zip

Archive:  for_bert.zip
  inflating: data/data.csv           
  inflating: data/held-out-data.csv  


# Imports and Preprocessing

In [6]:
!pip install -qq transformers

[K     |████████████████████████████████| 890kB 6.4MB/s 
[K     |████████████████████████████████| 890kB 32.0MB/s 
[K     |████████████████████████████████| 1.1MB 41.1MB/s 
[K     |████████████████████████████████| 3.0MB 43.4MB/s 
[?25h  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone


In [7]:
from transformers import BertTokenizer, AdamW, get_linear_schedule_with_warmup, BertForSequenceClassification
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
import pandas as pd
from collections import defaultdict
import numpy as np
from sklearn.metrics import classification_report, f1_score
from tqdm import trange, notebook

In [8]:
required_rows = ['object_a', 'object_b', 'sentence', 'most_frequent_label']
df_train = pd.read_csv('data/data.csv', usecols=required_rows)
df_test = pd.read_csv('data/held-out-data.csv', usecols=required_rows)
df_train.head()

Unnamed: 0,object_a,object_b,sentence,most_frequent_label
0,Dell,Intel,"Frankly, like a procession of formerly-consist...",NONE
1,mit,harvard,"mit, harvard",NONE
2,Microsoft,Sony,It also didn't hurt my faith in Microsoft when...,BETTER
3,Ethernet,Bluetooth,You can probably find the Onkyo TX-NR717 for m...,NONE
4,Toyota,Ford,I liked the Toyota over the Ford and it looks ...,BETTER


In [3]:
label_list = ["BETTER", "WORSE", "NONE"]
num_labels = len(label_list)
label2idx = {}
idx2label = {}
for i, label in enumerate(label_list):
  label2idx[label] = i
  idx2label[i] = label

In [4]:
df_train_bert = pd.DataFrame()
df_train_bert['sequences_1'] = df_train['sentence']
df_train_bert['sequences_2'] = df_train['object_a'] + " " + df_train['object_b']
df_train_bert['label'] = df_train['most_frequent_label'].replace(label2idx)

df_test_bert = pd.DataFrame()
df_test_bert['sequences_1'] = df_test['sentence']
df_test_bert['sequences_2'] = df_test['object_a'] + " " + df_test['object_b']
df_test_bert['label'] = df_test['most_frequent_label'].replace(label2idx)


df_train_bert.head()

Unnamed: 0,sequences_1,sequences_2,label
0,"Frankly, like a procession of formerly-consist...",Dell Intel,2
1,"mit, harvard",mit harvard,2
2,It also didn't hurt my faith in Microsoft when...,Microsoft Sony,0
3,You can probably find the Onkyo TX-NR717 for m...,Ethernet Bluetooth,2
4,I liked the Toyota over the Ford and it looks ...,Toyota Ford,0


We will use cased BERT because it can help to understand the sentiment of the Review ("This laptop is MUCH BETTER than another").

In [5]:
PRE_TRAINED_MODEL_NAME = 'bert-base-cased'
tokenizer = BertTokenizer.from_pretrained(PRE_TRAINED_MODEL_NAME)

In [6]:
lengths = []

for row in df_train.iterrows():
    example = row[1]
    tokens_a = tokenizer.tokenize(example["sentence"])
    tokens_b = tokenizer.tokenize(example["object_a"] + " " + example["object_b"])
    total_length = len(tokens_a) + len(tokens_b) + 3
    lengths.append(total_length)

len_ser = pd.Series(lengths)
len_ser.quantile([.9, .95, .96, .97, .98, .99, 1.0])

0.90     61.20
0.95     78.00
0.96     85.00
0.97     91.00
0.98    101.84
0.99    117.00
1.00    226.00
dtype: float64

As we can see from the experiment, almost all examples can be described as less than or equal to 78 BERT tokens.

In [7]:
MAX_SEQ_LEN=78

For our task we will use BERT Sequence-Pair Classification mode. For that purpose we should perform special BERT tokenization and get token_ids with segment_ids(token type ids), which show where each sequence is located, and an attention mask, which shows where we added padding.

In [8]:
encoding = tokenizer.encode_plus(
  text = df_train['sentence'][0],
  text_pair = "Dell Intel",
  max_length=78,
  add_special_tokens=True, # Add '[CLS]' and '[SEP]'
  return_token_type_ids=True,
  pad_to_max_length=True,
  return_attention_mask=True,
  return_tensors='pt',  # Return PyTorch tensors
)
print(tokenizer.convert_ids_to_tokens(encoding['input_ids'][0]))
encoding

Truncation was not explicitely activated but `max_length` is provided a specific value, please use `truncation=True` to explicitely truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


['[CLS]', 'Frank', '##ly', ',', 'like', 'a', 'procession', 'of', 'formerly', '-', 'consistently', 'superior', 'performing', 'technology', 'icons', 'before', 'it', ',', 'including', 'J', '##DS', 'Un', '##ip', '##has', '##e', ',', 'Co', '##gni', '##zan', '##t', 'Technologies', ',', 'Dell', ',', 'Microsoft', ',', 'and', 'Intel', ',', 'C', '##isco', 'has', 'been', 'a', 'total', 'return', 'has', '-', 'been', 'for', 'quite', 'some', 'time', '.', '[SEP]', 'Dell', 'Intel', '[SEP]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]']




{'input_ids': tensor([[  101,  2748,  1193,   117,  1176,   170, 16018,  1104,  3147,   118,
         10887,  7298,  4072,  2815, 22493,  1196,  1122,   117,  1259,   147,
         13675, 12118,  9717, 16481,  1162,   117,  3291, 22152, 14883,  1204,
         14164,   117, 18451,   117,  6998,   117,  1105, 15397,   117,   140,
         21097,  1144,  1151,   170,  1703,  1862,  1144,   118,  1151,  1111,
          2385,  1199,  1159,   119,   102, 18451, 15397,   102,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

In [9]:
class ComparativeDataset(Dataset):
  def __init__(self, sequences_1, sequences_2, labels, tokenizer, max_len):
    self.sequences_1 = sequences_1
    self.sequences_2 = sequences_2
    self.labels = labels
    self.tokenizer = tokenizer
    self.max_len = max_len
  def __len__(self):
    return len(self.sequences_1)
  def __getitem__(self, item):
    sequence_1 = str(self.sequences_1[item])
    sequence_2 = str(self.sequences_2[item])
    label = self.labels[item]
    encoding = self.tokenizer.encode_plus(
      text = sequence_1,
      text_pair = sequence_2,
      truncation = True,
      max_length=78,
      add_special_tokens=True, # Add '[CLS]' and '[SEP]'
      return_token_type_ids=True, # Segment_ids
      padding = 'max_length', # Padding to max_len
      return_attention_mask=True,
      return_tensors='pt',  # Return PyTorch tensors
)
    return {
      'review_text': sequence_1,
      'input_ids': encoding['input_ids'].flatten(),
      'segment_ids': encoding['token_type_ids'].flatten(),
      'attention_mask': encoding['attention_mask'].flatten(),
      'label': torch.tensor(label, dtype=torch.long)
    }

In [10]:
def create_data_loader(df, tokenizer, max_len, batch_size):
  dataset = ComparativeDataset(
    sequences_1=df.sequences_1.to_numpy(),
    sequences_2=df.sequences_2.to_numpy(),
    labels=df.label.to_numpy(),
    tokenizer=tokenizer,
    max_len=max_len
  )
  return DataLoader(
    dataset,
    batch_size=batch_size,
    num_workers=4
  )
BATCH_SIZE = 16
train_data_loader = create_data_loader(df_train_bert, tokenizer, MAX_SEQ_LEN, BATCH_SIZE)
test_data_loader = create_data_loader(df_test_bert, tokenizer, MAX_SEQ_LEN, BATCH_SIZE)

In [11]:
data = next(iter(train_data_loader))

print(data['input_ids'].shape)
print(data['attention_mask'].shape)
print(data['label'].shape)

torch.Size([16, 78])
torch.Size([16, 78])
torch.Size([16])


# BERT

Hyperparameters were Fine-tuned with respect to the paper (https://www.lsv.uni-saarland.de/wp-content/publications/2020/On_the_Stability_of_Fine-tuning_BERT_preprint.pdf).

In [12]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = BertForSequenceClassification.from_pretrained(PRE_TRAINED_MODEL_NAME, num_labels=len(label_list), return_dict=True)
model.to(device)
print("Done")

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

Done


In [13]:
EPOCHS = 20
optimizer = AdamW(model.parameters(), lr=1e-5, correct_bias=True)
total_steps = len(train_data_loader) * EPOCHS
scheduler = get_linear_schedule_with_warmup(
  optimizer,
  num_warmup_steps=0,
  num_training_steps=total_steps
)
loss_fn = nn.CrossEntropyLoss().to(device)

In [19]:
def train_epoch(
  model,
  data_loader,
  loss_fn,
  optimizer,
  device,
  scheduler,
  n_examples
):
  model = model.train()
  losses = []
  predictions = []
  correct_predictions = 0
  for batch in notebook.tqdm(data_loader, desc='Iteration'):
    input_ids = batch["input_ids"].to(device)
    attention_mask = batch["attention_mask"].to(device)
    token_type_ids = batch['segment_ids'].to(device)
    labels = batch["label"].to(device)
    outputs = model(
      input_ids=input_ids,
      token_type_ids = token_type_ids,
      attention_mask=attention_mask
    )
    _, preds = torch.max(outputs.logits, dim=1)
    if len(predictions) == 0:
      predictions.append(preds.detach().cpu().numpy())
    else:
      predictions[0] = np.append(
        predictions[0], preds.detach().cpu().numpy(), axis=0)
    loss = loss_fn(outputs.logits, labels)
    correct_predictions += torch.sum(preds == labels)
    losses.append(loss.item())
    loss.backward()
#    nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
    optimizer.step()
    scheduler.step()
    optimizer.zero_grad()
  predictions = predictions[0]
  return correct_predictions.double() / n_examples, np.mean(losses), predictions

In [20]:
def eval_model(model, data_loader, loss_fn, device, n_examples):
  model = model.eval()
  losses = []
  predictions = []
  correct_predictions = 0
  with torch.no_grad():
    for batch in notebook.tqdm(data_loader, desc="Iteration"):
      input_ids = batch["input_ids"].to(device)
      attention_mask = batch["attention_mask"].to(device)
      labels = batch["label"].to(device)
      token_type_ids = batch['segment_ids'].to(device)
      outputs = model(
        input_ids=input_ids,
        token_type_ids = token_type_ids,
        attention_mask=attention_mask
      )
      _, preds = torch.max(outputs.logits, dim=1)
      if len(predictions) == 0:
        predictions.append(preds.detach().cpu().numpy())
      else:
        predictions[0] = np.append(
          predictions[0], preds.detach().cpu().numpy(), axis=0)
      loss = loss_fn(outputs.logits, labels)
      correct_predictions += torch.sum(preds == labels)
      losses.append(loss.item())
  predictions = predictions[0]
  return correct_predictions.double() / n_examples, np.mean(losses), predictions

In [23]:
history = defaultdict(list)
best_accuracy = 0
best_micro_f = 0
for epoch in trange(EPOCHS , desc="Epoch"):
  train_acc, train_loss, predictions = train_epoch(
    model,
    train_data_loader,
    loss_fn,
    optimizer,
    device,
    scheduler,
    len(df_train)
  )
  train_micro_f_score = f1_score(df_train_bert['label'].to_numpy(), predictions, average='macro')
  print(f'Train loss {train_loss} accuracy {train_acc} macro_f1_score {train_micro_f_score}')
  val_acc, val_loss, predictions = eval_model(
    model,
    test_data_loader,
    loss_fn,
    device,
    len(df_test)
  )
  test_micro_f_score = f1_score(df_test_bert['label'].to_numpy(), predictions, average='macro')
  print(f'Val   loss {val_loss} accuracy {val_acc} val_macro_f1_score {test_micro_f_score}')
  print()
  history['train_acc'].append(train_acc)
  history['train_loss'].append(train_loss)
  history['val_acc'].append(val_acc)
  history['val_loss'].append(val_loss)
  history['train_micro_f'].append(train_micro_f_score)
  history['test_micro_f'].append(test_micro_f_score)
  if test_micro_f_score > best_micro_f:
    print(classification_report(df_test_bert['label'].to_numpy(), predictions, target_names=label_list, digits=3))
    torch.save(model.state_dict(), 'best_model_state.bin')
    best_micro_f = test_micro_f_score
    best_accuracy = val_acc




Epoch:   0%|          | 0/20 [00:00<?, ?it/s][A[A[A

HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.04209577786483957 accuracy 0.9878451119986108 macro_f1_score 0.9754032448778934


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…


Val   loss 0.5090154979830711 accuracy 0.8777777777777778 val_macro_f1_score 0.7816881258941345

              precision    recall  f1-score   support

      BETTER      0.794     0.806     0.800       273
       WORSE      0.632     0.605     0.618       119
        NONE      0.927     0.927     0.927      1048

    accuracy                          0.878      1440
   macro avg      0.784     0.779     0.782      1440
weighted avg      0.877     0.878     0.877      1440






Epoch:   5%|▌         | 1/20 [01:00<19:07, 60.37s/it][A[A[A

HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.03667911846795404 accuracy 0.9902760895988886 macro_f1_score 0.9804800578622467


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  10%|█         | 2/20 [01:59<17:58, 59.94s/it][A[A[A


Val   loss 0.5453328105808598 accuracy 0.8694444444444445 val_macro_f1_score 0.7721560366256085



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.023543923547873016 accuracy 0.9934016322278173 macro_f1_score 0.9864554024792597


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  15%|█▌        | 3/20 [02:57<16:52, 59.54s/it][A[A[A


Val   loss 0.5403783120480108 accuracy 0.8777777777777778 val_macro_f1_score 0.7806145263049019



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.016658253906320572 accuracy 0.995311686056607 macro_f1_score 0.9908957135382749


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…


Val   loss 0.5651591517723217 accuracy 0.8805555555555555 val_macro_f1_score 0.7872171327783354

              precision    recall  f1-score   support

      BETTER      0.795     0.810     0.802       273
       WORSE      0.611     0.647     0.629       119
        NONE      0.936     0.926     0.931      1048

    accuracy                          0.881      1440
   macro avg      0.781     0.794     0.787      1440
weighted avg      0.883     0.881     0.882      1440






Epoch:  20%|██        | 4/20 [03:57<15:55, 59.70s/it][A[A[A

HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.013663254598618045 accuracy 0.995311686056607 macro_f1_score 0.9899220882580133


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  25%|██▌       | 5/20 [04:56<14:50, 59.39s/it][A[A[A


Val   loss 0.6081294461324837 accuracy 0.875 val_macro_f1_score 0.776397974021607



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.008600892878813384 accuracy 0.9979163049140475 macro_f1_score 0.9954501113670772


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  30%|███       | 6/20 [05:55<13:48, 59.17s/it][A[A[A


Val   loss 0.6351205138622188 accuracy 0.8729166666666667 val_macro_f1_score 0.7733215378900534



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.0113198569008091 accuracy 0.9965271748567459 macro_f1_score 0.9945272270958196


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  35%|███▌      | 7/20 [06:53<12:46, 58.97s/it][A[A[A


Val   loss 0.6221615261918891 accuracy 0.8756944444444444 val_macro_f1_score 0.7678606101136625



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.0074679019324119305 accuracy 0.9980899461712102 macro_f1_score 0.9959474480634278


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  40%|████      | 8/20 [07:52<11:46, 58.86s/it][A[A[A


Val   loss 0.6281113394455234 accuracy 0.8798611111111111 val_macro_f1_score 0.7814462662877969



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.006509325276824206 accuracy 0.9986108699426983 macro_f1_score 0.9968139448175585


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…


Val   loss 0.6514436412098197 accuracy 0.8819444444444445 val_macro_f1_score 0.7907247163686133

              precision    recall  f1-score   support

      BETTER      0.771     0.839     0.804       273
       WORSE      0.626     0.647     0.636       119
        NONE      0.945     0.920     0.932      1048

    accuracy                          0.882      1440
   macro avg      0.781     0.802     0.791      1440
weighted avg      0.886     0.882     0.883      1440






Epoch:  45%|████▌     | 9/20 [08:52<10:50, 59.15s/it][A[A[A

HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.003971954518783604 accuracy 0.9994790762285118 macro_f1_score 0.9987974585881458


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  50%|█████     | 10/20 [09:50<09:49, 59.00s/it][A[A[A


Val   loss 0.6539375089689403 accuracy 0.8833333333333334 val_macro_f1_score 0.7891413654896944



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.003337763841611579 accuracy 0.9993054349713492 macro_f1_score 0.9989182785081527


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  55%|█████▌    | 11/20 [10:49<08:49, 58.86s/it][A[A[A


Val   loss 0.6713420505077617 accuracy 0.8840277777777779 val_macro_f1_score 0.7900173705875542



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.0032877545554785884 accuracy 0.9994790762285118 macro_f1_score 0.9992239908337099


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  60%|██████    | 12/20 [11:47<07:50, 58.76s/it][A[A[A


Val   loss 0.6742817142761649 accuracy 0.8798611111111111 val_macro_f1_score 0.7866633842203191



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.0033762031264308865 accuracy 0.998784511199861 macro_f1_score 0.9976602403649234


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  65%|██████▌   | 13/20 [12:46<06:50, 58.71s/it][A[A[A


Val   loss 0.67078204359358 accuracy 0.8819444444444445 val_macro_f1_score 0.787196558150435



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.003670696967835991 accuracy 0.9989581524570237 macro_f1_score 0.9980246411938406


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  70%|███████   | 14/20 [13:45<05:51, 58.64s/it][A[A[A


Val   loss 0.6708672442725704 accuracy 0.8819444444444445 val_macro_f1_score 0.787196558150435



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.0026215957703243477 accuracy 0.9993054349713492 macro_f1_score 0.9984068703117045


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  75%|███████▌  | 15/20 [14:43<04:53, 58.62s/it][A[A[A


Val   loss 0.6708672442725704 accuracy 0.8819444444444445 val_macro_f1_score 0.787196558150435



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.0026919474663979295 accuracy 0.9994790762285118 macro_f1_score 0.999026178378112


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  80%|████████  | 16/20 [15:42<03:54, 58.61s/it][A[A[A


Val   loss 0.6708672442725704 accuracy 0.8819444444444445 val_macro_f1_score 0.787196558150435



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.0024911736364123904 accuracy 0.9994790762285118 macro_f1_score 0.9992230255592096


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  85%|████████▌ | 17/20 [16:40<02:55, 58.60s/it][A[A[A


Val   loss 0.6708672442725704 accuracy 0.8819444444444445 val_macro_f1_score 0.787196558150435



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.0013432809381710184 accuracy 0.9999999999999999 macro_f1_score 1.0


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  90%|█████████ | 18/20 [17:39<01:57, 58.58s/it][A[A[A


Val   loss 0.6708672442725704 accuracy 0.8819444444444445 val_macro_f1_score 0.787196558150435



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.003551163971537284 accuracy 0.9993054349713492 macro_f1_score 0.9990313502091016


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch:  95%|█████████▌| 19/20 [18:37<00:58, 58.59s/it][A[A[A


Val   loss 0.6708672442725704 accuracy 0.8819444444444445 val_macro_f1_score 0.787196558150435



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=360.0, style=ProgressStyle(description_wi…


Train loss 0.0020949217016701977 accuracy 0.9996527174856745 macro_f1_score 0.9996147299282311


HBox(children=(FloatProgress(value=0.0, description='Iteration', max=90.0, style=ProgressStyle(description_wid…




Epoch: 100%|██████████| 20/20 [19:36<00:00, 58.82s/it]


Val   loss 0.6708672442725704 accuracy 0.8819444444444445 val_macro_f1_score 0.787196558150435




