<a href="https://colab.research.google.com/github/andrewdge/CSE354-Final-Project/blob/doc_agg/CSE354_Final_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Document Level Predictions Using Paragraph Level Sentimient

Using the PerSent dataset, we will be training our model based on paragraph-level sentiments. These will be aggregated to produce document-level sentiments.

In [2]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.18.0-py3-none-any.whl (4.0 MB)
[K     |████████████████████████████████| 4.0 MB 27.1 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 54.7 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 61.1 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.6.0-py3-none-any.whl (84 kB)
[K     |████████████████████████████████| 84 kB 4.5 MB/s 
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.53.tar.gz (880 kB)
[K     |████████████████████████████████| 880 kB 72.4 MB/s 
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
  Created wheel for sacr

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
import torch
import math
import pandas as pd
from sklearn.model_selection import train_test_split
from torch.utils.data import Dataset, TensorDataset, DataLoader, random_split
from torch.nn.utils.rnn import pad_sequence
from tqdm import tqdm
import numpy as np
import os
from sklearn.metrics import precision_score, recall_score, f1_score
from transformers import DistilBertModel, DistilBertConfig, DistilBertTokenizer, AdamW, DistilBertForSequenceClassification
from collections import Counter
torch.manual_seed(42)
np.random.seed(42)

# Constants

Constants we will use in our experiments. These may be subjected to change as hyperparameters

In [5]:
DISTILBERT_DROPOUT = 0.2
DISTILBERT_ATT_DROPOUT = 0.2
BATCH_SIZE = 16
EPOCHS = 30

# Andrew PATH
TEST_PATH = '/content/drive/MyDrive/CSE354/random_test.csv'
TEST_PATH_2 = '/content/drive/MyDrive/CSE354/fixed_test.csv'
TRAIN_PATH = '/content/drive/MyDrive/CSE354/train.csv'
VAL_PATH = '/content/drive/MyDrive/CSE354/dev.csv'
SAVE_PATH = '/content/drive/MyDrive/CSE354/models'

test_data = pd.read_csv(TEST_PATH)
test_data_2 = pd.read_csv(TEST_PATH_2)
train_data = pd.read_csv(TRAIN_PATH)
# fixed_test will be used for validation
val_data = pd.read_csv(VAL_PATH)



# Initializing Our Model

Here is where we set up our DistilBERT model.

In [6]:
class DistillBERT():
  def __init__(self, model_name='distilbert-base-uncased'):
    # TODO(students): start
    self.tokenizer = DistilBertTokenizer.from_pretrained(model_name)
    config = DistilBertConfig(dropout=DISTILBERT_DROPOUT, 
                          attention_dropout=DISTILBERT_ATT_DROPOUT, 
                          output_hidden_states=True)
    self.model = DistilBertForSequenceClassification.from_pretrained(model_name, num_labels=3)
    # TODO(students): end
  def get_tokenizer_and_model(self):
    return self.model, self.tokenizer

# DataLoader

This class handles loading, preprocessing, and tokenizing the data.

Each row in the dataframe contains text with some number of paragraphs, as well as a number as labels per paragraph. We add another column in the dataframe, paragraphs per document. This will be used later to test our predictions as compare paragraph-level predictions to paragraph labels, as well as document-level predictions to document labels. We also remove data without paragraph-level labels.

For the labels create new columns for each.

This format is largely takes inspiration from Assignment 3.

In [14]:
class DatasetLoader(Dataset):
  def __init__(self, data, tokenizer):
    # Data is the uncleaned data, as a dataframe.
    self.data = data
    self.tokenizer = tokenizer

  def preprocess_data(self):
    # Combine labels into list.
    df = self.data
    df = df[df['Paragraph0'].notna()]
    cols = df.iloc[:, 6:].columns
    df['Paragraph Labels'] = [[x for x in ary if x==x] for ary in df.iloc[:,6:].to_records(index=False)]
    
    self.data = df

  def tokenize_data(self, mode="paragraph"):
    # Tokenizing
    tokens = []
    labels = []
    label_dict = {'Negative': 0,
                  'Neutral': 1,
                  'Positive': 2}
    document_list = self.data['DOCUMENT']

    # Tokenizes documents by paragraphs. Tokens = paragraph, labels = paragraph sentiment
    if mode == "paragraph":
      label_list = self.data['Paragraph Labels']
      for (document, doc_labels) in tqdm(zip(document_list, label_list), total=len(document_list)):
        paragraphs = document.split('\n')
        for paragraph, label in zip(paragraphs, doc_labels):
          encoding = self.tokenizer(text=paragraph, truncation='longest_first', max_length=512, return_tensors='pt')
          labels.append(label_dict[label])
          tokens.append(encoding.input_ids[0]) # Might need to CUDA

    # Tokenizes documents by document. Tokens = document, labels = document sentiment
    if mode == "document":
      label_list = self.data['TRUE_SENTIMENT']
      for (document, true_label) in tqdm(zip(document_list, label_list), total=len(document_list)):
        encoding = self.tokenizer(text=document, truncation='longest_first', max_length=512, return_tensors='pt')
        labels.append(label_dict[true_label])
        tokens.append(encoding.input_ids[0]) # Might need to CUDA

    tokens = pad_sequence(tokens, batch_first=True)
    labels = torch.tensor(labels)
    labels.to("cuda:0" if torch.cuda.is_available() else "cpu")
    tokens.to("cuda:0" if torch.cuda.is_available() else "cpu")
    dataset = TensorDataset(tokens, labels)
    return dataset

  def get_data_loaders(self, mode, shuffle=True):
    self.preprocess_data()
    processed_dataset = self.tokenize_data(mode=mode)
    data_loader = DataLoader(
        processed_dataset,
        shuffle=shuffle,
        batch_size=BATCH_SIZE
    )
    return data_loader

In [18]:
class Trainer():

  def __init__(self, args):
    self.train_data = args['train_data']
    self.val_data = args['val_data']
    self.batch_size = args['batch_size']
    self.epochs = args['epochs']
    self.save_path = args['save_path']
    self.training_type = args['training_type']
    self.device = args['device']
    self.model_type = args['model']
    transformer = DistillBERT()
    self.model, self.tokenizer = transformer.get_tokenizer_and_model()
    self.model.to(self.device)
    self.val_preds = []
    self.train_preds = []

  def get_performance_metrics(self, preds, labels, mode=None):
    pred_flat = preds
    # First portion from training preds, second from val preds
    if mode == "training":
      pred_flat = np.argmax(preds, axis=1).flatten()
      self.train_preds.extend(pred_flat)
    if mode == "val":
      pred_flat = np.argmax(preds, axis=1).flatten()
      self.val_preds.extend(pred_flat)
    labels_flat = labels.flatten()
    precision = precision_score(labels_flat, pred_flat, zero_division=0, average='macro')
    recall = recall_score(labels_flat, pred_flat, zero_division=0, average='macro')
    f1 = f1_score(labels_flat, pred_flat, zero_division=0, average='macro')
    return precision, recall, f1

  def set_training_parameters(self):
    
    if self.training_type == 'frozen_layers':
      for param in self.model.base_model.parameters():
        param.requires_grad = False
    elif self.training_type == 'all_training':
      pass
    elif self.training_type == "top_2_training":
      print("2")
      for name, param in self.model.named_parameters():
        if "transformer" or "embeddings" in name:
          print(name)
          param.requires_grad = False
          if "layer.4" or "layer.5" in name:
            param.requires_grad = True
        else:
          param.requires_grad = True

  def train(self, data_loader, optimizer):
    self.model.train()
    total_recall = 0
    total_precision = 0
    total_f1 = 0
    total_loss = 0

    for batch_idx, (reviews, labels) in enumerate(tqdm(data_loader)):
      self.model.zero_grad()
      # TODO(students): start
      output = self.model(reviews.to(self.device), labels=labels.to(self.device)) 
      loss = output.loss
      logits = output.logits
      with torch.no_grad():
        precision, recall, f1 = self.get_performance_metrics(logits.cpu(), labels.cpu(), mode='training')
      loss.backward()
      optimizer.step()
      total_loss += loss
      total_recall += recall
      total_precision += precision
      total_f1 += f1
      # TODO(students): end
    precision = total_precision/len(data_loader)
    recall = total_recall/len(data_loader)
    f1 = total_f1/len(data_loader)
    loss = total_loss/len(data_loader)

    return precision, recall, f1, loss

  def eval(self, data_loader):
    self.model.eval()
    total_recall = 0
    total_precision = 0
    total_f1 = 0
    total_loss = 0

    with torch.no_grad():
      for (reviews, labels) in tqdm(data_loader):
        # TODO(students): start
        output = self.model(reviews.to(self.device), labels=labels.to(self.device)) 
        prec, rec, f1 = self.get_performance_metrics(output.logits.cpu(), labels.cpu(), mode='val')
        total_recall += rec
        total_precision += prec
        total_f1 += f1
        total_loss += output.loss
        # TODO(students): end
    
    precision = total_precision/len(data_loader)
    recall = total_recall/len(data_loader)
    f1 = total_f1/len(data_loader)
    loss = total_loss/len(data_loader)

    return precision, recall, f1, loss
  
  def eval_doc(self, dataset, mode):
    # Predictions per paragraph
    preds = []
    index = 0
    if mode == "training":
      preds = self.train_preds
    if mode == "val":
      preds = self.val_preds
    df = dataset.data
    doc_preds = [
    label_dict = {'Negative': 0,
                  'Neutral': 1,
                  'Positive': 2}
    df = df.replace({'TRUE_SENTIMENT': label_dict})
    doc_labels = df['TRUE_SENTIMENT'].values
    # For each document, check the number of predicted labels corresponding with its paragraphs.
    for idx, doc in df.iterrows():
      true_sentiment = doc['TRUE_SENTIMENT']
      p_labels = doc['Paragraph Labels']
      doc_paragraph_preds = []
      # For the number of paragraphs (denoted by p_labels) per document, append the prediction list for that document
      # based on the entire list of paragraph predictions, preds.
      for i in range(len(p_labels)):
        doc_paragraph_preds.append(preds[index])
        index += 1
      # Find the most common and append that to the document predictions.
      c = Counter(doc_paragraph_preds)
      doc_preds.append(c.most_common(1)[0][0])
    self.train_preds = []
    self.val_preds = []
    return self.get_performance_metrics(doc_preds, doc_labels)
    

  def save_transformer(self):
    self.model.save_pretrained(self.save_path)
    self.tokenizer.save_pretrained(self.save_path)

  def execute(self):
    last_best = 0
    train_dataset = DatasetLoader(self.train_data, self.tokenizer)
    train_data_loader = train_dataset.get_data_loaders(mode=self.model_type)
    val_dataset = DatasetLoader(self.val_data, self.tokenizer)
    val_data_loader = val_dataset.get_data_loaders(mode=self.model_type)
    optimizer = AdamW(self.model.parameters(), lr = 3e-5, eps = 1e-8)
    self.set_training_parameters()
    for epoch_i in range(0, self.epochs):
      train_precision, train_recall, train_f1, train_loss = self.train(train_data_loader, optimizer)
      print(f'Epoch {epoch_i + 1}: train_loss: {train_loss:.4f} train_precision: {train_precision:.4f} train_recall: {train_recall:.4f} train_f1: {train_f1:.4f}')
      if (self.model_type == 'paragraph'):
        doc_precision, doc_recall, doc_f1 = self.eval_doc(train_dataset, mode='training')
        print(f'doc_precision: {doc_precision:.4f} doc_recall: {doc_recall:.4f}, doc_f1: {doc_f1:.4f}')
      val_precision, val_recall, val_f1, val_loss = self.eval(val_data_loader)
      print(f'Epoch {epoch_i + 1}: val_loss: {val_loss:.4f} val_precision: {val_precision:.4f} val_recall: {val_recall:.4f} val_f1: {val_f1:.4f}')
      if val_f1 > last_best:
        print("Saving model..")
        self.save_transformer()
        last_best = val_f1
        print("Model saved.")

In [None]:
import os
import gc
gc.collect()
torch.cuda.empty_cache()
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
args = {}
args['batch_size'] = BATCH_SIZE
args['device'] = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
args['train_data'] = train_data
args['val_data'] = val_data
args['save_path'] = SAVE_PATH + '_top_2_paragraphs'
args['epochs'] = EPOCHS
args['training_type'] = 'top_2_training'
args['model'] = "paragraph"
print(args['device'])
CUDA_LAUNCH_BLOCKING=1
trainer = Trainer(args)

trainer.execute()

cuda:0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_projector.weight', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'classifier.bias', 'pre_classifi

2
distilbert.embeddings.word_embeddings.weight
distilbert.embeddings.position_embeddings.weight
distilbert.embeddings.LayerNorm.weight
distilbert.embeddings.LayerNorm.bias
distilbert.transformer.layer.0.attention.q_lin.weight
distilbert.transformer.layer.0.attention.q_lin.bias
distilbert.transformer.layer.0.attention.k_lin.weight
distilbert.transformer.layer.0.attention.k_lin.bias
distilbert.transformer.layer.0.attention.v_lin.weight
distilbert.transformer.layer.0.attention.v_lin.bias
distilbert.transformer.layer.0.attention.out_lin.weight
distilbert.transformer.layer.0.attention.out_lin.bias
distilbert.transformer.layer.0.sa_layer_norm.weight
distilbert.transformer.layer.0.sa_layer_norm.bias
distilbert.transformer.layer.0.ffn.lin1.weight
distilbert.transformer.layer.0.ffn.lin1.bias
distilbert.transformer.layer.0.ffn.lin2.weight
distilbert.transformer.layer.0.ffn.lin2.bias
distilbert.transformer.layer.0.output_layer_norm.weight
distilbert.transformer.layer.0.output_layer_norm.bias
dist

100%|██████████| 1498/1498 [20:47<00:00,  1.20it/s]


Epoch 1: train_loss: 0.9541 train_precision: 0.2779 train_recall: 0.3666 train_f1: 0.2786
doc_precision: 0.3053 doc_recall: 0.3394, doc_f1: 0.3127


100%|██████████| 261/261 [01:16<00:00,  3.43it/s]


Epoch 1: val_loss: 0.9411 val_precision: 0.3458 val_recall: 0.3762 val_f1: 0.3120
Saving model..
Model saved.


100%|██████████| 1498/1498 [20:54<00:00,  1.19it/s]


Epoch 2: train_loss: 0.9283 train_precision: 0.3556 train_recall: 0.3984 train_f1: 0.3427
doc_precision: 0.2975 doc_recall: 0.3288, doc_f1: 0.3035


100%|██████████| 261/261 [01:16<00:00,  3.42it/s]


Epoch 2: val_loss: 0.8923 val_precision: 0.4035 val_recall: 0.4136 val_f1: 0.3604
Saving model..
Model saved.


 75%|███████▌  | 1127/1498 [15:44<05:10,  1.19it/s]

In [25]:
class Tester():

  def __init__(self, args):
    self.batch_size = args['batch_size']
    self.save_path = args['save_path']
    self.test_data = args['test_data']
    self.device = args['device']
    self.model_type = args['model']
    transformer = DistillBERT(self.save_path)
    self.model, self.tokenizer = transformer.get_tokenizer_and_model()
    self.model.to(self.device)
    self.val_preds = []
    self.train_preds = []

  def get_performance_metrics(self, preds, labels, mode=None):
    pred_flat = preds
    # First portion from training preds, second from val preds
    if mode == "training":
      pred_flat = np.argmax(preds, axis=1).flatten()
      self.train_preds.extend(pred_flat)
    if mode == "val":
      pred_flat = np.argmax(preds, axis=1).flatten()
      self.val_preds.extend(pred_flat)
    labels_flat = labels.flatten()
    precision = precision_score(labels_flat, pred_flat, zero_division=0, average='macro')
    recall = recall_score(labels_flat, pred_flat, zero_division=0, average='macro')
    f1 = f1_score(labels_flat, pred_flat, zero_division=0, average='macro')
    return precision, recall, f1

  def test(self, data_loader):
    self.model.eval()
    total_recall = 0
    total_precision = 0
    total_f1 = 0
    total_loss = 0

    with torch.no_grad():
      for (reviews, labels) in tqdm(data_loader):
        # TODO(students): start
        output = self.model(reviews.to(self.device), labels=labels.to(self.device)) 
        prec, rec, f1 = self.get_performance_metrics(output.logits.cpu(), labels.cpu(), mode='val')
        total_recall += rec
        total_precision += prec
        total_f1 += f1
        total_loss += output.loss
        # TODO(students): end
    
    precision = total_precision/len(data_loader)
    recall = total_recall/len(data_loader)
    f1 = total_f1/len(data_loader)
    loss = total_loss/len(data_loader)

    return precision, recall, f1, loss
  def eval_doc(self, dataset, mode):
    # Predictions per paragraph
    preds = []
    index = 0
    if mode == "training":
      preds = self.train_preds
    if mode == "val":
      preds = self.val_preds
    df = dataset.data
    doc_preds = []
    label_dict = {'Negative': 0,
                  'Neutral': 1,
                  'Positive': 2}
    df = df.replace({'TRUE_SENTIMENT': label_dict})
    doc_labels = df['TRUE_SENTIMENT'].values
    # For each document, check the number of predicted labels corresponding with its paragraphs.
    for idx, doc in df.iterrows():
      true_sentiment = doc['TRUE_SENTIMENT']
      p_labels = doc['Paragraph Labels']
      doc_paragraph_preds = []
      # For the number of paragraphs (denoted by p_labels) per document, append the prediction list for that document
      # based on the entire list of paragraph predictions, preds.
      for i in range(len(p_labels)):
        doc_paragraph_preds.append(preds[index])
        index += 1
      # Find the most common and append that to the document predictions.
      c = Counter(doc_paragraph_preds)
      doc_preds.append(c.most_common(1)[0][0])
    self.train_preds = []
    self.val_preds = []
    return self.get_performance_metrics(doc_preds, doc_labels)
  
  def execute(self):
    test_dataset = DatasetLoader(self.test_data, self.tokenizer)
    test_data_loader = test_dataset.get_data_loaders(mode=self.model_type)

    test_precision, test_recall, test_f1, test_loss = self.test(test_data_loader)
    if (self.model_type == 'paragraph'):
        doc_precision, doc_recall, doc_f1 = self.eval_doc(test_dataset, mode='val')
        print(f'doc_precision: {doc_precision:.4f} doc_recall: {doc_recall:.4f}, doc_f1: {doc_f1:.4f}')
    print()
    print(f'test_loss: {test_loss:.4f} test_precision: {test_precision:.4f} test_recall: {test_recall:.4f} test_f1: {test_f1:.4f}')

In [26]:
args = {}
args['batch_size'] = BATCH_SIZE
args['device'] = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
args['test_data'] = test_data
args['save_path'] = SAVE_PATH + '_frozen_paragraphs'
args['model'] = "paragraph"
tester = Tester(args)
tester.execute()

args = {}
args['batch_size'] = BATCH_SIZE
args['device'] = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
args['test_data'] = test_data_2
args['save_path'] = SAVE_PATH + '_frozen_paragraphs'
args['model'] = 'paragraph'
tester = Tester(args)
tester.execute()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if sys.path[0] == '':
100%|██████████| 527/527 [00:05<00:00, 92.21it/s]
100%|██████████| 267/267 [01:11<00:00,  3.71it/s]


doc_precision: 0.2899 doc_recall: 0.3268, doc_f1: 0.3004

test_loss: 0.9759 test_precision: 0.3550 test_recall: 0.4019 test_f1: 0.3648


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if sys.path[0] == '':
100%|██████████| 713/713 [00:08<00:00, 88.62it/s]
100%|██████████| 374/374 [01:46<00:00,  3.52it/s]

doc_precision: 0.2829 doc_recall: 0.3376, doc_f1: 0.2950

test_loss: 1.0302 test_precision: 0.2914 test_recall: 0.3461 test_f1: 0.3051





# DO NOT DELETE
##Training all layer results:
Epoch 3: train_loss: 0.9514 train_precision: 0.2719 train_recall: 0.3635 train_f1: 0.2743
100%|██████████| 261/261 [01:09<00:00,  3.77it/s]Epoch 3: val_loss: 0.9390 val_precision: 0.1770 val_recall: 0.3678 val_f1: 0.2352

##Freeze all distilbert layer results:
Epoch 3: train_loss: 0.9435 train_precision: 0.3501 train_recall: 0.3875 train_f1: 0.3319
100%|██████████| 261/261 [01:09<00:00,  3.75it/s]Epoch 3: val_loss: 0.9261 val_precision: 0.3844 val_recall: 0.3959 val_f1: 0.3225

#Freeze all, paragraph-level

Epoch 1: train_loss: 0.9513 train_precision: 0.3261 train_recall: 0.3733 train_f1: 0.3087 doc_precision: 0.6308 doc_recall: 0.3310, doc_f1: 0.3075 Epoch 1: val_loss: 0.9316 val_precision: 0.3729 val_recall: 0.3908 val_f1: 0.3188

Epoch 2: train_loss: 0.9456 train_precision: 0.3440 train_recall: 0.3835 train_f1: 0.3220 doc_precision: 0.3053 doc_recall: 0.3372, doc_f1: 0.3078 Epoch 2: val_loss: 0.9278 val_precision: 0.3724 val_recall: 0.4010 val_f1: 0.3212

Epoch 3: train_loss: 0.9432 train_precision: 0.3471 train_recall: 0.3865 train_f1: 0.3273 doc_precision: 0.3018 doc_recall: 0.3336, doc_f1: 0.3045 Epoch 3: val_loss: 0.9265 val_precision: 0.3145 val_recall: 0.3879 val_f1: 0.2858

Epoch 4: train_loss: 0.9399 train_precision: 0.3622 train_recall: 0.3959 train_f1: 0.3464 doc_precision: 0.3029 doc_recall: 0.3356, doc_f1: 0.3081 Epoch 4: val_loss: 0.9235 val_precision: 0.3910 val_recall: 0.4288 val_f1: 0.3953

Epoch 5: train_loss: 0.9370 train_precision: 0.3673 train_recall: 0.3980 train_f1: 0.3474 doc_precision: 0.3069 doc_recall: 0.3393, doc_f1: 0.3140 Epoch 5: val_loss: 0.9217 val_precision: 0.3146 val_recall: 0.3831 val_f1: 0.2809

Epoch 6: train_loss: 0.9347 train_precision: 0.3644 train_recall: 0.3979 train_f1: 0.3500 doc_precision: 0.2899 doc_recall: 0.3205, doc_f1: 0.2948 Epoch 6: val_loss: 0.9172 val_precision: 0.4016 val_recall: 0.4262 val_f1: 0.3956

Epoch 7: train_loss: 0.9322 train_precision: 0.3700 train_recall: 0.4008 train_f1: 0.3540 doc_precision: 0.2979 doc_recall: 0.3290, doc_f1: 0.3045 Epoch 7: val_loss: 0.9140 val_precision: 0.4108 val_recall: 0.4226 val_f1: 0.3786

Epoch 8: train_loss: 0.9312 train_precision: 0.3744 train_recall: 0.4081 train_f1: 0.3634 doc_precision: 0.2976 doc_recall: 0.3284, doc_f1: 0.3051 Epoch 8: val_loss: 0.9131 val_precision: 0.3956 val_recall: 0.4297 val_f1: 0.3977

Epoch 9: train_loss: 0.9290 train_precision: 0.3731 train_recall: 0.4035 train_f1: 0.3634 doc_precision: 0.3120 doc_recall: 0.3440, doc_f1: 0.3203 Epoch 9: val_loss: 0.9129 val_precision: 0.3957 val_recall: 0.4343 val_f1: 0.4003

Epoch 10: train_loss: 0.9277 train_precision: 0.3765 train_recall: 0.4070 train_f1: 0.3639 doc_precision: 0.3053 doc_recall: 0.3372, doc_f1: 0.3125 Epoch 10: val_loss: 0.9108 val_precision: 0.3964 val_recall: 0.4372 val_f1: 0.4031

Underfitting when getting paragraph-level sentiments and aggregating results for the document-level task. 

Try different method: aggregating paragraph-level sentimeents and training against those for the document-level task

Random Test:

doc_precision: 0.2899 doc_recall: 0.3268, doc_f1: 0.3004

test_loss: 0.9759 test_precision: 0.3550 test_recall: 0.4019 test_f1: 0.3648

Fixed Test:

doc_precision: 0.2829 doc_recall: 0.3376, doc_f1: 0.2950

test_loss: 1.0302 test_precision: 0.2914 test_recall: 0.3461 test_f1: 0.3051

##Freezing with document level

Epoch 1: train_loss: 0.9397 train_precision: 0.2432 train_recall: 0.3585 train_f1: 0.2682

Epoch 1: val_loss: 0.9041 val_precision: 0.3165 val_recall: 0.3818 val_f1: 0.3010

Epoch 2: train_loss: 0.9011 train_precision: 0.3457 train_recall: 0.4023 train_f1: 0.3294

Epoch 2: val_loss: 0.8884 val_precision: 0.2490 val_recall: 0.3741 val_f1: 0.2726

Epoch 3: train_loss: 0.8882 train_precision: 0.3873 train_recall: 0.4049 train_f1: 0.3453

Epoch 3: val_loss: 0.8920 val_precision: 0.3954 val_recall: 0.4066 val_f1: 0.3617

Epoch 4: train_loss: 0.8823 train_precision: 0.3894 train_recall: 0.4104 train_f1: 0.3580

Epoch 4: val_loss: 0.8755 val_precision: 0.3933 val_recall: 0.4320 val_f1: 0.3748

Epoch 5: train_loss: 0.8703 train_precision: 0.4281 train_recall: 0.4438 train_f1: 0.4032

Epoch 5: val_loss: 0.8597 val_precision: 0.3814 val_recall: 0.4048 val_f1: 0.3457

Epoch 6: train_loss: 0.8669 train_precision: 0.4122 train_recall: 0.4303 train_f1: 0.3879

Epoch 6: val_loss: 0.8577 val_precision: 0.3879 val_recall: 0.4256 val_f1: 0.3742

Epoch 7: train_loss: 0.8597 train_precision: 0.4294 train_recall: 0.4579 train_f1: 0.4160

Epoch 7: val_loss: 0.8629 val_precision: 0.3744 val_recall: 0.4101 val_f1: 0.3461

Epoch 8: train_loss: 0.8552 train_precision: 0.4134 train_recall: 0.4379 train_f1: 0.3972

Epoch 8: val_loss: 0.8445 val_precision: 0.3662 val_recall: 0.3955 val_f1: 0.3510

Epoch 9: train_loss: 0.8531 train_precision: 0.4145 train_recall: 0.4414 train_f1: 0.4021

Epoch 9: val_loss: 0.8600 val_precision: 0.3498 val_recall: 0.3944 val_f1: 0.3293

Epoch 10: train_loss: 0.8503 train_precision: 0.4165 train_recall: 0.4454 train_f1: 0.4083

Epoch 10: val_loss: 0.8485 val_precision: 0.4114 val_recall: 0.4411 val_f1: 0.4082

Epoch 11: train_loss: 0.8471 train_precision: 0.4205 train_recall: 0.4560 train_f1: 0.4155

Epoch 11: val_loss: 0.8479 val_precision: 0.3819 val_recall: 0.4182 val_f1: 0.3858

Epoch 12: train_loss: 0.8432 train_precision: 0.4247 train_recall: 0.4597 train_f1: 0.4209

Epoch 12: val_loss: 0.8417 val_precision: 0.4143 val_recall: 0.4398 val_f1: 0.3997

Epoch 13: train_loss: 0.8400 train_precision: 0.4215 train_recall: 0.4530 train_f1: 0.4187

Epoch 13: val_loss: 0.8441 val_precision: 0.4032 val_recall: 0.4193 val_f1: 0.3717

Epoch 14: train_loss: 0.8357 train_precision: 0.4377 train_recall: 0.4671 train_f1: 0.4332

Epoch 14: val_loss: 0.8413 val_precision: 0.4155 val_recall: 0.4288 val_f1: 0.3897

Epoch 15: train_loss: 0.8344 train_precision: 0.4242 train_recall: 0.4576 train_f1: 0.4205

Epoch 15: val_loss: 0.8425 val_precision: 0.3830 val_recall: 0.4198 val_f1: 0.3875

Epoch 16: train_loss: 0.8322 train_precision: 0.4267 train_recall: 0.4566 train_f1: 0.4202

Epoch 16: val_loss: 0.8495 val_precision: 0.3905 val_recall: 0.4222 val_f1: 0.3869

Epoch 17: train_loss: 0.8331 train_precision: 0.4345 train_recall: 0.4654 train_f1: 0.4301

Epoch 17: val_loss: 0.8310 val_precision: 0.4280 val_recall: 0.4512 val_f1: 0.4188

Epoch 18: train_loss: 0.8318 train_precision: 0.4294 train_recall: 0.4607 train_f1: 0.4269

Epoch 18: val_loss: 0.8331 val_precision: 0.3857 val_recall: 0.4157 val_f1: 0.3730

Epoch 19: train_loss: 0.8271 train_precision: 0.4495 train_recall: 0.4756 train_f1: 0.4418

Epoch 19: val_loss: 0.8408 val_precision: 0.3931 val_recall: 0.4184 val_f1: 0.3771

Epoch 20: train_loss: 0.8273 train_precision: 0.4337 train_recall: 0.4653 train_f1: 0.4313

Epoch 20: val_loss: 0.8325 val_precision: 0.4216 val_recall: 0.4443 val_f1: 0.4200

Epoch 21: train_loss: 0.8237 train_precision: 0.4243 train_recall: 0.4590 train_f1: 0.4233

Epoch 21: val_loss: 0.8394 val_precision: 0.4127 val_recall: 0.4473 val_f1: 0.4106

Epoch 22: train_loss: 0.8229 train_precision: 0.4390 train_recall: 0.4720 train_f1: 0.4350

Epoch 22: val_loss: 0.8322 val_precision: 0.4225 val_recall: 0.4555 val_f1: 0.4256

Epoch 23: train_loss: 0.8245 train_precision: 0.4396 train_recall: 0.4698 train_f1: 0.4360

Epoch 23: val_loss: 0.8372 val_precision: 0.4300 val_recall: 0.4553 val_f1: 0.4163

Epoch 24: train_loss: 0.8224 train_precision: 0.4477 train_recall: 0.4789 train_f1: 0.4439

Epoch 24: val_loss: 0.8433 val_precision: 0.4178 val_recall: 0.4265 val_f1: 0.3902

Epoch 25: train_loss: 0.8187 train_precision: 0.4292 train_recall: 0.4617 train_f1: 0.4259

Epoch 25: val_loss: 0.8488 val_precision: 0.4251 val_recall: 0.4510 val_f1: 0.4186


Best Model from document-level on random:

test_loss: 0.9830 test_precision: 0.3650 test_recall: 0.3998 test_f1: 0.3536

Best model from document-elvel on fixed:

test_loss: 1.0553 test_precision: 0.3034 test_recall: 0.3533 test_f1: 0.2956

#All training with document level

Epoch 1: train_loss: 0.9093 train_precision: 0.3208 train_recall: 0.4083 train_f1: 0.3311

Epoch 1: val_loss: 0.8463 val_precision: 0.4232 val_recall: 0.4522 val_f1: 0.4182

Epoch 2: train_loss: 0.8306 train_precision: 0.4340 train_recall: 0.4654 train_f1: 0.4250

Epoch 2: val_loss: 0.8293 val_precision: 0.4791 val_recall: 0.5138 val_f1: 0.4682

Epoch 3: train_loss: 0.7256 train_precision: 0.4960 train_recall: 0.5365 train_f1: 0.4994

Epoch 3: val_loss: 0.8873 val_precision: 0.4648 val_recall: 0.4954 val_f1: 0.4565

Epoch 4: train_loss: 0.5640 train_precision: 0.6594 train_recall: 0.6608 train_f1: 0.6387

Epoch 4: val_loss: 0.9463 val_precision: 0.4286 val_recall: 0.4519 val_f1: 0.4242

Epoch 5: train_loss: 0.3753 train_precision: 0.7902 train_recall: 0.7897 train_f1: 0.7693

Epoch 5: val_loss: 1.1243 val_precision: 0.4412 val_recall: 0.4527 val_f1: 0.4301

Epoch 6: train_loss: 0.2132 train_precision: 0.8868 train_recall: 0.8858 train_f1: 0.8760

Epoch 6: val_loss: 1.4082 val_precision: 0.4985 val_recall: 0.5038 val_f1: 0.4756

Epoch 7: train_loss: 0.1477 train_precision: 0.9352 train_recall: 0.9290 train_f1: 0.9244

Epoch 7: val_loss: 1.6487 val_precision: 0.4457 val_recall: 0.4684 val_f1: 0.4340

On Random:
test_loss: 0.9749 test_precision: 0.3049 test_recall: 0.3679 test_f1: 0.2658

On fixed: 
test_loss: 1.0114 test_precision: 0.1981 test_recall: 0.3415 test_f1: 0.2266

#Top 2 document level

Epoch 1: train_loss: 0.9279 train_precision: 0.2575 train_recall: 0.3741 train_f1: 0.2849

Epoch 1: val_loss: 0.9115 val_precision: 0.2032 val_recall: 0.3775 val_f1: 0.2596

Epoch 2: train_loss: 0.8768 train_precision: 0.3812 

Epoch 2: val_loss: 0.8314 val_precision: 0.4610 val_recall: 0.4863 val_f1: 0.4463

Epoch 3: train_loss: 0.8121 train_precision: 0.4380 train_recall: 0.4699 train_f1: 0.4291

Epoch 3: val_loss: 0.8254 val_precision: 0.4504 val_recall: 0.4885 val_f1: 0.4520

Epoch 4: train_loss: 0.6867 train_precision: 0.5226 train_recall: 0.5542 train_f1: 0.5212

Epoch 4: val_loss: 0.9862 val_precision: 0.4077 val_recall: 0.4003

Random?
test_loss: 0.9431 test_precision: 0.3592 test_recall: 0.4100 test_f1: 0.3695

Fixed
test_loss: 1.0227 test_precision: 0.3248 test_recall: 0.3837 test_f1: 0.3398