# Sentiment Analysis of Financial Statements

The goal for our project was be to take the finance phrasebank data put together by Malo, Pekka, et al. (2013). Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts. and develop a deep learning/neural network pipeline to classify the sentiment of the sentences in the dataset. The goal was to benchmark our results against the results of Malo et al., particularly the LPS method they developed.

The NLP task we used in this project was sentiment analysis in which we attempting to classified different sentences as positive, negative, or neutral.

To do so we will use the GloVe premade word vectors to embed the words. We also implemented 2 neural networks for this project; a sequence-to-sequence encoder-decoder GRU based model and a basic GRU model.

Below we link to the arxiv page for Malo et al's paper

https://arxiv.org/pdf/1307.5336

##Team Members

- Group member 1
    - Name: David Blankenkship
    - Email: dwb65@drexel.edu
- Group member 2
    - Name: Christian Ekwomadu
    - Email: cce49@drexel.edu
- Group member 3
    - Name: Jai Vaidya
    - Email: jv625@drexel.edu
- Group member 4
    - Name: Nana Afua Martinson
    - Email: nsm86@drexel.edu

## Preprocessing/Building the LM

The goal here is to build the datasets and embedding matrix using the Glove pretrained model and financial phrasebank data.

In [None]:
# Example of our data
test = 'According to Gran , the company has no plans to move all production to Russia , although that is where the company is growing .@neutral'
x, y = test.split('@')
print(x)
print(y)

According to Gran , the company has no plans to move all production to Russia , although that is where the company is growing .
neutral


### Import Statements

In [None]:
# Import statements
import json
import re
import numpy as np
import pandas as pd
import random as ra
from tqdm import tqdm
from collections import Counter

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

from sklearn.model_selection import train_test_split


### Load GLOVE Pretrained Vectors

In [None]:
#Load Glove Embeddings
def load_glove_embeddings(glove_file_path):
    embeddings_index = {}
    with open(glove_file_path, encoding="utf-8") as filename:
        for line in filename:
            values = line.split()
            word = values[0]
            coefs = np.asarray(values[1:], dtype='float32')
            embeddings_index[word] = coefs
    return embeddings_index

# Run as below
# We'll use 6B 50d file at first in order to save on time.
# glove_file_path = './glove.6B.50d.txt'
# embeddings_index = load_glove_embeddings(glove_file_path)

### Preprocessing Function
Outputs X and y for test, train, and validation as well as embedding matrix.

In [None]:
# exec(open("01-utilities.py").read()) # Will need to either put in here as files or call as code
# Just putting tokenize() in here instead

def tokenize(text, space = True):
    tokens = []
    for token in re.split("([0-9a-zA-Z'-]+)", text):
        if not space:
            token = re.sub("[ ]+", "", token)
        if not token:
            continue
        if re.search("[0-9a-zA-Z'-]", token):
            tokens.append(token)
        else:
            tokens.extend(token)
    return tokens

#just using torchtext vocab instead
#exec(open("05-utilities.py").read()) # Will need to either put in here as files or call as code

In [None]:
class Vocab:

  """
  Class that handles mapping to and from
  words to vocab indices. Built off of class utilities file 5.
  """

  def __init__(self):
    self._word2idx = {'<unk>': 0, '<pad>': 1}
    self._idx2word = {}


  def train(self, sentence_list):
    # generate vocab list
    for sentence in sentence_list:
      for token in sentence:
        if token not in self._word2idx:
          self._word2idx[token] = len(self._word2idx)

    # rebuild reverse lookup
    self._idx2word = {v: k for k, v in self._word2idx.items()}

  def encode(self, word):
      return self._word2idx.get(word, self._word2idx['<unk>'])

  # def decode(self, idx):
  #   if self._target:
  #     return self._idx2word[idx]
  #   else:
  #     return self._idx2word.get(idx, '<unk>')

In [None]:
def finance_preprocessing(filename, embeddings_index, embed_dim=50):
  with open(filename, mode='r', encoding='iso-8859-1') as file: # the phrasebank data uses 'iso-8859-1' or latin-1 encoding.
    rows = [row.rstrip() for row in file]

  # Create data dictionary and relabel sentiment as numbers
  data={}
  sent_list = []
  label_list = []
  for row in rows:
    sentence, label = row.rsplit('@')
    # Turns sentiment targets into numbers.
    # I shifted to 0, 1, 2 as cross entropy loss would otherise require one hot encodings.
    if label == 'neutral':
      label_num = 1
    elif label == 'positive':
      label_num = 2
    elif label == 'negative':
      label_num = 0
    sent_list.append(sentence)
    label_list.append(label_num)
  data['sentences'] = sent_list
  data['label'] = label_list

  # Tokenize sentences
  max_sent_len = 0
  tok_sent_list = []
  for sent in data['sentences']:
    s = tokenize(sent.lower(), space=False)
    if len(s) > max_sent_len:
      max_sent_len = len(s)
    tok_sent_list.append(s)
  data['sentences'] = tok_sent_list
  print('Max Sentence length: ', max_sent_len)

  # Develop Vocab using custom class
  vocab = Vocab()
  vocab.train(data['sentences'])


  # Creates sequences by encoding the tokenized sentences
  sequences = []
  for sentence in data['sentences']:
    sequences.append([vocab.encode(tok) for tok in sentence])


  # pads out sentences to max length using the <pad token>
  padded_sequences = [seq + [vocab.encode('<pad>')] * (max_sent_len - len(seq)) for seq in sequences]

  # Create tensors of full data sets
  X = torch.tensor(padded_sequences, dtype=torch.long)
  y = torch.tensor(data['label'], dtype=torch.long)

  # Split the data
  X_tv, X_test, y_tv, y_test = train_test_split(X, y, test_size=0.15, stratify=y, random_state=691)
  X_train, X_val, y_train, y_val = train_test_split(X_tv, y_tv, test_size=0.20, stratify=y_tv, random_state=691)

  # Creates embedding matrix
  embedding_matrix = np.zeros((len(vocab._word2idx), embed_dim))
  for word, idx in vocab._word2idx.items():
      if word in embeddings_index:
          embedding_matrix[idx] = embeddings_index[word]
      else:
        embedding_matrix[idx] = np.random.normal(scale=0.5, size=(embed_dim, )) # Randomized vector for not in glove

  embedding_matrix = torch.tensor(embedding_matrix, dtype=torch.float32)


  return X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix



### Create Dataset Class
Needed to use DataLoader

In [None]:
class SentimentDataset(Dataset):
    def __init__(self, sequences, labels):
        self.sequences = sequences
        self.labels = labels

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, idx):
        # Return the data in the correct format
        return {'sequences': self.sequences[idx], 'labels': self.labels[idx]}



## Define Neural Network

This is where we create the architecture of our Neural Network. The output for this entire step will be the sentiment of the sentence. We use the following models:

1. Sequence-to-Sequence GRU-based Encoder-Decoder Model
2. Basic GRU Model

Both will include dropout, embedding matrices based off of the earlier GloVe premade word vectors, and variable hidden size and GRU layers.

### GRU Encoder/Decoder

In [None]:
class EncoderGRU(nn.Module):
  def __init__(self, hidden_size, embedding_matrix, gru_layers = 1, dropout=0.1):
    super(EncoderGRU, self).__init__()
    self.hidden_size = hidden_size # hyperparam to tweak
    self.embedding_size = embedding_matrix.shape[1]
    self.num_layers = gru_layers # hyperparam to tweak
    self.embedding = nn.Embedding.from_pretrained(embedding_matrix, freeze=False)
    self.gru = nn.GRU(self.embedding_size, self.hidden_size,
                      num_layers = self.num_layers, bidirectional = True,
                      batch_first=True, dropout=dropout)


  def forward(self, input):
    embedded = self.embedding(input)
    output, hidden = self.gru(embedded)
    return output, hidden

In [None]:
class DecoderGRU(nn.Module):
  def __init__(self, hidden_size, gru_layers=1, dropout=0.1):
    super(DecoderGRU, self).__init__()
    self.hidden_size = hidden_size
    self.num_layers = gru_layers
    self.gru = nn.GRU(self.hidden_size, self.hidden_size,
                      num_layers = self.num_layers, batch_first=True)
    self.fc = nn.Linear(self.hidden_size, 3) # hardcode ouput size to 3. There's no ambiguity about output size
    self.dropout = nn.Dropout(dropout)

  def forward(self, hidden):
    output, _ = self.gru(hidden)
    output = self.dropout(output)
    output = self.fc(output.squeeze(1))
    return output


In [None]:
# this will be what we input as the model
class GRU2GRU(nn.Module):
    def __init__(self, encoder, decoder):
        super(GRU2GRU, self).__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, input):
        encoder_output, hidden = self.encoder(input)
        output = self.decoder(hidden[-1])
        return output

# Called like so
# encoder = EncoderGRU(embedding_matrix, hidden_size, gru_layers)
# decoder = DecoderGRU(hidden_size, gru_layers)
# model = GRU2GRU(encoder, decoder)

### Basic GRU Classifier

Simple model to compare against our encoder-decoder GRUs

In [None]:
class SimpleClassifier(nn.Module):
    def __init__(self, hidden_size, embedding_matrix, gru_layers=1, dropout=0.1):
        super(SimpleClassifier, self).__init__()
        self.hidden_size = hidden_size
        self.embedding_size = embedding_matrix.shape[1]
        self.num_layers = gru_layers
        self.embedding = nn.Embedding.from_pretrained(embedding_matrix, freeze=False)
        self.gru = nn.GRU(self.embedding_size, self.hidden_size,
                      num_layers = self.num_layers, bidirectional = True,
                      batch_first=True, dropout=dropout)
        self.fc = nn.Linear(hidden_size * 2, 3)
        self.dropout = nn.Dropout(dropout)

    def forward(self, input):
        embedded = self.embedding(input)
        gru_output, hidden = self.gru(embedded)
        hidden = self.dropout(hidden)
        hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)
        output = self.fc(hidden)
        return output

## Training the Model/Optimization/Backpropagation

This is where we set up our training function to do optimization and backpropagation.

Hyperparameters include batch size, learning rate, epochs, and clip.

In [None]:
# Define the training function
def train_model(model, train_data, val_data, batch_size=64, lr=0.01, epochs=10, clip=0.25, model_run=0):
    # Setup optimizer and loss function
    optimizer = optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss(ignore_index=1) #should ignore pad token

    train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)

    best_val_loss = float('inf') # is THIS JACKING US UP?

    for epoch in range(epochs):
        model.train()  # Set the model to training mode
        train_loss = 0

        # Training loop
        for batch in tqdm(train_loader, desc=f"Training Epoch {epoch+1}/{epochs}"):
            sequences, labels = batch['sequences'], batch['labels']
            optimizer.zero_grad()  # Clear the gradients

            # Forward pass
            outputs = model(sequences)
            if torch.isnan(outputs).sum() > 0:
                print("NaN detected in model output.")
                continue


            # Compute loss
            loss = criterion(outputs, labels)
            if torch.isnan(loss).sum() > 0:
                print("NaN detected in loss computation.")
                continue


            # Backward pass and optimization
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), clip)  # Gradient clipping
            optimizer.step()

            train_loss += loss.item()


        # Validation loop
        model.eval()  # Set the model to evaluation mode
        val_loss = 0
        with torch.no_grad():
            for batch in tqdm(val_loader, desc=f"Validation Epoch {epoch+1}/{epochs}"):
                sequences, labels = batch['sequences'], batch['labels']

                # Forward pass
                outputs = model(sequences)
                if torch.isnan(outputs).sum() > 0:
                    print("NaN detected in model output during validation.")
                    continue

                # Compute loss
                loss = criterion(outputs, labels)
                if torch.isnan(loss).sum() > 0:
                    print("NaN detected in validation loss computation.")
                    continue

                val_loss += loss.item()

        print(f"Epoch {epoch+1}/{epochs} - Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}")

        # Check for improvement
        if val_loss < best_val_loss: # consider more chances
            best_val_loss = val_loss
            # name format epoch, loss, model run
            # Model run is a variable for distinguishing which set of hyper parameters was used
            # 0 is reserved for testing, defaults will be at 1.
            torch.save(model.state_dict(), f'best_model_run{model_run}.pt')
        else:
            print("No improvement! Early stopping.")
            break
    return model


## Prediction/Evaluation


In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def evaluate_model(model, data_loader, device, class_names):
    model.eval()
    all_preds = []
    all_labels = []

    with torch.no_grad():
        for batch in data_loader:
            sequences, labels = batch['sequences'].to(device), batch['labels'].to(device)
            outputs = model(sequences)
            preds = torch.argmax(outputs, 1)
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    accuracy = accuracy_score(all_labels, all_preds)
    precision = precision_score(all_labels, all_preds, average='weighted', zero_division=0.0)
    recall = recall_score(all_labels, all_preds, average='weighted')
    f1 = f1_score(all_labels, all_preds, average='weighted')

    print(f'Accuracy: {accuracy:.4f}')
    print(f'Precision: {precision:.4f}')
    print(f'Recall: {recall:.4f}')
    print(f'F1 Score: {f1:.4f}')

    print('\nClassification Report:\n', classification_report(all_labels, all_preds, target_names=class_names, zero_division=0.0))

    return accuracy, precision, recall, f1

## Run Models

This is where we run the models side-by-side. We tweak hyperparameters and compare performance to determine the best values for the full run.

In [None]:
glove_file_path = './glove.6B.50d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

In [None]:
filename = './Sentences_AllAgree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=50)

Max Sentence length:  81


In [None]:
train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)

In [None]:
# Initialize model run number
model_run = 0

### Baseline Run

In [None]:
# Initialize model run number
model_run = 0

hidden_size = 81 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.01
epochs=10
clip=0.25


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

# print("Test Set Evaluation")
# evaluate_model(trained_model, test_loader, device, class_names)


print('\nBasic Model:\n')

model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

# print("Test Set Evaluation")
# evaluate_model(simple_model, test_loader, device, class_names)


Training Epoch 1/10: 100%|██████████| 49/49 [00:06<00:00,  7.09it/s]
Validation Epoch 1/10: 100%|██████████| 13/13 [00:00<00:00, 25.24it/s]


NaN detected in validation loss computation.
Epoch 1/10 - Train Loss: 35.9594, Val Loss: 7.6293


Training Epoch 2/10: 100%|██████████| 49/49 [00:09<00:00,  5.15it/s]
Validation Epoch 2/10: 100%|██████████| 13/13 [00:00<00:00, 23.41it/s]


Epoch 2/10 - Train Loss: 29.8402, Val Loss: 5.9394


Training Epoch 3/10: 100%|██████████| 49/49 [00:09<00:00,  5.32it/s]


NaN detected in loss computation.


Validation Epoch 3/10: 100%|██████████| 13/13 [00:00<00:00, 42.10it/s]


Epoch 3/10 - Train Loss: 13.2822, Val Loss: 4.9356


Training Epoch 4/10: 100%|██████████| 49/49 [00:05<00:00,  8.77it/s]
Validation Epoch 4/10: 100%|██████████| 13/13 [00:00<00:00, 38.41it/s]


NaN detected in validation loss computation.
Epoch 4/10 - Train Loss: 8.1245, Val Loss: 4.5425


Training Epoch 5/10: 100%|██████████| 49/49 [00:06<00:00,  7.23it/s]
Validation Epoch 5/10: 100%|██████████| 13/13 [00:00<00:00, 26.20it/s]


NaN detected in validation loss computation.
Epoch 5/10 - Train Loss: 3.0273, Val Loss: 6.3904
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3827
Precision: 0.1478
Recall: 0.3827
F1 Score: 0.2131

Classification Report:
               precision    recall  f1-score   support

    negative       0.41      1.00      0.58       206
     neutral       0.00      0.00      0.00       946
    positive       0.37      0.99      0.54       387

    accuracy                           0.38      1539
   macro avg       0.26      0.66      0.37      1539
weighted avg       0.15      0.38      0.21      1539

Validation Set Evaluation




Accuracy: 0.3377
Precision: 0.1313
Recall: 0.3377
F1 Score: 0.1888

Classification Report:
               precision    recall  f1-score   support

    negative       0.37      0.85      0.51        52
     neutral       0.00      0.00      0.00       236
    positive       0.32      0.89      0.48        97

    accuracy                           0.34       385
   macro avg       0.23      0.58      0.33       385
weighted avg       0.13      0.34      0.19       385


Basic Model:



Training Epoch 1/10: 100%|██████████| 49/49 [00:08<00:00,  5.99it/s]
Validation Epoch 1/10: 100%|██████████| 13/13 [00:00<00:00, 25.34it/s]


Epoch 1/10 - Train Loss: 13.1761, Val Loss: 4.6793


Training Epoch 2/10: 100%|██████████| 49/49 [00:07<00:00,  6.72it/s]


NaN detected in loss computation.


Validation Epoch 2/10: 100%|██████████| 13/13 [00:00<00:00, 42.00it/s]


NaN detected in validation loss computation.
Epoch 2/10 - Train Loss: 3.5978, Val Loss: 9.8923
No improvement! Early stopping.
Train Set Evaluation
Accuracy: 0.3795
Precision: 0.1477
Recall: 0.3795
F1 Score: 0.2122

Classification Report:
               precision    recall  f1-score   support

    negative       0.33      1.00      0.50       206
     neutral       0.00      0.00      0.00       946
    positive       0.41      0.98      0.58       387

    accuracy                           0.38      1539
   macro avg       0.25      0.66      0.36      1539
weighted avg       0.15      0.38      0.21      1539

Validation Set Evaluation
Accuracy: 0.3299
Precision: 0.1302
Recall: 0.3299
F1 Score: 0.1859

Classification Report:
               precision    recall  f1-score   support

    negative       0.28      0.88      0.43        52
     neutral       0.00      0.00      0.00       236
    positive       0.37      0.84      0.51        97

    accuracy                           0.33

(0.32987012987012987,
 0.1302270183188404,
 0.32987012987012987,
 0.18587855191628777)

Seeing gradient explosion, tweaking clip.

In [None]:
model_run += 1

hidden_size = 81 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.01
epochs=10
clip=0.75


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)



print('\nBasic Model:\n')

model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)




Training Epoch 1/10: 100%|██████████| 49/49 [00:10<00:00,  4.84it/s]
Validation Epoch 1/10: 100%|██████████| 13/13 [00:01<00:00,  6.87it/s]


NaN detected in validation loss computation.
Epoch 1/10 - Train Loss: 19.2105, Val Loss: 6.4947


Training Epoch 2/10: 100%|██████████| 49/49 [00:07<00:00,  6.83it/s]
Validation Epoch 2/10: 100%|██████████| 13/13 [00:00<00:00, 42.11it/s]


Epoch 2/10 - Train Loss: 3.1463, Val Loss: 7.0463
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3847
Precision: 0.1502
Recall: 0.3847
F1 Score: 0.2154

Classification Report:
               precision    recall  f1-score   support

    negative       0.45      1.00      0.62       206
     neutral       0.00      0.00      0.00       946
    positive       0.36      1.00      0.52       387

    accuracy                           0.38      1539
   macro avg       0.27      0.67      0.38      1539
weighted avg       0.15      0.38      0.22      1539

Validation Set Evaluation




Accuracy: 0.3403
Precision: 0.1331
Recall: 0.3403
F1 Score: 0.1902

Classification Report:
               precision    recall  f1-score   support

    negative       0.38      0.75      0.50        52
     neutral       0.00      0.00      0.00       236
    positive       0.33      0.95      0.49        97

    accuracy                           0.34       385
   macro avg       0.23      0.57      0.33       385
weighted avg       0.13      0.34      0.19       385


Basic Model:



Training Epoch 1/10: 100%|██████████| 49/49 [00:06<00:00,  7.36it/s]
Validation Epoch 1/10: 100%|██████████| 13/13 [00:00<00:00, 26.12it/s]


Epoch 1/10 - Train Loss: 13.0947, Val Loss: 7.9441


Training Epoch 2/10: 100%|██████████| 49/49 [00:07<00:00,  6.32it/s]
Validation Epoch 2/10: 100%|██████████| 13/13 [00:00<00:00, 27.46it/s]


Epoch 2/10 - Train Loss: 1.0060, Val Loss: 8.1909
No improvement! Early stopping.
Train Set Evaluation
Accuracy: 0.3821
Precision: 0.1504
Recall: 0.3821
F1 Score: 0.2149

Classification Report:
               precision    recall  f1-score   support

    negative       0.32      1.00      0.48       206
     neutral       0.00      0.00      0.00       946
    positive       0.43      0.99      0.60       387

    accuracy                           0.38      1539
   macro avg       0.25      0.66      0.36      1539
weighted avg       0.15      0.38      0.21      1539

Validation Set Evaluation
Accuracy: 0.3377
Precision: 0.1337
Recall: 0.3377
F1 Score: 0.1909

Classification Report:
               precision    recall  f1-score   support

    negative       0.27      0.83      0.41        52
     neutral       0.00      0.00      0.00       236
    positive       0.39      0.90      0.54        97

    accuracy                           0.34       385
   macro avg       0.22      0.57 

(0.33766233766233766,
 0.13371861471861474,
 0.33766233766233766,
 0.1909363342622454)

Even more clip

In [None]:
model_run += 1

hidden_size = 81 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.01
epochs=10
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)



print('\nBasic Model:\n')

model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)




Training Epoch 1/10: 100%|██████████| 49/49 [00:08<00:00,  5.77it/s]
Validation Epoch 1/10: 100%|██████████| 13/13 [00:00<00:00, 25.62it/s]


Epoch 1/10 - Train Loss: 13.2127, Val Loss: 8.3327


Training Epoch 2/10: 100%|██████████| 49/49 [00:08<00:00,  5.79it/s]


NaN detected in loss computation.


Validation Epoch 2/10: 100%|██████████| 13/13 [00:00<00:00, 24.66it/s]


Epoch 2/10 - Train Loss: 2.3649, Val Loss: 8.9371
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3847
Precision: 0.7675
Recall: 0.3847
F1 Score: 0.2184

Classification Report:
               precision    recall  f1-score   support

    negative       0.50      1.00      0.66       206
     neutral       1.00      0.00      0.00       946
    positive       0.34      1.00      0.51       387

    accuracy                           0.38      1539
   macro avg       0.61      0.66      0.39      1539
weighted avg       0.77      0.38      0.22      1539

Validation Set Evaluation




Accuracy: 0.3403
Precision: 0.7495
Recall: 0.3403
F1 Score: 0.1966

Classification Report:
               precision    recall  f1-score   support

    negative       0.43      0.75      0.55        52
     neutral       1.00      0.00      0.01       236
    positive       0.31      0.94      0.47        97

    accuracy                           0.34       385
   macro avg       0.58      0.56      0.34       385
weighted avg       0.75      0.34      0.20       385


Basic Model:



Training Epoch 1/10: 100%|██████████| 49/49 [00:04<00:00, 10.11it/s]
Validation Epoch 1/10: 100%|██████████| 13/13 [00:00<00:00, 44.00it/s]


NaN detected in validation loss computation.
Epoch 1/10 - Train Loss: 7.8733, Val Loss: 7.9728


Training Epoch 2/10: 100%|██████████| 49/49 [00:05<00:00,  8.74it/s]


NaN detected in loss computation.


Validation Epoch 2/10: 100%|██████████| 13/13 [00:00<00:00, 28.54it/s]


Epoch 2/10 - Train Loss: 0.5901, Val Loss: 9.1853
No improvement! Early stopping.
Train Set Evaluation
Accuracy: 0.3853
Precision: 0.1490
Recall: 0.3853
F1 Score: 0.2147

Classification Report:
               precision    recall  f1-score   support

    negative       0.36      1.00      0.52       206
     neutral       0.00      0.00      0.00       946
    positive       0.40      1.00      0.57       387

    accuracy                           0.39      1539
   macro avg       0.25      0.67      0.37      1539
weighted avg       0.15      0.39      0.21      1539

Validation Set Evaluation
Accuracy: 0.3455
Precision: 0.1340
Recall: 0.3455
F1 Score: 0.1931

Classification Report:
               precision    recall  f1-score   support

    negative       0.31      0.83      0.45        52
     neutral       0.00      0.00      0.00       236
    positive       0.37      0.93      0.53        97

    accuracy                           0.35       385
   macro avg       0.22      0.58 

(0.34545454545454546,
 0.13403657566922875,
 0.34545454545454546,
 0.19310207336523127)

Scaling back clip to .75 and decreasing lr

In [None]:
model_run = 5

In [None]:
model_run += 1

hidden_size = 81 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.001
epochs=10
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)



print('\nBasic Model:\n')

model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)




Training Epoch 1/10: 100%|██████████| 49/49 [00:08<00:00,  5.76it/s]
Validation Epoch 1/10: 100%|██████████| 13/13 [00:00<00:00, 18.96it/s]


NaN detected in validation loss computation.
Epoch 1/10 - Train Loss: 21.5043, Val Loss: 6.8370


Training Epoch 2/10: 100%|██████████| 49/49 [00:17<00:00,  2.82it/s]
Validation Epoch 2/10: 100%|██████████| 13/13 [00:00<00:00, 13.38it/s]


NaN detected in validation loss computation.
Epoch 2/10 - Train Loss: 1.0933, Val Loss: 7.7039
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3853
Precision: 0.1527
Recall: 0.3853
F1 Score: 0.2174

Classification Report:
               precision    recall  f1-score   support

    negative       0.49      1.00      0.66       206
     neutral       0.00      0.00      0.00       946
    positive       0.35      1.00      0.51       387

    accuracy                           0.39      1539
   macro avg       0.28      0.67      0.39      1539
weighted avg       0.15      0.39      0.22      1539

Validation Set Evaluation




Accuracy: 0.3299
Precision: 0.1297
Recall: 0.3299
F1 Score: 0.1844

Classification Report:
               precision    recall  f1-score   support

    negative       0.37      0.69      0.48        52
     neutral       0.00      0.00      0.00       236
    positive       0.32      0.94      0.47        97

    accuracy                           0.33       385
   macro avg       0.23      0.54      0.32       385
weighted avg       0.13      0.33      0.18       385


Basic Model:



Training Epoch 1/10: 100%|██████████| 49/49 [00:07<00:00,  6.33it/s]
Validation Epoch 1/10: 100%|██████████| 13/13 [00:00<00:00, 27.78it/s]


NaN detected in validation loss computation.
Epoch 1/10 - Train Loss: 17.9517, Val Loss: 6.7332


Training Epoch 2/10: 100%|██████████| 49/49 [00:14<00:00,  3.37it/s]
Validation Epoch 2/10: 100%|██████████| 13/13 [00:01<00:00, 12.59it/s]


Epoch 2/10 - Train Loss: 1.5155, Val Loss: 6.4435


Training Epoch 3/10: 100%|██████████| 49/49 [00:10<00:00,  4.49it/s]
Validation Epoch 3/10: 100%|██████████| 13/13 [00:00<00:00, 25.76it/s]


NaN detected in validation loss computation.
Epoch 3/10 - Train Loss: 0.2453, Val Loss: 7.6664
No improvement! Early stopping.
Train Set Evaluation
Accuracy: 0.3847
Precision: 0.1482
Recall: 0.3847
F1 Score: 0.2140

Classification Report:
               precision    recall  f1-score   support

    negative       0.38      1.00      0.55       206
     neutral       0.00      0.00      0.00       946
    positive       0.39      1.00      0.56       387

    accuracy                           0.38      1539
   macro avg       0.26      0.67      0.37      1539
weighted avg       0.15      0.38      0.21      1539

Validation Set Evaluation
Accuracy: 0.3299
Precision: 0.1274
Recall: 0.3299
F1 Score: 0.1838

Classification Report:
               precision    recall  f1-score   support

    negative       0.27      0.69      0.39        52
     neutral       0.00      0.00      0.00       236
    positive       0.36      0.94      0.52        97

    accuracy                           0.33

(0.32987012987012987,
 0.1273819346488698,
 0.32987012987012987,
 0.18378003296036083)

Further decreasing lr

In [None]:
model_run += 1

hidden_size = 81 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.0001
epochs=10
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)



print('\nBasic Model:\n')

model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)




Training Epoch 1/10: 100%|██████████| 49/49 [00:09<00:00,  5.35it/s]
Validation Epoch 1/10: 100%|██████████| 13/13 [00:00<00:00, 26.42it/s]


NaN detected in validation loss computation.
Epoch 1/10 - Train Loss: 48.3927, Val Loss: 10.5005


Training Epoch 2/10: 100%|██████████| 49/49 [00:18<00:00,  2.64it/s]
Validation Epoch 2/10: 100%|██████████| 13/13 [00:01<00:00,  8.70it/s]


NaN detected in validation loss computation.
Epoch 2/10 - Train Loss: 37.7305, Val Loss: 8.6879


Training Epoch 3/10: 100%|██████████| 49/49 [00:16<00:00,  2.94it/s]
Validation Epoch 3/10: 100%|██████████| 13/13 [00:01<00:00, 12.22it/s]


NaN detected in validation loss computation.
Epoch 3/10 - Train Loss: 30.0296, Val Loss: 7.6393


Training Epoch 4/10: 100%|██████████| 49/49 [00:16<00:00,  3.06it/s]
Validation Epoch 4/10: 100%|██████████| 13/13 [00:00<00:00, 26.64it/s]


Epoch 4/10 - Train Loss: 23.6309, Val Loss: 7.2441


Training Epoch 5/10: 100%|██████████| 49/49 [00:08<00:00,  5.84it/s]
Validation Epoch 5/10: 100%|██████████| 13/13 [00:00<00:00, 18.89it/s]


Epoch 5/10 - Train Loss: 14.2078, Val Loss: 5.7282


Training Epoch 6/10: 100%|██████████| 49/49 [00:16<00:00,  2.92it/s]
Validation Epoch 6/10: 100%|██████████| 13/13 [00:01<00:00, 12.82it/s]


Epoch 6/10 - Train Loss: 5.3793, Val Loss: 6.7253
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3847
Precision: 0.1530
Recall: 0.3847
F1 Score: 0.2174

Classification Report:
               precision    recall  f1-score   support

    negative       0.50      1.00      0.67       206
     neutral       0.00      0.00      0.00       946
    positive       0.34      1.00      0.51       387

    accuracy                           0.38      1539
   macro avg       0.28      0.67      0.39      1539
weighted avg       0.15      0.38      0.22      1539

Validation Set Evaluation




Accuracy: 0.3195
Precision: 0.1267
Recall: 0.3195
F1 Score: 0.1789

Classification Report:
               precision    recall  f1-score   support

    negative       0.37      0.65      0.48        52
     neutral       0.00      0.00      0.00       236
    positive       0.30      0.92      0.46        97

    accuracy                           0.32       385
   macro avg       0.23      0.52      0.31       385
weighted avg       0.13      0.32      0.18       385


Basic Model:



Training Epoch 1/10: 100%|██████████| 49/49 [00:07<00:00,  6.35it/s]


NaN detected in loss computation.


Validation Epoch 1/10: 100%|██████████| 13/13 [00:00<00:00, 27.16it/s]


Epoch 1/10 - Train Loss: 42.3478, Val Loss: 10.2530


Training Epoch 2/10: 100%|██████████| 49/49 [00:13<00:00,  3.54it/s]


NaN detected in loss computation.


Validation Epoch 2/10: 100%|██████████| 13/13 [00:01<00:00, 12.36it/s]


NaN detected in validation loss computation.
Epoch 2/10 - Train Loss: 31.6065, Val Loss: 8.1686


Training Epoch 3/10: 100%|██████████| 49/49 [00:11<00:00,  4.31it/s]
Validation Epoch 3/10: 100%|██████████| 13/13 [00:00<00:00, 27.14it/s]


NaN detected in validation loss computation.
Epoch 3/10 - Train Loss: 25.0412, Val Loss: 7.1127


Training Epoch 4/10: 100%|██████████| 49/49 [00:09<00:00,  5.09it/s]
Validation Epoch 4/10: 100%|██████████| 13/13 [00:00<00:00, 14.56it/s]


NaN detected in validation loss computation.
Epoch 4/10 - Train Loss: 17.3616, Val Loss: 5.9120


Training Epoch 5/10: 100%|██████████| 49/49 [00:16<00:00,  2.91it/s]
Validation Epoch 5/10: 100%|██████████| 13/13 [00:00<00:00, 13.67it/s]


NaN detected in validation loss computation.
Epoch 5/10 - Train Loss: 8.2553, Val Loss: 4.9267


Training Epoch 6/10: 100%|██████████| 49/49 [00:16<00:00,  3.00it/s]
Validation Epoch 6/10: 100%|██████████| 13/13 [00:00<00:00, 14.72it/s]


Epoch 6/10 - Train Loss: 3.4185, Val Loss: 6.5730
No improvement! Early stopping.
Train Set Evaluation
Accuracy: 0.3840
Precision: 0.1480
Recall: 0.3840
F1 Score: 0.2136

Classification Report:
               precision    recall  f1-score   support

    negative       0.39      1.00      0.56       206
     neutral       0.00      0.00      0.00       946
    positive       0.38      0.99      0.55       387

    accuracy                           0.38      1539
   macro avg       0.26      0.66      0.37      1539
weighted avg       0.15      0.38      0.21      1539

Validation Set Evaluation
Accuracy: 0.3247
Precision: 0.1248
Recall: 0.3247
F1 Score: 0.1801

Classification Report:
               precision    recall  f1-score   support

    negative       0.28      0.65      0.39        52
     neutral       0.00      0.00      0.00       236
    positive       0.34      0.94      0.50        97

    accuracy                           0.32       385
   macro avg       0.21      0.53 

(0.3246753246753247,
 0.1247978605416622,
 0.3246753246753247,
 0.18011000896050458)

Trying different hidden sizes, 64 and 128

In [None]:
model_run += 1

hidden_size = 64 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20 # Increasing epochs just in case
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)



print('\nBasic Model:\n')

model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)


model_run += 1

hidden_size = 128 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20 # Increasing epochs just in case
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)





model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)




Training Epoch 1/20: 100%|██████████| 49/49 [00:16<00:00,  2.94it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:01<00:00,  9.14it/s]


Epoch 1/20 - Train Loss: 8.2694, Val Loss: 10.8903


Training Epoch 2/20: 100%|██████████| 49/49 [00:19<00:00,  2.51it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 21.72it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 0.0911, Val Loss: 11.7784
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3847
Precision: 0.1546
Recall: 0.3847
F1 Score: 0.2185

Classification Report:
               precision    recall  f1-score   support

    negative       0.52      1.00      0.68       206
     neutral       0.00      0.00      0.00       946
    positive       0.34      1.00      0.51       387

    accuracy                           0.38      1539
   macro avg       0.29      0.67      0.40      1539
weighted avg       0.15      0.38      0.22      1539

Validation Set Evaluation




Accuracy: 0.3299
Precision: 0.1319
Recall: 0.3299
F1 Score: 0.1862

Classification Report:
               precision    recall  f1-score   support

    negative       0.41      0.75      0.53        52
     neutral       0.00      0.00      0.00       236
    positive       0.30      0.91      0.45        97

    accuracy                           0.33       385
   macro avg       0.24      0.55      0.33       385
weighted avg       0.13      0.33      0.19       385


Basic Model:



Training Epoch 1/20: 100%|██████████| 49/49 [00:09<00:00,  5.21it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 23.71it/s]


Epoch 1/20 - Train Loss: 7.6358, Val Loss: 10.3715


Training Epoch 2/20: 100%|██████████| 49/49 [00:12<00:00,  3.92it/s]


NaN detected in loss computation.


Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 15.75it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 1.2271, Val Loss: 10.9748
No improvement! Early stopping.
Train Set Evaluation
Accuracy: 0.3827
Precision: 0.1565
Recall: 0.3827
F1 Score: 0.2198

Classification Report:
               precision    recall  f1-score   support

    negative       0.28      1.00      0.44       206
     neutral       0.00      0.00      0.00       946
    positive       0.47      0.99      0.64       387

    accuracy                           0.38      1539
   macro avg       0.25      0.66      0.36      1539
weighted avg       0.16      0.38      0.22      1539

Validation Set Evaluation




Accuracy: 0.3221
Precision: 0.1319
Recall: 0.3221
F1 Score: 0.1852

Classification Report:
               precision    recall  f1-score   support

    negative       0.24      0.85      0.38        52
     neutral       0.00      0.00      0.00       236
    positive       0.39      0.82      0.53        97

    accuracy                           0.32       385
   macro avg       0.21      0.56      0.30       385
weighted avg       0.13      0.32      0.19       385



Training Epoch 1/20: 100%|██████████| 49/49 [00:11<00:00,  4.11it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:01<00:00,  9.81it/s]


Epoch 1/20 - Train Loss: 6.5909, Val Loss: 10.2373


Training Epoch 2/20: 100%|██████████| 49/49 [00:19<00:00,  2.46it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 17.14it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 2.4339, Val Loss: 7.5890


Training Epoch 3/20: 100%|██████████| 49/49 [00:15<00:00,  3.26it/s]
Validation Epoch 3/20: 100%|██████████| 13/13 [00:01<00:00,  9.49it/s]


NaN detected in validation loss computation.
Epoch 3/20 - Train Loss: 0.2911, Val Loss: 13.6894
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3834
Precision: 0.1495
Recall: 0.3834
F1 Score: 0.2146

Classification Report:
               precision    recall  f1-score   support

    negative       0.33      1.00      0.50       206
     neutral       0.00      0.00      0.00       946
    positive       0.42      0.99      0.59       387

    accuracy                           0.38      1539
   macro avg       0.25      0.66      0.36      1539
weighted avg       0.15      0.38      0.21      1539

Validation Set Evaluation




Accuracy: 0.3351
Precision: 0.1311
Recall: 0.3351
F1 Score: 0.1881

Classification Report:
               precision    recall  f1-score   support

    negative       0.28      0.83      0.42        52
     neutral       0.00      0.00      0.00       236
    positive       0.37      0.89      0.52        97

    accuracy                           0.34       385
   macro avg       0.22      0.57      0.31       385
weighted avg       0.13      0.34      0.19       385



Training Epoch 1/20: 100%|██████████| 49/49 [00:12<00:00,  3.92it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 17.85it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 5.5019, Val Loss: 11.4935


Training Epoch 2/20: 100%|██████████| 49/49 [00:17<00:00,  2.78it/s]


NaN detected in loss computation.


Validation Epoch 2/20: 100%|██████████| 13/13 [00:01<00:00,  9.19it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 1.6774, Val Loss: 12.2560
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3847
Precision: 0.1482
Recall: 0.3847
F1 Score: 0.2140

Classification Report:
               precision    recall  f1-score   support

    negative       0.38      1.00      0.56       206
     neutral       0.00      0.00      0.00       946
    positive       0.38      1.00      0.56       387

    accuracy                           0.38      1539
   macro avg       0.26      0.67      0.37      1539
weighted avg       0.15      0.38      0.21      1539

Validation Set Evaluation
Accuracy: 0.3377
Precision: 0.1316
Recall: 0.3377
F1 Score: 0.1893

Classification Report:
               precision    recall  f1-score   support

    negative       0.29      0.81      0.42        52
     neutral       0.00      0.00      0.00       236
    positive       0.37      0.91      0.52        97

    accuracy               

(0.33766233766233766,
 0.13162177995100563,
 0.33766233766233766,
 0.18927306459773993)

81 remains the best but continuing to see if this is improved by a larger hidden size

In [None]:
model_run += 1

hidden_size = 256 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.0001
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)



print('\nBasic Model:\n')

model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)


model_run += 1

hidden_size = 512 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.0001
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)





model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)




Training Epoch 1/20: 100%|██████████| 49/49 [00:41<00:00,  1.19it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:01<00:00,  7.46it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 36.0615, Val Loss: 7.5875


Training Epoch 2/20: 100%|██████████| 49/49 [00:33<00:00,  1.47it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:01<00:00,  7.51it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 16.7871, Val Loss: 5.6825


Training Epoch 3/20: 100%|██████████| 49/49 [00:33<00:00,  1.47it/s]


NaN detected in loss computation.


Validation Epoch 3/20: 100%|██████████| 13/13 [00:01<00:00,  7.40it/s]


Epoch 3/20 - Train Loss: 2.1425, Val Loss: 9.3761
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3827
Precision: 0.1520
Recall: 0.3827
F1 Score: 0.2161

Classification Report:
               precision    recall  f1-score   support

    negative       0.49      0.98      0.65       206
     neutral       0.00      0.00      0.00       946
    positive       0.34      1.00      0.51       387

    accuracy                           0.38      1539
   macro avg       0.28      0.66      0.39      1539
weighted avg       0.15      0.38      0.22      1539

Validation Set Evaluation




Accuracy: 0.3195
Precision: 0.1266
Recall: 0.3195
F1 Score: 0.1783

Classification Report:
               precision    recall  f1-score   support

    negative       0.37      0.62      0.46        52
     neutral       0.00      0.00      0.00       236
    positive       0.31      0.94      0.46        97

    accuracy                           0.32       385
   macro avg       0.22      0.52      0.31       385
weighted avg       0.13      0.32      0.18       385


Basic Model:



Training Epoch 1/20: 100%|██████████| 49/49 [00:43<00:00,  1.13it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:01<00:00,  7.54it/s]


Epoch 1/20 - Train Loss: 38.5844, Val Loss: 7.9565


Training Epoch 2/20: 100%|██████████| 49/49 [00:15<00:00,  3.18it/s]


NaN detected in loss computation.


Validation Epoch 2/20: 100%|██████████| 13/13 [00:01<00:00,  8.03it/s]


Epoch 2/20 - Train Loss: 15.7929, Val Loss: 6.8537


Training Epoch 3/20: 100%|██████████| 49/49 [00:22<00:00,  2.23it/s]
Validation Epoch 3/20: 100%|██████████| 13/13 [00:01<00:00, 12.94it/s]


Epoch 3/20 - Train Loss: 2.9820, Val Loss: 6.7135


Training Epoch 4/20: 100%|██████████| 49/49 [00:17<00:00,  2.82it/s]
Validation Epoch 4/20: 100%|██████████| 13/13 [00:01<00:00,  7.70it/s]


Epoch 4/20 - Train Loss: 1.5175, Val Loss: 10.3093
No improvement! Early stopping.
Train Set Evaluation
Accuracy: 0.3840
Precision: 0.1490
Recall: 0.3840
F1 Score: 0.2144

Classification Report:
               precision    recall  f1-score   support

    negative       0.34      1.00      0.51       206
     neutral       0.00      0.00      0.00       946
    positive       0.41      0.99      0.58       387

    accuracy                           0.38      1539
   macro avg       0.25      0.66      0.36      1539
weighted avg       0.15      0.38      0.21      1539

Validation Set Evaluation




Accuracy: 0.3065
Precision: 0.1214
Recall: 0.3065
F1 Score: 0.1735

Classification Report:
               precision    recall  f1-score   support

    negative       0.23      0.69      0.35        52
     neutral       0.00      0.00      0.00       236
    positive       0.36      0.85      0.50        97

    accuracy                           0.31       385
   macro avg       0.20      0.51      0.28       385
weighted avg       0.12      0.31      0.17       385



Training Epoch 1/20: 100%|██████████| 49/49 [01:00<00:00,  1.24s/it]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:05<00:00,  2.20it/s]


Epoch 1/20 - Train Loss: 29.7472, Val Loss: 6.7034


Training Epoch 2/20: 100%|██████████| 49/49 [01:05<00:00,  1.34s/it]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:07<00:00,  1.76it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 5.3746, Val Loss: 6.9416
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3834
Precision: 0.1479
Recall: 0.3834
F1 Score: 0.2134

Classification Report:
               precision    recall  f1-score   support

    negative       0.40      1.00      0.57       206
     neutral       0.00      0.00      0.00       946
    positive       0.37      0.99      0.54       387

    accuracy                           0.38      1539
   macro avg       0.26      0.66      0.37      1539
weighted avg       0.15      0.38      0.21      1539

Validation Set Evaluation




Accuracy: 0.3039
Precision: 0.1172
Recall: 0.3039
F1 Score: 0.1689

Classification Report:
               precision    recall  f1-score   support

    negative       0.29      0.65      0.40        52
     neutral       0.00      0.00      0.00       236
    positive       0.31      0.86      0.46        97

    accuracy                           0.30       385
   macro avg       0.20      0.50      0.29       385
weighted avg       0.12      0.30      0.17       385



Training Epoch 1/20: 100%|██████████| 49/49 [00:55<00:00,  1.12s/it]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:05<00:00,  2.34it/s]


Epoch 1/20 - Train Loss: 31.1715, Val Loss: 6.3967


Training Epoch 2/20: 100%|██████████| 49/49 [01:08<00:00,  1.40s/it]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:05<00:00,  2.31it/s]


Epoch 2/20 - Train Loss: 4.8159, Val Loss: 12.9506
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3827
Precision: 0.1476
Recall: 0.3827
F1 Score: 0.2130

Classification Report:
               precision    recall  f1-score   support

    negative       0.37      0.99      0.54       206
     neutral       0.00      0.00      0.00       946
    positive       0.39      0.99      0.56       387

    accuracy                           0.38      1539
   macro avg       0.25      0.66      0.37      1539
weighted avg       0.15      0.38      0.21      1539

Validation Set Evaluation
Accuracy: 0.3065
Precision: 0.1187
Recall: 0.3065
F1 Score: 0.1712

Classification Report:
               precision    recall  f1-score   support

    negative       0.26      0.67      0.37        52
     neutral       0.00      0.00      0.00       236
    positive       0.33      0.86      0.48        97

    accuracy                           0.31       385
   macro avg      

(0.3064935064935065,
 0.11874204227145402,
 0.3064935064935065,
 0.17116694644673983)

trying smaller

In [None]:
model_run += 1

hidden_size = 32 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.0001
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)





model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

Training Epoch 1/20: 100%|██████████| 49/49 [00:05<00:00,  8.58it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 69.02it/s]


Epoch 1/20 - Train Loss: 47.7896, Val Loss: 11.9531


Training Epoch 2/20: 100%|██████████| 49/49 [00:07<00:00,  6.30it/s]


NaN detected in loss computation.


Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 25.87it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 41.0623, Val Loss: 9.8857


Training Epoch 3/20: 100%|██████████| 49/49 [00:12<00:00,  3.90it/s]


NaN detected in loss computation.


Validation Epoch 3/20: 100%|██████████| 13/13 [00:00<00:00, 22.69it/s]


Epoch 3/20 - Train Loss: 36.5482, Val Loss: 9.7064


Training Epoch 4/20: 100%|██████████| 49/49 [00:11<00:00,  4.11it/s]
Validation Epoch 4/20: 100%|██████████| 13/13 [00:00<00:00, 23.12it/s]


Epoch 4/20 - Train Loss: 33.5614, Val Loss: 9.2576


Training Epoch 5/20: 100%|██████████| 49/49 [00:12<00:00,  3.99it/s]
Validation Epoch 5/20: 100%|██████████| 13/13 [00:00<00:00, 40.12it/s]


Epoch 5/20 - Train Loss: 30.9195, Val Loss: 8.8047


Training Epoch 6/20: 100%|██████████| 49/49 [00:06<00:00,  7.37it/s]
Validation Epoch 6/20: 100%|██████████| 13/13 [00:00<00:00, 20.07it/s]


NaN detected in validation loss computation.
Epoch 6/20 - Train Loss: 28.5755, Val Loss: 7.8417


Training Epoch 7/20: 100%|██████████| 49/49 [00:11<00:00,  4.35it/s]


NaN detected in loss computation.


Validation Epoch 7/20: 100%|██████████| 13/13 [00:00<00:00, 26.54it/s]


NaN detected in validation loss computation.
Epoch 7/20 - Train Loss: 24.8712, Val Loss: 7.2356


Training Epoch 8/20: 100%|██████████| 49/49 [00:07<00:00,  6.15it/s]
Validation Epoch 8/20: 100%|██████████| 13/13 [00:00<00:00, 66.56it/s]


NaN detected in validation loss computation.
Epoch 8/20 - Train Loss: 21.4773, Val Loss: 6.8237


Training Epoch 9/20: 100%|██████████| 49/49 [00:03<00:00, 13.15it/s]
Validation Epoch 9/20: 100%|██████████| 13/13 [00:00<00:00, 58.40it/s]


Epoch 9/20 - Train Loss: 17.9423, Val Loss: 7.1280
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation




Accuracy: 0.3710
Precision: 0.1761
Recall: 0.3710
F1 Score: 0.2253

Classification Report:
               precision    recall  f1-score   support

    negative       0.76      0.90      0.82       206
     neutral       0.00      0.00      0.00       946
    positive       0.30      1.00      0.46       387

    accuracy                           0.37      1539
   macro avg       0.35      0.63      0.43      1539
weighted avg       0.18      0.37      0.23      1539

Validation Set Evaluation
Accuracy: 0.3039
Precision: 0.1497
Recall: 0.3039
F1 Score: 0.1777

Classification Report:
               precision    recall  f1-score   support

    negative       0.61      0.48      0.54        52
     neutral       0.00      0.00      0.00       236
    positive       0.27      0.95      0.42        97

    accuracy                           0.30       385
   macro avg       0.29      0.48      0.32       385
weighted avg       0.15      0.30      0.18       385



Training Epoch 1/20: 100%|██████████| 49/49 [00:03<00:00, 14.79it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 71.79it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 46.4158, Val Loss: 10.9268


Training Epoch 2/20: 100%|██████████| 49/49 [00:04<00:00, 11.09it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 46.73it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 40.0825, Val Loss: 9.4747


Training Epoch 3/20: 100%|██████████| 49/49 [00:05<00:00,  9.78it/s]


NaN detected in loss computation.


Validation Epoch 3/20: 100%|██████████| 13/13 [00:00<00:00, 49.72it/s]


Epoch 3/20 - Train Loss: 33.9909, Val Loss: 9.3166


Training Epoch 4/20: 100%|██████████| 49/49 [00:05<00:00,  9.40it/s]


NaN detected in loss computation.


Validation Epoch 4/20: 100%|██████████| 13/13 [00:00<00:00, 45.76it/s]


NaN detected in validation loss computation.
Epoch 4/20 - Train Loss: 30.2000, Val Loss: 8.0936


Training Epoch 5/20: 100%|██████████| 49/49 [00:05<00:00,  9.21it/s]
Validation Epoch 5/20: 100%|██████████| 13/13 [00:00<00:00, 47.97it/s]


Epoch 5/20 - Train Loss: 26.5938, Val Loss: 8.4997
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.2930
Precision: 0.1778
Recall: 0.2930
F1 Score: 0.1658

Classification Report:
               precision    recall  f1-score   support

    negative       0.83      0.31      0.45       206
     neutral       0.00      0.00      0.00       946
    positive       0.26      1.00      0.42       387

    accuracy                           0.29      1539
   macro avg       0.37      0.44      0.29      1539
weighted avg       0.18      0.29      0.17      1539

Validation Set Evaluation
Accuracy: 0.2623
Precision: 0.1486
Recall: 0.2623
F1 Score: 0.1246

Classification Report:
               precision    recall  f1-score   support

    negative       0.62      0.10      0.17        52
     neutral       0.00      0.00      0.00       236
    positive       0.25      0.99      0.41        97

    accuracy                           0.26       385
   macro avg      

(0.2623376623376623,
 0.14857211753763477,
 0.2623376623376623,
 0.12456572962902077)

for giggles 1024 and 16

In [None]:
model_run += 1

hidden_size = 1024  # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.0001
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)





model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

model_run += 1

hidden_size = 16 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.0001
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)





model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

Training Epoch 1/20: 100%|██████████| 49/49 [04:52<00:00,  5.97s/it]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:20<00:00,  1.58s/it]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 22.1783, Val Loss: 11.8317


Training Epoch 2/20: 100%|██████████| 49/49 [04:26<00:00,  5.44s/it]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:16<00:00,  1.27s/it]

NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 2.7747, Val Loss: 14.2394
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation





Accuracy: 0.3814
Precision: 0.1526
Recall: 0.3814
F1 Score: 0.2161

Classification Report:
               precision    recall  f1-score   support

    negative       0.51      0.98      0.67       206
     neutral       0.00      0.00      0.00       946
    positive       0.34      0.99      0.50       387

    accuracy                           0.38      1539
   macro avg       0.28      0.66      0.39      1539
weighted avg       0.15      0.38      0.22      1539

Validation Set Evaluation




Accuracy: 0.3247
Precision: 0.1275
Recall: 0.3247
F1 Score: 0.1804

Classification Report:
               precision    recall  f1-score   support

    negative       0.36      0.62      0.45        52
     neutral       0.00      0.00      0.00       236
    positive       0.32      0.96      0.47        97

    accuracy                           0.32       385
   macro avg       0.22      0.52      0.31       385
weighted avg       0.13      0.32      0.18       385



Training Epoch 1/20: 100%|██████████| 49/49 [03:58<00:00,  4.87s/it]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:21<00:00,  1.64s/it]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 24.2684, Val Loss: 11.8751


Training Epoch 2/20: 100%|██████████| 49/49 [03:40<00:00,  4.51s/it]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it]


Epoch 2/20 - Train Loss: 5.1148, Val Loss: 14.0110
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3730
Precision: 0.1495
Recall: 0.3730
F1 Score: 0.2111

Classification Report:
               precision    recall  f1-score   support

    negative       0.49      0.91      0.64       206
     neutral       0.00      0.00      0.00       946
    positive       0.33      1.00      0.50       387

    accuracy                           0.37      1539
   macro avg       0.27      0.64      0.38      1539
weighted avg       0.15      0.37      0.21      1539

Validation Set Evaluation




Accuracy: 0.3169
Precision: 0.1251
Recall: 0.3169
F1 Score: 0.1756

Classification Report:
               precision    recall  f1-score   support

    negative       0.35      0.56      0.43        52
     neutral       0.00      0.00      0.00       236
    positive       0.31      0.96      0.47        97

    accuracy                           0.32       385
   macro avg       0.22      0.51      0.30       385
weighted avg       0.13      0.32      0.18       385



Training Epoch 1/20: 100%|██████████| 49/49 [00:05<00:00,  9.19it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 47.27it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 54.0508, Val Loss: 12.9670


Training Epoch 2/20: 100%|██████████| 49/49 [00:05<00:00,  9.06it/s]


NaN detected in loss computation.


Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 47.95it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 50.3348, Val Loss: 12.3683


Training Epoch 3/20: 100%|██████████| 49/49 [00:05<00:00,  8.37it/s]


NaN detected in loss computation.


Validation Epoch 3/20: 100%|██████████| 13/13 [00:00<00:00, 38.93it/s]


NaN detected in validation loss computation.
Epoch 3/20 - Train Loss: 47.8911, Val Loss: 11.8474


Training Epoch 4/20: 100%|██████████| 49/49 [00:05<00:00,  8.19it/s]
Validation Epoch 4/20: 100%|██████████| 13/13 [00:00<00:00, 52.84it/s]


Epoch 4/20 - Train Loss: 46.1826, Val Loss: 12.2163
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2684
Precision: 0.6020
Recall: 0.2684
F1 Score: 0.1338

Classification Report:
               precision    recall  f1-score   support

    negative       0.57      0.12      0.20       206
     neutral       0.75      0.00      0.01       946
    positive       0.26      0.99      0.41       387

    accuracy                           0.27      1539
   macro avg       0.53      0.37      0.21      1539
weighted avg       0.60      0.27      0.13      1539

Validation Set Evaluation




Accuracy: 0.2649
Precision: 0.7672
Recall: 0.2649
F1 Score: 0.1310

Classification Report:
               precision    recall  f1-score   support

    negative       0.67      0.08      0.14        52
     neutral       1.00      0.01      0.02       236
    positive       0.25      0.99      0.41        97

    accuracy                           0.26       385
   macro avg       0.64      0.36      0.19       385
weighted avg       0.77      0.26      0.13       385



Training Epoch 1/20: 100%|██████████| 49/49 [00:05<00:00,  9.43it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 96.58it/s]


Epoch 1/20 - Train Loss: 45.7577, Val Loss: 12.0677


Training Epoch 2/20: 100%|██████████| 49/49 [00:03<00:00, 16.33it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 95.10it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 41.7105, Val Loss: 9.8955


Training Epoch 3/20: 100%|██████████| 49/49 [00:03<00:00, 16.18it/s]


NaN detected in loss computation.


Validation Epoch 3/20: 100%|██████████| 13/13 [00:00<00:00, 89.91it/s]


Epoch 3/20 - Train Loss: 36.6168, Val Loss: 9.9874
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3112
Precision: 0.1251
Recall: 0.3112
F1 Score: 0.1718

Classification Report:
               precision    recall  f1-score   support

    negative       0.38      0.50      0.43       206
     neutral       0.00      0.00      0.00       946
    positive       0.30      0.97      0.46       387

    accuracy                           0.31      1539
   macro avg       0.22      0.49      0.29      1539
weighted avg       0.13      0.31      0.17      1539

Validation Set Evaluation
Accuracy: 0.2571
Precision: 0.0951
Recall: 0.2571
F1 Score: 0.1304

Classification Report:
               precision    recall  f1-score   support

    negative       0.21      0.19      0.20        52
     neutral       0.00      0.00      0.00       236
    positive       0.26      0.92      0.41        97

    accuracy                           0.26       385
   macro avg      

(0.2571428571428571,
 0.0950785711833175,
 0.2571428571428571,
 0.13038182994295847)

Trying the earlier best params. Tweaking gru layers. This also introduces dropout.

In [None]:
model_run += 1

hidden_size = 81 # max seq length.
gru_layers = 2 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)





model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

Training Epoch 1/20: 100%|██████████| 49/49 [00:31<00:00,  1.56it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:01<00:00, 12.09it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 17.8893, Val Loss: 8.4266


Training Epoch 2/20: 100%|██████████| 49/49 [00:29<00:00,  1.68it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:01<00:00, 12.19it/s]


Epoch 2/20 - Train Loss: 2.1139, Val Loss: 9.7842
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3847
Precision: 0.1588
Recall: 0.3847
F1 Score: 0.2213

Classification Report:
               precision    recall  f1-score   support

    negative       0.57      1.00      0.73       206
     neutral       0.00      0.00      0.00       946
    positive       0.33      1.00      0.49       387

    accuracy                           0.38      1539
   macro avg       0.30      0.67      0.41      1539
weighted avg       0.16      0.38      0.22      1539

Validation Set Evaluation
Accuracy: 0.3273
Precision: 0.1355
Recall: 0.3273
F1 Score: 0.1870

Classification Report:
               precision    recall  f1-score   support

    negative       0.46      0.71      0.56        52
     neutral       0.00      0.00      0.00       236
    positive       0.29      0.92      0.44        97

    accuracy                           0.33       385
   macro avg     

Training Epoch 1/20: 100%|██████████| 49/49 [00:26<00:00,  1.87it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:01<00:00, 12.44it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 6.9517, Val Loss: 11.1758


Training Epoch 2/20: 100%|██████████| 49/49 [00:21<00:00,  2.23it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:01<00:00, 12.47it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 1.9291, Val Loss: 17.0211
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3847
Precision: 0.1523
Recall: 0.3847
F1 Score: 0.2169

Classification Report:
               precision    recall  f1-score   support

    negative       0.49      1.00      0.65       206
     neutral       0.00      0.00      0.00       946
    positive       0.35      1.00      0.51       387

    accuracy                           0.38      1539
   macro avg       0.28      0.67      0.39      1539
weighted avg       0.15      0.38      0.22      1539

Validation Set Evaluation
Accuracy: 0.3247
Precision: 0.1266
Recall: 0.3247
F1 Score: 0.1805

Classification Report:
               precision    recall  f1-score   support

    negative       0.34      0.65      0.45        52
     neutral       0.00      0.00      0.00       236
    positive       0.32      0.94      0.48        97

    accuracy               

(0.3246753246753247,
 0.12655122655122655,
 0.3246753246753247,
 0.18054860602886816)

Trying 2 layers with variations on dropout

In [None]:
model_run += 1

hidden_size = 81 # max seq length.
gru_layers = 2 # 1 to start
dropout = 0.25
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)



model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

model_run += 1

hidden_size = 81 # max seq length.
gru_layers = 2 # 1 to start
dropout = 0.5
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)





model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

model_run += 1

hidden_size = 81 # max seq length.
gru_layers = 2 # 1 to start
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)





model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

Training Epoch 1/20: 100%|██████████| 49/49 [00:28<00:00,  1.72it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:02<00:00,  6.45it/s]


Epoch 1/20 - Train Loss: 9.0741, Val Loss: 12.3080


Training Epoch 2/20: 100%|██████████| 49/49 [00:27<00:00,  1.79it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:02<00:00,  5.68it/s]


Epoch 2/20 - Train Loss: 0.3536, Val Loss: 13.8619
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3814
Precision: 0.1682
Recall: 0.3814
F1 Score: 0.2280

Classification Report:
               precision    recall  f1-score   support

    negative       0.25      1.00      0.40       206
     neutral       0.00      0.00      0.00       946
    positive       0.54      0.98      0.69       387

    accuracy                           0.38      1539
   macro avg       0.26      0.66      0.36      1539
weighted avg       0.17      0.38      0.23      1539

Validation Set Evaluation
Accuracy: 0.3247
Precision: 0.1537
Recall: 0.3247
F1 Score: 0.2005

Classification Report:
               precision    recall  f1-score   support

    negative       0.21      0.90      0.33        52
     neutral       0.00      0.00      0.00       236
    positive       0.50      0.80      0.62        97

    accuracy                           0.32       385
   macro avg    

Training Epoch 1/20: 100%|██████████| 49/49 [00:17<00:00,  2.87it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:01<00:00, 11.40it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 9.3918, Val Loss: 9.6339


Training Epoch 2/20: 100%|██████████| 49/49 [00:18<00:00,  2.69it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:01<00:00, 10.83it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 1.5090, Val Loss: 18.8376
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3840
Precision: 0.1654
Recall: 0.3840
F1 Score: 0.2251

Classification Report:
               precision    recall  f1-score   support

    negative       0.64      0.99      0.78       206
     neutral       0.00      0.00      0.00       946
    positive       0.32      1.00      0.48       387

    accuracy                           0.38      1539
   macro avg       0.32      0.66      0.42      1539
weighted avg       0.17      0.38      0.23      1539

Validation Set Evaluation
Accuracy: 0.3143
Precision: 0.1340
Recall: 0.3143
F1 Score: 0.1792

Classification Report:
               precision    recall  f1-score   support

    negative       0.46      0.58      0.51        52
     neutral       0.00      0.00      0.00       236
    positive       0.28      0.94      0.44        97

    accuracy               

Training Epoch 1/20: 100%|██████████| 49/49 [00:13<00:00,  3.54it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 19.02it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 15.8742, Val Loss: 14.4705


Training Epoch 2/20: 100%|██████████| 49/49 [00:18<00:00,  2.71it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:01<00:00, 11.76it/s]


Epoch 2/20 - Train Loss: 4.3737, Val Loss: 19.8359
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3814
Precision: 0.1695
Recall: 0.3814
F1 Score: 0.2265

Classification Report:
               precision    recall  f1-score   support

    negative       0.68      0.97      0.80       206
     neutral       0.00      0.00      0.00       946
    positive       0.31      1.00      0.47       387

    accuracy                           0.38      1539
   macro avg       0.33      0.66      0.43      1539
weighted avg       0.17      0.38      0.23      1539

Validation Set Evaluation
Accuracy: 0.3273
Precision: 0.1561
Recall: 0.3273
F1 Score: 0.1947

Classification Report:
               precision    recall  f1-score   support

    negative       0.63      0.63      0.63        52
     neutral       0.00      0.00      0.00       236
    positive       0.28      0.96      0.43        97

    accuracy                           0.33       385
   macro avg    

Training Epoch 1/20: 100%|██████████| 49/49 [00:12<00:00,  4.07it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:01<00:00, 12.12it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 6.6092, Val Loss: 10.4607


Training Epoch 2/20: 100%|██████████| 49/49 [00:16<00:00,  2.97it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 20.04it/s]


Epoch 2/20 - Train Loss: 1.6112, Val Loss: 10.9185
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3847
Precision: 0.1606
Recall: 0.3847
F1 Score: 0.2224

Classification Report:
               precision    recall  f1-score   support

    negative       0.59      1.00      0.74       206
     neutral       0.00      0.00      0.00       946
    positive       0.32      1.00      0.49       387

    accuracy                           0.38      1539
   macro avg       0.30      0.67      0.41      1539
weighted avg       0.16      0.38      0.22      1539

Validation Set Evaluation
Accuracy: 0.3377
Precision: 0.1494
Recall: 0.3377
F1 Score: 0.1992

Classification Report:
               precision    recall  f1-score   support

    negative       0.58      0.79      0.67        52
     neutral       0.00      0.00      0.00       236
    positive       0.28      0.92      0.43        97

    accuracy                           0.34       385
   macro avg      

Training Epoch 1/20: 100%|██████████| 49/49 [00:14<00:00,  3.37it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:01<00:00, 11.57it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 16.4302, Val Loss: 14.1702


Training Epoch 2/20: 100%|██████████| 49/49 [00:15<00:00,  3.11it/s]


NaN detected in loss computation.


Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 18.49it/s]


Epoch 2/20 - Train Loss: 1.6925, Val Loss: 14.8668
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3853
Precision: 0.1631
Recall: 0.3853
F1 Score: 0.2243

Classification Report:
               precision    recall  f1-score   support

    negative       0.61      1.00      0.76       206
     neutral       0.00      0.00      0.00       946
    positive       0.32      1.00      0.49       387

    accuracy                           0.39      1539
   macro avg       0.31      0.67      0.42      1539
weighted avg       0.16      0.39      0.22      1539

Validation Set Evaluation
Accuracy: 0.3325
Precision: 0.1421
Recall: 0.3325
F1 Score: 0.1932

Classification Report:
               precision    recall  f1-score   support

    negative       0.52      0.77      0.62        52
     neutral       0.00      0.00      0.00       236
    positive       0.29      0.91      0.43        97

    accuracy                           0.33       385
   macro avg    

Training Epoch 1/20: 100%|██████████| 49/49 [00:15<00:00,  3.09it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:01<00:00, 11.84it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 6.4566, Val Loss: 18.0799


Training Epoch 2/20: 100%|██████████| 49/49 [00:13<00:00,  3.75it/s]


NaN detected in loss computation.


Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 20.02it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 2.7438, Val Loss: 15.1160


Training Epoch 3/20: 100%|██████████| 49/49 [00:15<00:00,  3.17it/s]
Validation Epoch 3/20: 100%|██████████| 13/13 [00:01<00:00, 12.25it/s]


Epoch 3/20 - Train Loss: 0.5194, Val Loss: 18.6197
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3853
Precision: 0.1503
Recall: 0.3853
F1 Score: 0.2157

Classification Report:
               precision    recall  f1-score   support

    negative       0.45      1.00      0.62       206
     neutral       0.00      0.00      0.00       946
    positive       0.36      1.00      0.53       387

    accuracy                           0.39      1539
   macro avg       0.27      0.67      0.38      1539
weighted avg       0.15      0.39      0.22      1539

Validation Set Evaluation
Accuracy: 0.3221
Precision: 0.1259
Recall: 0.3221
F1 Score: 0.1801

Classification Report:
               precision    recall  f1-score   support

    negative       0.36      0.73      0.48        52
     neutral       0.00      0.00      0.00       236
    positive       0.31      0.89      0.46        97

    accuracy                           0.32       385
   macro avg      

(0.3220779220779221,
 0.12590775055644282,
 0.3220779220779221,
 0.1801195131912113)

Trying different values for encoder decoder length ie 1 or 2 for encoder or decoder.

In [None]:
model_run += 1


hidden_size = 81 # max seq length.
#gru_layers = 2 # 1 to start
dropout = 0.5
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers=2, dropout=dropout)
decoder = DecoderGRU(hidden_size, gru_layers=1, dropout=dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

model_run += 1

hidden_size = 81 # max seq length.
#gru_layers = 2 # 1 to start
dropout = 0.5
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers=1, dropout=dropout)
decoder = DecoderGRU(hidden_size, gru_layers=2, dropout=dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

Training Epoch 1/20: 100%|██████████| 49/49 [00:36<00:00,  1.34it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:02<00:00,  6.04it/s]


Epoch 1/20 - Train Loss: 6.3307, Val Loss: 13.0189


Training Epoch 2/20: 100%|██████████| 49/49 [00:24<00:00,  2.02it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:02<00:00,  5.81it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 0.9260, Val Loss: 14.4461
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3853
Precision: 0.1512
Recall: 0.3853
F1 Score: 0.2163

Classification Report:
               precision    recall  f1-score   support

    negative       0.47      1.00      0.64       206
     neutral       0.00      0.00      0.00       946
    positive       0.35      1.00      0.52       387

    accuracy                           0.39      1539
   macro avg       0.27      0.67      0.39      1539
weighted avg       0.15      0.39      0.22      1539

Validation Set Evaluation




Accuracy: 0.3273
Precision: 0.1284
Recall: 0.3273
F1 Score: 0.1828

Classification Report:
               precision    recall  f1-score   support

    negative       0.36      0.69      0.48        52
     neutral       0.00      0.00      0.00       236
    positive       0.31      0.93      0.47        97

    accuracy                           0.33       385
   macro avg       0.23      0.54      0.32       385
weighted avg       0.13      0.33      0.18       385



Training Epoch 1/20: 100%|██████████| 49/49 [00:08<00:00,  6.07it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 37.80it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 15.6699, Val Loss: 13.6562


Training Epoch 2/20: 100%|██████████| 49/49 [00:06<00:00,  7.88it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 37.80it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 2.5242, Val Loss: 17.2715
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.3847
Precision: 0.1485
Recall: 0.3847
F1 Score: 0.2142

Classification Report:
               precision    recall  f1-score   support

    negative       0.41      1.00      0.58       206
     neutral       0.00      0.00      0.00       946
    positive       0.37      1.00      0.54       387

    accuracy                           0.38      1539
   macro avg       0.26      0.67      0.37      1539
weighted avg       0.15      0.38      0.21      1539

Validation Set Evaluation
Accuracy: 0.3169
Precision: 0.1235
Recall: 0.3169
F1 Score: 0.1771

Classification Report:
               precision    recall  f1-score   support

    negative       0.34      0.73      0.47        52
     neutral       0.00      0.00      0.00       236
    positive       0.31      0.87      0.45        97

    accuracy             

(0.3168831168831169,
 0.12347799501084174,
 0.3168831168831169,
 0.17706474564907254)

Still no real improvement. Try more layers 3, 4, and 5.

In [None]:
model_run += 1

hidden_size = 81 # max seq length.
gru_layers = 3
dropout = 0.5
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)


model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)


model_run += 1

hidden_size = 81 # max seq length.
gru_layers = 4
dropout = 0.5
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)


model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)


model_run += 1

hidden_size = 81 # max seq length.
gru_layers = 5
dropout = 0.5
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)


model_run += 1

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

Training Epoch 1/20: 100%|██████████| 49/49 [00:55<00:00,  1.13s/it]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:03<00:00,  4.00it/s]


Epoch 1/20 - Train Loss: 36.0956, Val Loss: 8.6523


Training Epoch 2/20: 100%|██████████| 49/49 [00:36<00:00,  1.36it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:01<00:00,  7.43it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 36.7786, Val Loss: 7.7489


Training Epoch 3/20: 100%|██████████| 49/49 [00:22<00:00,  2.18it/s]
Validation Epoch 3/20: 100%|██████████| 13/13 [00:01<00:00,  7.82it/s]


NaN detected in validation loss computation.
Epoch 3/20 - Train Loss: 35.8293, Val Loss: 7.7280


Training Epoch 4/20: 100%|██████████| 49/49 [00:24<00:00,  2.02it/s]


NaN detected in loss computation.


Validation Epoch 4/20: 100%|██████████| 13/13 [00:01<00:00, 12.25it/s]


NaN detected in validation loss computation.
Epoch 4/20 - Train Loss: 34.5132, Val Loss: 8.5699
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.1371
Precision: 0.0694
Recall: 0.1371
F1 Score: 0.0433

Classification Report:
               precision    recall  f1-score   support

    negative       0.13      0.98      0.24       206
     neutral       0.00      0.00      0.00       946
    positive       0.20      0.03      0.05       387

    accuracy                           0.14      1539
   macro avg       0.11      0.33      0.09      1539
weighted avg       0.07      0.14      0.04      1539

Validation Set Evaluation
Accuracy: 0.1377
Precision: 0.0763
Recall: 0.1377
F1 Score: 0.0456

Classification Report:
               precision    recall  f1-score   support

    negative       0.13      0.96      0.24        52
     neutral       0.00      0.00      0.00       236
    positive       0.23      0.03      0.05        97

    accuracy             

Training Epoch 1/20: 100%|██████████| 49/49 [00:21<00:00,  2.24it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 13.10it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 7.0667, Val Loss: 11.8637


Training Epoch 2/20: 100%|██████████| 49/49 [00:23<00:00,  2.05it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 13.26it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 0.1750, Val Loss: 20.6192
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3853
Precision: 0.1614
Recall: 0.3853
F1 Score: 0.2232

Classification Report:
               precision    recall  f1-score   support

    negative       0.60      1.00      0.75       206
     neutral       0.00      0.00      0.00       946
    positive       0.32      1.00      0.49       387

    accuracy                           0.39      1539
   macro avg       0.31      0.67      0.41      1539
weighted avg       0.16      0.39      0.22      1539

Validation Set Evaluation
Accuracy: 0.3169
Precision: 0.1322
Recall: 0.3169
F1 Score: 0.1810

Classification Report:
               precision    recall  f1-score   support

    negative       0.45      0.65      0.53        52
     neutral       0.00      0.00      0.00       236
    positive       0.28      0.91      0.43        97

    accuracy               

Training Epoch 1/20: 100%|██████████| 49/49 [00:31<00:00,  1.54it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:01<00:00,  9.12it/s]


Epoch 1/20 - Train Loss: 36.5346, Val Loss: 8.8260


Training Epoch 2/20: 100%|██████████| 49/49 [00:59<00:00,  1.22s/it]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:02<00:00,  5.91it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 33.2827, Val Loss: 7.9032


Training Epoch 3/20: 100%|██████████| 49/49 [00:34<00:00,  1.43it/s]
Validation Epoch 3/20: 100%|██████████| 13/13 [00:02<00:00,  5.08it/s]


NaN detected in validation loss computation.
Epoch 3/20 - Train Loss: 35.1875, Val Loss: 7.8970


Training Epoch 4/20: 100%|██████████| 49/49 [00:32<00:00,  1.50it/s]


NaN detected in loss computation.


Validation Epoch 4/20: 100%|██████████| 13/13 [00:01<00:00,  8.89it/s]


NaN detected in validation loss computation.
Epoch 4/20 - Train Loss: 34.9865, Val Loss: 7.7771


Training Epoch 5/20: 100%|██████████| 49/49 [00:32<00:00,  1.49it/s]
Validation Epoch 5/20: 100%|██████████| 13/13 [00:02<00:00,  5.79it/s]


Epoch 5/20 - Train Loss: 33.6536, Val Loss: 9.2886
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2515
Precision: 0.0632
Recall: 0.2515
F1 Score: 0.1011

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       206
     neutral       0.00      0.00      0.00       946
    positive       0.25      1.00      0.40       387

    accuracy                           0.25      1539
   macro avg       0.08      0.33      0.13      1539
weighted avg       0.06      0.25      0.10      1539

Validation Set Evaluation
Accuracy: 0.2519
Precision: 0.0635
Recall: 0.2519
F1 Score: 0.1014

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        52
     neutral       0.00      0.00      0.00       236
    positive       0.25      1.00      0.40        97

    accuracy                           0.25       385
   macro avg    

Training Epoch 1/20: 100%|██████████| 49/49 [00:32<00:00,  1.50it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:04<00:00,  2.97it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 7.3407, Val Loss: 14.9110


Training Epoch 2/20: 100%|██████████| 49/49 [00:51<00:00,  1.04s/it]


NaN detected in loss computation.


Validation Epoch 2/20: 100%|██████████| 13/13 [00:04<00:00,  2.82it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 2.6022, Val Loss: 20.4313
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3827
Precision: 0.1712
Recall: 0.3827
F1 Score: 0.2280

Classification Report:
               precision    recall  f1-score   support

    negative       0.70      0.98      0.81       206
     neutral       0.00      0.00      0.00       946
    positive       0.31      1.00      0.47       387

    accuracy                           0.38      1539
   macro avg       0.34      0.66      0.43      1539
weighted avg       0.17      0.38      0.23      1539

Validation Set Evaluation
Accuracy: 0.3195
Precision: 0.1441
Recall: 0.3195
F1 Score: 0.1871

Classification Report:
               precision    recall  f1-score   support

    negative       0.55      0.63      0.59        52
     neutral       0.00      0.00      0.00       236
    positive       0.28      0.93      0.43        97

    accuracy               

Training Epoch 1/20: 100%|██████████| 49/49 [01:05<00:00,  1.33s/it]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:02<00:00,  4.68it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 37.8154, Val Loss: 7.7955


Training Epoch 2/20: 100%|██████████| 49/49 [00:47<00:00,  1.03it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:02<00:00,  4.52it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 36.3596, Val Loss: 7.7323


Training Epoch 3/20: 100%|██████████| 49/49 [00:47<00:00,  1.03it/s]
Validation Epoch 3/20: 100%|██████████| 13/13 [00:01<00:00,  7.45it/s]


NaN detected in validation loss computation.
Epoch 3/20 - Train Loss: 35.9107, Val Loss: 8.5541
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2515
Precision: 0.0632
Recall: 0.2515
F1 Score: 0.1011

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       206
     neutral       0.00      0.00      0.00       946
    positive       0.25      1.00      0.40       387

    accuracy                           0.25      1539
   macro avg       0.08      0.33      0.13      1539
weighted avg       0.06      0.25      0.10      1539

Validation Set Evaluation
Accuracy: 0.2519
Precision: 0.0635
Recall: 0.2519
F1 Score: 0.1014

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        52
     neutral       0.00      0.00      0.00       236
    positive       0.25      1.00      0.40        97

    accuracy             

Training Epoch 1/20: 100%|██████████| 49/49 [00:38<00:00,  1.28it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:02<00:00,  4.63it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 8.1548, Val Loss: 28.9997


Training Epoch 2/20: 100%|██████████| 49/49 [00:37<00:00,  1.31it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:01<00:00,  7.69it/s]


Epoch 2/20 - Train Loss: 3.3715, Val Loss: 15.3384


Training Epoch 3/20: 100%|██████████| 49/49 [00:35<00:00,  1.39it/s]
Validation Epoch 3/20: 100%|██████████| 13/13 [00:01<00:00,  7.74it/s]


NaN detected in validation loss computation.
Epoch 3/20 - Train Loss: 4.9358, Val Loss: 15.0878


Training Epoch 4/20: 100%|██████████| 49/49 [00:42<00:00,  1.16it/s]
Validation Epoch 4/20: 100%|██████████| 13/13 [00:02<00:00,  4.73it/s]


NaN detected in validation loss computation.
Epoch 4/20 - Train Loss: 1.0929, Val Loss: 21.2762
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3847
Precision: 0.1573
Recall: 0.3847
F1 Score: 0.2203

Classification Report:
               precision    recall  f1-score   support

    negative       0.55      1.00      0.71       206
     neutral       0.00      0.00      0.00       946
    positive       0.33      1.00      0.50       387

    accuracy                           0.38      1539
   macro avg       0.29      0.67      0.40      1539
weighted avg       0.16      0.38      0.22      1539

Validation Set Evaluation
Accuracy: 0.3091
Precision: 0.1285
Recall: 0.3091
F1 Score: 0.1750

Classification Report:
               precision    recall  f1-score   support

    negative       0.42      0.58      0.49        52
     neutral       0.00      0.00      0.00       236
    positive       0.28      0.92      0.43        97

    accuracy               

(0.3090909090909091,
 0.12848171833549066,
 0.3090909090909091,
 0.17500152212632183)

We see clearly that more layers are not helping and sometimes strongly hurting. The GRU2GRU model becomes worthless as it starts predicting everything as either positive or negative. Unfortunately it seems our model is simply unable to predict neutral values outside of the occasional random few that aren't sorted into negative or positive. We also run into the question of what thebest model is when the performance is so off.

We will be loading the best GRU2GRU and Basic model and evaluating the test set.

From there we will use the best parameters and try different word dim vectors and different agreements

In [None]:
class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model Run 28:\n')
hidden_size = 81 # max seq length.
gru_layers = 2 # 1 to start
dropout = 0.5
batch_size = 32
lr=0.01
epochs=20
clip=0.9

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
best_g2g_model = GRU2GRU(encoder, decoder)
best_g2g_model.load_state_dict(torch.load('./best_model_run28.pt'))

print("Test Set Evaluation")
evaluate_model(best_g2g_model, test_loader, device, class_names)


print('\nGRU2GRU Model Run 30:\n')
hidden_size = 81 # max seq length.
gru_layers = 2 # 1 to start
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
best_g2g_model = GRU2GRU(encoder, decoder)
best_g2g_model.load_state_dict(torch.load('./best_model_run30.pt'))

print("Test Set Evaluation")
evaluate_model(best_g2g_model, test_loader, device, class_names)


print('\nBasic Model Run 5:\n')
hidden_size = 81 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.01
epochs=10
clip=0.9

best_simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)
best_simple_model.load_state_dict(torch.load('./best_model_run5.pt'))

print("Test Set Evaluation")
evaluate_model(best_simple_model, test_loader, device, class_names)


print('\nBasic Model Run 29:\n')

hidden_size = 81 # max seq length.
gru_layers = 2 # 1 to start
dropout = 0.5
batch_size = 32
lr=0.01
epochs=20
clip=0.9

best_simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)
best_simple_model.load_state_dict(torch.load('./best_model_run29.pt'))

print("Test Set Evaluation")
evaluate_model(best_simple_model, test_loader, device, class_names)


GRU2GRU Model Run 28:

Test Set Evaluation
Accuracy: 0.3059
Precision: 0.1498
Recall: 0.3059
F1 Score: 0.1924

Classification Report:
               precision    recall  f1-score   support

    negative       0.18      0.84      0.30        45
     neutral       0.00      0.00      0.00       209
    positive       0.50      0.77      0.60        86

    accuracy                           0.31       340
   macro avg       0.23      0.54      0.30       340
weighted avg       0.15      0.31      0.19       340


GRU2GRU Model Run 30:

Test Set Evaluation
Accuracy: 0.3088
Precision: 0.1477
Recall: 0.3088
F1 Score: 0.1809

Classification Report:
               precision    recall  f1-score   support

    negative       0.60      0.53      0.56        45
     neutral       0.00      0.00      0.00       209
    positive       0.27      0.94      0.42        86

    accuracy                           0.31       340
   macro avg       0.29      0.49      0.33       340
weighted avg       0.



Accuracy: 0.3471
Precision: 0.1365
Recall: 0.3471
F1 Score: 0.1944

Classification Report:
               precision    recall  f1-score   support

    negative       0.41      0.80      0.54        45
     neutral       0.00      0.00      0.00       209
    positive       0.33      0.95      0.49        86

    accuracy                           0.35       340
   macro avg       0.24      0.58      0.34       340
weighted avg       0.14      0.35      0.19       340


Basic Model Run 29:

Test Set Evaluation
Accuracy: 0.3147
Precision: 0.1221
Recall: 0.3147
F1 Score: 0.1750

Classification Report:
               precision    recall  f1-score   support

    negative       0.34      0.69      0.45        45
     neutral       0.00      0.00      0.00       209
    positive       0.31      0.88      0.46        86

    accuracy                           0.31       340
   macro avg       0.21      0.52      0.30       340
weighted avg       0.12      0.31      0.18       340



(0.31470588235294117,
 0.12211141819981851,
 0.31470588235294117,
 0.17500790604278776)

In [None]:
print('\nGRU2GRU Model Run 0:\n')
hidden_size = 81 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.01
epochs=10
clip=0.25

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
best_g2g_model = GRU2GRU(encoder, decoder)
best_g2g_model.load_state_dict(torch.load('./best_model_run0.pt'))

print("Test Set Evaluation")
evaluate_model(best_g2g_model, test_loader, device, class_names)


print('\nGRU2GRU Model Run 4:\n')

hidden_size = 81 # max seq length.
gru_layers = 1 # 1 to start
dropout = 0.1
batch_size = 32
lr=0.01
epochs=10
clip=0.9

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
best_g2g_model = GRU2GRU(encoder, decoder)
best_g2g_model.load_state_dict(torch.load('./best_model_run4.pt'))

print("Test Set Evaluation")
evaluate_model(best_g2g_model, test_loader, device, class_names)




GRU2GRU Model Run 0:

Test Set Evaluation




Accuracy: 0.3324
Precision: 0.1279
Recall: 0.3324
F1 Score: 0.1845

Classification Report:
               precision    recall  f1-score   support

    negative       0.33      0.76      0.46        45
     neutral       0.00      0.00      0.00       209
    positive       0.33      0.92      0.49        86

    accuracy                           0.33       340
   macro avg       0.22      0.56      0.32       340
weighted avg       0.13      0.33      0.18       340


GRU2GRU Model Run 4:

Test Set Evaluation




Accuracy: 0.3412
Precision: 0.1337
Recall: 0.3412
F1 Score: 0.1916

Classification Report:
               precision    recall  f1-score   support

    negative       0.28      0.84      0.42        45
     neutral       0.00      0.00      0.00       209
    positive       0.38      0.91      0.54        86

    accuracy                           0.34       340
   macro avg       0.22      0.58      0.32       340
weighted avg       0.13      0.34      0.19       340



(0.3411764705882353,
 0.13369377162629756,
 0.3411764705882353,
 0.19163851938184304)

We elect to use model 30 and model 5 for the GRU2GRU and basic model respectively as the "best" models for the full run through.

While running the below models we observed a constant issue where every GRU2GRU model would put all the data points as positive. We suspected that was because we had a defined random seed in the test train splits, but even after removing the random_state variable we consistently saw this behavior. We are baffled by it and have no explanation for it.

50 Dim Word Vectors all Agreements

In [None]:
glove_file_path = './glove.6B.50d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_AllAgree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=50)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '50dim_AllAgree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '50dim_AllAgree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  81


Training Epoch 1/20: 100%|██████████| 49/49 [00:37<00:00,  1.31it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:02<00:00,  5.54it/s]


Epoch 1/20 - Train Loss: 39.9046, Val Loss: 8.2508


Training Epoch 2/20: 100%|██████████| 49/49 [00:20<00:00,  2.42it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:01<00:00, 11.38it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 38.0406, Val Loss: 8.8048
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2515
Precision: 0.0632
Recall: 0.2515
F1 Score: 0.1011

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       206
     neutral       0.00      0.00      0.00       946
    positive       0.25      1.00      0.40       387

    accuracy                           0.25      1539
   macro avg       0.08      0.33      0.13      1539
weighted avg       0.06      0.25      0.10      1539

Validation Set Evaluation
Accuracy: 0.2519
Precision: 0.0635
Recall: 0.2519
F1 Score: 0.1014

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        52
     neutral       0.00      0.00      0.00       236
    positive       0.25      1.00      0.40        97

    accuracy             



Accuracy: 0.2529
Precision: 0.0640
Recall: 0.2529
F1 Score: 0.1021

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        45
     neutral       0.00      0.00      0.00       209
    positive       0.25      1.00      0.40        86

    accuracy                           0.25       340
   macro avg       0.08      0.33      0.13       340
weighted avg       0.06      0.25      0.10       340



Training Epoch 1/20: 100%|██████████| 49/49 [00:05<00:00,  9.34it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 37.98it/s]


Epoch 1/20 - Train Loss: 38.4088, Val Loss: 8.7978


Training Epoch 2/20: 100%|██████████| 49/49 [00:06<00:00,  7.84it/s]


NaN detected in loss computation.


Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 24.36it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 21.1557, Val Loss: 4.0683


Training Epoch 3/20: 100%|██████████| 49/49 [00:07<00:00,  6.15it/s]
Validation Epoch 3/20: 100%|██████████| 13/13 [00:00<00:00, 25.72it/s]


NaN detected in validation loss computation.
Epoch 3/20 - Train Loss: 7.7430, Val Loss: 4.7309
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3717
Precision: 0.1911
Recall: 0.3717
F1 Score: 0.2325

Classification Report:
               precision    recall  f1-score   support

    negative       0.88      0.90      0.89       206
     neutral       0.00      0.00      0.00       946
    positive       0.29      1.00      0.45       387

    accuracy                           0.37      1539
   macro avg       0.39      0.63      0.45      1539
weighted avg       0.19      0.37      0.23      1539

Validation Set Evaluation
Accuracy: 0.3455
Precision: 0.1800
Recall: 0.3455
F1 Score: 0.2129

Classification Report:
               precision    recall  f1-score   support

    negative       0.80      0.71      0.76        52
     neutral       0.00      0.00      0.00       236
    positive       0.28      0.99      0.44        97

    accuracy                

(0.35, 0.18696768703996064, 0.35, 0.21737307020684313)

In [None]:
glove_file_path = './glove.6B.50d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_75Agree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=50)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '50dim_75Agree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '50dim_75Agree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  81


Training Epoch 1/20: 100%|██████████| 74/74 [00:24<00:00,  3.04it/s]
Validation Epoch 1/20: 100%|██████████| 19/19 [00:01<00:00, 10.64it/s]


Epoch 1/20 - Train Loss: 56.6719, Val Loss: 12.9030


Training Epoch 2/20: 100%|██████████| 74/74 [00:37<00:00,  1.98it/s]
Validation Epoch 2/20: 100%|██████████| 19/19 [00:01<00:00, 17.65it/s]


Epoch 2/20 - Train Loss: 52.5867, Val Loss: 12.8824


Training Epoch 3/20: 100%|██████████| 74/74 [00:25<00:00,  2.91it/s]
Validation Epoch 3/20: 100%|██████████| 19/19 [00:01<00:00, 15.68it/s]


Epoch 3/20 - Train Loss: 53.3273, Val Loss: 11.9674


Training Epoch 4/20: 100%|██████████| 74/74 [00:22<00:00,  3.23it/s]
Validation Epoch 4/20: 100%|██████████| 19/19 [00:01<00:00, 10.91it/s]


Epoch 4/20 - Train Loss: 51.6564, Val Loss: 11.9001


Training Epoch 5/20: 100%|██████████| 74/74 [00:29<00:00,  2.55it/s]
Validation Epoch 5/20: 100%|██████████| 19/19 [00:01<00:00, 10.71it/s]


Epoch 5/20 - Train Loss: 51.2360, Val Loss: 12.1180
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2568
Precision: 0.0660
Recall: 0.2568
F1 Score: 0.1050

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       286
     neutral       0.00      0.00      0.00      1459
    positive       0.26      1.00      0.41       603

    accuracy                           0.26      2348
   macro avg       0.09      0.33      0.14      2348
weighted avg       0.07      0.26      0.10      2348

Validation Set Evaluation
Accuracy: 0.2572
Precision: 0.0662
Recall: 0.2572
F1 Score: 0.1053

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        71
     neutral       0.00      0.00      0.00       365
    positive       0.26      1.00      0.41       151

    accuracy                           0.26       587
   macro avg   



Accuracy: 0.2568
Precision: 0.0659
Recall: 0.2568
F1 Score: 0.1049

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        63
     neutral       0.00      0.00      0.00       322
    positive       0.26      1.00      0.41       133

    accuracy                           0.26       518
   macro avg       0.09      0.33      0.14       518
weighted avg       0.07      0.26      0.10       518



Training Epoch 1/20: 100%|██████████| 74/74 [00:10<00:00,  7.34it/s]
Validation Epoch 1/20: 100%|██████████| 19/19 [00:00<00:00, 40.52it/s]


Epoch 1/20 - Train Loss: 50.4074, Val Loss: 10.2995


Training Epoch 2/20: 100%|██████████| 74/74 [00:08<00:00,  8.26it/s]
Validation Epoch 2/20: 100%|██████████| 19/19 [00:00<00:00, 24.21it/s]


Epoch 2/20 - Train Loss: 22.1135, Val Loss: 6.3130


Training Epoch 3/20: 100%|██████████| 74/74 [00:12<00:00,  5.95it/s]
Validation Epoch 3/20: 100%|██████████| 19/19 [00:00<00:00, 24.54it/s]


Epoch 3/20 - Train Loss: 9.7614, Val Loss: 5.6009


Training Epoch 4/20: 100%|██████████| 74/74 [00:09<00:00,  7.77it/s]
Validation Epoch 4/20: 100%|██████████| 19/19 [00:00<00:00, 39.42it/s]


Epoch 4/20 - Train Loss: 3.6610, Val Loss: 9.6715
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3722
Precision: 0.1732
Recall: 0.3722
F1 Score: 0.2236

Classification Report:
               precision    recall  f1-score   support

    negative       0.79      0.95      0.86       286
     neutral       0.00      0.00      0.00      1459
    positive       0.30      1.00      0.46       603

    accuracy                           0.37      2348
   macro avg       0.36      0.65      0.44      2348
weighted avg       0.17      0.37      0.22      2348

Validation Set Evaluation
Accuracy: 0.3305
Precision: 0.1569
Recall: 0.3305
F1 Score: 0.1933

Classification Report:
               precision    recall  f1-score   support

    negative       0.69      0.62      0.65        71
     neutral       0.00      0.00      0.00       365
    positive       0.29      0.99      0.45       151

    accuracy                           0.33       587
   macro avg       

(0.3204633204633205,
 0.14961944703324015,
 0.3204633204633205,
 0.1842267470408174)

In [None]:
glove_file_path = './glove.6B.50d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_66Agree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=50)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '50dim_66Agree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9



model_run = model_run = '50dim_66Agree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  94


Training Epoch 1/20: 100%|██████████| 90/90 [00:33<00:00,  2.65it/s]
Validation Epoch 1/20: 100%|██████████| 23/23 [00:02<00:00,  9.61it/s]


Epoch 1/20 - Train Loss: 66.5124, Val Loss: 14.7736


Training Epoch 2/20: 100%|██████████| 90/90 [00:33<00:00,  2.67it/s]
Validation Epoch 2/20: 100%|██████████| 23/23 [00:02<00:00,  9.57it/s]


Epoch 2/20 - Train Loss: 65.4279, Val Loss: 14.5354


Training Epoch 3/20: 100%|██████████| 90/90 [00:33<00:00,  2.68it/s]
Validation Epoch 3/20: 100%|██████████| 23/23 [00:01<00:00, 15.40it/s]


Epoch 3/20 - Train Loss: 62.9939, Val Loss: 14.2621


Training Epoch 4/20: 100%|██████████| 90/90 [00:39<00:00,  2.28it/s]
Validation Epoch 4/20: 100%|██████████| 23/23 [00:02<00:00,  9.22it/s]


Epoch 4/20 - Train Loss: 62.8865, Val Loss: 14.1341


Training Epoch 5/20: 100%|██████████| 90/90 [00:32<00:00,  2.73it/s]
Validation Epoch 5/20: 100%|██████████| 23/23 [00:01<00:00, 15.79it/s]


Epoch 5/20 - Train Loss: 62.7566, Val Loss: 15.0996
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2769
Precision: 0.0767
Recall: 0.2769
F1 Score: 0.1201

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       350
     neutral       0.00      0.00      0.00      1723
    positive       0.28      1.00      0.43       794

    accuracy                           0.28      2867
   macro avg       0.09      0.33      0.14      2867
weighted avg       0.08      0.28      0.12      2867

Validation Set Evaluation
Accuracy: 0.2775
Precision: 0.0770
Recall: 0.2775
F1 Score: 0.1206

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        87
     neutral       0.00      0.00      0.00       431
    positive       0.28      1.00      0.43       199

    accuracy                           0.28       717
   macro avg   



Accuracy: 0.2765
Precision: 0.0764
Recall: 0.2765
F1 Score: 0.1198

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        77
     neutral       0.00      0.00      0.00       381
    positive       0.28      1.00      0.43       175

    accuracy                           0.28       633
   macro avg       0.09      0.33      0.14       633
weighted avg       0.08      0.28      0.12       633



Training Epoch 1/20: 100%|██████████| 90/90 [00:17<00:00,  5.18it/s]
Validation Epoch 1/20: 100%|██████████| 23/23 [00:01<00:00, 21.77it/s]


Epoch 1/20 - Train Loss: 55.0059, Val Loss: 10.1934


Training Epoch 2/20: 100%|██████████| 90/90 [00:12<00:00,  7.47it/s]
Validation Epoch 2/20: 100%|██████████| 23/23 [00:00<00:00, 29.64it/s]


Epoch 2/20 - Train Loss: 21.2167, Val Loss: 8.4151


Training Epoch 3/20: 100%|██████████| 90/90 [00:17<00:00,  5.20it/s]
Validation Epoch 3/20: 100%|██████████| 23/23 [00:01<00:00, 21.10it/s]


Epoch 3/20 - Train Loss: 6.2597, Val Loss: 12.5562
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3907
Precision: 0.1712
Recall: 0.3907
F1 Score: 0.2321

Classification Report:
               precision    recall  f1-score   support

    negative       0.64      0.94      0.76       350
     neutral       0.00      0.00      0.00      1723
    positive       0.34      1.00      0.50       794

    accuracy                           0.39      2867
   macro avg       0.33      0.64      0.42      2867
weighted avg       0.17      0.39      0.23      2867

Validation Set Evaluation
Accuracy: 0.3529
Precision: 0.1506
Recall: 0.3529
F1 Score: 0.2052

Classification Report:
               precision    recall  f1-score   support

    negative       0.50      0.70      0.59        87
     neutral       0.00      0.00      0.00       431
    positive       0.32      0.96      0.48       199

    accuracy                           0.35       717
   macro avg      

(0.36018957345971564,
 0.15342491821699455,
 0.36018957345971564,
 0.20961479011680995)

In [None]:
glove_file_path = './glove.6B.50d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_50Agree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=50)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '50dim_50Agree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)


hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9



model_run = model_run = '50dim_50Agree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  94


Training Epoch 1/20: 100%|██████████| 103/103 [00:38<00:00,  2.70it/s]
Validation Epoch 1/20: 100%|██████████| 26/26 [00:02<00:00,  9.14it/s]


Epoch 1/20 - Train Loss: 79.4954, Val Loss: 16.7469


Training Epoch 2/20: 100%|██████████| 103/103 [00:41<00:00,  2.51it/s]
Validation Epoch 2/20: 100%|██████████| 26/26 [00:02<00:00,  8.94it/s]


Epoch 2/20 - Train Loss: 71.2206, Val Loss: 16.1373


Training Epoch 3/20: 100%|██████████| 103/103 [00:39<00:00,  2.58it/s]
Validation Epoch 3/20: 100%|██████████| 26/26 [00:02<00:00,  9.17it/s]


Epoch 3/20 - Train Loss: 72.1461, Val Loss: 15.7710


Training Epoch 4/20: 100%|██████████| 103/103 [00:39<00:00,  2.61it/s]
Validation Epoch 4/20: 100%|██████████| 26/26 [00:01<00:00, 15.33it/s]


Epoch 4/20 - Train Loss: 73.9360, Val Loss: 16.4521
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2813
Precision: 0.0791
Recall: 0.2813
F1 Score: 0.1235

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       410
     neutral       0.00      0.00      0.00      1958
    positive       0.28      1.00      0.44       927

    accuracy                           0.28      3295
   macro avg       0.09      0.33      0.15      3295
weighted avg       0.08      0.28      0.12      3295

Validation Set Evaluation
Accuracy: 0.2816
Precision: 0.0793
Recall: 0.2816
F1 Score: 0.1237

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       103
     neutral       0.00      0.00      0.00       489
    positive       0.28      1.00      0.44       232

    accuracy                           0.28       824
   macro avg   



Accuracy: 0.2806
Precision: 0.0787
Recall: 0.2806
F1 Score: 0.1230

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        91
     neutral       0.00      0.00      0.00       432
    positive       0.28      1.00      0.44       204

    accuracy                           0.28       727
   macro avg       0.09      0.33      0.15       727
weighted avg       0.08      0.28      0.12       727



Training Epoch 1/20: 100%|██████████| 103/103 [00:19<00:00,  5.34it/s]
Validation Epoch 1/20: 100%|██████████| 26/26 [00:00<00:00, 35.02it/s]


Epoch 1/20 - Train Loss: 60.1497, Val Loss: 8.2039


Training Epoch 2/20: 100%|██████████| 103/103 [00:15<00:00,  6.79it/s]
Validation Epoch 2/20: 100%|██████████| 26/26 [00:01<00:00, 21.85it/s]


Epoch 2/20 - Train Loss: 19.3747, Val Loss: 5.2617


Training Epoch 3/20: 100%|██████████| 103/103 [00:17<00:00,  5.84it/s]
Validation Epoch 3/20: 100%|██████████| 26/26 [00:00<00:00, 33.56it/s]


Epoch 3/20 - Train Loss: 7.0571, Val Loss: 7.3468
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.4021
Precision: 0.1715
Recall: 0.4021
F1 Score: 0.2376

Classification Report:
               precision    recall  f1-score   support

    negative       0.57      0.98      0.72       410
     neutral       0.00      0.00      0.00      1958
    positive       0.36      1.00      0.52       927

    accuracy                           0.40      3295
   macro avg       0.31      0.66      0.42      3295
weighted avg       0.17      0.40      0.24      3295

Validation Set Evaluation
Accuracy: 0.3653
Precision: 0.1535
Recall: 0.3653
F1 Score: 0.2138

Classification Report:
               precision    recall  f1-score   support

    negative       0.47      0.81      0.59       103
     neutral       0.00      0.00      0.00       489
    positive       0.34      0.94      0.50       232

    accuracy                           0.37       824
   macro avg       

(0.3576341127922971,
 0.1483851244464703,
 0.3576341127922971,
 0.20766919200917963)

### 100 Dim, All Agreements

In [None]:
glove_file_path = './glove.6B.100d.txt'
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_AllAgree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=100)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '100dim_AllAgree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '100dim_AllAgree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  81


Training Epoch 1/20: 100%|██████████| 49/49 [00:13<00:00,  3.51it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 13.93it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 40.7989, Val Loss: 7.9966


Training Epoch 2/20: 100%|██████████| 49/49 [00:20<00:00,  2.40it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:01<00:00,  9.22it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 36.6721, Val Loss: 7.8807


Training Epoch 3/20: 100%|██████████| 49/49 [00:19<00:00,  2.55it/s]
Validation Epoch 3/20: 100%|██████████| 13/13 [00:00<00:00, 17.87it/s]


Epoch 3/20 - Train Loss: 35.3222, Val Loss: 8.7761
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2515
Precision: 0.0632
Recall: 0.2515
F1 Score: 0.1011

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       206
     neutral       0.00      0.00      0.00       946
    positive       0.25      1.00      0.40       387

    accuracy                           0.25      1539
   macro avg       0.08      0.33      0.13      1539
weighted avg       0.06      0.25      0.10      1539

Validation Set Evaluation
Accuracy: 0.2519
Precision: 0.0635
Recall: 0.2519
F1 Score: 0.1014

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        52
     neutral       0.00      0.00      0.00       236
    positive       0.25      1.00      0.40        97

    accuracy                           0.25       385
   macro avg    



Accuracy: 0.2529
Precision: 0.0640
Recall: 0.2529
F1 Score: 0.1021

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        45
     neutral       0.00      0.00      0.00       209
    positive       0.25      1.00      0.40        86

    accuracy                           0.25       340
   macro avg       0.08      0.33      0.13       340
weighted avg       0.06      0.25      0.10       340



Training Epoch 1/20: 100%|██████████| 49/49 [00:06<00:00,  7.18it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 23.32it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 35.4895, Val Loss: 9.6175


Training Epoch 2/20: 100%|██████████| 49/49 [00:08<00:00,  5.57it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 23.20it/s]


Epoch 2/20 - Train Loss: 16.4370, Val Loss: 3.8873


Training Epoch 3/20: 100%|██████████| 49/49 [00:08<00:00,  5.66it/s]
Validation Epoch 3/20: 100%|██████████| 13/13 [00:00<00:00, 39.28it/s]


NaN detected in validation loss computation.
Epoch 3/20 - Train Loss: 3.9956, Val Loss: 5.5042
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3808
Precision: 0.1569
Recall: 0.3808
F1 Score: 0.2194

Classification Report:
               precision    recall  f1-score   support

    negative       0.28      1.00      0.44       206
     neutral       0.00      0.00      0.00       946
    positive       0.48      0.98      0.64       387

    accuracy                           0.38      1539
   macro avg       0.25      0.66      0.36      1539
weighted avg       0.16      0.38      0.22      1539

Validation Set Evaluation
Accuracy: 0.3143
Precision: 0.1325
Recall: 0.3143
F1 Score: 0.1826

Classification Report:
               precision    recall  f1-score   support

    negative       0.23      0.90      0.37        52
     neutral       0.00      0.00      0.00       236
    positive       0.40      0.76      0.52        97

    accuracy                

(0.3205882352941177,
 0.12923224192453317,
 0.3205882352941177,
 0.1816743156753781)

In [None]:
glove_file_path = './glove.6B.100d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_75Agree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=100)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '100dim_75Agree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)


hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9



model_run = model_run = '100dim_75Agree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  81


Training Epoch 1/20: 100%|██████████| 74/74 [00:36<00:00,  2.04it/s]
Validation Epoch 1/20: 100%|██████████| 19/19 [00:01<00:00, 10.40it/s]


Epoch 1/20 - Train Loss: 55.7424, Val Loss: 11.9977


Training Epoch 2/20: 100%|██████████| 74/74 [00:23<00:00,  3.10it/s]
Validation Epoch 2/20: 100%|██████████| 19/19 [00:01<00:00, 10.23it/s]


Epoch 2/20 - Train Loss: 57.4218, Val Loss: 12.0805
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2568
Precision: 0.0660
Recall: 0.2568
F1 Score: 0.1050

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       286
     neutral       0.00      0.00      0.00      1459
    positive       0.26      1.00      0.41       603

    accuracy                           0.26      2348
   macro avg       0.09      0.33      0.14      2348
weighted avg       0.07      0.26      0.10      2348

Validation Set Evaluation
Accuracy: 0.2572
Precision: 0.0662
Recall: 0.2572
F1 Score: 0.1053

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        71
     neutral       0.00      0.00      0.00       365
    positive       0.26      1.00      0.41       151

    accuracy                           0.26       587
   macro avg   



Accuracy: 0.2568
Precision: 0.0659
Recall: 0.2568
F1 Score: 0.1049

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        63
     neutral       0.00      0.00      0.00       322
    positive       0.26      1.00      0.41       133

    accuracy                           0.26       518
   macro avg       0.09      0.33      0.14       518
weighted avg       0.07      0.26      0.10       518



Training Epoch 1/20: 100%|██████████| 74/74 [00:08<00:00,  8.24it/s]
Validation Epoch 1/20: 100%|██████████| 19/19 [00:00<00:00, 25.82it/s]


Epoch 1/20 - Train Loss: 46.0692, Val Loss: 6.2695


Training Epoch 2/20: 100%|██████████| 74/74 [00:13<00:00,  5.36it/s]
Validation Epoch 2/20: 100%|██████████| 19/19 [00:00<00:00, 22.09it/s]


Epoch 2/20 - Train Loss: 14.0445, Val Loss: 6.5682
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3612
Precision: 0.8049
Recall: 0.3612
F1 Score: 0.2230

Classification Report:
               precision    recall  f1-score   support

    negative       0.89      0.86      0.88       286
     neutral       1.00      0.00      0.00      1459
    positive       0.29      1.00      0.45       603

    accuracy                           0.36      2348
   macro avg       0.73      0.62      0.44      2348
weighted avg       0.80      0.36      0.22      2348

Validation Set Evaluation
Accuracy: 0.3305
Precision: 0.1745
Recall: 0.3305
F1 Score: 0.1997

Classification Report:
               precision    recall  f1-score   support

    negative       0.85      0.63      0.73        71
     neutral       0.00      0.00      0.00       365
    positive       0.28      0.99      0.44       151

    accuracy                           0.33       587
   macro avg      

(0.333976833976834,
 0.16862335263622816,
 0.333976833976834,
 0.19988307971177974)

In [None]:
glove_file_path = './glove.6B.100d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_66Agree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=100)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '100dim_66Agree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '100dim_66Agree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  94


Training Epoch 1/20: 100%|██████████| 90/90 [00:41<00:00,  2.17it/s]
Validation Epoch 1/20: 100%|██████████| 23/23 [00:02<00:00, 10.02it/s]


Epoch 1/20 - Train Loss: 64.9892, Val Loss: 15.2590


Training Epoch 2/20: 100%|██████████| 90/90 [00:36<00:00,  2.48it/s]
Validation Epoch 2/20: 100%|██████████| 23/23 [00:02<00:00,  9.09it/s]


Epoch 2/20 - Train Loss: 63.9135, Val Loss: 13.9013


Training Epoch 3/20: 100%|██████████| 90/90 [00:35<00:00,  2.54it/s]
Validation Epoch 3/20: 100%|██████████| 23/23 [00:02<00:00,  8.81it/s]


Epoch 3/20 - Train Loss: 62.4420, Val Loss: 14.1601
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2769
Precision: 0.0767
Recall: 0.2769
F1 Score: 0.1201

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       350
     neutral       0.00      0.00      0.00      1723
    positive       0.28      1.00      0.43       794

    accuracy                           0.28      2867
   macro avg       0.09      0.33      0.14      2867
weighted avg       0.08      0.28      0.12      2867

Validation Set Evaluation
Accuracy: 0.2775
Precision: 0.0770
Recall: 0.2775
F1 Score: 0.1206

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        87
     neutral       0.00      0.00      0.00       431
    positive       0.28      1.00      0.43       199

    accuracy                           0.28       717
   macro avg   



Accuracy: 0.2765
Precision: 0.0764
Recall: 0.2765
F1 Score: 0.1198

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        77
     neutral       0.00      0.00      0.00       381
    positive       0.28      1.00      0.43       175

    accuracy                           0.28       633
   macro avg       0.09      0.33      0.14       633
weighted avg       0.08      0.28      0.12       633



Training Epoch 1/20: 100%|██████████| 90/90 [00:18<00:00,  4.75it/s]
Validation Epoch 1/20: 100%|██████████| 23/23 [00:01<00:00, 18.67it/s]


Epoch 1/20 - Train Loss: 62.4985, Val Loss: 8.1138


Training Epoch 2/20: 100%|██████████| 90/90 [00:14<00:00,  6.29it/s]
Validation Epoch 2/20: 100%|██████████| 23/23 [00:00<00:00, 31.20it/s]


Epoch 2/20 - Train Loss: 19.3041, Val Loss: 6.4685


Training Epoch 3/20: 100%|██████████| 90/90 [00:18<00:00,  4.77it/s]
Validation Epoch 3/20: 100%|██████████| 23/23 [00:01<00:00, 19.49it/s]


Epoch 3/20 - Train Loss: 5.9754, Val Loss: 7.6700
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3931
Precision: 0.1579
Recall: 0.3931
F1 Score: 0.2250

Classification Report:
               precision    recall  f1-score   support

    negative       0.45      1.00      0.62       350
     neutral       0.00      0.00      0.00      1723
    positive       0.37      0.98      0.54       794

    accuracy                           0.39      2867
   macro avg       0.27      0.66      0.39      2867
weighted avg       0.16      0.39      0.22      2867

Validation Set Evaluation
Accuracy: 0.3584
Precision: 0.1433
Recall: 0.3584
F1 Score: 0.2044

Classification Report:
               precision    recall  f1-score   support

    negative       0.37      0.80      0.51        87
     neutral       0.00      0.00      0.00       431
    positive       0.35      0.94      0.51       199

    accuracy                           0.36       717
   macro avg       

(0.37282780410742494,
 0.14898616631470038,
 0.37282780410742494,
 0.2126636066226319)

In [None]:
glove_file_path = './glove.6B.100d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_50Agree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=100)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '100dim_50Agree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '100dim_50Agree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  94


Training Epoch 1/20: 100%|██████████| 103/103 [00:48<00:00,  2.11it/s]
Validation Epoch 1/20: 100%|██████████| 26/26 [00:02<00:00,  8.81it/s]


Epoch 1/20 - Train Loss: 75.5496, Val Loss: 15.8580


Training Epoch 2/20: 100%|██████████| 103/103 [00:39<00:00,  2.63it/s]
Validation Epoch 2/20: 100%|██████████| 26/26 [00:01<00:00, 14.75it/s]


Epoch 2/20 - Train Loss: 73.5543, Val Loss: 15.9520
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2813
Precision: 0.0791
Recall: 0.2813
F1 Score: 0.1235

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       410
     neutral       0.00      0.00      0.00      1958
    positive       0.28      1.00      0.44       927

    accuracy                           0.28      3295
   macro avg       0.09      0.33      0.15      3295
weighted avg       0.08      0.28      0.12      3295

Validation Set Evaluation
Accuracy: 0.2816
Precision: 0.0793
Recall: 0.2816
F1 Score: 0.1237

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       103
     neutral       0.00      0.00      0.00       489
    positive       0.28      1.00      0.44       232

    accuracy                           0.28       824
   macro avg   



Accuracy: 0.2806
Precision: 0.0787
Recall: 0.2806
F1 Score: 0.1230

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        91
     neutral       0.00      0.00      0.00       432
    positive       0.28      1.00      0.44       204

    accuracy                           0.28       727
   macro avg       0.09      0.33      0.15       727
weighted avg       0.08      0.28      0.12       727



Training Epoch 1/20: 100%|██████████| 103/103 [00:18<00:00,  5.59it/s]
Validation Epoch 1/20: 100%|██████████| 26/26 [00:00<00:00, 31.69it/s]


Epoch 1/20 - Train Loss: 57.2880, Val Loss: 6.9907


Training Epoch 2/20: 100%|██████████| 103/103 [00:20<00:00,  4.95it/s]
Validation Epoch 2/20: 100%|██████████| 26/26 [00:01<00:00, 18.62it/s]


Epoch 2/20 - Train Loss: 15.7329, Val Loss: 6.7354


Training Epoch 3/20: 100%|██████████| 103/103 [00:16<00:00,  6.27it/s]
Validation Epoch 3/20: 100%|██████████| 26/26 [00:01<00:00, 19.36it/s]


Epoch 3/20 - Train Loss: 6.1111, Val Loss: 8.8013
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.4033
Precision: 0.1637
Recall: 0.4033
F1 Score: 0.2329

Classification Report:
               precision    recall  f1-score   support

    negative       0.39      0.99      0.56       410
     neutral       0.00      0.00      0.00      1958
    positive       0.41      0.99      0.58       927

    accuracy                           0.40      3295
   macro avg       0.27      0.66      0.38      3295
weighted avg       0.16      0.40      0.23      3295

Validation Set Evaluation
Accuracy: 0.3665
Precision: 0.1490
Recall: 0.3665
F1 Score: 0.2118

Classification Report:
               precision    recall  f1-score   support

    negative       0.35      0.85      0.50       103
     neutral       0.00      0.00      0.00       489
    positive       0.37      0.92      0.53       232

    accuracy                           0.37       824
   macro avg       

(0.35900962861072905,
 0.14543255773967564,
 0.35900962861072905,
 0.20697658288159593)

### 200 Dim

In [None]:
glove_file_path = './glove.6B.200d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_AllAgree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=200)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '200dim_AllAgree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '200dim_AllAgree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  81


Training Epoch 1/20: 100%|██████████| 49/49 [00:17<00:00,  2.81it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 15.70it/s]


NaN detected in validation loss computation.
Epoch 1/20 - Train Loss: 40.4519, Val Loss: 7.7781


Training Epoch 2/20: 100%|██████████| 49/49 [00:19<00:00,  2.47it/s]


NaN detected in loss computation.


Validation Epoch 2/20: 100%|██████████| 13/13 [00:01<00:00,  8.76it/s]


Epoch 2/20 - Train Loss: 35.3517, Val Loss: 8.2022
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2515
Precision: 0.0632
Recall: 0.2515
F1 Score: 0.1011

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       206
     neutral       0.00      0.00      0.00       946
    positive       0.25      1.00      0.40       387

    accuracy                           0.25      1539
   macro avg       0.08      0.33      0.13      1539
weighted avg       0.06      0.25      0.10      1539

Validation Set Evaluation
Accuracy: 0.2519
Precision: 0.0635
Recall: 0.2519
F1 Score: 0.1014

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        52
     neutral       0.00      0.00      0.00       236
    positive       0.25      1.00      0.40        97

    accuracy                           0.25       385
   macro avg    



Accuracy: 0.2529
Precision: 0.0640
Recall: 0.2529
F1 Score: 0.1021

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        45
     neutral       0.00      0.00      0.00       209
    positive       0.25      1.00      0.40        86

    accuracy                           0.25       340
   macro avg       0.08      0.33      0.13       340
weighted avg       0.06      0.25      0.10       340



Training Epoch 1/20: 100%|██████████| 49/49 [00:10<00:00,  4.53it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 16.19it/s]


Epoch 1/20 - Train Loss: 33.5647, Val Loss: 7.2050


Training Epoch 2/20: 100%|██████████| 49/49 [00:07<00:00,  6.22it/s]


NaN detected in loss computation.


Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 30.35it/s]


Epoch 2/20 - Train Loss: 12.2292, Val Loss: 2.8452


Training Epoch 3/20: 100%|██████████| 49/49 [00:07<00:00,  6.18it/s]


NaN detected in loss computation.


Validation Epoch 3/20: 100%|██████████| 13/13 [00:00<00:00, 18.71it/s]


NaN detected in validation loss computation.
Epoch 3/20 - Train Loss: 3.4770, Val Loss: 3.5583
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3840
Precision: 0.1511
Recall: 0.3840
F1 Score: 0.2159

Classification Report:
               precision    recall  f1-score   support

    negative       0.47      1.00      0.64       206
     neutral       0.00      0.00      0.00       946
    positive       0.35      0.99      0.52       387

    accuracy                           0.38      1539
   macro avg       0.27      0.66      0.39      1539
weighted avg       0.15      0.38      0.22      1539

Validation Set Evaluation
Accuracy: 0.3455
Precision: 0.1363
Recall: 0.3455
F1 Score: 0.1947

Classification Report:
               precision    recall  f1-score   support

    negative       0.42      0.90      0.58        52
     neutral       0.00      0.00      0.00       236
    positive       0.31      0.89      0.46        97

    accuracy                

(0.36470588235294116,
 0.1473819705505348,
 0.36470588235294116,
 0.20793998784965265)

In [None]:
glove_file_path = './glove.6B.200d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_75Agree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=200)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '200dim_75Agree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '200dim_75Agree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  81


Training Epoch 1/20: 100%|██████████| 74/74 [00:27<00:00,  2.74it/s]
Validation Epoch 1/20: 100%|██████████| 19/19 [00:01<00:00, 10.89it/s]


Epoch 1/20 - Train Loss: 56.7511, Val Loss: 12.4644


Training Epoch 2/20: 100%|██████████| 74/74 [00:27<00:00,  2.68it/s]
Validation Epoch 2/20: 100%|██████████| 19/19 [00:01<00:00, 15.67it/s]


Epoch 2/20 - Train Loss: 54.3254, Val Loss: 12.5088
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2568
Precision: 0.0660
Recall: 0.2568
F1 Score: 0.1050

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       286
     neutral       0.00      0.00      0.00      1459
    positive       0.26      1.00      0.41       603

    accuracy                           0.26      2348
   macro avg       0.09      0.33      0.14      2348
weighted avg       0.07      0.26      0.10      2348

Validation Set Evaluation
Accuracy: 0.2572
Precision: 0.0662
Recall: 0.2572
F1 Score: 0.1053

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        71
     neutral       0.00      0.00      0.00       365
    positive       0.26      1.00      0.41       151

    accuracy                           0.26       587
   macro avg   



Accuracy: 0.2568
Precision: 0.0659
Recall: 0.2568
F1 Score: 0.1049

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        63
     neutral       0.00      0.00      0.00       322
    positive       0.26      1.00      0.41       133

    accuracy                           0.26       518
   macro avg       0.09      0.33      0.14       518
weighted avg       0.07      0.26      0.10       518



Training Epoch 1/20: 100%|██████████| 74/74 [00:14<00:00,  5.00it/s]
Validation Epoch 1/20: 100%|██████████| 19/19 [00:00<00:00, 29.54it/s]


Epoch 1/20 - Train Loss: 57.7308, Val Loss: 9.4429


Training Epoch 2/20: 100%|██████████| 74/74 [00:13<00:00,  5.54it/s]
Validation Epoch 2/20: 100%|██████████| 19/19 [00:00<00:00, 19.30it/s]


Epoch 2/20 - Train Loss: 16.9761, Val Loss: 8.7012


Training Epoch 3/20: 100%|██████████| 74/74 [00:15<00:00,  4.82it/s]
Validation Epoch 3/20: 100%|██████████| 19/19 [00:00<00:00, 28.49it/s]


Epoch 3/20 - Train Loss: 5.4604, Val Loss: 7.4675


Training Epoch 4/20: 100%|██████████| 74/74 [00:12<00:00,  5.86it/s]
Validation Epoch 4/20: 100%|██████████| 19/19 [00:01<00:00, 18.36it/s]


Epoch 4/20 - Train Loss: 1.5095, Val Loss: 11.7387
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3778
Precision: 0.1540
Recall: 0.3778
F1 Score: 0.2151

Classification Report:
               precision    recall  f1-score   support

    negative       0.58      1.00      0.74       286
     neutral       0.00      0.00      0.00      1459
    positive       0.32      1.00      0.49       603

    accuracy                           0.38      2348
   macro avg       0.30      0.67      0.41      2348
weighted avg       0.15      0.38      0.22      2348

Validation Set Evaluation
Accuracy: 0.3288
Precision: 0.1320
Recall: 0.3288
F1 Score: 0.1835

Classification Report:
               precision    recall  f1-score   support

    negative       0.45      0.68      0.54        71
     neutral       0.00      0.00      0.00       365
    positive       0.30      0.96      0.46       151

    accuracy                           0.33       587
   macro avg      

(0.3436293436293436,
 0.13710786467834074,
 0.3436293436293436,
 0.19215429482776245)

In [None]:
glove_file_path = './glove.6B.200d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_66Agree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=200)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '200dim_66Agree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '200dim_66Agree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  94


Training Epoch 1/20: 100%|██████████| 90/90 [00:37<00:00,  2.38it/s]
Validation Epoch 1/20: 100%|██████████| 23/23 [00:01<00:00, 13.70it/s]


Epoch 1/20 - Train Loss: 66.4920, Val Loss: 14.1887


Training Epoch 2/20: 100%|██████████| 90/90 [00:36<00:00,  2.49it/s]
Validation Epoch 2/20: 100%|██████████| 23/23 [00:02<00:00,  8.85it/s]


Epoch 2/20 - Train Loss: 62.8651, Val Loss: 14.2707
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2769
Precision: 0.0767
Recall: 0.2769
F1 Score: 0.1201

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       350
     neutral       0.00      0.00      0.00      1723
    positive       0.28      1.00      0.43       794

    accuracy                           0.28      2867
   macro avg       0.09      0.33      0.14      2867
weighted avg       0.08      0.28      0.12      2867

Validation Set Evaluation
Accuracy: 0.2775
Precision: 0.0770
Recall: 0.2775
F1 Score: 0.1206

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        87
     neutral       0.00      0.00      0.00       431
    positive       0.28      1.00      0.43       199

    accuracy                           0.28       717
   macro avg   



Accuracy: 0.2765
Precision: 0.0764
Recall: 0.2765
F1 Score: 0.1198

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        77
     neutral       0.00      0.00      0.00       381
    positive       0.28      1.00      0.43       175

    accuracy                           0.28       633
   macro avg       0.09      0.33      0.14       633
weighted avg       0.08      0.28      0.12       633



Training Epoch 1/20: 100%|██████████| 90/90 [00:17<00:00,  5.15it/s]
Validation Epoch 1/20: 100%|██████████| 23/23 [00:01<00:00, 16.73it/s]


Epoch 1/20 - Train Loss: 58.5073, Val Loss: 7.3988


Training Epoch 2/20: 100%|██████████| 90/90 [00:20<00:00,  4.41it/s]
Validation Epoch 2/20: 100%|██████████| 23/23 [00:00<00:00, 26.05it/s]


Epoch 2/20 - Train Loss: 17.1197, Val Loss: 7.5660
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3931
Precision: 0.1601
Recall: 0.3931
F1 Score: 0.2265

Classification Report:
               precision    recall  f1-score   support

    negative       0.49      0.99      0.66       350
     neutral       0.00      0.00      0.00      1723
    positive       0.36      0.98      0.53       794

    accuracy                           0.39      2867
   macro avg       0.28      0.66      0.40      2867
weighted avg       0.16      0.39      0.23      2867

Validation Set Evaluation
Accuracy: 0.3487
Precision: 0.1414
Recall: 0.3487
F1 Score: 0.1995

Classification Report:
               precision    recall  f1-score   support

    negative       0.40      0.74      0.52        87
     neutral       0.00      0.00      0.00       431
    positive       0.33      0.93      0.49       199

    accuracy                           0.35       717
   macro avg      

(0.36018957345971564,
 0.14577191656936145,
 0.36018957345971564,
 0.20640212441680042)

In [None]:
glove_file_path = './glove.6B.200d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_50Agree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=200)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '200dim_50Agree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '200dim_50Agree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  94


Training Epoch 1/20: 100%|██████████| 103/103 [00:50<00:00,  2.05it/s]
Validation Epoch 1/20: 100%|██████████| 26/26 [00:03<00:00,  8.22it/s]


Epoch 1/20 - Train Loss: 78.3121, Val Loss: 16.3678


Training Epoch 2/20: 100%|██████████| 103/103 [00:46<00:00,  2.24it/s]
Validation Epoch 2/20: 100%|██████████| 26/26 [00:01<00:00, 13.30it/s]


Epoch 2/20 - Train Loss: 75.7813, Val Loss: 16.3306


Training Epoch 3/20: 100%|██████████| 103/103 [00:44<00:00,  2.33it/s]
Validation Epoch 3/20: 100%|██████████| 26/26 [00:03<00:00,  8.27it/s]


Epoch 3/20 - Train Loss: 72.3829, Val Loss: 16.2527


Training Epoch 4/20: 100%|██████████| 103/103 [00:44<00:00,  2.29it/s]
Validation Epoch 4/20: 100%|██████████| 26/26 [00:01<00:00, 13.38it/s]


Epoch 4/20 - Train Loss: 72.2477, Val Loss: 16.7799
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2813
Precision: 0.0791
Recall: 0.2813
F1 Score: 0.1235

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       410
     neutral       0.00      0.00      0.00      1958
    positive       0.28      1.00      0.44       927

    accuracy                           0.28      3295
   macro avg       0.09      0.33      0.15      3295
weighted avg       0.08      0.28      0.12      3295

Validation Set Evaluation
Accuracy: 0.2816
Precision: 0.0793
Recall: 0.2816
F1 Score: 0.1237

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       103
     neutral       0.00      0.00      0.00       489
    positive       0.28      1.00      0.44       232

    accuracy                           0.28       824
   macro avg   



Accuracy: 0.2806
Precision: 0.0787
Recall: 0.2806
F1 Score: 0.1230

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        91
     neutral       0.00      0.00      0.00       432
    positive       0.28      1.00      0.44       204

    accuracy                           0.28       727
   macro avg       0.09      0.33      0.15       727
weighted avg       0.08      0.28      0.12       727



Training Epoch 1/20: 100%|██████████| 103/103 [00:27<00:00,  3.69it/s]
Validation Epoch 1/20: 100%|██████████| 26/26 [00:01<00:00, 15.87it/s]


Epoch 1/20 - Train Loss: 67.8905, Val Loss: 8.6803


Training Epoch 2/20: 100%|██████████| 103/103 [00:27<00:00,  3.73it/s]
Validation Epoch 2/20: 100%|██████████| 26/26 [00:00<00:00, 26.69it/s]


Epoch 2/20 - Train Loss: 19.5626, Val Loss: 9.4210
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3997
Precision: 0.1700
Recall: 0.3997
F1 Score: 0.2359

Classification Report:
               precision    recall  f1-score   support

    negative       0.57      0.98      0.72       410
     neutral       0.00      0.00      0.00      1958
    positive       0.35      0.99      0.52       927

    accuracy                           0.40      3295
   macro avg       0.31      0.66      0.41      3295
weighted avg       0.17      0.40      0.24      3295

Validation Set Evaluation
Accuracy: 0.3653
Precision: 0.1537
Recall: 0.3653
F1 Score: 0.2142

Classification Report:
               precision    recall  f1-score   support

    negative       0.48      0.83      0.61       103
     neutral       0.00      0.00      0.00       489
    positive       0.33      0.93      0.49       232

    accuracy                           0.37       824
   macro avg      

(0.3631361760660248,
 0.15263338240588742,
 0.3631361760660248,
 0.21191764846600084)

### 300 Dim

In [None]:
glove_file_path = './glove.6B.300d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_AllAgree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=300)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '300dim_AllAgree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '300dim_AllAgree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  81


Training Epoch 1/20: 100%|██████████| 49/49 [00:20<00:00,  2.43it/s]


NaN detected in loss computation.


Validation Epoch 1/20: 100%|██████████| 13/13 [00:01<00:00, 10.69it/s]


Epoch 1/20 - Train Loss: 39.3277, Val Loss: 8.9601


Training Epoch 2/20: 100%|██████████| 49/49 [00:28<00:00,  1.73it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 14.63it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 38.2387, Val Loss: 7.8583


Training Epoch 3/20: 100%|██████████| 49/49 [00:20<00:00,  2.41it/s]


NaN detected in loss computation.


Validation Epoch 3/20: 100%|██████████| 13/13 [00:01<00:00,  8.90it/s]


NaN detected in validation loss computation.
Epoch 3/20 - Train Loss: 34.3322, Val Loss: 7.8305


Training Epoch 4/20: 100%|██████████| 49/49 [00:17<00:00,  2.85it/s]
Validation Epoch 4/20: 100%|██████████| 13/13 [00:01<00:00,  8.96it/s]


NaN detected in validation loss computation.
Epoch 4/20 - Train Loss: 37.0207, Val Loss: 7.7995


Training Epoch 5/20: 100%|██████████| 49/49 [00:20<00:00,  2.39it/s]
Validation Epoch 5/20: 100%|██████████| 13/13 [00:00<00:00, 15.11it/s]


NaN detected in validation loss computation.
Epoch 5/20 - Train Loss: 34.1907, Val Loss: 7.6823


Training Epoch 6/20: 100%|██████████| 49/49 [00:20<00:00,  2.42it/s]
Validation Epoch 6/20: 100%|██████████| 13/13 [00:01<00:00,  8.75it/s]


Epoch 6/20 - Train Loss: 35.8010, Val Loss: 8.2383
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2515
Precision: 0.0632
Recall: 0.2515
F1 Score: 0.1011

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       206
     neutral       0.00      0.00      0.00       946
    positive       0.25      1.00      0.40       387

    accuracy                           0.25      1539
   macro avg       0.08      0.33      0.13      1539
weighted avg       0.06      0.25      0.10      1539

Validation Set Evaluation
Accuracy: 0.2519
Precision: 0.0635
Recall: 0.2519
F1 Score: 0.1014

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        52
     neutral       0.00      0.00      0.00       236
    positive       0.25      1.00      0.40        97

    accuracy                           0.25       385
   macro avg    



Accuracy: 0.2529
Precision: 0.0640
Recall: 0.2529
F1 Score: 0.1021

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        45
     neutral       0.00      0.00      0.00       209
    positive       0.25      1.00      0.40        86

    accuracy                           0.25       340
   macro avg       0.08      0.33      0.13       340
weighted avg       0.06      0.25      0.10       340



Training Epoch 1/20: 100%|██████████| 49/49 [00:12<00:00,  3.80it/s]
Validation Epoch 1/20: 100%|██████████| 13/13 [00:00<00:00, 16.96it/s]


Epoch 1/20 - Train Loss: 35.8157, Val Loss: 6.4138


Training Epoch 2/20: 100%|██████████| 49/49 [00:12<00:00,  3.99it/s]
Validation Epoch 2/20: 100%|██████████| 13/13 [00:00<00:00, 16.54it/s]


NaN detected in validation loss computation.
Epoch 2/20 - Train Loss: 11.2453, Val Loss: 4.3128


Training Epoch 3/20: 100%|██████████| 49/49 [00:12<00:00,  4.07it/s]
Validation Epoch 3/20: 100%|██████████| 13/13 [00:00<00:00, 26.11it/s]


Epoch 3/20 - Train Loss: 2.6765, Val Loss: 5.8157
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3795
Precision: 0.1477
Recall: 0.3795
F1 Score: 0.2122

Classification Report:
               precision    recall  f1-score   support

    negative       0.44      0.99      0.61       206
     neutral       0.00      0.00      0.00       946
    positive       0.35      0.98      0.52       387

    accuracy                           0.38      1539
   macro avg       0.26      0.66      0.38      1539
weighted avg       0.15      0.38      0.21      1539

Validation Set Evaluation
Accuracy: 0.3351
Precision: 0.1298
Recall: 0.3351
F1 Score: 0.1870

Classification Report:
               precision    recall  f1-score   support

    negative       0.32      0.87      0.47        52
     neutral       0.00      0.00      0.00       236
    positive       0.34      0.87      0.49        97

    accuracy                           0.34       385
   macro avg       

(0.3382352941176471,
 0.13133001974837466,
 0.3382352941176471,
 0.18868389098758773)

In [None]:
glove_file_path = './glove.6B.300d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_75Agree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=300)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '300dim_75Agree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '300dim_75Agree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  81


Training Epoch 1/20: 100%|██████████| 74/74 [00:30<00:00,  2.40it/s]
Validation Epoch 1/20: 100%|██████████| 19/19 [00:02<00:00,  8.55it/s]


Epoch 1/20 - Train Loss: 57.8871, Val Loss: 12.0162


Training Epoch 2/20: 100%|██████████| 74/74 [00:30<00:00,  2.45it/s]
Validation Epoch 2/20: 100%|██████████| 19/19 [00:02<00:00,  8.78it/s]


Epoch 2/20 - Train Loss: 53.3431, Val Loss: 11.7326


Training Epoch 3/20: 100%|██████████| 74/74 [00:29<00:00,  2.47it/s]
Validation Epoch 3/20: 100%|██████████| 19/19 [00:02<00:00,  9.15it/s]


Epoch 3/20 - Train Loss: 57.4253, Val Loss: 11.9124
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2568
Precision: 0.0660
Recall: 0.2568
F1 Score: 0.1050

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       286
     neutral       0.00      0.00      0.00      1459
    positive       0.26      1.00      0.41       603

    accuracy                           0.26      2348
   macro avg       0.09      0.33      0.14      2348
weighted avg       0.07      0.26      0.10      2348

Validation Set Evaluation
Accuracy: 0.2572
Precision: 0.0662
Recall: 0.2572
F1 Score: 0.1053

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        71
     neutral       0.00      0.00      0.00       365
    positive       0.26      1.00      0.41       151

    accuracy                           0.26       587
   macro avg   



Accuracy: 0.2568
Precision: 0.0659
Recall: 0.2568
F1 Score: 0.1049

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        63
     neutral       0.00      0.00      0.00       322
    positive       0.26      1.00      0.41       133

    accuracy                           0.26       518
   macro avg       0.09      0.33      0.14       518
weighted avg       0.07      0.26      0.10       518



Training Epoch 1/20: 100%|██████████| 74/74 [00:16<00:00,  4.44it/s]
Validation Epoch 1/20: 100%|██████████| 19/19 [00:01<00:00, 16.44it/s]


Epoch 1/20 - Train Loss: 43.5276, Val Loss: 9.1867


Training Epoch 2/20: 100%|██████████| 74/74 [00:20<00:00,  3.54it/s]
Validation Epoch 2/20: 100%|██████████| 19/19 [00:01<00:00, 15.72it/s]


Epoch 2/20 - Train Loss: 12.2783, Val Loss: 6.7748


Training Epoch 3/20: 100%|██████████| 74/74 [00:16<00:00,  4.45it/s]
Validation Epoch 3/20: 100%|██████████| 19/19 [00:00<00:00, 24.52it/s]


Epoch 3/20 - Train Loss: 4.0935, Val Loss: 8.7246
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3765
Precision: 0.1623
Recall: 0.3765
F1 Score: 0.2197

Classification Report:
               precision    recall  f1-score   support

    negative       0.68      0.99      0.80       286
     neutral       0.00      0.00      0.00      1459
    positive       0.31      1.00      0.47       603

    accuracy                           0.38      2348
   macro avg       0.33      0.66      0.43      2348
weighted avg       0.16      0.38      0.22      2348

Validation Set Evaluation
Accuracy: 0.3271
Precision: 0.1451
Recall: 0.3271
F1 Score: 0.1885

Classification Report:
               precision    recall  f1-score   support

    negative       0.59      0.65      0.62        71
     neutral       0.00      0.00      0.00       365
    positive       0.29      0.97      0.44       151

    accuracy                           0.33       587
   macro avg       

(0.34555984555984554,
 0.1527216155207126,
 0.34555984555984554,
 0.20052389345867608)

In [None]:
glove_file_path = './glove.6B.300d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_66Agree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=300)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '300dim_66Agree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '300dim_66Agree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  94


Training Epoch 1/20: 100%|██████████| 90/90 [00:45<00:00,  1.96it/s]
Validation Epoch 1/20: 100%|██████████| 23/23 [00:01<00:00, 12.49it/s]


Epoch 1/20 - Train Loss: 66.4759, Val Loss: 14.3403


Training Epoch 2/20: 100%|██████████| 90/90 [00:43<00:00,  2.08it/s]
Validation Epoch 2/20: 100%|██████████| 23/23 [00:03<00:00,  7.63it/s]


Epoch 2/20 - Train Loss: 62.5776, Val Loss: 14.4238
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2769
Precision: 0.0767
Recall: 0.2769
F1 Score: 0.1202

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       350
     neutral       0.00      0.00      0.00      1723
    positive       0.28      1.00      0.43       794

    accuracy                           0.28      2867
   macro avg       0.09      0.33      0.14      2867
weighted avg       0.08      0.28      0.12      2867

Validation Set Evaluation
Accuracy: 0.2775
Precision: 0.0770
Recall: 0.2775
F1 Score: 0.1206

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        87
     neutral       0.00      0.00      0.00       431
    positive       0.28      1.00      0.43       199

    accuracy                           0.28       717
   macro avg   



Accuracy: 0.2765
Precision: 0.0764
Recall: 0.2765
F1 Score: 0.1198

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        77
     neutral       0.00      0.00      0.00       381
    positive       0.28      1.00      0.43       175

    accuracy                           0.28       633
   macro avg       0.09      0.33      0.14       633
weighted avg       0.08      0.28      0.12       633



Training Epoch 1/20: 100%|██████████| 90/90 [00:25<00:00,  3.51it/s]
Validation Epoch 1/20: 100%|██████████| 23/23 [00:01<00:00, 14.67it/s]


Epoch 1/20 - Train Loss: 51.3810, Val Loss: 8.4859


Training Epoch 2/20: 100%|██████████| 90/90 [00:25<00:00,  3.46it/s]
Validation Epoch 2/20: 100%|██████████| 23/23 [00:01<00:00, 13.89it/s]


Epoch 2/20 - Train Loss: 13.8018, Val Loss: 8.1186


Training Epoch 3/20: 100%|██████████| 90/90 [00:23<00:00,  3.79it/s]
Validation Epoch 3/20: 100%|██████████| 23/23 [00:01<00:00, 22.42it/s]


Epoch 3/20 - Train Loss: 4.6731, Val Loss: 11.9302
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.3976
Precision: 0.1589
Recall: 0.3976
F1 Score: 0.2270

Classification Report:
               precision    recall  f1-score   support

    negative       0.38      0.99      0.55       350
     neutral       0.00      0.00      0.00      1723
    positive       0.41      1.00      0.58       794

    accuracy                           0.40      2867
   macro avg       0.26      0.66      0.37      2867
weighted avg       0.16      0.40      0.23      2867

Validation Set Evaluation
Accuracy: 0.3445
Precision: 0.1385
Recall: 0.3445
F1 Score: 0.1975

Classification Report:
               precision    recall  f1-score   support

    negative       0.28      0.77      0.41        87
     neutral       0.00      0.00      0.00       431
    positive       0.38      0.90      0.53       199

    accuracy                           0.34       717
   macro avg      

(0.3617693522906793,
 0.1447658026663404,
 0.3617693522906793,
 0.20657608985817316)

In [None]:
glove_file_path = './glove.6B.300d.txt' # Testing with 50d for speed
embeddings_index = load_glove_embeddings(glove_file_path)

filename = './Sentences_50Agree.txt'
X_train, y_train, X_val, y_val, X_test, y_test, embedding_matrix = finance_preprocessing(filename, embeddings_index, embed_dim=300)

train_data = SentimentDataset(X_train, y_train)
val_data = SentimentDataset(X_val, y_val)
test_data = SentimentDataset(X_test, y_test)


model_run = '300dim_50Agree_G2G'

hidden_size = 81
gru_layers = 2
dropout = 0.75
batch_size = 32
lr=0.01
epochs=20
clip=0.9


train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

encoder = EncoderGRU(hidden_size, embedding_matrix, gru_layers, dropout)
decoder = DecoderGRU(hidden_size, gru_layers, dropout)
g2g_model = GRU2GRU(encoder, decoder)

trained_model = train_model(g2g_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

class_names = ['negative', 'neutral', 'positive']

print('\nGRU2GRU Model:\n')

print("Train Set Evaluation")
evaluate_model(trained_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(trained_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(trained_model, test_loader, device, class_names)

hidden_size = 81
gru_layers = 1
dropout = 0.1
batch_size = 32
lr=0.01
epochs=20
clip=0.9


model_run = model_run = '300dim_50Agree_Basic'

simple_model = SimpleClassifier(hidden_size, embedding_matrix, gru_layers=gru_layers, dropout=dropout)

simple_model = train_model(simple_model, train_data, val_data, batch_size=batch_size, lr=lr, epochs=epochs, clip=clip, model_run=model_run)

print('\nBasic Model:\n')

print("Train Set Evaluation")
evaluate_model(simple_model, train_loader, device, class_names)

print("Validation Set Evaluation")
evaluate_model(simple_model, val_loader, device, class_names)

print("Test Set Evaluation")
evaluate_model(simple_model, test_loader, device, class_names)

Max Sentence length:  94


Training Epoch 1/20: 100%|██████████| 103/103 [00:47<00:00,  2.18it/s]
Validation Epoch 1/20: 100%|██████████| 26/26 [00:03<00:00,  7.68it/s]


Epoch 1/20 - Train Loss: 76.0778, Val Loss: 16.7431


Training Epoch 2/20: 100%|██████████| 103/103 [00:49<00:00,  2.08it/s]
Validation Epoch 2/20: 100%|██████████| 26/26 [00:02<00:00,  9.74it/s]


Epoch 2/20 - Train Loss: 70.7466, Val Loss: 16.2213


Training Epoch 3/20: 100%|██████████| 103/103 [00:52<00:00,  1.95it/s]
Validation Epoch 3/20: 100%|██████████| 26/26 [00:03<00:00,  7.61it/s]


Epoch 3/20 - Train Loss: 71.2302, Val Loss: 15.9998


Training Epoch 4/20: 100%|██████████| 103/103 [00:49<00:00,  2.07it/s]
Validation Epoch 4/20: 100%|██████████| 26/26 [00:02<00:00, 11.96it/s]


Epoch 4/20 - Train Loss: 71.0042, Val Loss: 16.3600
No improvement! Early stopping.

GRU2GRU Model:

Train Set Evaluation
Accuracy: 0.2813
Precision: 0.0791
Recall: 0.2813
F1 Score: 0.1235

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       410
     neutral       0.00      0.00      0.00      1958
    positive       0.28      1.00      0.44       927

    accuracy                           0.28      3295
   macro avg       0.09      0.33      0.15      3295
weighted avg       0.08      0.28      0.12      3295

Validation Set Evaluation
Accuracy: 0.2816
Precision: 0.0793
Recall: 0.2816
F1 Score: 0.1237

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00       103
     neutral       0.00      0.00      0.00       489
    positive       0.28      1.00      0.44       232

    accuracy                           0.28       824
   macro avg   



Accuracy: 0.2806
Precision: 0.0787
Recall: 0.2806
F1 Score: 0.1230

Classification Report:
               precision    recall  f1-score   support

    negative       0.00      0.00      0.00        91
     neutral       0.00      0.00      0.00       432
    positive       0.28      1.00      0.44       204

    accuracy                           0.28       727
   macro avg       0.09      0.33      0.15       727
weighted avg       0.08      0.28      0.12       727



Training Epoch 1/20: 100%|██████████| 103/103 [00:27<00:00,  3.77it/s]
Validation Epoch 1/20: 100%|██████████| 26/26 [00:01<00:00, 14.00it/s]


Epoch 1/20 - Train Loss: 51.2715, Val Loss: 8.9363


Training Epoch 2/20: 100%|██████████| 103/103 [00:26<00:00,  3.89it/s]
Validation Epoch 2/20: 100%|██████████| 26/26 [00:01<00:00, 13.96it/s]


Epoch 2/20 - Train Loss: 16.1854, Val Loss: 9.2881
No improvement! Early stopping.

Basic Model:

Train Set Evaluation
Accuracy: 0.4012
Precision: 0.1686
Recall: 0.4012
F1 Score: 0.2354

Classification Report:
               precision    recall  f1-score   support

    negative       0.54      0.97      0.69       410
     neutral       0.00      0.00      0.00      1958
    positive       0.36      1.00      0.53       927

    accuracy                           0.40      3295
   macro avg       0.30      0.66      0.41      3295
weighted avg       0.17      0.40      0.24      3295

Validation Set Evaluation
Accuracy: 0.3629
Precision: 0.1516
Recall: 0.3629
F1 Score: 0.2114

Classification Report:
               precision    recall  f1-score   support

    negative       0.45      0.76      0.56       103
     neutral       0.00      0.00      0.00       489
    positive       0.34      0.95      0.50       232

    accuracy                           0.36       824
   macro avg      

(0.3452544704264099,
 0.14389813988464853,
 0.3452544704264099,
 0.19986252599942123)

## Conclusion

We see here there are very significant problems with our encoder-decoder model.

Ultimately, we were unable to replicate or improve upon the results of Malo et al. We suspect that in order to improve upon the model that we developed a serious rework would be required adding additional information just as was done in the original paper.
