## Part 2. Model Training & Evaluation - RNN   
Now with the pretrained word embeddings acquired from Part 1 and the dataset acquired from
Part 0, you need to train a deep learning model for sentiment classification using the training set,
conforming to these requirements:


• Use the pretrained word embeddings from Part 1 as inputs; do not update them during training
(they are “frozen”).   

• Design a simple recurrent neural network (RNN), taking the input word embeddings, and
predicting a sentiment label for each sentence. To do that, you need to consider how to
aggregate the word representations to represent a sentence.   

• Use the validation set to gauge the performance of the model for each epoch during training.
You are required to use accuracy as the performance metric during validation and evaluation. 
   
• Use the mini-batch strategy during training. You may choose any preferred optimizer (e.g.,
SGD, Adagrad, Adam, RMSprop). Be careful when you choose your initial learning rate and
mini-batch size. (You should use the validation set to determine the optimal configuration.)
Train the model until the accuracy score on the validation set is not increasing for a few
epochs.
   
• Evaluate your trained model on the test dataset, observing the accuracy score.

In [1]:
import os
import json
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import time
from datasets import load_dataset

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
from common_utils import UNK_TOKEN,EMBEDDING_DIM,SAVE_DIR,VOCAB_PATH,EMBEDDING_MATRIX_PATH,WORD2IDX_PATH,IDX2WORD_PATH,tokenize

## Part 2a
Use the pretrained word embeddings from Part 1 as inputs; do not update them during training (they are “frozen”).  

In [4]:
# Load the embedding
embedding_matrix:np.ndarray = np.load(EMBEDDING_MATRIX_PATH)

# transform the embedding matrix to tensor
embedding_matrix = torch.tensor(embedding_matrix, dtype=torch.float32)

In [5]:
# load supporting dictionaries for encoding
# Load the 'word2idx' mapping
with open(WORD2IDX_PATH, 'r', encoding='utf-8') as f:
    word2idx:dict = json.load(f)
print(f"Mapping 'word2idx' loaded from '{WORD2IDX_PATH}'.")

# Load the 'idx2word' mapping
with open(IDX2WORD_PATH, 'r', encoding='utf-8') as f:
    idx2word:dict = json.load(f)
print(f"Mapping 'idx2word' loaded from '{IDX2WORD_PATH}'.")

Mapping 'word2idx' loaded from './result/word2idx.json'.
Mapping 'idx2word' loaded from './result/idx2word.json'.


## Part 2b
Design a simple recurrent neural network (RNN), taking the input word embeddings, and
predicting a sentiment label for each sentence. To do that, you need to consider how to
aggregate the word representations to represent a sentence.   

In [None]:
# prepare labels for each sentence
start_time = time.time()

dataset = load_dataset("rotten_tomatoes")
train_dataset = dataset['train']
validation_dataset = dataset['validation']
test_dataset = dataset['test']

end_time = time.time()
print(f"Elapsed time to load dataset: {end_time - start_time:.4f} seconds")

# obtain labels for each sentence
y_train = train_dataset["label"]
y_validation = validation_dataset["label"]
y_test = test_dataset["label"]

# split data into tokens
train_tokenized = tokenize(train_dataset)
validation_tokenized = tokenize(validation_dataset)
test_tokenized = tokenize(test_dataset)

Vocabulary Size: 18030
Vocabulary saved to ./result/vocab.json
Vocabulary Size: 5418
Vocabulary saved to ./result/vocab.json
Vocabulary Size: 5456
Vocabulary saved to ./result/vocab.json


In [11]:
# encode the tokens into indices with the word2idx dictionary
def encode_tokens(tokens:list, word2idx:dict) -> list:
    """Encode the tokens into indices with the word2idx dictionary

    :param tokens: List of sentences, where each sentence is a list of tokens
    :type tokens: list
    :param word2idx: Dictionary mapping words to indices
    :type word2idx: dict
    :return: List of sentences, where each sentence is a list of indices
    :rtype: list
    """
    return [[word2idx.get(token, word2idx[UNK_TOKEN]) for token in sentence] for sentence in tokens]

train_encoded = encode_tokens(train_tokenized, word2idx)

In [14]:
print(f"The expected structure of the encoding would be a sequence of int for "
      "each sentence.")
train_encoded[:1]

The expected structure of the encoding would be a sequence of int for each sentence.


[[3994, 7913, 5842, 5842, 5062, 12650, 5062, 10541, 15750]]

In [None]:
# paddding for the model as each sentence has different length, this is to
# ensure that there is a consistent input dim for model

In [None]:
print("\t\t\tFeatures Shapes:")
print("Train set: \t\t{}".format(train_x.shape),
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))

In [None]:
import torch
from torch.utils.data import TensorDataset, DataLoader

# create Tensor datasets
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
valid_data = TensorDataset(torch.from_numpy(val_x), torch.from_numpy(val_y))
test_data = TensorDataset(torch.from_numpy(test_x), torch.from_numpy(test_y))

# dataloaders
batch_size = 50

# make sure to SHUFFLE your data
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
valid_loader = DataLoader(valid_data, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)

## Part 2e

In [None]:
def train(dataloader, model, criterion, optimizer, device):
    model.train()
    epoch_losses = []
    epoch_accs = []
    for batch in tqdm.tqdm(dataloader, desc="training..."):
        ids = batch["ids"].to(device)
        length = batch["length"]
        label = batch["label"].to(device)
        prediction = model(ids, length)
        loss = criterion(prediction, label)
        accuracy = get_accuracy(prediction, label)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        epoch_losses.append(loss.item())
        epoch_accs.append(accuracy.item())
    return np.mean(epoch_losses), np.mean(epoch_accs)

In [None]:
def evaluate(dataloader, model, criterion, device):
    model.eval()
    epoch_losses = []
    epoch_accs = []
    with torch.no_grad():
        for batch in tqdm.tqdm(dataloader, desc="evaluating..."):
            ids = batch["ids"].to(device)
            length = batch["length"]
            label = batch["label"].to(device)
            prediction = model(ids, length)
            loss = criterion(prediction, label)
            accuracy = get_accuracy(prediction, label)
            epoch_losses.append(loss.item())
            epoch_accs.append(accuracy.item())
    return np.mean(epoch_losses), np.mean(epoch_accs)

In [None]:
def get_accuracy(prediction, label):
    batch_size, _ = prediction.shape
    predicted_classes = prediction.argmax(dim=-1)
    correct_predictions = predicted_classes.eq(label).sum()
    accuracy = correct_predictions / batch_size
    return accuracy

In [None]:
n_epochs = 10
best_valid_loss = float("inf")

metrics = collections.defaultdict(list)

for epoch in range(n_epochs):
    train_loss, train_acc = train(
        train_data_loader, model, criterion, optimizer, device
    )
    valid_loss, valid_acc = evaluate(valid_data_loader, model, criterion, device)
    metrics["train_losses"].append(train_loss)
    metrics["train_accs"].append(train_acc)
    metrics["valid_losses"].append(valid_loss)
    metrics["valid_accs"].append(valid_acc)
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), "lstm.pt")
    print(f"epoch: {epoch}")
    print(f"train_loss: {train_loss:.3f}, train_acc: {train_acc:.3f}")
    print(f"valid_loss: {valid_loss:.3f}, valid_acc: {valid_acc:.3f}")

### Question 2. RNN
(a) Report the final configuration of your best model, namely the number of training epochs,
learning rate, optimizer, batch size.   

(b) Report the accuracy score on the test set, as well as the accuracy score on the validation
set for each epoch during training.   

(c) RNNs produce a hidden vector for each word, instead of the entire sentence. Which methods
have you tried in deriving the final sentence representation to perform sentiment classification?
Describe all the strategies you have implemented, together with their accuracy scores on the
test set.