# CNN model for Sentiment Classification

## Introduction

In this notebook, we approach text classification problem with a CNN model. The dataset used for this project can be found [here](https://github.com/andreasceid/sentiment_classification/blob/main/dataset/MoviesDataset.csv). The accuracy of the defined model on this dataset is approximately 77%. However, the model was also tested with the IMDB dataset which can be found [here](https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews). The accuracy improved over the IMDB dataset and reached approximately 91%. The variable to be classified is the sentiment of the given critics. The critics are separated in *good* and *bad*, which means that the classification is binary. Let's explore the proposed model!

### Import the Required Libraries

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

from torchtext import data

from nltk import word_tokenize
from collections import Counter
from tqdm.notebook import tnrange, tqdm

import re
import time
import math
import spacy
import numpy
import pandas
import random
import warnings
import matplotlib.pyplot as plt

## Define the Model

The model is composed of:
* An Embedding Layer.
* (Four) 2D Convolutional Layers.
* A Dropout Layer.
* A Fully Connected Layer.

The Embedding Layer is used as a lookup table for each token. Therefore, because the critics consist of many words, a matrix is shaped with those word vectors. This matrix can is our *image*. On that image, we apply the filters of the four *Convolutional* layers. The filtered results are then used as an input for a *Fully Connected* layer that implements the (binary) sentiment classification. To regularize the model, we also apply a *Dropout* layer. 

![Embedding Demo](https://miro.medium.com/max/501/1*A094Vuq3OiLFVD2ogxUS7Q.gif)

In [2]:
class CNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, dropout, pad_idx):
        # extends the functionality of this method
        super(CNN, self).__init__()
        # defines an embedding layer
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=pad_idx)
        # freezes the embedding layer
        self.embedding.requires_grad = False
        # applies convolution over the input signal
        self.convs_1d = nn.ModuleList([nn.Conv2d(1, n_filters, (k, embedding_dim), padding=(k - 2, 0)) for k in filter_sizes])
        # applies linear transformation to the convolved data
        self.fc = nn.Linear(len(filter_sizes) * n_filters, output_dim)
        # regularizes and prevents the co-adaptation of neurons
        self.dropout = nn.Dropout(dropout)

    @staticmethod
    def conv_and_pool(x, conv):
        x = F.relu(conv(x)).squeeze(3)
        x_max = F.max_pool1d(x, x.size(2)).squeeze(2)
        return x_max

    def forward(self, x):
        # embedded vectors of: (batch_size, seq_length, embedding_dim)
        embeds = self.embedding(x)
        # creates a fourth dimension for the convolutional module list
        embeds = embeds.unsqueeze(1)
        # gets output of each convolutional layer
        conv_results = [self.conv_and_pool(embeds, conv) for conv in self.convs_1d]
        # concatenates results
        x = torch.cat(conv_results, 1)
        # add dropout
        x = self.dropout(x)
        # fully connected layer that yields a float tensor of size equal to the batch size
        logit = self.fc(x)
        return logit

## Preprocess

In this project, the dataset provided for *Sentiment Classification* is already preprocessed. There are no capital letters and any punctuation marks are already removed. Therefore, the methods defined below do not increase the model's accuracy. However, if one uses the IMDB dataset for sentiment classification, then these methods might actually boost the model's accuracy.

In [3]:
def nlp_preprocessor(text):
    text = re.sub('<[^>]*>', '', text)
    emoticons = re.findall('(?::|;|=)(?:-)?[)(DP]', text)
    text = re.sub('[\W]+', ' ', text.lower()) + ' '.join(emoticons).replace('-', '')
    return text

In [4]:
def dataset_preprocessor(df, column, filepath):
    # apply the preprocessor to the dataframe
    df[column] = df[column].apply(nlp_preprocessor)
    # save data
    df.to_csv(filepath, index=False)

## Split the Dataset

The train-validate-test split is a technique for evaluating the performance of a machine learning model. It can be used for any supervised learning algorithm. The procedure involves taking a dataset and dividing it into three subsets:
* Training Dataset: The sample of data used to fit the model.
* Validation Dataset: The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.
* Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.

With this technique, we prevent look-ahead bias, overfitting and underfitting:
* Look-ahead bias: Building a model based on data that is not supposed to be known.
* Overfitting: This is the process of designing a model that adapts so closely to historical data that it becomes ineffective in the future.
* Underfitting: This is the process of designing a model that adapts so loosely to historical data that it becomes ineffective in the future.

Here, we illustrate the process of the train-validate-test split technique 

![train-validate-test Demo](https://miro.medium.com/max/1200/1*HEe_oHZHToY8oD1RoShHGg.png)

In [5]:
def train_validate_test_split(df, seed, train_percent=.7, validate_percent=.1):
    # shuffle the given dataframe indexes
    shuffled = numpy.random.RandomState(seed).permutation(df.index)
    # get the number of rows inside the dataframe
    data_length = len(df.index)
    # compute the number of rows for the training dataset
    train_end = int(train_percent * data_length)
    # make the training dataset size divide perfectly the batch size
    train_end = int(train_end/BATCH_SIZE) * BATCH_SIZE + BATCH_SIZE
    # compute the number of rows for the validation dataset
    validate_end = int(validate_percent * data_length) + train_end
    # make the validation dataset size divide perfectly the batch size
    validate_end = int(validate_end / BATCH_SIZE) * BATCH_SIZE + BATCH_SIZE
    # make the test dataset size divide perfectly the batch size
    test_end = int(data_length / BATCH_SIZE) * BATCH_SIZE
    # set the training dataset
    train_df = df.iloc[shuffled[:train_end]]
    # set the validation dataset
    valid_df = df.iloc[shuffled[train_end:validate_end]]
    # set the test dataset
    test_df = df.iloc[shuffled[validate_end:test_end]]
    # save the training dataset
    train_df.to_csv('train_df.csv', index=False)
    # save the validation dataset
    valid_df.to_csv('valid_df.csv', index=False)
    # save the test dataset
    test_df.to_csv('test_df.csv', index=False)

## Vocabulary inspection

In the given dataset, there are tokens that are found only a few times. Those tokens have to be excluded from the classification task. The reason is that, the classifier will not be able to find a pattern due to insufficient data on the given token. On this matter, there are vocabulary preprocessing techniques:
* Subsampling frequent words
* Deleting rare words

We combine the $2$ methods and come up with a filtering formula:

$$
p \; = \; 1 \; - \; \sqrt{\frac{t}{f}}
$$
where:
* $p$ is a factor that determines whether a rare word is to be included in the dictonary or not. If $p$ is greater than $0.5$, then there are enough data on that word and therefore it shall be included in the vocabulary.
* $t$ is a threshold that based on [research](https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00134) is initialized at $10^{-5}$.
* $f$ is the frequency of the token. The frequency is equal to $\frac{\text{number of token appearences}}{\text{number of unique tokens}}$.

With this method, the *context window* size increases. The context window is the area around a vertex that is used to encode an embedding. This usually provided slightly better results regarding the preprocessed word embeddings.

In [6]:
def inspect_vocab(df):
    # initialize vocabulary size register
    unique_count = 0
    # get the text column loaded in a pandas Series
    texts = df.Summary.str.lower()
    # get a dictionary with the count of each token in that pandas Series
    word_counts = Counter(word_tokenize('\n'.join(texts)))
    # get the total token sum
    total_token_count = sum(word_counts.values())
    # get the unique token sum
    final_count = len(word_counts)
    # initialize threshold constant
    threshold = 1e-5
    # use the subsampling formula to estimate the vocabulary size
    for token_freq in word_counts.values():
        if 1 - math.sqrt(threshold / (token_freq / total_token_count)) > 0.5:
            unique_count += 1
    return unique_count, final_count

In [7]:
def compute_vocab_size():
    if subsampling:
        return vocab_subsampled
    else:
        return token_count

## Utilities

Throughout the notebook, we use some helper methods:
* We define a method that counts all the trainable parameters found in the model. This helps us estimate the performance of the model. The more trainable parameters a model has, the worse performance it usually has. Threfore, we prefer lightweight models.
* We define a method that estimates the model's accuracy in binary classification task.
* We define a method that finds the maximum sentence length found in the given dataset. This is useful when the *text preprocessor* adds *padding* to sentences. However, for this dataset (and for the IMDB) dataset, there is no such need. 

In [8]:
def count_parameters():
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

In [9]:
def binary_accuracy(preds, y):
    # use the sigmoid to round the predictions
    rounded_preds = torch.round(torch.sigmoid(preds))
    # count the correct predictions by comparing them to the ground truth tensor
    correct = (rounded_preds == y).float()
    # compute the accuracy of the model
    acc = correct.sum() / len(correct)
    # return the accuracy
    return acc

In [10]:
def get_max_length(df):
    # initializes maximum sentence length register
    max_len = 0
    # iterates the Summary column of the given dataframe
    for text in df.Summary:
        # checks length of the "running" sentence
        if len(text.split()) > max_len:
            # update maximum sentence length register
            max_len = len(text.split())
    return max_len

## Fit the model

To fit the model, there are $2$ methods defined. The first method trains the model and the second evaluates it. There are also additional methods that:
* Estimate each epoch's duration
* Plot model's accuracy

In [11]:
def train(iterator):
    # initializes epoch loss accumulator
    epoch_loss = 0
    # initializes epoch accuracy accumulator
    epoch_acc = 0
    # sets the module in training mode
    model.train()
    for batch in tqdm(iterator, desc="Train"):
        # set the gradients to zero
        optimizer.zero_grad()
        # make predictions
        predictions = model(batch.Summary).squeeze(1)
        # compute loss
        loss = criterion(predictions, batch.Sentiment.squeeze(0))
        # compute accuracy
        acc = binary_accuracy(predictions, batch.Sentiment.squeeze(0))
        # store the gradients
        loss.backward()
        # parameter update based on the current gradients
        optimizer.step()
        # update epoch loss accumulator
        epoch_loss += loss.item()
        # update epoch accuracy accumulator
        epoch_acc += acc.item()

    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [12]:
def evaluate(iterator):
    # initializes epoch loss accumulator
    epoch_loss = 0
    # initializes epoch accuracy accumulator
    epoch_acc = 0
    # sets the module in evaluation mode
    model.eval()
    # disables gradient calculation
    with torch.no_grad():
        for batch in tqdm(iterator, desc="Validate"):
            # make predictions
            predictions = model(batch.Summary).squeeze(1)
            # compute loss
            loss = criterion(predictions, batch.Sentiment.squeeze(0))
            # compute accuracy
            acc = binary_accuracy(predictions, batch.Sentiment.squeeze(0))
            # update epoch loss accumulator
            epoch_loss += loss.item()
            # update epoch accuracy accumulator
            epoch_acc += acc.item()

    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [13]:
def epoch_time():
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

In [14]:
def plot_loss_and_accuracy():
    plt.plot(train_losses, label="Training loss")
    plt.plot(val_losses, label="Validation loss")
    plt.legend()
    plt.title("Losses")
    plt.savefig("model-train_valid_losses.png", dpi=300, bbox_inches='tight', pad_inches=0.1)

## Test the model with custom critics

We also define methods which use custom defined critics for sentiment classification. We use a method to define the custom critics. Then we use a method that calls the model upon each custom critic and returns the prediction on that critic. Finally, we use a method that outputs the result to the user.

In [15]:
def predict_sentiment(sentence, min_len=5):
    # load natural language processor
    nlp = spacy.load('en')
    # set the module in evaluation mode
    model.eval()
    # tokenize given text using the defined processor
    tokenized = [tok.text for tok in nlp.tokenizer(sentence)]
    # pad the sentence if it has less tokens than required
    if len(tokenized) < min_len:
        tokenized += ['<pad>'] * (min_len - len(tokenized))
    # convert tokens to embeddings using the fit torchtext data field
    indexed = [TEXT.vocab.stoi[t] for t in tokenized]
    # convert embedding list to torch tensor and load it to the available device
    tensor = torch.LongTensor(indexed).to(device)
    # unsqueeze tensor to make it 2D
    tensor = tensor.unsqueeze(0)
    # filter prediction using sigmoid
    prediction = F.sigmoid(model(tensor))
    return prediction.item()

In [16]:
def filter_prediction(prediction, critic):
    message = "negative"
    if prediction < 0.5:
        message = "positive"
        prediction = 1 - prediction
    print('Label for critic {:25s}: {:7s}\t-\tPrediction validity probability: {:10f}'.format('\"'+critic+'\"', message, prediction))

In [17]:
def manual_testing():
    x_critic = "This film is terrible"
    y_pred = predict_sentiment(x_critic)
    filter_prediction(y_pred, x_critic)
    x_critic = "This film is great"
    y_pred = predict_sentiment(x_critic)
    filter_prediction(y_pred, x_critic)
    x_critic = "I loved this film"
    y_pred = predict_sentiment(x_critic)
    filter_prediction(y_pred, x_critic)

## Module Binding

We now bind all the methods defined above to perform (binary) sentiment classification.

### Deactivate any user Deprecation warnings

In [18]:
# disable warnings
warnings.filterwarnings("ignore")

### Seed random generators

In [19]:
# define a seed for the randomizers
SEED = 42
# seed random package
random.seed(SEED)
# seed numpy
numpy.random.seed(SEED)
# seed pytorch
torch.manual_seed(SEED)
# make program controllability easier
torch.backends.cudnn.deterministic = True

### Load Natural Language Processor

In [20]:
# load English package of spacy package
spacy.load('en')

<spacy.lang.en.English object at 0x000002D10B0D4AC0>

### Search for a GPU for faster model training

In [21]:
# check for any CUDA device available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

### Define path to save trained model

In [22]:
# define filepath to dave model
model_filepath = 'cnn-model.pt'

### Load and preprocess the dataset

In [23]:
# defines the input's filepath
dataset_filepath = "C:/Users/andreas/PycharmProjects/sentiment_classification/dataset/MoviesDataset.csv"
# load the dataset
dataset = pandas.read_csv(dataset_filepath)

In [24]:
# dataset preprocessed
dataset_preprocessor(dataset, 'Summary', "C:/Users/andreas/PycharmProjects/sentiment_classification/dataset/MoviesDatasetPreprocessed.csv")
# reload dataset after preprocessing
dataset = pandas.read_csv("C:/Users/andreas/PycharmProjects/sentiment_classification/dataset/MoviesDatasetPreprocessed.csv")

In [25]:
# inspect vocabulary
vocab_subsampled, token_count = inspect_vocab(dataset)
# set subsampling flag
subsampling = True
# set vocabulary size
vocab_size = compute_vocab_size()

### Split the dataset

In [26]:
# define batch size
BATCH_SIZE = 32
# split the given dataset
train_validate_test_split(dataset, SEED)

### Declare the pretrained word embeddings

In [27]:
# define torchtext data text field
TEXT = data.Field(tokenize='spacy', batch_first=True)
# define torchtext data label field
LABEL = data.Field(dtype=torch.float, unk_token=None, pad_token=None)
# associate defined fields with DataFrame columns
fields = [('Summary', TEXT), ('Sentiment', LABEL)]
# define a dataset of columns stored in CSV
train_data, valid_data, test_data = data.TabularDataset.splits(
    path='./',
    train='train_df.csv',
    validation='valid_df.csv',
    test='test_df.csv',
    format='csv',
    fields=fields,
    skip_header=True
)

In [28]:
# construct the Vocab object for the TEXT field
TEXT.build_vocab(train_data, valid_data, test_data,
                 max_size=vocab_size,
                 vectors="glove.6B.100d",
                 unk_init=torch.Tensor.normal_)
# construct the Vocab object for the LABEL field
LABEL.build_vocab(train_data)

In [29]:
# define an iterator that batches the training dataset object
train_iterator = data.BucketIterator(
    train_data,
    batch_size=BATCH_SIZE,
    device=device,
)
# define an iterator that batches the validation dataset object
valid_iterator = data.BucketIterator(
    valid_data,
    batch_size=BATCH_SIZE,
    device=device,
)
# define an iterator that batches the test dataset object
test_iterator = data.BucketIterator(
    test_data,
    batch_size=BATCH_SIZE,
    device=device,
)

### Initialize the model

In [30]:
# define the size of the dictionary of embeddings
INPUT_DIM = len(TEXT.vocab)
# define the size of each embedding vector
EMBEDDING_DIM = 100
# define the number of channels produced by each convolution
N_FILTERS = 64
# define the size of the first dimension of the kernel of each convolutional layer
FILTER_SIZES = [2, 3, 4, 5]
# define the number of neurons in the output layer of the model
OUTPUT_DIM = 1
# define the probability of an element to be zeroed
DROPOUT = 0.3
# return the index of the string token used as padding
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]
# return the index of the string token used to represent Out-Of-Vocabulary words
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]

In [31]:
# define a CNN model
model = CNN(INPUT_DIM, EMBEDDING_DIM, N_FILTERS, FILTER_SIZES, OUTPUT_DIM, DROPOUT, PAD_IDX)

In [32]:
# print model summary
print(model)

CNN(
  (embedding): Embedding(4250, 100, padding_idx=1)
  (convs_1d): ModuleList(
    (0): Conv2d(1, 64, kernel_size=(2, 100), stride=(1, 1))
    (1): Conv2d(1, 64, kernel_size=(3, 100), stride=(1, 1), padding=(1, 0))
    (2): Conv2d(1, 64, kernel_size=(4, 100), stride=(1, 1), padding=(2, 0))
    (3): Conv2d(1, 64, kernel_size=(5, 100), stride=(1, 1), padding=(3, 0))
  )
  (fc): Linear(in_features=256, out_features=1, bias=True)
  (dropout): Dropout(p=0.3, inplace=False)
)


In [33]:
# print model trainable parameters
print(f'The model has {count_parameters():,} trainable parameters')

The model has 515,113 trainable parameters


In [34]:
# extract pretrained vectors
pretrained_embeddings = TEXT.vocab.vectors
# copy pretrained vectors to the embedding layer of the defined model
model.embedding.weight.data.copy_(pretrained_embeddings)
# set the weight of the <pad> token
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)
# set the weight of the <unk> token
model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)

### Initialize model's hyperparameters

In [35]:
# define optimizer
optimizer = optim.Adam(model.parameters(), lr=1e-4)
# define cost function
criterion = nn.BCEWithLogitsLoss(pos_weight=torch.ones([BATCH_SIZE]))
# load model to the available device
model = model.to(device)
# load cost function to the available device
criterion = criterion.to(device)
# define number of epochs for the model's training
N_EPOCHS = 20

### Fit the model

In [36]:
# initialize a register that holds the best validation cost returned during an epoch
best_valid_loss = float('inf')
# declare the train and validation loss lists
train_losses, val_losses = [], []

In [37]:
# fit the model
for epoch in tnrange(N_EPOCHS, desc='Fit'):
    # initialize an epoch starting time-point
    start_time = time.time()
    # train the model
    train_loss, train_acc = train(train_iterator)
    # update the train loss list
    train_losses.append(train_acc)
    # validate the model
    valid_loss, valid_acc = evaluate(valid_iterator)
    # update the validation loss list
    val_losses.append(valid_acc)
    # initialize an epoch ending time-point
    end_time = time.time()
    # compute epoch duration in minutes and seconds
    epoch_mins, epoch_secs = epoch_time()
    # save the model if validation loss was better than past validation losses
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), model_filepath)
    # print epoch's progress results
    print(f'\nEpoch: {epoch + 1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc * 100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc * 100:.2f}%\n')

HBox(children=(HTML(value='Fit'), FloatProgress(value=0.0, max=20.0), HTML(value='')))

HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 01 | Epoch Time: 0m 3s
	Train Loss: 0.675 | Train Acc: 58.43%
	 Val. Loss: 0.649 |  Val. Acc: 69.39%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 02 | Epoch Time: 0m 2s
	Train Loss: 0.628 | Train Acc: 68.70%
	 Val. Loss: 0.610 |  Val. Acc: 70.96%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 03 | Epoch Time: 0m 2s
	Train Loss: 0.584 | Train Acc: 71.89%
	 Val. Loss: 0.582 |  Val. Acc: 70.59%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 04 | Epoch Time: 0m 2s
	Train Loss: 0.545 | Train Acc: 74.47%
	 Val. Loss: 0.553 |  Val. Acc: 73.16%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 05 | Epoch Time: 0m 2s
	Train Loss: 0.512 | Train Acc: 76.31%
	 Val. Loss: 0.536 |  Val. Acc: 74.54%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 06 | Epoch Time: 0m 2s
	Train Loss: 0.481 | Train Acc: 78.04%
	 Val. Loss: 0.525 |  Val. Acc: 75.00%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 07 | Epoch Time: 0m 2s
	Train Loss: 0.458 | Train Acc: 79.46%
	 Val. Loss: 0.514 |  Val. Acc: 75.55%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 08 | Epoch Time: 0m 2s
	Train Loss: 0.437 | Train Acc: 80.45%
	 Val. Loss: 0.509 |  Val. Acc: 75.46%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 09 | Epoch Time: 0m 2s
	Train Loss: 0.416 | Train Acc: 81.82%
	 Val. Loss: 0.503 |  Val. Acc: 76.19%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 10 | Epoch Time: 0m 2s
	Train Loss: 0.396 | Train Acc: 83.04%
	 Val. Loss: 0.501 |  Val. Acc: 75.64%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 11 | Epoch Time: 0m 2s
	Train Loss: 0.379 | Train Acc: 84.07%
	 Val. Loss: 0.499 |  Val. Acc: 76.29%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 12 | Epoch Time: 0m 2s
	Train Loss: 0.361 | Train Acc: 84.87%
	 Val. Loss: 0.496 |  Val. Acc: 76.38%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 13 | Epoch Time: 0m 2s
	Train Loss: 0.345 | Train Acc: 85.72%
	 Val. Loss: 0.496 |  Val. Acc: 76.47%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 14 | Epoch Time: 0m 2s
	Train Loss: 0.327 | Train Acc: 86.82%
	 Val. Loss: 0.495 |  Val. Acc: 77.02%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 15 | Epoch Time: 0m 2s
	Train Loss: 0.315 | Train Acc: 87.55%
	 Val. Loss: 0.495 |  Val. Acc: 76.47%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 16 | Epoch Time: 0m 2s
	Train Loss: 0.298 | Train Acc: 88.78%
	 Val. Loss: 0.497 |  Val. Acc: 77.67%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 17 | Epoch Time: 0m 2s
	Train Loss: 0.285 | Train Acc: 89.28%
	 Val. Loss: 0.500 |  Val. Acc: 77.11%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 18 | Epoch Time: 0m 2s
	Train Loss: 0.266 | Train Acc: 89.65%
	 Val. Loss: 0.500 |  Val. Acc: 77.11%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 19 | Epoch Time: 0m 2s
	Train Loss: 0.258 | Train Acc: 90.56%
	 Val. Loss: 0.505 |  Val. Acc: 76.29%



HBox(children=(HTML(value='Train'), FloatProgress(value=0.0, max=234.0), HTML(value='')))






HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=34.0), HTML(value='')))





Epoch: 20 | Epoch Time: 0m 2s
	Train Loss: 0.240 | Train Acc: 90.96%
	 Val. Loss: 0.504 |  Val. Acc: 76.56%




In [38]:
# plot model's fitting data
plot_loss_and_accuracy()

### Reload the model

In [39]:
# load the best evaluated model
model.load_state_dict(torch.load(model_filepath, map_location=device))

<All keys matched successfully>

### Test the model

In [40]:
# test the model
test_loss, test_acc = evaluate(test_iterator)
# print test results
print(f'\nTest Loss: {test_loss:.3f} | Test Acc: {test_acc * 100:.2f}%')

HBox(children=(HTML(value='Validate'), FloatProgress(value=0.0, max=65.0), HTML(value='')))





Test Loss: 0.519 | Test Acc: 74.57%


In [41]:
# test the model over custom critics
manual_testing()

Label for critic "This film is terrible"  : negative	-	Prediction validity probability:   0.826478
Label for critic "This film is great"     : positive	-	Prediction validity probability:   0.796404
Label for critic "I loved this film"      : positive	-	Prediction validity probability:   0.837794
