# LELA32051 Computational Linguistics Week 4

This week we are going to take a close look at 1-layer neural networks, also known as perceptrons. These were introduced to you in abstract in the lecture and in this seminar we are going to look at how they work in reality.

Perceptrons are commonly used as binary classifiers - applying one of two possible labels to input. The example that we are going to look at today is sentiment classification, where we classify a text as having either a "negative" or "positive" perspective on whatever it is discussing, e.g. a product it is reviewing.

Note: the code is heavily based on examples in Chapter 3 of Rao, D., & McMahan, B. (2019). Natural language processing with PyTorch: build intelligent language applications using deep learning. O'Reilly Media, Inc.

In [None]:
!wget https://raw.githubusercontent.com/cbannard/compling23/main/CL_Week_4_Materials/model.pth
!wget https://raw.githubusercontent.com/cbannard/compling23/main/CL_Week_4_Materials/nn_tools.py
!wget https://raw.githubusercontent.com/cbannard/compling23/main/CL_Week_4_Materials/nn_tools2.py
!wget https://raw.githubusercontent.com/cbannard/compling23/main/CL_Week_4_Materials/reviews_with_splits_lite.csv
!wget https://raw.githubusercontent.com/cbannard/compling23/main/CL_Week_4_Materials/vectorizer.json

### Importing modules

The most important thing we are importing here is PyTorch (https://pytorch.org/). This is one of the two widely used neural network/deep learning packages, the other being TensorFlow (https://www.tensorflow.org/). Both are great! We use PyTorch here because we have to choose and it is slightly more intuitive when first encountered.

In [None]:
from argparse import Namespace
from collections import Counter
import json
import os
import re
import string

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm_notebook
from nn_tools import Vocabulary, ReviewVectorizer, ReviewDataset, ReviewClassifier
from nn_tools2 import *

import matplotlib.pyplot as plt
%matplotlib inline

###Organizing code in Python (a very brief intro)



### Functions
The first thing you will notice is that we are starting to define our own functions (https://www.w3schools.com/python/python_functions.asp):


In [None]:
def tell_my_name(name):
  return("My name is " + name)

In [None]:
print(tell_my_name("Colin"))

### Objects

A second thing you will see is that we start to define our own objects (https://www.w3schools.com/python/python_classes.asp):

In [None]:
class Agent:
    def tell_my_name(name):
        return("My name is " + name)

In [None]:
print(Agent.tell_my_name("Colin"))

As well as being a way to organize functions this can be a way to store and group variables

In [None]:
class Agent:
    def __init__(self, name="",workplace=""):
        self.name=name
        self.workplace=workplace
    def introduce_myself(self):
        return("My name is " + self.name + " and I work in " + self.workplace)

In [None]:
agent_colin = Agent("Colin","Manchester")

In [None]:
print(agent_colin.introduce_myself())

## Single Layer Networks in PyTorch


### Defining our Single layer network (aka Perceptron)

We need to define our object type Perceptron. In doing so we make use of perhaps the most powerful property of objects with is inheritance - we define Perceptron as a subclass of the PyTorch object nn.Module and thereby inherit all of the attributes and functions from that object type

In [None]:
class Perceptron(nn.Module):
    """ A Perceptron is one Linear layer """

    def __init__(self, input_dim):
        """
        Args:
            input_dim (int): size of the input features
        """
        super(Perceptron, self).__init__()
        self.fc1 = nn.Linear(input_dim, 1)

    def forward(self, x_in):
        """The forward pass of the MLP

        Args:
            x_in (torch.Tensor): an input data tensor.
                x_in.shape should be (batch, input_dim)
        Returns:
            the resulting tensor. tensor.shape should be (batch, 1)
        """
        return torch.sigmoid(self.fc1(x_in))

## Classifying Yelp Reviews

We have 56000 reviews on Yelp classified as negative (1 or 2 star) or positive (3 or 4 star). We are going to train a classifier using this a part of this data and test its performance on another part.

### Settings and some prep work

We first load in our data, and set some parameters for use in training. We are also going to import a pretrained model (to save time, although there is code for model training below if you want to try this out).

In [None]:
args = Namespace(
    # Data and Path information
    frequency_cutoff=25,
    model_state_file='model.pth',
    review_csv='reviews_with_splits_lite.csv',
    save_dir='.',
    vectorizer_file='vectorizer.json',
    # No Model hyper parameters
    # Training hyper parameters
    batch_size=128,
    early_stopping_criteria=5,
    learning_rate=0.001,
    num_epochs=100,
    seed=1337,
    # Runtime options
    catch_keyboard_interrupt=True,
    cuda=True,
    expand_filepaths_to_save_dir=True,
    reload_from_files=False,
)

if args.expand_filepaths_to_save_dir:
    args.vectorizer_file = os.path.join(args.save_dir,
                                        args.vectorizer_file)

    args.model_state_file = os.path.join(args.save_dir,
                                         args.model_state_file)

    print("Expanded filepaths: ")
    print("\t{}".format(args.vectorizer_file))
    print("\t{}".format(args.model_state_file))

# Check CUDA
if not torch.cuda.is_available():
    args.cuda = False

print("Using CUDA: {}".format(args.cuda))

args.device = torch.device("cuda" if args.cuda else "cpu")

# Set seed for reproducibility
set_seed_everywhere(args.seed, args.cuda)

# handle dirs
handle_dirs(args.save_dir)

In [None]:
reviews = pd.read_csv(args.review_csv)
print(reviews)

### Initializations

In [None]:
if args.reload_from_files:
    # training from a checkpoint
    print("Loading dataset and vectorizer")
    dataset = ReviewDataset.load_dataset_and_load_vectorizer(args.review_csv,
                                                            args.vectorizer_file)
else:
    print("Loading dataset and creating vectorizer")
    # create dataset and vectorizer
    dataset = ReviewDataset.load_dataset_and_make_vectorizer(args.review_csv)
    dataset.save_vectorizer(args.vectorizer_file)
vectorizer = dataset.get_vectorizer()

classifier = ReviewClassifier(num_features=len(vectorizer.review_vocab))

we are going to use what is known as one-hot coding (in statistics it is called dummy coding) for words. For a vocab of size N we have N dimensions. Each dimension has the value of 1 for a single word and zero for all others. For example, the first 50 dimensions for the word "can" look like this:

In [None]:
vectorizer.vectorize('can')[0:50]


Data is then input to the Perceptron in this format and weights learned for each dimension

In [None]:
classifier.fc1.weight.shape

In [None]:
classifier.fc1.weight

### Classifying instances

We now define a function predict_rating which will allow us to assign labels to previously unseen reviews.

In [None]:
def predict_rating(review, classifier, vectorizer, decision_threshold=0.5):
    """Predict the rating of a review

    Args:
        review (str): the text of the review
        classifier (ReviewClassifier): the trained model
        vectorizer (ReviewVectorizer): the corresponding vectorizer
        decision_threshold (float): The numerical boundary which separates the rating classes
    """
    review = preprocess_text(review)

    vectorized_review = torch.tensor(vectorizer.vectorize(review))
    result = classifier(vectorized_review.view(1, -1))

    probability_value = torch.sigmoid(result).item()
    index = 1
    if probability_value < decision_threshold:
        index = 0

    return vectorizer.rating_vocab.lookup_index(index)

In [None]:
test_review = "this is a pretty awesome book"

classifier = classifier.cpu()
prediction = predict_rating(test_review, classifier, vectorizer, decision_threshold=0.5)
print("{} -> {}".format(test_review, prediction))

In [None]:
test_review = "this is a pretty terrible book"

classifier = classifier.cpu()
prediction = predict_rating(test_review, classifier, vectorizer, decision_threshold=0.5)
print("{} -> {}".format(test_review, prediction))

### Run on Test Data

To evaluate overall performance we can run on our test data and compare the ratings with our annotations in order to calculate an accuracy score.

In [None]:
# compute the loss & accuracy on the test set using the best available model
loss_func = nn.BCEWithLogitsLoss()
train_state = make_train_state(args)
classifier.load_state_dict(torch.load(train_state['model_filename']))
classifier = classifier.to(args.device)

dataset.set_split('test')
batch_generator = generate_batches(dataset,
                                   batch_size=args.batch_size,
                                   device=args.device)
running_loss = 0.
running_acc = 0.
classifier.eval()

for batch_index, batch_dict in enumerate(batch_generator):
    # compute the output
    y_pred = classifier(x_in=batch_dict['x_data'].float())

    # compute the loss
    loss = loss_func(y_pred, batch_dict['y_target'].float())
    loss_t = loss.item()
    running_loss += (loss_t - running_loss) / (batch_index + 1)

    # compute the accuracy
    acc_t = compute_accuracy(y_pred, batch_dict['y_target'])
    running_acc += (acc_t - running_acc) / (batch_index + 1)

train_state['test_loss'] = running_loss
train_state['test_acc'] = running_acc

In [None]:
print("Test loss: {:.3f}".format(train_state['test_loss']))
print("Test Accuracy: {:.2f}".format(train_state['test_acc']))

### Interpretability

The simplicity of the Perceptron (it only has 1 layer) means that it is straightforward to interpret by looking at model weights. When models have more layers (or even once we start using representations other than one hot encoding) this becomes much more difficult!

In [None]:
# Sort weights
fc1_weights = classifier.fc1.weight.detach()[0]
_, indices = torch.sort(fc1_weights, dim=0, descending=True)
indices = indices.cpu().numpy().tolist()

# Top 20 words
print("Influential words in Positive Reviews:")
print("--------------------------------------")
for i in range(20):
    print(vectorizer.review_vocab.lookup_index(indices[i]))

print("====\n\n\n")

# Top 20 negative words
print("Influential words in Negative Reviews:")
print("--------------------------------------")
indices.reverse()
for i in range(20):
    print(vectorizer.review_vocab.lookup_index(indices[i]))

### Training Loop

Here, for completeness, is the loop that was used for model training. You can run it, but it will take a while.

If you do this, it will run faster if you switch to using a processor type know as a GPU. You can do this as follows:

Navigate to Edit→Notebook Settings
select GPU from the Hardware Accelerator drop-down

In [None]:
classifier = classifier.to(args.device)

loss_func = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(classifier.parameters(), lr=args.learning_rate)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer,
                                                 mode='min', factor=0.5,
                                                 patience=1)

train_state = make_train_state(args)

epoch_bar = tqdm_notebook(desc='training routine',
                          total=args.num_epochs,
                          position=0)

dataset.set_split('train')
train_bar = tqdm_notebook(desc='split=train',
                          total=dataset.get_num_batches(args.batch_size),
                          position=1,
                          leave=True)
dataset.set_split('val')
val_bar = tqdm_notebook(desc='split=val',
                        total=dataset.get_num_batches(args.batch_size),
                        position=1,
                        leave=True)

try:
    for epoch_index in range(args.num_epochs):
        train_state['epoch_index'] = epoch_index

        # Iterate over training dataset

        # setup: batch generator, set loss and acc to 0, set train mode on
        dataset.set_split('train')
        batch_generator = generate_batches(dataset,
                                           batch_size=args.batch_size,
                                           device=args.device)
        running_loss = 0.0
        running_acc = 0.0
        classifier.train()

        for batch_index, batch_dict in enumerate(batch_generator):
            # the training routine is these 5 steps:

            # --------------------------------------
            # step 1. zero the gradients
            optimizer.zero_grad()

            # step 2. compute the output
            y_pred = classifier(x_in=batch_dict['x_data'].float())

            # step 3. compute the loss
            loss = loss_func(y_pred, batch_dict['y_target'].float())
            loss_t = loss.item()
            running_loss += (loss_t - running_loss) / (batch_index + 1)

            # step 4. use loss to produce gradients
            loss.backward()

            # step 5. use optimizer to take gradient step
            optimizer.step()
            # -----------------------------------------
            # compute the accuracy
            acc_t = compute_accuracy(y_pred, batch_dict['y_target'])
            running_acc += (acc_t - running_acc) / (batch_index + 1)

            # update bar
            train_bar.set_postfix(loss=running_loss,
                                  acc=running_acc,
                                  epoch=epoch_index)
            train_bar.update()

        train_state['train_loss'].append(running_loss)
        train_state['train_acc'].append(running_acc)

        # Iterate over val dataset

        # setup: batch generator, set loss and acc to 0; set eval mode on
        dataset.set_split('val')
        batch_generator = generate_batches(dataset,
                                           batch_size=args.batch_size,
                                           device=args.device)
        running_loss = 0.
        running_acc = 0.
        classifier.eval()

        for batch_index, batch_dict in enumerate(batch_generator):

            # compute the output
            y_pred = classifier(x_in=batch_dict['x_data'].float())

            # step 3. compute the loss
            loss = loss_func(y_pred, batch_dict['y_target'].float())
            loss_t = loss.item()
            running_loss += (loss_t - running_loss) / (batch_index + 1)

            # compute the accuracy
            acc_t = compute_accuracy(y_pred, batch_dict['y_target'])
            running_acc += (acc_t - running_acc) / (batch_index + 1)

            val_bar.set_postfix(loss=running_loss,
                                acc=running_acc,
                                epoch=epoch_index)
            val_bar.update()

        train_state['val_loss'].append(running_loss)
        train_state['val_acc'].append(running_acc)

        train_state = update_train_state(args=args, model=classifier,
                                         train_state=train_state)

        scheduler.step(train_state['val_loss'][-1])

        train_bar.n = 0
        val_bar.n = 0
        epoch_bar.update()

        if train_state['stop_early']:
            break

        train_bar.n = 0
        val_bar.n = 0
        epoch_bar.update()
except KeyboardInterrupt:
    print("Exiting loop")