# 1. Data loading and preparation

As we mentioned the very first step is to prepare the data in a way that: (1) is understood by our predictive model (neural net in our case), and (2) let us evaluate our progress.

For the purpose of the workshop, we have created an easy way to download, process, and split the data for training.

Let's go!

In [178]:
# First we need to import torchtext, which is the utility we used for
# working with the IMDB dataset for sentiment classification
from torchtext import data
from torchtext import datasets

## Data structure

Our raw data, contains examples of the form: (1) input text sequence, and (2) label, which can be 0 for negative reviews or 1 for positive reviews.

Let's reflect these two fields:

In [179]:
# Our first field contains raw text, we tokenize this text using
# spaCy, a widely-used Python library for NLP
SEQUENCE_LENGTH = 200
TEXT = data.Field(tokenize='spacy',fix_length=SEQUENCE_LENGTH)

In [180]:
# Our second field contains the expected labels for each piece of text
LABEL = data.Field(sequential=False,
                  use_vocab=False,
                  preprocessing=(lambda s: int(s)))

## Loading the dataset

Now, using these two fields we can read the dataset and attach the data points to our fields, and the .splits() will create the splits for us: in this case we will work only with test and training.


In [181]:
train, dev = datasets.IMDB.splits(TEXT, LABEL, root='.')


Creating dataset splits


In [182]:
# Now let's see some examples in our train split
train[0].text

['Story',
 'of',
 'a',
 'man',
 'who',
 'has',
 'unnatural',
 'feelings',
 'for',
 'a',
 'pig',
 '.',
 'Starts',
 'out',
 'with',
 'a',
 'opening',
 'scene',
 'that',
 'is',
 'a',
 'terrific',
 'example',
 'of',
 'absurd',
 'comedy',
 '.',
 'A',
 'formal',
 'orchestra',
 'audience',
 'is',
 'turned',
 'into',
 'an',
 'insane',
 ',',
 'violent',
 'mob',
 'by',
 'the',
 'crazy',
 'chantings',
 'of',
 'it',
 "'s",
 'singers',
 '.',
 'Unfortunately',
 'it',
 'stays',
 'absurd',
 'the',
 'WHOLE',
 'time',
 'with',
 'no',
 'general',
 'narrative',
 'eventually',
 'making',
 'it',
 'just',
 'too',
 'off',
 'putting',
 '.',
 'Even',
 'those',
 'from',
 'the',
 'era',
 'should',
 'be',
 'turned',
 'off',
 '.',
 'The',
 'cryptic',
 'dialogue',
 'would',
 'make',
 'Shakespeare',
 'seem',
 'easy',
 'to',
 'a',
 'third',
 'grader',
 '.',
 'On',
 'a',
 'technical',
 'level',
 'it',
 "'s",
 'better',
 'than',
 'you',
 'might',
 'think',
 'with',
 'some',
 'good',
 'cinematography',
 'by',
 'future',


In [183]:
len(dev)

25000

In [184]:
#train[0].label # First example has negative sentiment
dev[0].text

['Once',
 'again',
 'Mr.',
 'Costner',
 'has',
 'dragged',
 'out',
 'a',
 'movie',
 'for',
 'far',
 'longer',
 'than',
 'necessary',
 '.',
 'Aside',
 'from',
 'the',
 'terrific',
 'sea',
 'rescue',
 'sequences',
 ',',
 'of',
 'which',
 'there',
 'are',
 'very',
 'few',
 'I',
 'just',
 'did',
 'not',
 'care',
 'about',
 'any',
 'of',
 'the',
 'characters',
 '.',
 'Most',
 'of',
 'us',
 'have',
 'ghosts',
 'in',
 'the',
 'closet',
 ',',
 'and',
 'Costner',
 "'s",
 'character',
 'are',
 'realized',
 'early',
 'on',
 ',',
 'and',
 'then',
 'forgotten',
 'until',
 'much',
 'later',
 ',',
 'by',
 'which',
 'time',
 'I',
 'did',
 'not',
 'care',
 '.',
 'The',
 'character',
 'we',
 'should',
 'really',
 'care',
 'about',
 'is',
 'a',
 'very',
 'cocky',
 ',',
 'overconfident',
 'Ashton',
 'Kutcher',
 '.',
 'The',
 'problem',
 'is',
 'he',
 'comes',
 'off',
 'as',
 'kid',
 'who',
 'thinks',
 'he',
 "'s",
 'better',
 'than',
 'anyone',
 'else',
 'around',
 'him',
 'and',
 'shows',
 'no',
 'signs'

## Preparing the data: build a vocabulary

When we work with text, we usually transform our words into ids and keep a vocabulary which associates words with ids. Using torchtext this is an easy task. We also want to start with a relatively small vocabulary. A classical way to do this is to include only words above a certain frequency. Let's do this:



In [185]:
MIN_FREQ = 10
TEXT.build_vocab(train, min_freq=MIN_FREQ)

## Preparing the data: create mini-batches

Modern neural networks are usually trained using minibatches. These minibatches are used for optimizing the weights using backpropagation.

Torchtext also facilitates creating an iterator over these minibatches. Typical sizes for NLP are 16,32, 64, depending on the size of the data and the task.


In [186]:
BATCH_SIZE = 32
train_iter, dev_iter = data.BucketIterator.splits((train, dev),
                                                  batch_size=BATCH_SIZE,
                                                  sort_key=lambda x: len(x.text),
                                                  device=-1,
                                                  repeat=False)

# 2. Model definition

Now we have our data ready, let's start defining our neural network structure. 

The core neural network components in Pytorch are in the nn module. This modules include typical layers: Linear, RNNs, CNNs, etc. which can be combined to create a multilayer neural network.

Every model we create extends the basid nn.Module, and implements at least two methods:

- **init**: Defines the core variables of our network.
- **forward**: Defines the "forward" pass. This is, the computation made by our network to transform the input data into a prediction output.

Let's create our first model, a simple sentiment classifier.


In [187]:
# Now we can start defining our predictive model. The first step is to define the 'architecture' of the model
# and its main operations with the data that goes through the network.

# The core neural network components of Pytorch belong to the nn module
import torch
import torch.nn as nn
import torch.nn.functional as F

# Let's start with a very simple baseline model
class BaseSentimentClassifier(nn.Module):
    
    def __init__(self, input_dim, hidden_size, output_dim, batch_size=32, debug=None):
        super(BaseSentimentClassifier, self).__init__()
        self.embed = nn.Embedding(input_dim, hidden_size)
        self.fc1 = nn.Linear(hidden_size, hidden_size) 
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_dim)
        
        self.debug = debug
    
    def forward(self, input):
        # The forward pass defines how the input data is processed by the network
        # to make a prediction
        if (self.debug):
            print(input)
        embed = self.embed(input)
        if (self.debug):
            print(embed)
        # This operation summarizes a 3D tensor 200x32x200 into a 32x200 matrix
        out = F.max_pool1d(embed.transpose(0,2), input.size()[0]).squeeze().transpose(0,1)
        out = self.fc1(out)
        out = self.relu(out)
        out = self.fc2(out)
        return out

# 3. Training process

Now that we have defined the architecture of our network, we can start defining our training process. Ideally, this training process should be independent of our model architecture. A very naive approach would be to define a function which receives a model instance. Let's do this:. But first, let's define an auxiliary method for showing progress (do not worry much about this method now)


In [188]:
def log(time, epoch, iterations, batch_idx, train_iter, loss, train_acc, dev_loss=None, dev_acc=None):
    header = '  Time Epoch Iteration Progress    (%Epoch)   Loss   Dev/Loss     Accuracy  Dev/Accuracy'
    dev_log_template = ' '.join('{:>6.0f},{:>5.0f},{:>9.0f},{:>5.0f}/{:<5.0f} {:>7.0f}%,{:>8.6f},{:8.6f},{:12.4f},{:12.4f}'.split(','))
    log_template =     ' '.join('{:>6.0f},{:>5.0f},{:>9.0f},{:>5.0f}/{:<5.0f} {:>7.0f}%,{:>8.6f},{},{:12.4f},{}'.split(','))
    print(header)
    if(dev_loss):
        print(dev_log_template.format(time,
                    epoch, iterations, 1+batch_idx, len(train_iter),
                    100. * (1+batch_idx) / len(train_iter), loss.data[0], dev_loss, train_acc, dev_acc))
    else:
        print(log_template.format(time,
                    epoch, iterations, 1+batch_idx, len(train_iter),
                    100. * (1+batch_idx) / len(train_iter), loss.data[0], ' '*8, train_acc, ' '*12))
    print()

In [None]:
def train(model, batches, num_epochs=2, learning_rate = 0.001):
    import time
    train_iter, dev_iter = batches
    
    # First we need to define our loss/objective function
    # Cross Entropy Loss already applies softmax
    criterion = nn.CrossEntropyLoss()
    # And the optimizer (Gradient-descent methods)
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
    from torch.autograd import Variable
    # Now the code for training our network
    iterations = 0
    log_every = 300
    dev_every = 700
    start = time.time()
    for epoch in range(num_epochs):
        train_iter.init_epoch()
        n_correct, n_total = 0, 0
        for batch_idx, batch in enumerate(train_iter):
            optimizer.zero_grad()
            output = model(batch.text)
            iterations += 1
            n_correct += (torch.max(output, 1)[1].view(batch.label.size()).data == batch.label.data).sum()
            n_total += batch.batch_size
            train_acc = 100. * n_correct/n_total
            loss = criterion(output, batch.label)
            loss.backward()
            optimizer.step()
            if iterations % log_every == 0:
                log(time.time()-start, 
                    epoch, 
                    iterations, 
                    batch_idx, 
                    train_iter, 
                    loss, 
                    train_acc)
            if iterations % dev_every == 0:
                model.eval(); dev_iter.init_epoch()
                n_dev_correct, dev_loss = 0, 0
                for dev_batch_idx, dev_batch in enumerate(dev_iter):
                    answer = model(dev_batch.text)
                    n_dev_correct += (torch.max(answer, 1)[1].view(dev_batch.label.size()).data == dev_batch.label.data).sum()
                    dev_loss = criterion(answer, dev_batch.label)
                dev_acc = 100. * n_dev_correct / len(dev)
                log(time.time()-start, 
                        epoch, 
                        iterations, 
                        batch_idx, 
                        train_iter, 
                        loss, 
                        train_acc,
                        dev_loss.data[0],
                        dev_acc)
                

# 4. Finally training

We can finally use our method for training and try out different models. Let's start with our simple sentiment classifier.


In [None]:
# Input dimensions are defined by the len of the input vocab
input_dim = len(TEXT.vocab)
hidden_size = 200
# Output dimensions are two: 0 = negative, 1 = positive
output_dim = 2

model = BaseSentimentClassifier(input_dim,
                              hidden_size,
                              output_dim)

# Let's call the training process
train(model, (train_iter, dev_iter),num_epochs=4, learning_rate = 0.02)

  Time Epoch Iteration Progress    (%Epoch)   Loss   Dev/Loss     Accuracy  Dev/Accuracy
    12     0       300   300/782        38% 0.681308               52.3646             

  Time Epoch Iteration Progress    (%Epoch)   Loss   Dev/Loss     Accuracy  Dev/Accuracy
    23     0       600   600/782        77% 0.686829               53.5156             



# 4. Saving your model and vocabulary for serving predictions

Imagine we are satisfied with our last model and we want to serve predictions in production. All we need to do is to: (1) save our model, and (2) save our vocabulary for turning words into ids understood by our model


In [None]:

# Save model
torch.save(model, 'model.pt')

In [None]:
# Save vocab: strings to ids dictionary
import dill as pickle
input_vocab = TEXT.vocab.stoi

vocab = input_vocab
with open('input_vocab.pickle',  'wb') as f:
    pickle.dump(vocab, f)

# 5. Predict!

Now imagine you were happy with the model and want to serve predictions on production. Typically, this will happen on another environment/service than the one you trained your model on. But you are lucky you stored all the data you need. Let's do this!

First, load the vocab and the model:


In [None]:
import dill as pickle
with open('input_vocab.pickle', 'rb') as file_:
    vocab = pickle.load(file_)

production_model = torch.load('model.pt')
production_model.debug=True

Now we are ready to predict the sentiment of unseen text:

In [None]:
import torch.autograd as autograd
import numpy as np

zeros = [0 for i in range(0,200)]
def text2ids(texts):
    def padding(text):
        t = text.split(" ")
        wids = [vocab[s.lower()] for s in t]
        padded = wids + zeros[len(wids):]
        return padded
    
    batch = [padding(text) for text in texts]
    tensor = torch.LongTensor(batch).transpose(0,1)
    
    return autograd.Variable(tensor)

text = text2ids(["The movie was very terribly bad", "The movie was very terribly bad"])

prediction = production_model(text)

print(prediction)

# 5. Exercise: Define a RNN model

Recurrent neural networks are thought for dealing with sequences, and thus are suitable for NLP tasks: sentences are sequences of words, texts are sequences of sentences, etc.

Try creating an new model using some of the advanced RNN provided by Pytorch: LSTM (with or without bidirection) or the simpler but effective GRU.

In [None]:
# Let's start with a very simple baseline model
class RNNSentimentClassifier(nn.Module):
    
    def __init__(self, input_dim, hidden_size, output_dim, batch_size=32):
        super(NewSentimentClassifier, self).__init__()
        self.embed = nn.Embedding(input_dim, hidden_size)
        # Here you should define an RNN layer: be careful with embed shape and hidden, output shapes
        # self.rnn
        self.fc2 = nn.Linear(hidden_size, output_dim)
    
    def forward(self, input):
        # The forward pass defines how the input data is processed by the network
        # to make a prediction
        embed = self.embed(input)
        # Here you should pass the batches throught the RNN layer: you might need to use .view
        # out = self.rnn ...
        out = self.fc2(out)
        return out