# Implement a Deep Averaging Network

<h3>Python Initialization</h3>

In [1]:
# import libraries
import torch
import torch.nn as nn
from torch.nn import functional as F
import torch.utils.data
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

In [2]:
# PyTorch random seed
torch.manual_seed(1)

<torch._C.Generator at 0x1f593be7ad0>

In [3]:
# If gpu is supported, then seed the gpu random number generator as well
gpu_available = torch.cuda.is_available()
if gpu_available:
    torch.cuda.manual_seed(1)
    
print("GPU is available:", gpu_available)

GPU is available: False


## Deep Averaging Network

###  Part One: Load Dataset

We use the <a href=https://www.cs.jhu.edu/~mdredze/datasets/sentiment/>multi-domain sentiment</a> dataset created by Professor <a href=https://www.cs.jhu.edu/~mdredze/>Mark Dredze</a> for our project. This dataset contains product reviews taken from Amazon.com from many product types and the reviews are labeled positive and negative. In particular, we only consider the reviews for books for our project. To make things easier for you, we also created a dictionary where you will only consider the words in this dictionary when you are constructing the word embedding for our deep averaging network. Run the following two cells to load the data and see a positive and a negative review:

In [5]:
def load_data():
    review_train = pd.read_csv('../data/review_train.csv')
    review_test = pd.read_csv('../data/review_test.csv')
    vocabulary = np.load('../data/vocabulary.npy', allow_pickle=True).item()
    
    return review_train, review_test, vocabulary

In [7]:
# vocabulary is dictionary with key-value pairs (word, index): vocabulary[word] = index
# We will use this vocabulary to construct bag-of-word (bow) features
review_train, review_test, vocab = load_data()

# label 0 == Negative reviews
# label 1 == Positive reviews
label_meaning = ['Negative', 'Positive']

print('Number of Training Reviews:', review_train.shape[0])
print('Number of Test Reviews:', review_test.shape[0])
print('Number of Words in the Vocabulary:', len(vocab))

Number of Training Reviews: 1787
Number of Test Reviews: 200
Number of Words in the Vocabulary: 4380


In [8]:
# print some training reviews
print('A Positive Training Review:', review_train.iloc[0]['review'])
print('A Negative Training Review:', review_train.iloc[-1]['review'])


A Negative Training Review: I got to page 26 and gave up.  Lockes writings lack focus and are void of humour.  I read as much as I could with patience until it became clear this book was simply someone rambling on about nothing.  Save your money for something worth reading



We also create a function <code>generate_featurizer</code> which takes in a vocabulary and returns a bow featurizer based on the vocabulary. Using the returned featurizer, you can convert a sentence into a bag of word feature vector. See the following cell for example:

In [9]:
def generate_featurizer(vocabulary):
    return CountVectorizer(vocabulary=vocabulary)

In [10]:
# Create a simple vocabulary
simple_vocab = {'learn': 0, 'machine': 1, 'learning': 2, 'teach': 3}

# Create a simple sentence that will be converted into bag of words features
simple_sentence = 'I learn machine learning to teach computers how to learn.'

# Create a featurizer by passing in the vocabulary
simple_featurizer = generate_featurizer(simple_vocab)

# Call simple_featurizer.transform to transform the sentence to its bag of word features
simple_featurizer.transform([simple_sentence]).toarray()

# You should get array([[2, 1, 1, 1]]) as output.
# This means that the sentence has:
#     2 occurences of 'learn'
#     1 occurence of 'machine'
#     1 occurence of 'learning'
#     1 occurence of 'teach'

array([[2, 1, 1, 1]], dtype=int64)

Now we will use <code>generate_featurizer</code> to generate a featurizer based on the vocabulary we provided.

In [11]:
bow_featurizer = generate_featurizer(vocab)

Using the featurizer, we will convert the training reviews and test reviews into bag of word representation and PyTorch Tensor.

In [12]:
# convert the reviews to bow representation and torch Tensor
X_train = torch.Tensor(bow_featurizer.transform(review_train['review'].values).toarray())
y_train = torch.LongTensor(review_train['label'].values.flatten())

X_test = torch.Tensor(bow_featurizer.transform(review_test['review'].values).toarray())
y_test = torch.LongTensor(review_test['label'].values.flatten())

In [13]:
# Generate PyTorch Datasets
trainset = torch.utils.data.TensorDataset(X_train, y_train)
testset = torch.utils.data.TensorDataset(X_test, y_test)

# Generate PyTorch Dataloaders
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, drop_last=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=True, drop_last=False)

### Part Two: Implement a Deep Averaging Network

We define a PyTorch network class. To start, we first implement <code>average</code> that averages the words in a review and then implement <code>forward</code> that passes the "averaged" review to a linear layer to produce the model's belief.

- For `average` recall that multiplying the matrix of the bag-of-words with the word embeddings will get us the embedded representations for the reviews. We then want to average over all the different words in a review to get an "average" embedding for each review. Note that here, we compute a weighted average. I.e. Let $\mathbf{E}\in{\mathcal{R}}^{r\times v}$ be the embedding matrix (with embedding dimensionality $r$ and vocabulary size $v$). The $i^{th}$ column of $\mathbf{E}$ is the embedding of word $i$ in the vocabulary (we denote it as $\mathbf{E}[:,i]$). Further, let $\mathbf{x}\in{\mathcal{R}}^{1\times v}$ be a horizontal bag-of-words input vector. We compute the average embedding as follows:
$$\mathbf{a}=\frac{1}{\sum_i \mathbf{x}[i]}\sum_{j}\mathbf{E}[:,j]\mathbf{x}[j]$$
In the function you need to compute this average for each input vector (the input <code>x</code> is a list of inputs). 
- For `forward`, pass the output of `average` through the linear layer stored in `self.fc`.

In [14]:
class DAN(nn.Module):
    
    def __init__(self, vocab_size, embedding_size=32):
        super().__init__()
        
        # Create a word-embedding of dimension embedding_size
        # self.embeds is now the matrix E, where each column corresponds to the embedding of a word
        self.embeds = torch.nn.Parameter(torch.randn(vocab_size, embedding_size))
        self.embeds.requires_grad_(True)
        
        # add a final linear layer that computes the 2d output from the averaged word embedding
        self.fc = nn.Linear(embedding_size, 2)
        
    def average(self, x):
        """
        This method takes in multiple inputs, stored in one tensor x.
        Each input is a bag of word representation of reviews.
        For each review, it retrieves the word embedding of each word in the review and averages them 
        (weighted by the corresponding entry in x).
        
        Inputs:
        ------
        x : nxd torch Tensor where each row corresponds to bag of word representation of a review
        
        Outputs:
        -------
        emb : n x (embedding_size) torch Tensor for the averaged reivew
        """
        den = torch.sum(x, dim=1)
        emb = x @ self.embeds
        for i in range(len(den)):
            emb[i] = emb[i] / den[i]
        
        return emb
    
    def forward(self, x):
        """
        This method takes in a bag of word representation of reviews.
        It calls the self.average to get the averaged review and pass it
        through the linear layer to produce the model's belief.
        
        Inputs:
        ------
        x : nxd torch Tensor where each row corresponds to bag of word representation of reviews
        
        Outputs:
        -------
        out : nx2 torch Tensor that corresponds to model belief of the input
              For instance, output[0][0] is the model belief that the 1st review is negative.
        """
        review_averaged = self.average(x)
        out = self.fc(review_averaged)
        
        return out

In [15]:
# Create a model
model = DAN(len(vocab), embedding_size=32)

if gpu_available:
    model = model.cuda()

### Part Three: Define the loss function and optimizer

In [16]:
# Create optimizer and loss function
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=5)

### Part Four: Train the network

Run the following cell to train our network.

In [17]:
# Start Training
num_epochs = 1000

model.train()
for epoch in range(num_epochs):
    running_loss = 0.0
    running_acc = 0.0
    count = 0
    
    for i, (X, y) in enumerate(trainloader):
        # use gpu if possible
        if gpu_available:
            X = X.cuda()
            y = y.cuda()
        
        # clear the gradient buffer
        optimizer.zero_grad()
        
        # Do forward propagation to get the model's belief
        logits = model(X)
        
        # Compute the loss
        loss = loss_fn(logits, y)
        
        # Run a backward propagation to get the gradient
        loss.backward()
        
        # Update the model's parameters
        optimizer.step()
        
        # Get the model's prediction
        pred = torch.argmax(logits, dim=1)
        
        # Update the running statistics
        running_acc += torch.sum((pred == y).float()).item()
        running_loss += loss.item()
        count += X.size(0)
        
    # print the running statistics after training for 100 epochs
    if (epoch + 1) % 100 == 0:
        print('Epoch [{} / {}] Average Training Accuracy: {:4f}'.format(epoch + 1, num_epochs, running_acc / count))
        print('Epoch [{} / {}] Average Training loss: {:4f}'.format(epoch + 1, num_epochs, running_loss / len(trainloader)))

Epoch [100 / 1000] Average Training Accuracy: 0.919471
Epoch [100 / 1000] Average Training loss: 0.198661
Epoch [200 / 1000] Average Training Accuracy: 0.995793
Epoch [200 / 1000] Average Training loss: 0.036976
Epoch [300 / 1000] Average Training Accuracy: 1.000000
Epoch [300 / 1000] Average Training loss: 0.007970
Epoch [400 / 1000] Average Training Accuracy: 1.000000
Epoch [400 / 1000] Average Training loss: 0.003750
Epoch [500 / 1000] Average Training Accuracy: 1.000000
Epoch [500 / 1000] Average Training loss: 0.002358
Epoch [600 / 1000] Average Training Accuracy: 1.000000
Epoch [600 / 1000] Average Training loss: 0.001582
Epoch [700 / 1000] Average Training Accuracy: 1.000000
Epoch [700 / 1000] Average Training loss: 0.001234
Epoch [800 / 1000] Average Training Accuracy: 1.000000
Epoch [800 / 1000] Average Training loss: 0.000989
Epoch [900 / 1000] Average Training Accuracy: 1.000000
Epoch [900 / 1000] Average Training loss: 0.000854
Epoch [1000 / 1000] Average Training Accuracy:

### Step 5: Evaluate your model on the test data

In [18]:
# Evaluate the model
model.eval()

running_acc = 0.0
count = 0.0

for (X, y) in testloader:
    # Use gpu if available
    if gpu_available:
        X = X.cuda()
        y = y.cuda()
        
    # Do a forward pass with no gradient
    with torch.no_grad():
        logits = model(X)
    
    # Calculate the prediction
    pred = torch.argmax(logits, dim=1)
    
    # Update the running stats
    running_acc += torch.sum((pred == y).float()).item()
    count += X.size(0)

print('Your Test Accuracy is {:.4f}'. format(running_acc / count))

Your Test Accuracy is 0.8950


Run the following cells to see a random test review and the model prediction.
(You may observe that neural networks achieve high accuracy - but tend to be over-confident. This is because they achieve 100% training accuracy early on in the learning procedure and therefore learn that they tend to be always right.)

In [19]:
target = torch.randint(high=len(testset), size=(1,)).item()
review_target, label_target = review_test.iloc[target]

if gpu_available:
    bog_target = testset[target][0].unsqueeze(0).cuda()
else:
    bog_target = testset[target][0].unsqueeze(0)

model.eval()
with torch.no_grad():
    logits_target = model(bog_target)

pred = torch.argmax(logits_target, dim=1)
probability = torch.exp(logits_target.squeeze()) / torch.sum(torch.exp(logits_target.squeeze()))

print('Review:', review_target)
print('Ground Truth:', label_meaning[int(label_target)])
print('Prediction: %s (Certainty %2.2f%%)' % (label_meaning[pred.item()], 100.0 * probability[pred.item()]))

Review: An amazing resource to the odd world of Chick collecting.  Mr. Fowler has crafted an exhausting, almost overwhelming guide to all of Chick's works and articles about him.  Everything is cross-referenced to the point of where the reader is nearly overwhelmed with information.  Minute details of publishing histories and changes to individual tracts are documented down to the smallest letter.  The book (presented in a distinctive computer printout style, possibly due to its origins as a self-published work) is packed with charts of pricing info and catalog numbers.  But even casual Chick fans will enjoy the &quot;History of the World&quot; segment, which ties all of Jack's wild theories into a cohesive narrative, and a segment devoted entirely to fun trivia (there's a list of every instance of the term &quot;haw&quot; for instance).  Each tract and comic is described, and info is given on various parodies and rip-offs.  This is a must for all Chick fans (both &quot;saved&quot; and