# IS 784 Neural Network Example

In last tutorial we covered basics of Pytorch and neural networks. In this tutorial we combine our previous knowledge and create a basic neural network that will perform sentiment analysis on IMDB dataset.
 
Please note that  in this tutorial we will use legacy version of the torchtext.

In [None]:
import torch
print(torch.__version__) #version of the pytorh
import torch.nn.functional as F
import torchtext.legacy as torchtext

1.8.0+cu101


## Finding Max Length 


We first need to find the max length of the text in the dataset because our input size is constant. This is because, in MLP models, the linear layer requires fixed-sized input. In More complex models (like RNN), we don't need this part because they can take variable input sizes.

In [None]:
TEXT = torchtext.data.Field(tokenize='spacy', batch_first=True) # preprocessing parameters can be used to add aditional  preprocessing steps
LABEL = torchtext.data.LabelField(dtype = torch.float)

In [None]:
train_data, test_data = torchtext.datasets.IMDB.splits(TEXT, LABEL)

In [None]:
max_size=0  
count=0
sum= 0
for i in  range(len(train_data)):
  if max_size < len(train_data[i].text):
    max_size =len(train_data[i].text)
    print(max_size)
  count +=1
  sum +=len(train_data[0].text)
print("avarage: ", sum/count)

259
295
943
964
986
2789
avarage:  259.0


## Training Embeddings

Now that we found the max length corpus. Let's create our train and test datasets.

In [None]:
TEXT = torchtext.data.Field(tokenize='spacy', batch_first=True,fix_length= 2789) # preprocessing paraneters can be used to add aditional  preprocessing steps
LABEL = torchtext.data.LabelField(dtype = torch.float)
train_data, test_data = torchtext.datasets.IMDB.splits(TEXT, LABEL)
print("train length is: ",len(train_data))
print("test length is: ",len(test_data))
print(vars(train_data[0]))

train length is:  25000
test length is:  25000
{'text': ['Anyone', 'new', 'to', 'the', 'incredibly', 'prolific', 'Takashi', 'Miike', "'s", 'work', 'might', 'want', 'to', 'think', 'twice', 'about', 'making', 'this', 'startling', 'film', 'their', 'first', 'experience', 'of', 'this', 'truly', 'maverick', 'director', '.', 'In', 'keeping', 'with', 'Miike', "'s", 'working', 'practice', 'of', 'taking', 'any', 'work', 'that', 'comes', 'his', 'way', 'and', 'then', 'grafting', 'his', 'own', 'sensibilities', 'onto', 'the', 'script', ',', 'this', 'is', 'at', 'heart', 'a', 'fairly', 'basic', 'yakuza', 'thriller', ',', 'with', 'a', 'morally', 'ambiguous', 'cop', 'chasing', 'a', 'gang', 'which', 'his', 'lawyer', 'brother', 'has', 'fallen', 'in', 'with', '.', 'What', 'takes', 'the', 'movie', 'out', 'of', 'the', 'realms', 'of', 'the', 'same', '-', 'old', 'same', '-', 'old', 'however', ',', 'is', 'the', 'utterly', 'unflinching', 'attitude', 'so', 'some', 'of', 'the', 'most', 'sudden', 'and', 'horrific',

IMDB library contains more than 100000 words. Using all of the words will make our computation slower and removing them will reduce our networks efficiency slightly. Words we removed will be  marked with `<unk>` token. We can use  `max_size`  or `min_freq` parameters of  `build_vocab` function to limit size of our vocabulary.   

In [None]:
TEXT.build_vocab(train_data,
                 max_size = 30000)
LABEL.build_vocab(train_data)

In [None]:
print("Unique tokens in TEXT vocabulary:",len(TEXT.vocab))
print("Unique tokens in LABEL vocabulary:",len(LABEL.vocab))
print(TEXT.vocab.freqs.most_common(20))
print(LABEL.vocab.freqs)
print(TEXT.unk_token)
print(TEXT.pad_token)

Unique tokens in TEXT vocabulary: 23402
Unique tokens in LABEL vocabulary: 2
[('the', 289838), (',', 275296), ('.', 236843), ('and', 156483), ('a', 156282), ('of', 144055), ('to', 133886), ('is', 109095), ('in', 87676), ('I', 77546), ('it', 76545), ('that', 70355), ('"', 63329), ("'s", 61928), ('this', 60483), ('-', 52863), ('/><br', 50935), ('was', 50013), ('as', 43508), ('with', 42807)]
Counter({'pos': 12500, 'neg': 12500})
<unk>
<pad>


In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, test_iterator = torchtext.data.BucketIterator.splits(
    (train_data, test_data), batch_size=32, device=device)

In [None]:
for batch in train_iterator:
  print(batch)
  break


[torchtext.legacy.data.batch.Batch of size 32 from IMDB]
	[.text]:[torch.cuda.LongTensor of size 32x2789 (GPU 0)]
	[.label]:[torch.cuda.FloatTensor of size 32 (GPU 0)]


In [None]:
print(TEXT.vocab.itos[3000]) # itos -> int to string  | stoi -> str to int
print(TEXT.vocab.itos[3001])

cell
disbelief


Before creating our network, we need to solve how we could convert our words to a format that our network will understood clearly. Although `torchtext` converted our words to numerical form, due to sequential nature of the numbers this will create semantic errors. For example, 3000th word  ‘cell’ and 3001th word  ‘disbelief’ although they have no semantic connections between them they follow each other in our vocabulary.   

Our second option is  converting our vocabulary to one-hot vectors. Although this option is a good alternative in small vocabularies, in our example we need to create vector with  30000 parameters. Combined with 2789-word count in our sentences (with padding),  we will have  60 million bits of input. 

Our most feasible option is creating word vectors. Essentially, we will create a lookup table that will convert our words into vector with predetermined size. For this purpose, pytorch contains [nn.Embeddings](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html) module. This module which creates a lookup table with trainable weights. 



Lets create our network;

In [None]:
class Network(torch.nn.Module):
    def __init__(self,pad_idx):
        super().__init__()
        self.embedding = torch.nn.Embedding(num_embeddings = 30002, embedding_dim =100,padding_idx = pad_idx)
        self.layer1 = torch.nn.Linear(2789*100, 1000)
        self.layer2 = torch.nn.Linear(1000, 1)


    def forward(self, x):
        x = self.embedding(x).view(x.size(0),-1)
        x = self.layer1(x)
        x = F.relu(x)
        x = self.layer2(x)
        return x       

In [None]:
model = Network(pad_idx = TEXT.vocab.stoi[TEXT.pad_token])
print(model)

Network(
  (embedding): Embedding(30002, 100, padding_idx=1)
  (layer1): Linear(in_features=278900, out_features=1000, bias=True)
  (layer2): Linear(in_features=1000, out_features=1, bias=True)
)


We will use Adam optimizer and BCEWithLogitsLoss function for due to its better performance when there is single class.   

Please note that BCEWithLogitsLoss function combines a sigmoid layer and the BCELoss (Binary Cross Entropy)  in one single class while giving more stable results while using them separately. We also need to apply sigmoid function to output of our network if we want to obtain results separately.

In [None]:
loss_fn = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(params= model.parameters(),lr= 0.0005) #default lr is 0.001

In [None]:
model = model.to(device)
loss_fn = loss_fn.to(device)

Now that we are ready lets train our network;

In [None]:
import time
# Training loop
N_EPOCHS = 2

tr_loss = []
model.train()

for epoch in range(N_EPOCHS):
    
    # Calculate training time
    start_time = time.time()

    epoch_loss = 0
    epoch_acc = 0

    
    batch_no = 0
    for batch in train_iterator:
        
        # Reset the gradient to not use them in multiple passes 
        optimizer.zero_grad()
        
        predictions = model(batch.text).squeeze(1)
        loss = loss_fn(predictions, batch.label.squeeze(0))

        # Backprop
        loss.backward()
        
        # Optimize the weights
        optimizer.step()
        
        # Record accuracy and loss
        epoch_loss += loss.item()
        
        correct = (torch.round(torch.sigmoid(predictions)) == batch.label.squeeze(0)).float() 
        acc = correct.sum() / len(correct)
        epoch_acc +=acc.item()

        batch_no = batch_no +1
        
        if batch_no%60 == 0:
          print(f'Epoch:  {epoch+1:2} | Batch No: {batch_no} | Loss: {loss.item():.3f} | Accuracy: {acc.item()*100:.2f}%')

    
    train_loss = epoch_loss / len(train_iterator)

    end_time = time.time()

    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    
    print('\n')    
    print(f'Epoch: {epoch+1:2} | Epoch Time: {elapsed_mins}m {elapsed_secs}s')
    print(f'\tAvarage Train Loss: {train_loss:.3f} ')
    print('\n') 

Epoch:   1 | Batch No: 60 | Loss: 0.732 | Accuracy: 40.62%
Epoch:   1 | Batch No: 120 | Loss: 0.656 | Accuracy: 59.38%
Epoch:   1 | Batch No: 180 | Loss: 0.716 | Accuracy: 56.25%
Epoch:   1 | Batch No: 240 | Loss: 0.714 | Accuracy: 53.12%
Epoch:   1 | Batch No: 300 | Loss: 0.674 | Accuracy: 59.38%
Epoch:   1 | Batch No: 360 | Loss: 0.737 | Accuracy: 43.75%
Epoch:   1 | Batch No: 420 | Loss: 0.762 | Accuracy: 40.62%
Epoch:   1 | Batch No: 480 | Loss: 0.654 | Accuracy: 59.38%
Epoch:   1 | Batch No: 540 | Loss: 0.616 | Accuracy: 68.75%
Epoch:   1 | Batch No: 600 | Loss: 0.609 | Accuracy: 65.62%
Epoch:   1 | Batch No: 660 | Loss: 0.571 | Accuracy: 62.50%
Epoch:   1 | Batch No: 720 | Loss: 0.554 | Accuracy: 62.50%
Epoch:   1 | Batch No: 780 | Loss: 0.657 | Accuracy: 59.38%


Epoch:  1 | Epoch Time: 1m 52s
	Avarage Train Loss: 0.669 


Epoch:   2 | Batch No: 60 | Loss: 0.109 | Accuracy: 96.88%
Epoch:   2 | Batch No: 120 | Loss: 0.110 | Accuracy: 96.88%
Epoch:   2 | Batch No: 180 | Loss: 0.20

Now that our training is complete, we can test our network using our test_iterator. During testing we do not need to calculate gradients so we can use `torch.no_grad()` module to prevent backpropagation. Similarly, `.eval()` function of turn our network  to evaluation mode. On the other hand,  `.train()` function can be used to re enable training mode.

In [None]:
test_epoch_loss = 0
test_epoch_acc = 0

# Turm on evalutaion mode
model.eval()

# No need to backprop in eval
with torch.no_grad():

    for batch in test_iterator:

        test_predictions = model(batch.text).squeeze(1)
        
        test_loss = loss_fn(test_predictions, batch.label)

        test_epoch_loss += test_loss.item()
        
        correct = (torch.round(torch.sigmoid(test_predictions)) == batch.label.squeeze(0)).float() 
        acc = correct.sum() / len(correct)
        
        test_epoch_acc +=acc.item()

test_loss = test_epoch_loss/len(test_iterator)
test_acc = test_epoch_acc  / len(test_iterator)
print(f'Test Loss: {test_loss:.3f} | | Test Acc: {test_acc*100:.2f}%')

Test Loss: 0.759 | | Test Acc: 64.39%


Finally let’s see our networks effectiveness on actual examples.

In [None]:
review1 = 'This is the best movie I have ever watched!'
review2 = 'This is an okay movie'
review3 = 'This was a waste of time! I hated this movie.'

import spacy
nlp = spacy.load('en')

model.eval()
tokenized = [tok.text for tok in nlp.tokenizer(review1)]
if len(tokenized) < 2789:
    tokenized += ['<pad>'] * (2789 - len(tokenized))

# Map words to word embeddings
indexed = [TEXT.vocab.stoi[t] for t in tokenized]
tensor = torch.LongTensor(indexed).to(device)
tensor = tensor.unsqueeze(0)
# Get predicitons
prediction = torch.sigmoid(model(tensor))

print(prediction.item())

0.9385111927986145


## Using Pretrained Embeddings

 In previous example we trained our word embeddings from scratch. In this example we will use pretrained embeddings instead of training them. These embeddings can provide better network accuracy because they are trained on large datasets and they provide semantic meanings for the words.


In [None]:
TEXT2 = torchtext.data.Field(tokenize='spacy', batch_first=True,fix_length= 2789)  # fix_length 
LABEL2 = torchtext.data.LabelField(dtype = torch.float)

In [None]:
train_data2, test_data2 = torchtext.datasets.IMDB.splits(TEXT2, LABEL2)

Torchtext’s vocabulary module provides easy access to  popular word embeddings. In this example we will use Glove word vectors which is trained on Wikipedia and Gigaword 5 datasets. This embedding contains trained on six billion tokens and  400-thousand-word vocabulary. It has different sized word vector from 50 dimension to 300. Glove will be discussed upcoming word embedding lecture in detailly.    
This process can take some time because Colab needs to download word embeddings from  torchtext repository. 

In [None]:
TEXT2.build_vocab(train_data2,
                 max_size = 30000,
                  vectors = "glove.6B.100d", 
                 # Set unknown vectors
                  unk_init = torch.Tensor.normal_)
LABEL2.build_vocab(train_data2)
print("Unique tokens in TEXT vocabulary:",len(TEXT2.vocab))
print("Unique tokens in LABEL vocabulary:",len(LABEL2.vocab))

Unique tokens in TEXT vocabulary: 30002
Unique tokens in LABEL vocabulary: 2


Now that our vocabulary is ready we can create  iterators and  network.

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator2, test_iterator2 = torchtext.data.BucketIterator.splits(
    (train_data2, test_data2), batch_size=64, device=device)

In [None]:
model2 = Network(pad_idx = TEXT2.vocab.stoi[TEXT2.pad_token])
print(model2)

Network(
  (embedding): Embedding(30002, 100, padding_idx=1)
  (layer1): Linear(in_features=278900, out_features=1000, bias=True)
  (layer2): Linear(in_features=1000, out_features=1, bias=True)
)


We need to load our word embedding to our network and set unknown  and padding tokens to zero. After that we have a choice, we can either freeze embedding layer to prevent further training or we can continue to train the embeddings to fine tune the weights.  

In [None]:
model2.embedding.weight.data.copy_(TEXT2.vocab.vectors)
model2.embedding.weight.data[TEXT2.vocab.stoi[TEXT2.unk_token]] = torch.zeros(100)
model2.embedding.weight.data[TEXT2.vocab.stoi[TEXT2.pad_token]] = torch.zeros(100)

Lets continue with our training and test steps.

In [None]:
model2.embedding.requires_grad = False

We will use Adam optimizer and BCEWithLogitsLoss function for due to its better performance when there is single class.

In [None]:
loss_fn2 = torch.nn.BCEWithLogitsLoss()
optimizer2 = torch.optim.Adam(model2.parameters(), lr=0.0005)

In [None]:
model2 = model2.to(device)
loss_fn2 = loss_fn2.to(device)

In [None]:
import time
# Training loop
N_EPOCHS = 2

tr_loss2 = []
model2.train()

for epoch in range(N_EPOCHS):
    
    # Calculate training time
    start_time = time.time()

    epoch_loss = 0
    epoch_acc = 0
    batch_no = 0

    
    for batch in train_iterator2:
        
        # Reset the gradient to not use them in multiple passes 
        optimizer2.zero_grad()
        
        predictions = model2(batch.text).squeeze(1)
        loss = loss_fn2(predictions, batch.label.squeeze(0))
        
        # Backprop
        loss.backward()
        
        # Optimize the weights
        optimizer2.step()
        
        # Record accuracy and loss
        epoch_loss += loss.item()

        correct = (torch.round(torch.sigmoid(predictions)) == batch.label.squeeze(0)).float() 
        acc = correct.sum() / len(correct)
        epoch_acc +=acc.item()

        batch_no = batch_no +1
        
        if batch_no%60 == 0:
          print(f'Epoch:  {epoch+1:2} | Batch No: {batch_no} | Loss: {loss.item():.3f} | Accuracy: {acc.item()*100:.2f}%')
    
    train_loss = epoch_loss / len(train_iterator2)
        
    
    end_time = time.time()

    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    
    # Save training metrics
    tr_loss2.append(train_loss)
        
    print(f'Epoch: {epoch+1:2} | Epoch Time: {elapsed_mins}m {elapsed_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} ')

Epoch:   1 | Batch No: 60 | Loss: 0.743 | Accuracy: 57.81%
Epoch:   1 | Batch No: 120 | Loss: 0.641 | Accuracy: 64.06%
Epoch:   1 | Batch No: 180 | Loss: 0.606 | Accuracy: 60.94%
Epoch:   1 | Batch No: 240 | Loss: 0.582 | Accuracy: 75.00%
Epoch:   1 | Batch No: 300 | Loss: 0.565 | Accuracy: 68.75%
Epoch:   1 | Batch No: 360 | Loss: 0.531 | Accuracy: 75.00%
Epoch:  1 | Epoch Time: 1m 4s
	Train Loss: 0.613 
Epoch:   2 | Batch No: 60 | Loss: 0.230 | Accuracy: 90.62%
Epoch:   2 | Batch No: 120 | Loss: 0.212 | Accuracy: 89.06%
Epoch:   2 | Batch No: 180 | Loss: 0.263 | Accuracy: 92.19%
Epoch:   2 | Batch No: 240 | Loss: 0.297 | Accuracy: 87.50%
Epoch:   2 | Batch No: 300 | Loss: 0.216 | Accuracy: 93.75%
Epoch:   2 | Batch No: 360 | Loss: 0.230 | Accuracy: 92.19%
Epoch:  2 | Epoch Time: 1m 3s
	Train Loss: 0.282 


In [None]:
test_epoch_loss = 0
test_epoch_acc = 0

# Turm off dropout while evaluating
model2.eval()

# No need to backprop in eval
with torch.no_grad():

    for batch in test_iterator2:

        test_predictions = model2(batch.text).squeeze(1)
        
        test_loss = loss_fn2(test_predictions, batch.label)

        test_epoch_loss += test_loss.item()
        
        correct = (torch.round(torch.sigmoid(test_predictions)) == batch.label.squeeze(0)).float() 
        acc = correct.sum() / len(correct)
        
        test_epoch_acc +=acc.item()

test_loss = test_epoch_loss/len(test_iterator2)
test_acc = test_epoch_acc  / len(test_iterator2)
print(f'Test Loss: {test_loss:.3f} | | Test Acc: {test_acc*100:.2f}%')

Test Loss: 0.491 | | Test Acc: 79.06%


As we can see while using pretrained embeddings we obtained better accuracy from our network with pretrained embeddings. 

In [None]:
review1 = 'This is the best movie I have ever watched!'
review2 = 'This is an okay movie'
review3 = 'This was a waste of time! I hated this movie.'


import spacy
nlp = spacy.load('en')

model2.eval()
tokenized = [tok.text for tok in nlp.tokenizer(review1)]
if len(tokenized) < 2789:
    tokenized += ['<pad>'] * (2789 - len(tokenized))

# Map words to word embeddings
indexed = [TEXT2.vocab.stoi[t] for t in tokenized]
tensor = torch.LongTensor(indexed).to(device)
tensor = tensor.unsqueeze(0)
# Get predicitons
prediction = torch.sigmoid(model2(tensor))

print(prediction.item())

0.7838658690452576


## Recommended Readings

*   For more information about networks, optimizers and data utilities , you can refer to [pytorch learn the basics tutorial]( https://pytorch.org/tutorials/beginner/basics/intro.html)


