## Assignment 2.4: Text classification via CNN (20 points)

In this assignment you should perform sentiment analysis of the IMDB reviews based on CNN architecture. Read carefully [Convolutional Neural Networks for Sentence Classification](https://arxiv.org/pdf/1408.5882.pdf) by Yoon Kim.

In [1]:
import numpy as np
import torch

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torchtext import datasets
from torchtext.data import Field, LabelField
from torchtext.data import Iterator

### Preparing Data

In [2]:
TEXT = Field(sequential=True, lower=True, batch_first=True)
LABEL = LabelField(batch_first=True)

In [3]:
train, tst = datasets.IMDB.splits(TEXT, LABEL)
trn, vld = train.split()

In [4]:
# %%time
TEXT.build_vocab(trn)

In [5]:
LABEL.build_vocab(trn)

### Creating the Iterator (2 points)

Define an iterator here

In [6]:
if torch.cuda.is_available():
    device = 'cuda'
    torch.cuda.set_device(2)
else:
    device = 'cpu'


print("device: ", device)

device:  cuda


In [7]:
train_iter, val_iter, test_iter = Iterator.splits((trn, vld, tst), 
                                                  batch_size = 64, 
                                                  device = device)

### Define CNN-based text classification model (8 points)

In [8]:
class CNN(nn.Module):
    def __init__(self, V, D, kernel_sizes, dropout=0.5):
        super(CNN, self).__init__()
        self.n_filters = 100
        self.embedding = nn.Embedding(V, D)

        self.convs = nn.ModuleList([nn.Conv2d(in_channels = 1, 
                                              out_channels = self.n_filters, 
                                              kernel_size = (k, D)) 
                                    for k in kernel_sizes])
        
        self.linear = nn.Linear(len(kernel_sizes) * self.n_filters, 1)
        self.dropout = nn.Dropout(dropout)
        self.act = nn.Sigmoid()
        
    def forward(self, x):
        emb = self.embedding(x).unsqueeze(1)
        cnv = [F.relu(conv(emb)).squeeze(3) for conv in self.convs]  
        pool = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in cnv]
        cat = self.dropout(torch.cat(pool, dim = 1))
        outputs = self.linear(cat)
        logit = self.act(outputs)
        return logit

In [9]:
kernel_sizes = [3,4,5]
vocab_size = len(TEXT.vocab)
dropout = 0.5
dim = 300

model = CNN(vocab_size, dim, kernel_sizes, dropout)

In [10]:
model.cuda()

CNN(
  (embedding): Embedding(201881, 300)
  (convs): ModuleList(
    (0): Conv2d(1, 100, kernel_size=(3, 300), stride=(1, 1))
    (1): Conv2d(1, 100, kernel_size=(4, 300), stride=(1, 1))
    (2): Conv2d(1, 100, kernel_size=(5, 300), stride=(1, 1))
  )
  (linear): Linear(in_features=300, out_features=1, bias=True)
  (dropout): Dropout(p=0.5)
  (act): Sigmoid()
)

### The training loop (3 points)

Define the optimization function and the loss functions.

In [11]:
opt = optim.Adam(model.parameters())
loss_func = nn.BCEWithLogitsLoss()

Think carefully about the stopping criteria. 

In [12]:
epochs = 15

In [13]:
%%time
for epoch in range(1, epochs + 1):
    running_loss = 0.0
    running_corrects = 0
    model.train() 
    for batch in train_iter:         
        
        x = batch.text
        y = batch.label
        
        opt.zero_grad()
        preds = model(x).squeeze()
        loss = loss_func(preds, y.float())
        loss.backward()
        opt.step()
        running_loss += loss.item()
        
    epoch_loss = running_loss / len(trn)
    
    val_loss = 0.0
    model.eval()
    correct = 0
    total = 0 
    for batch in val_iter:
        
        x = batch.text
        y = batch.label
        
        preds = model(x).squeeze()
        loss = loss_func(preds, y.float())
        val_loss += loss.item()
        
    val_loss /= len(vld)
    
    print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, epoch_loss, val_loss))

Epoch: 1, Training Loss: 0.010757887881142753, Validation Loss: 0.010054031268755596
Epoch: 2, Training Loss: 0.009938647314480372, Validation Loss: 0.009596681984265646
Epoch: 3, Training Loss: 0.009675254477773393, Validation Loss: 0.009394608600934346
Epoch: 4, Training Loss: 0.009470309250695365, Validation Loss: 0.009289481178919473
Epoch: 5, Training Loss: 0.009340208881241934, Validation Loss: 0.009387280861536661
Epoch: 6, Training Loss: 0.00919532527582986, Validation Loss: 0.009173213561375937
Epoch: 7, Training Loss: 0.009034168810503824, Validation Loss: 0.009045720668633779
Epoch: 8, Training Loss: 0.008898489214692797, Validation Loss: 0.009008890684445698
Epoch: 9, Training Loss: 0.008820920118263789, Validation Loss: 0.008965045686562857
Epoch: 10, Training Loss: 0.008701072846140181, Validation Loss: 0.008954120310147603
Epoch: 11, Training Loss: 0.008623659658432007, Validation Loss: 0.00891890817085902
Epoch: 12, Training Loss: 0.00853805672952107, Validation Loss: 0

### Calculate performance of the trained model (2 points)

In [14]:
for batch in test_iter:
    x = batch.text
    y = batch.label

In [15]:
predictions = model(x).squeeze()
print("predictions: ", predictions)
print("y: ", y)

predictions:  tensor([2.4164e-04, 5.3173e-02, 1.0000e+00, 4.3139e-04, 1.9722e-09, 2.3583e-14,
        1.1757e-09, 1.0000e+00, 1.0000e+00, 3.4317e-02, 9.9148e-01, 2.4658e-11,
        1.1935e-14, 2.4479e-16, 4.3233e-13, 1.0000e+00, 1.8072e-12, 3.1692e-13,
        5.8017e-01, 5.8399e-19, 6.4782e-04, 1.0000e+00, 1.0000e+00, 4.2441e-04,
        1.0000e+00, 1.0000e+00, 8.0071e-01, 9.5748e-01, 1.2338e-06, 8.4248e-10,
        4.6205e-05, 9.9993e-01, 9.4306e-08, 1.0000e+00, 8.9101e-12, 6.0629e-16,
        8.6360e-13, 1.0000e+00, 1.0000e+00, 9.9998e-01], device='cuda:2',
       grad_fn=<SqueezeBackward0>)
y:  tensor([0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1,
        1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1], device='cuda:2')


In [16]:
rounded_preds = torch.round(predictions)
correct = (rounded_preds == y.float()).float() 
accuracy = correct.sum()/len(correct)
print(accuracy)

tensor(0.9000, device='cuda:2')


In [17]:
correct

tensor([1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 0., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1.,
        1., 1., 1., 1.], device='cuda:2')

In [18]:
correct_true_amount =  (correct * y.float()).sum()
recall = correct_true_amount / y.float().sum()
print(recall)

tensor(0.8421, device='cuda:2')


In [19]:
precision = correct_true_amount / predictions.sum()
print(precision)

tensor(0.9745, device='cuda:2', grad_fn=<DivBackward0>)


In [20]:
f1 = 2*precision*recall / (precision + recall)
print(f1)

tensor(0.9035, device='cuda:2', grad_fn=<DivBackward0>)


Write down the calculated performance

### Accuracy: 0.900
### Precision: 0.9745
### Recall: 0.8421
### F1: 0.9035

### Experiments (5 points)

Experiment with the model and achieve better results. Implement and describe your experiments in details, mention what was helpful.

### 1. ?
### 2. ?
### 3. ?