## Assignment 2.4: Text classification via CNN (20 points)

In this assignment you should perform sentiment analysis of the IMDB reviews based on CNN architecture. Read carefully [Convolutional Neural Networks for Sentence Classification](https://arxiv.org/pdf/1408.5882.pdf) by Yoon Kim.

In [1]:
import numpy as np
import torch

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torchtext import datasets
from torchtext.data import Field, LabelField
from torchtext.data import Iterator

### Preparing Data

In [2]:
TEXT = Field(sequential=True, lower=True, batch_first=True)
LABEL = LabelField(batch_first=True)

In [3]:
train, tst = datasets.IMDB.splits(TEXT, LABEL)
trn, vld = train.split()

In [4]:
# %%time
TEXT.build_vocab(trn)

In [5]:
LABEL.build_vocab(trn)

### Creating the Iterator (2 points)

Define an iterator here

In [6]:
if torch.cuda.is_available():
    device = 'cuda'
    torch.cuda.set_device(2)
else:
    device = 'cpu'


print("device: ", device)

device:  cuda


In [7]:
train_iter, val_iter, test_iter = Iterator.splits((trn, vld, tst), 
                                                  batch_size = 64, 
                                                  device = device)

### Define CNN-based text classification model (8 points)

In [8]:
class CNN(nn.Module):
    def __init__(self, V, D, kernel_sizes, dropout=0.5):
        super(CNN, self).__init__()
        self.n_filters = 100
        self.embedding = nn.Embedding(V, D)

        self.convs = nn.ModuleList([nn.Conv2d(in_channels = 1, 
                                              out_channels = self.n_filters, 
                                              kernel_size = (k, D)) 
                                    for k in kernel_sizes])
        
        self.linear = nn.Linear(len(kernel_sizes) * self.n_filters, 1)
        self.dropout = nn.Dropout(dropout)
        self.act = nn.Sigmoid()
        
    def forward(self, x):
        emb = self.embedding(x).unsqueeze(1)
        cnv = [F.relu(conv(emb)).squeeze(3) for conv in self.convs]  
        pool = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in cnv]
        cat = self.dropout(torch.cat(pool, dim = 1))
        outputs = self.linear(cat)
        logit = self.act(outputs)
        return logit

In [9]:
kernel_sizes = [3,4,5]
vocab_size = len(TEXT.vocab)
dropout = 0.5
dim = 300

model = CNN(vocab_size, dim, kernel_sizes, dropout)

In [10]:
model.cuda()

CNN(
  (embedding): Embedding(202250, 300)
  (convs): ModuleList(
    (0): Conv2d(1, 100, kernel_size=(3, 300), stride=(1, 1))
    (1): Conv2d(1, 100, kernel_size=(4, 300), stride=(1, 1))
    (2): Conv2d(1, 100, kernel_size=(5, 300), stride=(1, 1))
  )
  (linear): Linear(in_features=300, out_features=1, bias=True)
  (dropout): Dropout(p=0.5)
  (act): Sigmoid()
)

### The training loop (3 points)

Define the optimization function and the loss functions.

In [11]:
opt = optim.Adam(model.parameters())
loss_func = nn.BCEWithLogitsLoss()

Think carefully about the stopping criteria. 

In [12]:
epochs = 15

In [13]:
%%time
for epoch in range(1, epochs + 1):
    running_loss = 0.0
    running_corrects = 0
    model.train() 
    for batch in train_iter:         
        
        x = batch.text
        y = batch.label
        
        opt.zero_grad()
        preds = model(x).squeeze()
        loss = loss_func(preds, y.float())
        loss.backward()
        opt.step()
        running_loss += loss.item()
        
    epoch_loss = running_loss / len(trn)
    
    val_loss = 0.0
    model.eval()
    correct = 0
    total = 0 
    for batch in val_iter:
        
        x = batch.text
        y = batch.label
        
        preds = model(x).squeeze()
        loss = loss_func(preds, y.float())
        val_loss += loss.item()
        
    val_loss /= len(vld)
    
    print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, epoch_loss, val_loss))

Epoch: 1, Training Loss: 0.010685754438808986, Validation Loss: 0.009884297116597494
Epoch: 2, Training Loss: 0.009922284913063049, Validation Loss: 0.00964530390103658
Epoch: 3, Training Loss: 0.009657589020047869, Validation Loss: 0.009518324406941732
Epoch: 4, Training Loss: 0.009418815977232797, Validation Loss: 0.009379640690485636
Epoch: 5, Training Loss: 0.009241371279103416, Validation Loss: 0.009336049095789592
Epoch: 6, Training Loss: 0.009109584728309087, Validation Loss: 0.009204308764139812
Epoch: 7, Training Loss: 0.008991304503168379, Validation Loss: 0.009094850095113119
Epoch: 8, Training Loss: 0.008819631966522763, Validation Loss: 0.009011515665054322
Epoch: 9, Training Loss: 0.008744627933842796, Validation Loss: 0.009007434225082397
Epoch: 10, Training Loss: 0.00865777643578393, Validation Loss: 0.008981348419189452
Epoch: 11, Training Loss: 0.008561036724703652, Validation Loss: 0.009032762837409973
Epoch: 12, Training Loss: 0.008492571476527622, Validation Loss: 

### Calculate performance of the trained model (2 points)

In [14]:
tp = 0
fp = 0
fn = 0
tn = 0
for batch in test_iter:
    x = batch.text
    y = batch.label
    predictions = model(x).squeeze()
    rounded_preds = torch.round(predictions)
    confusion_vector = rounded_preds / y.float()
    
    
    
    tp += torch.sum(confusion_vector == 1).item()
    fp += torch.sum(confusion_vector == float('inf')).item()
    tn += torch.sum(torch.isnan(confusion_vector)).item()
    fn += torch.sum(confusion_vector == 0).item()

In [15]:
accuracy = (tp+tn)/(tp+tn+fn+fp)
print("accuracy: ", accuracy)
precision = tp/(tp+fp)
print("precision: ", precision)
recall = tp/(tp+fn)
print("recall: ", recall)
f1 = 2*precision*recall/(precision+recall)
print("f1: ", f1)

accuracy:  0.86496
precision:  0.8709546267685803
recall:  0.85688
f1:  0.8638599887087669


Write down the calculated performance

### Accuracy: 0.8650
### Precision: 0.8710
### Recall: 0.8569
### F1: 0.8639

### Experiments (5 points)

Experiment with the model and achieve better results. Implement and describe your experiments in details, mention what was helpful.

## 1. 
### dropout = 0.5, change Adam optimizer to SGD momentum

In [71]:
class CNN(nn.Module):
    def __init__(self, V, D, kernel_sizes, dropout=0.3):
        super(CNN, self).__init__()
        self.n_filters = 100
        self.embedding = nn.Embedding(V, D)

        self.convs = nn.ModuleList([nn.Conv2d(in_channels = 1, 
                                              out_channels = self.n_filters, 
                                              kernel_size = (k, D)) 
                                    for k in kernel_sizes])
        
        self.linear = nn.Linear(len(kernel_sizes) * self.n_filters, 1)
        self.dropout = nn.Dropout(dropout)
        self.act = nn.Sigmoid()
        
    def forward(self, x):
        emb = self.embedding(x).unsqueeze(1)
        cnv = [F.relu(conv(emb)).squeeze(3) for conv in self.convs]  
        pool = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in cnv]
        cat = self.dropout(torch.cat(pool, dim = 1))
        outputs = self.linear(cat)
        logit = self.act(outputs)
        return logit

In [27]:
kernel_sizes = [3,4,5]
vocab_size = len(TEXT.vocab)
dropout = 0.5
dim = 300
epochs = 10

model = CNN(vocab_size, dim, kernel_sizes, dropout)

In [28]:
model.cuda()

CNN(
  (embedding): Embedding(201784, 300)
  (convs): ModuleList(
    (0): Conv2d(1, 100, kernel_size=(3, 300), stride=(1, 1))
    (1): Conv2d(1, 100, kernel_size=(4, 300), stride=(1, 1))
    (2): Conv2d(1, 100, kernel_size=(5, 300), stride=(1, 1))
  )
  (linear): Linear(in_features=300, out_features=1, bias=True)
  (dropout): Dropout(p=0.5)
  (act): Sigmoid()
)

In [29]:
opt = opt = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
loss_func = nn.BCEWithLogitsLoss()

In [30]:
%%time
for epoch in range(1, epochs + 1):
    running_loss = 0.0
    running_corrects = 0
    model.train() 
    for batch in train_iter:         
        
        x = batch.text
        y = batch.label
        
        opt.zero_grad()
        preds = model(x).squeeze()
        loss = loss_func(preds, y.float())
        loss.backward()
        opt.step()
        running_loss += loss.item()
        
    epoch_loss = running_loss / len(trn)
    
    val_loss = 0.0
    model.eval()
    correct = 0
    total = 0 
    for batch in val_iter:
        
        x = batch.text
        y = batch.label
        
        preds = model(x).squeeze()
        loss = loss_func(preds, y.float())
        val_loss += loss.item()
        
    val_loss /= len(vld)
    
    print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, epoch_loss, val_loss))

Epoch: 1, Training Loss: 0.010855203005245754, Validation Loss: 0.010905149841308593
Epoch: 2, Training Loss: 0.010852082306998117, Validation Loss: 0.01090480485757192
Epoch: 3, Training Loss: 0.010850114720208304, Validation Loss: 0.01090240683555603
Epoch: 4, Training Loss: 0.010846595283917019, Validation Loss: 0.01088865509033203
Epoch: 5, Training Loss: 0.010823195392744882, Validation Loss: 0.010823581592241923
Epoch: 6, Training Loss: 0.010669763101850237, Validation Loss: 0.010673945093154908
Epoch: 7, Training Loss: 0.010468876664979117, Validation Loss: 0.01014719382127126
Epoch: 8, Training Loss: 0.010272592316355024, Validation Loss: 0.009920679012934367
Epoch: 9, Training Loss: 0.01008944388798305, Validation Loss: 0.009886492236455281
Epoch: 10, Training Loss: 0.009969590612820217, Validation Loss: 0.009703847845395406
CPU times: user 2min 17s, sys: 50.7 s, total: 3min 8s
Wall time: 3min 7s


### It works worse than previos model.

## I have tried to change some other params but the result was terrible.