**Convolutional Neural Networks (CNN)** were originally invented for computer vision and now are the building blocks of state-of-the-art CV models. One of the earliest applications of CNN in Natural Language Processing was introduced in the paper ***Convolutional Neural Networks for Sentence Classification*** (Kim, 2014). With the same idea as in computer vision, CNN model is used as an feature extractor that encodes semantic features of sentences before these features are fed to a classifier.

With only a simple one-layer CNN trained on top of pretrained word vectors and little hyperparameter tuning, the model achieves excellent results on multiple sentence-level classification tasks. CNN models are now used widely in other NLP tasks such as translation and question answering as a part of a more complex architecture.

Source: [A Complete Guide to CNN for Sentence Classification with PyTorch](https://chriskhanhtran.github.io/posts/cnn-sentence-classification/)

Here in this notebook, we will implementing **Convolutional Neural Networks for Sentiment Analysis** on a custom dataset -- **Twitter Dataset**.

## 1: Import Libraries

In [1]:
import os
import re
from tqdm import tqdm
import numpy as np
import pandas as pd
import nltk
import matplotlib.pyplot as plt
import torch

%matplotlib inline
nltk.download("all")

[nltk_data] Downloading collection 'all'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to /usr/share/nltk_data...
[nltk_data]    |   Package abc is already up-to-date!
[nltk_data]    | Downloading package alpino to /usr/share/nltk_data...
[nltk_data]    |   Package alpino is already up-to-date!
[nltk_data]    | Downloading package averaged_perceptron_tagger to
[nltk_data]    |     /usr/share/nltk_data...
[nltk_data]    |   Package averaged_perceptron_tagger is already up-
[nltk_data]    |       to-date!
[nltk_data]    | Downloading package averaged_perceptron_tagger_ru to
[nltk_data]    |     /usr/share/nltk_data...
[nltk_data]    |   Unzipping
[nltk_data]    |       taggers/averaged_perceptron_tagger_ru.zip.
[nltk_data]    | Downloading package basque_grammars to
[nltk_data]    |     /usr/share/nltk_data...
[nltk_data]    |   Package basque_grammars is already up-to-date!
[nltk_data]    | Downloading package bcp47 to /usr/share/nltk_data...
[nltk_data]    | Downloading pa

True

### 2: Data Inspection

In [2]:
data_dir = "../input/tweets-cleaned/tweets_train_tokens.csv"\

tweets_df = pd.read_csv(data_dir)
tweets_df.drop_duplicates(inplace=True)
tweets_df.dropna(inplace=True)

tweets_df

Unnamed: 0,label,message
0,neutral,arirang simply kpop kim hyung jun cross ha yeo...
1,neutral,read politico article donald trump running mat...
2,neutral,type bazura project google image image photo d...
3,neutral,fast lerner subpoena tech guy work hillary pri...
4,negative,sony reward app like lot female singer non ret...
...,...,...
49670,negative,sleep think fuck jordan answer phone tomorrow ...
49671,neutral,yoga shannon tomorrow morning work day start u...
49672,neutral,bring dunkin iced coffee tomorrow hero
49673,neutral,currently holiday portugal come home tomorrow ...


#### 2.1 Encode Labels

In [3]:
class_dict = {'negative': 0, 'neutral': 1, 'positive': 2}
tweets_df.rename(columns={'label': 'category'}, inplace=True)
tweets_df['label'] = tweets_df['category'].map(class_dict)

tweets_df

Unnamed: 0,category,message,label
0,neutral,arirang simply kpop kim hyung jun cross ha yeo...,1
1,neutral,read politico article donald trump running mat...,1
2,neutral,type bazura project google image image photo d...,1
3,neutral,fast lerner subpoena tech guy work hillary pri...,1
4,negative,sony reward app like lot female singer non ret...,0
...,...,...,...
49670,negative,sleep think fuck jordan answer phone tomorrow ...,0
49671,neutral,yoga shannon tomorrow morning work day start u...,1
49672,neutral,bring dunkin iced coffee tomorrow hero,1
49673,neutral,currently holiday portugal come home tomorrow ...,1


#### 2.2 Download fastText Word Vectors

The code below will download fastText pretrained vectors.

In [4]:
%%time
URL = "https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M.vec.zip"
FILE = "fastText"

if os.path.isdir(FILE):
    print("fastText exists.")
else:
    !wget -P $FILE $URL
    !unzip $FILE/crawl-300d-2m.vec.zip -d $FILE

--2023-01-20 13:17:16--  https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M.vec.zip
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 172.67.9.4, 104.22.74.142, 104.22.75.142, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|172.67.9.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1523785255 (1.4G) [application/zip]
Saving to: ‘fastText/crawl-300d-2M.vec.zip’


2023-01-20 13:17:44 (52.0 MB/s) - ‘fastText/crawl-300d-2M.vec.zip’ saved [1523785255/1523785255]

unzip:  cannot find or open fastText/crawl-300d-2m.vec.zip, fastText/crawl-300d-2m.vec.zip.zip or fastText/crawl-300d-2m.vec.zip.ZIP.
CPU times: user 473 ms, sys: 143 ms, total: 616 ms
Wall time: 30.8 s


In [5]:
import zipfile

zip_file = "/kaggle/working/fastText/crawl-300d-2M.vec.zip"
with zipfile.ZipFile(zip_file, 'r') as zip_ref:
    zip_ref.extractall('fastText')

#### 2.3 Set `torch.devic` for Training

For faster training, `torch.device` must be set to **GPU**.

In [6]:
if torch.cuda.is_available():
    device = torch.device('cuda')
    print(f"There are {torch.cuda.device_count()} GPU(s) available.")
    print("Device name: ", torch.cuda.get_device_name(0))
else:
    print("No GPU available, using the CPU instead.")
    device = torch.device('cpu')

There are 2 GPU(s) available.
Device name:  Tesla T4


### 3: Data Preparation

To prepare our text data for training, first we need to tokenize our sentences and build a vocabulary dictionary `word2idx`, which will later be used to convert our tokens into indexes and build an embedding layer.

#### 3.1 Tokenize

The function `tokenize` will tokenize our sentences, build a vocabulary and find the maximum sentence length. The function `encode` will take outputs of `tokenize` as inputs, perform sentence padding and return `input_ids` as a numpy array.

In [7]:
from nltk.tokenize import word_tokenize
from collections import defaultdict

def tokenize(texts):
    """
        Tokenize texts, build vocabulary and find maximum sentence length.
        
        Args:
            texts (List[str]): List of text data
            
        Returns:
            tokenized_texts (List[List[str]]): List of list of tokens
            word2idx (Dict): Vocabular built from the corpus
            max_len (int): Maximum setence length
    """
    
    max_len = 0
    tokenized_texts = []
    word2idx = {}
    
    # Add <pad> and <unk> tokens to the vocabulary
    word2idx["<pad>"] = 0
    word2idx["<unk>"] = 1
    
    # Building our vocab from the corpus starting from index 2
    idx = 0
    for sent in texts:
        tokenized_sent = word_tokenize(sent)
        
        # Add `tokenized_sent` to `tokenized_texts`
        tokenized_texts.append(tokenized_sent)
        
        # Add new token to `word2idx`
        for token in tokenized_sent:
            if token not in word2idx:
                word2idx[token] = idx
                idx += 1
                
        # Update `max_len`
        max_len = max(max_len, len(tokenized_sent))
        
    return tokenized_texts, word2idx, max_len
        
def encode(tokenized_tweets, word2idx, max_len):
    """
        Pad each sentence to the maximum sentence length and encode tokens
        to their index in the vocabulary.
        
        Returns:
            input_idx (np.array): Array of token indexes in the vocabulary with 
            shape (N, max_len). It will be the input of our CNN model.
    """
    
    input_ids = []
    for tokenized_sent in tokenized_tweets:
        # pad sentences to max_len
        tokenized_sent += ["<pad>"] * (max_len - len(tokenized_sent))
        
        # Encode tokens to input_ids
        input_id = [word2idx.get(token) for token in tokenized_sent]
        input_ids.append(input_id)
        
    return np.array(input_ids)

#### 3.2 Load Pretrained Vectors

We will load the pretrained vectors for each token in our vocabulary. For tokens with no pretraiend vectors, we will initialize random word vectors with the same dimension and variance.

In [8]:
from tqdm import tqdm_notebook

def load_pretrained_vectors(word2idx, fname):
    """
        Load pretrained vectors and create embedding layers.
        
        Args:
            word2idx (Dict): Vocabulary built form the corpus
            fname (str): Path to pretrained vector file
            
        Returns:
            embeddings (np.array): Embedding matrix with shape (N, d) 
                where N is the size of word2idx and d is embedding dimension.
    """
    
    print("Loading pretrained vectors...")
    fin = open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
    n, d = map(int, fin.readline().split())
    
    # Initialize random embeddings
    embeddings = np.random.uniform(-0.25, 0.25, (len(word2idx), d))
    embeddings[word2idx['<pad>']] = np.zeros((d,))
    
    # Load pretrained vectors
    count = 0
    for line in tqdm_notebook(fin):
        tokens = line.rstrip().split(' ')
        word = tokens[0]
        if word in word2idx:
            count += 1
            embeddings[word2idx[word]] = np.array(tokens[1:], dtype=np.float32)
            
    print(f"There are {count} / {len(word2idx)} pretrained vectors found.")
    
    return embeddings

Finally time to perform the above functions defined.

In [9]:
np.array(tweets_df['message'])

array(['arirang simply kpop kim hyung jun cross ha yeong playback',
       'read politico article donald trump running mate tom brady list likely choice',
       'type bazura project google image image photo dad glenn moustache whatthe',
       ..., 'bring dunkin iced coffee tomorrow hero',
       'currently holiday portugal come home tomorrow poland tuesday holocaust memorial trip',
       'ladykiller saturday aternoon'], dtype=object)

In [10]:
# Tokenize, build vocabulary, encode tokens
print("Tokenizing...\n")
tokenized_tweets, word2idx, max_len = tokenize(np.array(tweets_df['message']))
input_ids = encode(tokenized_tweets, word2idx, max_len)

# Load pretrained vectors
embeddings = load_pretrained_vectors(word2idx, "fastText/crawl-300d-2M.vec")
embeddings = torch.tensor(embeddings)

Tokenizing...

Loading pretrained vectors...


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


0it [00:00, ?it/s]

There are 26422 / 35778 pretrained vectors found.


In [11]:
os.listdir("/kaggle/working/fastText")

['crawl-300d-2M.vec', 'crawl-300d-2M.vec.zip']

#### 3.3 Create Pytorch DataLoader
We will create an iterator for our dataset using the torch DataLoader class. This will help save on memory during training and boost the training speed. 

In [12]:
from torch.utils.data import (TensorDataset, DataLoader, 
                              RandomSampler, SequentialSampler)

def data_loader(train_inputs, test_inputs, val_inputs, train_labels, test_labels, val_labels,
               batch_size=64):
    """
        Convert train, test, and validation sets to torch.Tensors
        and load them to DataLoader
    """
    
    # Convert data type to torch.Tensor
    train_inputs, test_inputs, val_inputs, \
    train_labels, test_labels, val_labels = tuple(torch.tensor(data) for data in 
                                                  [train_inputs, test_inputs, val_inputs, train_labels, test_labels, val_labels])
    
    # Specify batch_size
    batch_size = 64
    
    # Create DataLoader for training data
    train_data = TensorDataset(train_inputs, train_labels)
    train_sampler = RandomSampler(train_data)
    train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)
    
    # Create DataLoader for testing data
    test_data = TensorDataset(test_inputs, test_labels)
    test_sampler = RandomSampler(test_data)
    test_dataloader = DataLoader(test_data, sampler=test_sampler, batch_size=batch_size)
    
    valid_data = TensorDataset(val_inputs, val_labels)
    valid_sampler = RandomSampler(valid_data)
    valid_dataloader = DataLoader(valid_data, sampler=valid_sampler, batch_size=batch_size)
    
    return train_dataloader, test_dataloader, valid_dataloader

The dataset will be split into 70% train, 20% test, 10% validation.

In [13]:
from sklearn.model_selection import train_test_split

# Train Test Split
X_train, X_test, y_train, y_test = train_test_split(input_ids, np.array(tweets_df['label'].values), 
                                                   test_size=0.2, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.1, random_state=42)

y_train = torch.from_numpy(y_train).long()
y_test = torch.from_numpy(y_test).long()
y_valid = torch.from_numpy(y_valid).long()

# Load data to Pytorch DataLoader
train_dataloader, test_dataloder, val_dataloader = \
data_loader(X_train, X_test, X_valid, y_train, y_test, y_valid, batch_size=64)

  del sys.path[0]


In [14]:
for (X, y) in train_dataloader:
    print(X.shape, y.shape)
    break

torch.Size([64, 27]) torch.Size([64])


### 4: Model Development

#### 4.1 Build CNN Model 

In [15]:
import torch.nn as nn
import torch.nn.functional as F

class CNNSentimentClassifier(nn.Module):
    """A 1D Convonlutional Neural Network for Text Classification"""
    def __init__(self, pretrained_embedding=None, freeze_embedding=False,
                vocab_size=None, embed_dim=300, filter_sizes=[3,4,5],
                num_filters=[100, 100, 100], num_classes=2, dropout=0.5):

        """
            The constructor for CNNSentimentClassifier class.
            
            Args:
                pretrained_embedding (torch.Tensor): Pretrained embeddings with 
                            shape (vocab_size, embed_dim)
                freeze_embedding (bool): Set to False to fine-tune pretrained 
                            vectors. Default: False
                vocab_size (int): Need to be specified when not pretrained word 
                            embeddings are not used.
                embed_dim (int): Dimension of word vectors. Need to be specified
                            when pretrained word embeddings are not used. Default: 300
                filter_sizes (List[int]): List of number of filter sizes. Default: [3,4,5]
                num_filters (List[int]): List of number of filters, has the same length as 
                            `filter_sizes`. Default: [100, 100, 100]
                n_classes (int): Number of classes. Default: 2
                dropout (float): Dropout rate. Default: 0.5
        """
        
        super(CNNSentimentClassifier, self).__init__()
        
        # Embedding layer
        if pretrained_embedding is not None:
            self.vocab_size, self.embed_dim = pretrained_embedding.shape
            self.embedding = nn.Embedding.from_pretrained(pretrained_embedding, freeze=freeze_embedding)
        
        else:
            self.embed_dim = embed_dim
            self.embedding = nn.Embedding(num_embeddings = vocab_size,
                                         embedding_dim = self.embed_dim,
                                         padding_idx = 0,
                                         max_norm = 5.0)
            
        # Conv Network
        self.conv1d_list = nn.ModuleList([
            nn.Conv1d(in_channels=self.embed_dim,
                      out_channels=num_filters[i],
                      kernel_size=filter_sizes[i])
            for i in range(len(filter_sizes))
        ])
        
        # Fully-connected layer and Dropout
        self.fc = nn.Linear(np.sum(num_filters), num_classes)
        self.dropout = nn.Dropout(p=dropout)
        
    def forward(self, input_ids):
        """
            Perform a forward pass through the network.
        
            Args:
                input_ids (torch.Tensor): A tensor of tokens ids with shape (batch_size, max_sentence_length)
            
            Returns:
                logits (torch.Tensor): Output logits with shape (batch_size, n_classes)
        """
        
        # Get embeddings from `input_ids`. Output shape: (b, max_len, embed_dim)
        x_embed = self.embedding(input_ids).float()
        
        # Permute `x_embed` to match input shape requiremnet of `nn.Conv1d`
        # Output shape: (b, embed_dim, max_len)
        x_reshaped = x_embed.permute(0, 2, 1)
        
        # Apply CNN and ReLU. Output shape: (b, num_filters[i], L_out)
        x_conv_list = [F.relu(conv1d(x_reshaped)) for conv1d in self.conv1d_list]
        
        # Max Pooling. Output shape: (b, num_filters[i], 1)
        x_pool_list = [F.max_pool1d(x_conv, kernel_size=x_conv.shape[2])
                       for x_conv in x_conv_list]
        
        # Concatenate x_pool_list to feed the fully connected layer.
        # Output shape: (b, sum(num_filters))
        x_fc = torch.cat([x_pool.squeeze(dim=2) for x_pool in x_pool_list],
                         dim=1)
        
        # Compute logits. Output shape: (b, n_classes)
        logits = self.fc(self.dropout(x_fc))
        
        return logits


#### 4.2 Optimizer
To train Deep Learning models, we need to define a loss function and minimize this loss. We’ll use back-propagation to compute gradients and use an optimization algorithm (ie. Gradient Descent) to minimize the loss.

In [16]:
import torch.optim as optim

def initialize_model(pretrained_embedding=None,
                     freeze_embedding=False,
                     vocab_size=None,
                     embed_dim=300,
                     filter_sizes=[3, 4, 5],
                     num_filters=[100, 100, 100],
                     num_classes=2,
                     dropout=0.5,
                     learning_rate=0.01):
    """Instantiate a CNN model and an optimizer."""
    
    assert (len(filter_sizes) == len(num_filters)), "filter_sizes \
    and num_filters need to be of the same length"
    
    # Instantiate CNN model
    cnn_model = CNNSentimentClassifier(pretrained_embedding=pretrained_embedding,
                                       freeze_embedding=freeze_embedding,
                                       vocab_size=vocab_size,
                                       embed_dim=embed_dim,
                                       filter_sizes=filter_sizes,
                                       num_filters=num_filters,
                                       num_classes=3,
                                       dropout=0.5)
    
    # Send model to `device` (GPU/CPU)
    cnn_model.to(device)
    
    # Instantiate Adadelta optimizer
    optimizer = optim.Adadelta(cnn_model.parameters(),
                               lr=learning_rate,
                               rho=0.95)
    
    return cnn_model, optimizer

#### 4.4 Training Loop

For each epoch, the code below will perform a forward step to compute the Cross Entropy loss, a backward step to compute gradients and use the optimizer to update weights/parameters. At the end of each epoch, the loss on training data and the accuracy over the validation data will be printed to help us keep track of the model’s performance. The code is heavily annotated with detailed explanations.

In [17]:
import random
import time

# Specify loss function
loss_fn = nn.CrossEntropyLoss()

def set_seed(seed_value=42):
    """Set seed for reproducibility."""
    
    random.seed(seed_value)
    np.random.seed(seed_value)
    torch.manual_seed(seed_value)
    torch.cuda.manual_seed_all(seed_value)
    
def train(model, optimizer, train_dataloader, val_dataloader=None, epochs=10):
    """Train the CNN Model."""
    
    # Tracking best validation accuracy
    best_accuracy = 0
    
    # Start training loop
    print("Start training...\n")
    print(f"{'Epoch':^7} | {'Train Loss': ^12} | {'Val Loss': ^10} | {'Val Acc': ^9} | {'Elapsed': ^9}")
    print("-"*60)
    
    for epoch_i in range(epochs):
        # =======================================
        #               Training
        # =======================================
        
        # Tracking time and loss
        t0_epoch = time.time()
        total_loss = 0
        
        # Put the model into training mode
        model.train()
        
        for step, batch in enumerate(train_dataloader):
            # Load batch to GPU
            b_input_ids, b_labels = tuple(t.to(device) for t in batch)
#             print(b_labels, b_input_ids)
#             print(b_labels.shape, b_input_ids)
#             print(type(b_labels), type(b_input_ids))
            
            # Zero out any previously calculated gradients
            optimizer.zero_grad()
            
            # Perform a forward pass. This will return logits.
            logits = model(b_input_ids)
#             print(logits.shape, b_labels.shape)
            # Compute loss and accumulate the loss values
            loss = loss_fn(logits, b_labels)
            total_loss += loss.item()
            
            # Perform a backward pass to calculate gradients
            loss.backward()
            
            # Update parameters
            optimizer.step()
            
        # Calculate the average loss over the entire training data
        avg_train_loss = total_loss / len(train_dataloader)
        
        # =======================================
        #               Evaluation
        # =======================================
        
        if val_dataloader is not None:
            # After the completion of each training epoch, measure the model's
            # performance on our validation set.
            val_loss, val_accuracy = evaluate(model, val_dataloader)
            
            # Train the best accuracy
            if val_accuracy > best_accuracy:
                best_accuracy = val_accuracy
                
            # Print performance over the entire training data
            time_elapsed = time.time() - t0_epoch
            print(f"{epoch_i + 1: ^7} | {avg_train_loss: ^12.6f} | {val_loss:^10.6f} | {val_accuracy: ^9.2f} | {time_elapsed: ^9.2f}")
            
    print("\n")
    print(f"Training complete! Best Accuracy: {best_accuracy:.2f}%.")
    
def evaluate(model, val_dataloader):
    """After the completion of each training epoch, measure the model's
    performance on our validation set."""
    
    # Put the model into evaluation mode. The dropout layers are diabled
    # during test time
    model.eval()
    
    # Tracking variables
    val_accuracy = []
    val_loss = []
    
    # For each batch in our validation set...
    for batch in val_dataloader:
        # Load batch to GPU
        b_input_ids, b_labels = tuple(t.to(device) for t in batch)
       
        # Compute logits
        with torch.no_grad():
            logits = model(b_input_ids)
            
        # Compute loss
        loss = loss_fn(logits, b_labels)
        val_loss.append(loss.item())
        
        # Get the predictions
        preds = torch.argmax(logits, dim=1).flatten()
        
        # Calculate the accuracy rate
        accuracy = (preds == b_labels).cpu().numpy().mean() * 100
        val_accuracy.append(accuracy)
        
    # Compute the average accuracy and loss over the validation set.
    val_loss = np.mean(val_loss)
    val_accuracy = np.mean(val_accuracy)
    
    return val_loss, val_accuracy
    

### 5. Evaluation

We will experiment with all 3 variations of CNN and compare their performance. 

* **CNN-rand**: The baseline model where the embedding layer is randomly initialized and then updated during training.
* **CNN-static**: A model with pretrained vectors. However, the embedding layer is freezed during training.
* **CNN-non-static**: Same as above but the embedding layers is fine-tuned during training.

**CNN-rand**: Word vectors are randomly initialized.

In [20]:
set_seed(42)
cnn_rand, optimizer = initialize_model(vocab_size=len(word2idx),
                                      embed_dim=300,
                                      learning_rate=0.25,
                                      dropout=0.5)

train(cnn_rand, optimizer, train_dataloader, val_dataloader, epochs=50)

Start training...

 Epoch  |  Train Loss  |  Val Loss  |  Val Acc  |  Elapsed 
------------------------------------------------------------
   1    |   0.957330   |  0.890498  |   57.59   |   5.35   
   3    |   0.773872   |  0.830403  |   60.63   |   5.30   
   4    |   0.695884   |  0.829802  |   61.30   |   5.19   
   5    |   0.621207   |  0.845834  |   61.07   |   5.46   
   6    |   0.553963   |  0.849505  |   60.87   |   5.25   
   7    |   0.495847   |  0.866690  |   60.71   |   5.20   
   8    |   0.438563   |  0.890770  |   61.25   |   5.23   
   9    |   0.390458   |  0.938950  |   60.37   |   5.18   
  10    |   0.348128   |  0.975072  |   60.45   |   5.26   
  11    |   0.311350   |  0.983773  |   60.81   |   5.33   
  12    |   0.286373   |  1.037823  |   60.66   |   5.28   
  13    |   0.258487   |  1.062248  |   59.98   |   5.16   
  14    |   0.240484   |  1.104778  |   60.81   |   5.29   
  15    |   0.220610   |  1.126926  |   59.99   |   5.15   
  16    |   0.201376

In [19]:
torch.__version__

'1.11.0'

**CNN-static**: fastText pretrained word vectors are used and freezed during training

In [22]:
set_seed(42)
cnn_static, optimizer = initialize_model(pretrained_embedding=embeddings,
                                        freeze_embedding=True,
                                        learning_rate=0.25,
                                        dropout=0.5)

train(cnn_static, optimizer, train_dataloader, val_dataloader, epochs=50)

Start training...

 Epoch  |  Train Loss  |  Val Loss  |  Val Acc  |  Elapsed 
------------------------------------------------------------
   1    |   0.876601   |  0.799927  |   61.81   |   2.36   
   2    |   0.766044   |  0.754865  |   65.87   |   2.11   
   3    |   0.721130   |  0.755264  |   64.88   |   2.16   
   4    |   0.677579   |  0.791402  |   63.81   |   2.54   
   5    |   0.632396   |  0.760268  |   65.04   |   2.11   
   6    |   0.581718   |  0.781103  |   65.22   |   2.29   
   7    |   0.529278   |  0.771659  |   65.29   |   2.09   
   8    |   0.485691   |  0.812627  |   64.98   |   2.07   
   9    |   0.433641   |  0.915200  |   62.75   |   2.06   
  10    |   0.393723   |  0.874937  |   63.28   |   2.09   
  11    |   0.358975   |  0.861795  |   64.20   |   2.41   
  12    |   0.332404   |  0.914028  |   64.18   |   2.22   
  13    |   0.308152   |  0.947374  |   64.39   |   2.18   
  14    |   0.277040   |  0.970528  |   64.59   |   2.10   
  15    |   0.260062

**CNN-non-static**: fastText pretrained word vectors are fine-tuned druing training

In [23]:
set_seed(42)
cnn_non_static, optimizer = initialize_model(pretrained_embedding=embeddings,
                                            freeze_embedding=False,
                                            learning_rate=0.25,
                                            dropout=0.5)
train(cnn_non_static, optimizer, train_dataloader, val_dataloader, epochs=50)

Start training...

 Epoch  |  Train Loss  |  Val Loss  |  Val Acc  |  Elapsed 
------------------------------------------------------------
   1    |   0.873470   |  0.794436  |   62.33   |   9.41   
   2    |   0.758100   |  0.748162  |   65.91   |   9.40   
   3    |   0.710624   |  0.749168  |   65.90   |   9.56   
   4    |   0.662624   |  0.775676  |   64.59   |   9.40   
   5    |   0.614046   |  0.756444  |   65.69   |   9.40   
   6    |   0.558201   |  0.779927  |   65.53   |   9.45   
   7    |   0.500743   |  0.774588  |   65.66   |   9.42   
   8    |   0.450207   |  0.847499  |   63.75   |   9.40   
   9    |   0.397661   |  0.938427  |   62.82   |   9.40   
  10    |   0.351946   |  0.902959  |   63.71   |   9.48   
  11    |   0.314390   |  0.893416  |   65.57   |   9.41   
  12    |   0.280492   |  0.948975  |   64.60   |   9.39   
  13    |   0.254926   |  1.002899  |   64.29   |   9.46   
  14    |   0.227664   |  1.055433  |   64.77   |   9.40   
  15    |   0.206875

### 6. Test Model

In [24]:
def predict(text, model=cnn_non_static.to('cpu'), max_len=62):
    """Predict probability that a review is positive"""
    
    # Tokenize, pad, and encode text
    tokens = word_tokenize(text.lower())
    padded_tokens = tokens + ["<pad>"] * (max_len - len(tokens))
    input_id = [word2idx.get(token, word2idx['<unk>']) for token in padded_tokens]
    
    # Convert to PyTorch tensors
    input_id = torch.tensor(input_id).unsqueeze(dim=0)
    
    # Compute logits
    logits = model.forward(input_id)
    
    # Compute probability
    probs = F.softmax(logits, dim=1).squeeze(dim=0)
    
    print(f"This review is {probs[1] * 100:.2f}% positive.")

In [25]:
predict("All of friends slept while watching this movie. But I really enjoyed it.")
predict("I have waited so long for this movie. I am now so satisfied and happy.")
predict("This movie is long and boring.")
predict("I don't like the ending.")

This review is 0.94% positive.
This review is 0.00% positive.
This review is 1.38% positive.
This review is 9.67% positive.
