### CNN in text classification.
``Covulutinal Neural Nets`` are not the best choice for precessing sequencial data as they are mainly meant for processing images. In this Notebook, however we are going to use ``CNN`` in text classification.

From the previous Notebook we were doing text classification using the `FastText`. We are not interested in calculating `n-grams` this time as we did from the previous `FastText` sentiment analyis. Instead we are going to include batch dimension since `CNN` expect the ``batch_size``. We can achive this by adding `batch_first = True` in the TEXT field.

### Loading the data

In [2]:
import torch
from torchtext.legacy import data, datasets
import numpy as np
import random

In [4]:
SEED = 42
random.seed(SEED)
torch.manual_seed(SEED)
np.random.seed(SEED)
torch.backends.cudnn.deterministic = True

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [5]:
TEXT = data.Field(tokenize="spacy", tokenizer_language="en_core_web_sm", batch_first=True)
LABEL = data.LabelField(dtype=torch.float32)

In [6]:
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

aclImdb_v1.tar.gz:   0%|          | 0.00/84.1M [00:00<?, ?B/s]

downloading aclImdb_v1.tar.gz


aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:03<00:00, 27.3MB/s]


In [7]:
validation_data, test_data = test_data.split(random_state=random.seed(SEED))

### Checking how many examples for each set.

In [8]:
print(f"TRAINING: \t {len(train_data)}")
print(f"TESTING: \t {len(test_data)}")
print(f"VALIDATION: \t {len(validation_data)}")

TRAINING: 	 25000
TESTING: 	 7500
VALIDATION: 	 17500


### Building a Vocabulary Using pretrained Word Vectors.
* We are going to use `glove.6B.100d` vectors.

In [9]:
MAX_VOCAB_SIZE = 25_000
TEXT.build_vocab(
    train_data,
    max_size = MAX_VOCAB_SIZE,
    vectors= "glove.6B.100d",
    unk_init = torch.Tensor.normal_
)
LABEL.build_vocab(train_data)

.vector_cache/glove.6B.zip: 862MB [02:42, 5.30MB/s]                           
 99%|█████████▉| 397350/400000 [00:13<00:00, 28484.01it/s]

### Creating Iterators.
We are going to use the `BucketIterator` to create our iterators.

In [11]:
BATCH_SIZE = 64

train_iterators, validation_iterators, test_iterators = data.BucketIterator.splits(
    (train_data, validation_data, test_data),
    batch_size = BATCH_SIZE,
    device = device
)

### Building a `CNN` text classifier.

#### How `CNN` are used for text.
Images are typically 2 dimensional (we'll ignore the fact that there is a third "colour" dimension for now) whereas text is 1 dimensional. However, we know that the first step in almost all of our previous tutorials (and pretty much all NLP pipelines) is converting the words into word embeddings. This is how we can visualize our words in 2 dimensions, each word along one axis and the elements of vectors aross the other dimension. Consider the 2 dimensional representation of the embedded sentence below:

<p align="center">
  <img src="https://github.com/bentrevett/pytorch-sentiment-analysis/raw/2b666b3cba7d629a2f192c7d9c66fadcc9f0c363/assets/sentiment9.png"/>
</p>

We can then use a filter that is **[``n`` ``x`` ``emb_dim``]**. This will cover **$n$** sequential words entirely, as their width will be emb_dim dimensions. Consider the image below, with our word vectors are represented in green. Here we have 4 words with 5 dimensional embeddings, creating a **``[4x5]``** ``imag`` tensor. A filter that covers two words at a time (i.e. bi-grams) will be **``[2x5]``** filter, shown in yellow, and each element of the filter with have a weight associated with it. The output of this filter (shown in red) will be a single real number that is the weighted sum of all elements covered by the filter.

<p align="center">
  <img src="https://github.com/bentrevett/pytorch-sentiment-analysis/raw/2b666b3cba7d629a2f192c7d9c66fadcc9f0c363/assets/sentiment12.png"/>
</p>

The filter then moves ``down`` the image (or across the sentence) to cover the next ``bi-gram`` and another output (weighted sum) is calculated and so on.

<p align="center">
  <img src="https://github.com/bentrevett/pytorch-sentiment-analysis/raw/2b666b3cba7d629a2f192c7d9c66fadcc9f0c363/assets/sentiment13.png"/>
  <img src="https://github.com/bentrevett/pytorch-sentiment-analysis/raw/2b666b3cba7d629a2f192c7d9c66fadcc9f0c363/assets/sentiment14.png"/>
</p>

In our case (and in the general case where the width of the filter equals the width of the "image"), our output will be a vector with number of elements equal to the height of the image (or lenth of the word) minus the height of the filter plus one, $4-2+1=3$ in this case.

This example showed how to calculate the output of one filter. Our model (and pretty much all CNNs) will have lots of these filters. The idea is that each filter will learn a different feature to extract. In the above example, we are hoping each of the [2 x emb_dim] filters will be looking for the occurence of different bi-grams.

In our model, we will also have different sizes of filters, heights of 3, 4 and 5, with 100 of each of them. The intuition is that we will be looking for the occurence of different tri-grams, 4-grams and 5-grams that are relevant for analysing sentiment of movie reviews.

The next step in our model is to use pooling (specifically max pooling) on the output of the convolutional layers. This is similar to the FastText model where we performed the average over each of the word vectors, implemented by the F.avg_pool2d function, however instead of taking the average over a dimension, we are taking the maximum value over a dimension. Below an example of taking the maximum value (0.9) from the output of the convolutional layer on the example sentence (not shown is the activation function applied to the output of the convolutions).

<p align="center">
  <img src="https://github.com/bentrevett/pytorch-sentiment-analysis/raw/2b666b3cba7d629a2f192c7d9c66fadcc9f0c363/assets/sentiment15.png"/>
</p>

The idea here is that the maximum value is the "most important" feature for determining the sentiment of the review, which corresponds to the "most important" ``n-gram`` within the review. 

### How do we know what the "most important" ``n-gram`` is? 
Luckily, we don't have to! Through backpropagation, the weights of the filters are changed so that whenever certain n-grams that are highly indicative of the sentiment are seen, the output of the filter is a "high" value. This "high" value then passes through the max pooling layer if it is the maximum value in the output.

As our model has 100 filters of 3 different sizes, that means we have 300 different n-grams the model thinks are important. We concatenate these together into a single vector and pass them through a linear layer to predict the sentiment. We can think of the weights of this linear layer as "weighting up the evidence" from each of the 300 ``n-grams`` and making a final decision.

### Model Implementation.
We implement the convolutional layers with ``nn.Conv2d``. The ``in_channels`` argument is the number of "channels" in your image going into the convolutional layer. In actual images this is usually ``3`` (one channel for each of the red, blue and green channels), however when using text we only have a **single** channel, the text itself. The ``out_channels`` is the number of filters and the ``kernel_size`` is the size of the filters. Each of our ``kernel_sizes`` is going to be ``[n x emb_dim]`` where $n$ is the size of the n-grams.

In PyTorch, RNNs want the input with the batch dimension second, whereas ``CNNs`` want the batch dimension first - we do not have to permute the data here as we have already set ``batch_first = True`` in our ``TEXT`` field. We then pass the sentence through an embedding layer to get our embeddings. The second dimension of the input into a ``nn.Conv2d`` layer must be the channel dimension. As text technically does not have a channel dimension, we ``unsqueeze`` our tensor to create one. This matches with our ``in_channels=1`` in the initialization of our convolutional layers.

We then pass the tensors through the convolutional and pooling layers, using the ReLU activation function after the convolutional layers. Another nice feature of the pooling layers is that they handle sentences of different lengths. The size of the output of the convolutional layer is dependent on the size of the input to it, and different batches contain sentences of different lengths. Without the max pooling layer the input to our linear layer would depend on the size of the input sentence (not what we want). One option to rectify this would be to trim/pad all sentences to the same length, however with the max pooling layer we always know the input to the linear layer will be the total number of filters. Note: there an exception to this if your sentence(s) are shorter than the largest filter used.
 
Note: **You will then have to pad your sentences to the length of the largest filter. In the IMDb data there are no reviews only 5 words long so we don't have to worry about that, but you will if you are using your own data.**

Finally, we perform dropout on the concatenated filter outputs and then pass them through a linear layer to make our predictions.

In [19]:
import torch.nn as nn
from torch.nn import  functional as F

### CNN with different filters.

In [22]:
class CNN(nn.Module):
  def __init__(self, vocab_size, embedding_size, n_filters, filter_sizes, output_size, 
                 dropout, pad_idx):
    super(CNN, self).__init__()

    self.embedding = nn.Embedding(vocab_size, embedding_dim=embedding_size, padding_idx=pad_idx)
    self.conv_0 = nn.Conv2d(in_channels = 1, 
                        out_channels = n_filters, 
                        kernel_size = (filter_sizes[0], embedding_size))
    self.conv_1 = nn.Conv2d(in_channels = 1, 
                        out_channels = n_filters, 
                        kernel_size = (filter_sizes[1], embedding_size))
    self.conv_2 = nn.Conv2d(in_channels = 1, 
                        out_channels = n_filters, 
                        kernel_size = (filter_sizes[2], embedding_size))
    self.fc = nn.Linear(len(filter_sizes) * n_filters, output_size)
    self.dropout = nn.Dropout(dropout)
  
  def forward(self, x):
    #text = [batch size, sent len]
    embedded = self.embedding(text)  
    #embedded = [batch size, sent len, emb dim
    embedded = embedding.unsqueeze(1)

    #embedded = [batch size, 1, sent len, emb dim]
    conved_0 = F.relu(self.conv_0(embedded).squeeze(3))
    conved_1 = F.relu(self.conv_1(embedded).squeeze(3))
    conved_2 = F.relu(self.conv_2(embedded).squeeze(3))
        
    # conved_n = [batch size, n_filters, sent len - filter_sizes[n] + 1]
    pooled_0 = F.max_pool1d(conved_0, conved_0.shape[2]).squeeze(2)
    pooled_1 = F.max_pool1d(conved_1, conved_1.shape[2]).squeeze(2)
    pooled_2 = F.max_pool1d(conved_2, conved_2.shape[2]).squeeze(2)

    #pooled_n = [batch size, n_filters]
    cat = self.dropout(torch.cat((pooled_0, pooled_1, pooled_2), dim = 1))
    #cat = [batch size, n_filters * len(filter_sizes)]
    return self.fc(cat)


### A generic `CNN` that takes any number of filters.

We do this by placing all of our convolutional layers in a ``nn.ModuleList``.

### What is ``nn.ModuleList``?
Is a function used to hold a list of PyTorch ``nn.Modules.`` If we simply used a standard Python list, the modules within the list cannot be "seen" by any modules outside the list which will cause us some errors.

We can now pass an arbitrary sized list of filter sizes and the list comprehension will create a convolutional layer for each of them. Then, in the forward method we iterate through the list applying each convolutional layer to get a list of convolutional outputs, which we also feed through the max pooling in a list comprehension before concatenating together and passing through the dropout and linear layers.

In [23]:
class CNN(nn.Module):
  def __init__(self, vocab_size, embedding_size, n_filters, filter_sizes, output_size, 
            dropout, pad_idx):
    super().__init__()
    self.embedding = nn.Embedding(vocab_size, embedding_size, padding_idx = pad_idx)
    self.convs = nn.ModuleList([
                                nn.Conv2d(in_channels = 1, 
                                          out_channels = n_filters, 
                                          kernel_size = (fs, embedding_size)) 
                                for fs in filter_sizes
                                ])
    self.fc = nn.Linear(len(filter_sizes) * n_filters, output_size)
    self.dropout = nn.Dropout(dropout)

  def forward(self, text):  
    #text = [batch size, sent len]
    embedded = self.embedding(text)    
    #embedded = [batch size, sent len, emb dim]
    embedded = embedded.unsqueeze(1)
    #embedded = [batch size, 1, sent len, emb dim]

    conved = [F.relu(conv(embedded)).squeeze(3) for conv in self.convs]
    #conved_n = [batch size, n_filters, sent len - filter_sizes[n] + 1]

    pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved]
    #pooled_n = [batch size, n_filters]
    cat = self.dropout(torch.cat(pooled, dim = 1))
    #cat = [batch size, n_filters * len(filter_sizes)]  
    return self.fc(cat)

### Conv1d
We can also implement the above model using ``1-dimensional `` convolutional layers, where the embedding dimension is the "depth" of the filter and the number of tokens in the sentence is the width.

In [24]:
class CNN1d(nn.Module):
  def __init__(self, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, 
            dropout, pad_idx):
    super().__init__()
    self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
    self.convs = nn.ModuleList([
                                nn.Conv1d(in_channels = embedding_dim, 
                                          out_channels = n_filters, 
                                          kernel_size = fs)
                                for fs in filter_sizes
                                ])
    self.fc = nn.Linear(len(filter_sizes) * n_filters, output_dim)
    self.dropout = nn.Dropout(dropout)
        
  def forward(self, text):
    #text = [batch size, sent len]
    embedded = self.embedding(text)  
    #embedded = [batch size, sent len, emb dim]
    embedded = embedded.permute(0, 2, 1)
    #embedded = [batch size, emb dim, sent len]
    conved = [F.relu(conv(embedded)) for conv in self.convs]
    #conved_n = [batch size, n_filters, sent len - filter_sizes[n] + 1]
    pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved]
    #pooled_n = [batch size, n_filters]
    cat = self.dropout(torch.cat(pooled, dim = 1))
    #cat = [batch size, n_filters * len(filter_sizes)]
    return self.fc(cat)


### Hyper Parameters

In [25]:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
N_FILTERS = 100
FILTER_SIZES = [3, 4, 5]
OUTPUT_DIM = 1
DROPOUT = 0.5
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

### CNN `2D` instance.

In [26]:
conv_2d_model = CNN(INPUT_DIM, EMBEDDING_DIM, N_FILTERS, FILTER_SIZES, OUTPUT_DIM, DROPOUT, PAD_IDX)

### CNN `1D` instance

In [30]:
conv_1d_model = CNN1d(INPUT_DIM, EMBEDDING_DIM, N_FILTERS, FILTER_SIZES, OUTPUT_DIM, DROPOUT, PAD_IDX)

### Checking Trainable Parameters.


In [31]:
def count_trainable_params(model):
  return sum([p.numel() for p in model.parameters() if p.requires_grad])

print(f'The model (CNN2D) has  {count_trainable_params(conv_2d_model):,} trainable parameters')
print(f'The model (CNN1D) has  {count_trainable_params(conv_1d_model):,} trainable parameters')

The model (CNN2D) has  2,620,801 trainable parameters
The model (CNN1D) has  2,620,801 trainable parameters


### Trainning the Model.
First let's train the `Conv2d` model first and then we will move on and train the `Conv1D`

### Loading pretrained embeddings

In [32]:
pretrained_embeddings = TEXT.vocab.vectors

In [94]:
conv_2d_model.embedding.weight.data.copy_(pretrained_embeddings)

tensor([[ 1.9269,  1.4873,  0.9007,  ...,  0.1233,  0.3499,  0.6173],
        [ 0.7262,  0.0912, -0.3891,  ...,  0.0821,  0.4440, -0.7240],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [ 0.6612,  0.4606,  0.7589,  ...,  1.0253, -0.7948,  0.7347],
        [ 0.3369, -1.2521, -0.7555,  ..., -1.7327, -0.9087,  0.3905],
        [ 1.0300,  0.0859,  1.1354,  ..., -0.4458,  2.0626, -3.2186]],
       device='cuda:0')

### Zero the initial weights of the ``unknown`` and ``padding `` tokens.



In [95]:
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]
for i in range(2):
  conv_2d_model.embedding.weight.data[i] = torch.zeros(EMBEDDING_DIM)

""" same as doing it this way
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]
model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)
"""

' same as doing it this way\nUNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]\nmodel.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)\nmodel.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)\n'

In [96]:
print(conv_2d_model.embedding.weight.data)

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [ 0.6612,  0.4606,  0.7589,  ...,  1.0253, -0.7948,  0.7347],
        [ 0.3369, -1.2521, -0.7555,  ..., -1.7327, -0.9087,  0.3905],
        [ 1.0300,  0.0859,  1.1354,  ..., -0.4458,  2.0626, -3.2186]],
       device='cuda:0')


### Trainning the `Conv2D` model.

In [97]:
optimizer = torch.optim.Adam(conv_2d_model.parameters())
criterion = nn.BCEWithLogitsLoss()

### Pushing loss function and model to the device.

In [98]:

conv_2d_model = conv_2d_model.to(device)
criterion = criterion.to(device)

### Accuracy function.

In [69]:
def accuracy(y_preds, y_true):
  #round predictions to the closest integer
  rounded_preds = torch.round(torch.sigmoid(y_preds))
  correct = (rounded_preds == y_true).float() #convert into float for division 
  acc = correct.sum() / len(correct)
  return acc

### Training and Evaluation function

In [70]:
def train(model, iterator, optimizer, criterion):
    epoch_loss = 0
    epoch_acc = 0
    model.train()
    for batch in iterator:
        optimizer.zero_grad()
        text = batch.text
        predictions = model(text).squeeze(1)
        loss = criterion(predictions, batch.label)
        acc = accuracy(predictions, batch.label)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
        epoch_acc += acc.item()
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

def evaluate(model, iterator, criterion):
    epoch_loss = 0
    epoch_acc = 0
    model.eval()
    with torch.no_grad():
        for batch in iterator:
            text = batch.text
            predictions = model(text).squeeze(1)
            loss = criterion(predictions, batch.label)
            acc = accuracy(predictions, batch.label)
            epoch_loss += loss.item()
            epoch_acc += acc.item()
    return epoch_loss / len(iterator), epoch_acc / len(iterator)


We'll also create a function to tell us how long an epoch takes to compare training times between models.

In [71]:
import time
def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

In [99]:
N_EPOCHS = 5
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
    start_time = time.time()
    train_loss, train_acc = train(conv_2d_model, train_iterators, optimizer, criterion)
    valid_loss, valid_acc = evaluate(conv_2d_model, validation_iterators, criterion)
    end_time = time.time()
    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(conv_2d_model.state_dict(), 'best-conv-2d-model.pt')
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

Epoch: 01 | Epoch Time: 0m 31s
	Train Loss: 0.440 | Train Acc: 79.64%
	 Val. Loss: 0.368 |  Val. Acc: 83.22%
Epoch: 02 | Epoch Time: 0m 30s
	Train Loss: 0.263 | Train Acc: 89.08%
	 Val. Loss: 0.351 |  Val. Acc: 84.52%
Epoch: 03 | Epoch Time: 0m 30s
	Train Loss: 0.195 | Train Acc: 92.40%
	 Val. Loss: 0.357 |  Val. Acc: 84.89%
Epoch: 04 | Epoch Time: 0m 30s
	Train Loss: 0.148 | Train Acc: 94.36%
	 Val. Loss: 0.359 |  Val. Acc: 85.26%
Epoch: 05 | Epoch Time: 0m 30s
	Train Loss: 0.114 | Train Acc: 95.77%
	 Val. Loss: 0.376 |  Val. Acc: 85.62%


### Evaluating the `Conv2d` model

In [100]:
conv_2d_model.load_state_dict(torch.load('best-conv-2d-model.pt'))

test_loss, test_acc = evaluate(conv_2d_model, test_iterators, criterion)

print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')

Test Loss: 0.355 | Test Acc: 84.59%


### Training the `conv1D` model.

In [101]:
conv_1d_model.embedding.weight.data.copy_(pretrained_embeddings)
for i in range(2):
  conv_1d_model.embedding.weight.data[i] = torch.zeros(EMBEDDING_DIM)

print(conv_1d_model.embedding.weight.data)

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [ 0.6612,  0.4606,  0.7589,  ...,  1.0253, -0.7948,  0.7347],
        [ 0.3369, -1.2521, -0.7555,  ..., -1.7327, -0.9087,  0.3905],
        [ 1.0300,  0.0859,  1.1354,  ..., -0.4458,  2.0626, -3.2186]],
       device='cuda:0')


In [102]:
optimizer = torch.optim.Adam(conv_1d_model.parameters())
criterion = nn.BCEWithLogitsLoss()

In [103]:
conv_1d_model = conv_1d_model.to(device)
criterion = criterion.to(device)

In [104]:
N_EPOCHS = 5
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
    start_time = time.time()
    train_loss, train_acc = train(conv_1d_model, train_iterators, optimizer, criterion)
    valid_loss, valid_acc = evaluate(conv_1d_model, validation_iterators, criterion)
    end_time = time.time()
    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(conv_1d_model.state_dict(), 'best-conv-1d-model.pt')
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

Epoch: 01 | Epoch Time: 0m 14s
	Train Loss: 0.256 | Train Acc: 89.44%
	 Val. Loss: 0.374 |  Val. Acc: 83.80%
Epoch: 02 | Epoch Time: 0m 14s
	Train Loss: 0.172 | Train Acc: 93.51%
	 Val. Loss: 0.371 |  Val. Acc: 84.54%
Epoch: 03 | Epoch Time: 0m 14s
	Train Loss: 0.134 | Train Acc: 95.09%
	 Val. Loss: 0.384 |  Val. Acc: 84.83%
Epoch: 04 | Epoch Time: 0m 14s
	Train Loss: 0.107 | Train Acc: 96.13%
	 Val. Loss: 0.413 |  Val. Acc: 84.23%
Epoch: 05 | Epoch Time: 0m 14s
	Train Loss: 0.085 | Train Acc: 97.03%
	 Val. Loss: 0.405 |  Val. Acc: 85.43%


### Evaluation the `conv1d` model

In [106]:
conv_1d_model.load_state_dict(torch.load('best-conv-1d-model.pt'))

test_loss, test_acc = evaluate(conv_1d_model, test_iterators, criterion)

print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')

Test Loss: 0.370 | Test Acc: 84.96%


### User Input.

**Note:** As mentioned in the implementation details, the input sentence has to be at least as long as the largest filter height used. We modify our ``predict_sentiment`` function to also accept a minimum length argument. If the tokenized input sentence is less than ``min_len`` tokens, we append padding tokens ``(<pad>)`` to make it ``min_len`` tokens.


In [107]:
import spacy
import en_core_web_sm
nlp = en_core_web_sm.load()

def predict_sentiment(model, sentence, min_len = 5):
    model.eval()
    tokenized = [tok.text for tok in nlp.tokenizer(sentence)]
    if len(tokenized) < min_len:
        tokenized += ['<pad>'] * (min_len - len(tokenized))
    indexed = [TEXT.vocab.stoi[t] for t in tokenized]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(0)
    prediction = torch.sigmoid(model(tensor))
    return prediction.item()

### Negative Review.

In [108]:
predict_sentiment(conv_2d_model, "This film is terrible")

0.0939841940999031

In [109]:
predict_sentiment(conv_1d_model, "This film is terrible")

0.25023776292800903

## Positive Review

In [110]:
predict_sentiment(conv_2d_model, "This film is great")

0.9003241062164307

In [111]:
predict_sentiment(conv_1d_model, "This film is great")

0.9770062565803528

In [112]:
print(LABEL.vocab.stoi)

defaultdict(None, {'neg': 0, 'pos': 1})


### Credits
* [bentrevett](https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/4%20-%20Convolutional%20Sentiment%20Analysis.ipynb)