# NLP - Multi-Class Text Classification using CNN+RNN

By [Akshaj Verma](https://akshajverma.com)  

This notebook takes you through the implementation of binary text classification in the form of sentiment analysis on yelp reviews using CNN+RNN in PyTorch.

In [1]:
import re
import numpy as np
import pandas as pd
from pprint import pprint
from collections import Counter

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.nn.utils.rnn import pad_sequence, pack_padded_sequence, pad_packed_sequence

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

%matplotlib inline

torch.manual_seed(1)

<torch._C.Generator at 0x7f9d806f43d0>

## Prepare Data

In [2]:
df = pd.read_csv("../../../data/nlp/text_classification/bbc-text.csv")
df = df.rename(columns = {'category':'tag'})
df.head()

Unnamed: 0,tag,text
0,tech,tv future in the hands of viewers with home th...
1,business,worldcom boss left books alone former worldc...
2,sport,tigers wary of farrell gamble leicester say ...
3,sport,yeading face newcastle in fa cup premiership s...
4,entertainment,ocean s twelve raids box office ocean s twelve...


### Convert from dataframe to list

In [3]:
sentence_list = [t for t in df['text'].to_list()]
tag_list = [t for t in df['tag'].to_list()]

#### The input sentences.

In [4]:
sentence_list[:2]

['tv future in the hands of viewers with home theatre systems  plasma high-definition tvs  and digital video recorders moving into the living room  the way people watch tv will be radically different in five years  time.  that is according to an expert panel which gathered at the annual consumer electronics show in las vegas to discuss how these new technologies will impact one of our favourite pastimes. with the us leading the trend  programmes and other content will be delivered to viewers via home networks  through cable  satellite  telecoms companies  and broadband service providers to front rooms and portable devices.  one of the most talked-about technologies of ces has been digital and personal video recorders (dvr and pvr). these set-top boxes  like the us s tivo and the uk s sky+ system  allow people to record  store  play  pause and forward wind tv programmes when they want.  essentially  the technology allows for much more personalised tv. they are also being built-in to hig

#### The output tags.

In [5]:
tag_list[:2]

['tech', 'business']

### Clean the input data.

In [6]:
# Convert to lowercase
sentence_list = [s.lower() for s in sentence_list]

# Remove non alphavets
regex_remove_nonalphabets = re.compile('[^a-zA-Z]')
sentence_list = [regex_remove_nonalphabets.sub(' ', s) for s in sentence_list]

# Remove words with less than 2 letters
regex_remove_shortwords = re.compile(r'\b\w{1,2}\b')
sentence_list = [regex_remove_shortwords.sub("", s) for s in sentence_list]

# Remove words that appear only once
c = Counter(w for s in sentence_list for w in s.split())
sentence_list = [' '.join(y for y in x.split() if c[y] > 1) for x in sentence_list]

# Strip extra whitespaces
sentence_list = [" ".join(s.split()) for s in sentence_list]

In [7]:
sentence_list[:2]

['future the hands viewers with home theatre systems plasma high definition tvs and digital video recorders moving into the living room the way people watch will radically different five years time that according expert panel which gathered the annual consumer electronics show las vegas discuss how these new technologies will impact one our favourite with the leading the trend programmes and other content will delivered viewers via home networks through cable satellite telecoms companies and broadband service providers front rooms and portable devices one the most talked about technologies ces has been digital and personal video recorders dvr and pvr these set top boxes like the tivo and the sky system allow people record store play pause and forward wind programmes when they want essentially the technology allows for much more personalised they are also being built high definition sets which are big business japan and the but slower take off europe because the lack high definition pro

### Create a vocab and dictionary for input.

#### Vocab for input.

In [8]:
words = []
for sentence in sentence_list:
    for w in sentence.split():
        words.append(w)
    
words = list(set(words))
print(f"Size of word-vocablury: {len(words)}\n")

Size of word-vocablury: 18430



#### Input <=> ID.

In [9]:
word2idx = {word: i for i, word in enumerate(words)}

### Create a vocab and dictionary for output.

#### Vocab for output.

In [10]:
tags = []
for tag in tag_list:
    tags.append(tag)
tags = list(set(tags))
print(f"Size of tag-vocab: {len(tags)}\n")
print(tags)

Size of tag-vocab: 5

['tech', 'business', 'sport', 'entertainment', 'politics']


#### Output <=> ID.

In [11]:
tag2idx = {word: i for i, word in enumerate(tags)}
print(tag2idx)

{'tech': 0, 'business': 1, 'sport': 2, 'entertainment': 3, 'politics': 4}


### Encode the input and output to numbers.

#### Input

In [12]:
X = [[word2idx[w] for w in s.split()] for s in sentence_list]

#### Output

In [13]:
y = [tag2idx[t] for t in tag_list]
y[:3]

[0, 1, 2]

### Train-Test Split

In [14]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

In [15]:
print("X_train size: ", len(X_train))
print("X_test size: ", len(X_test))

X_train size:  1557
X_test size:  668


## Sample Neural Network

### Sample Parameters.

In [16]:
BATCH_SIZE_SAMPLE = 2
EMBEDDING_SIZE_SAMPLE = 5
VOCAB_SIZE = len(word2idx)
TARGET_SIZE = len(tag2idx)
HIDDEN_SIZE_SAMPLE = 3
STACKED_LAYERS_SAMPLE = 4

### Sample Dataloader.

In [17]:
class SampleData(Dataset):
    
    def __init__(self, X_data, y_data):
        self.X_data = X_data
        self.y_data = y_data
        
        
    def __getitem__(self, index):
        return self.X_data[index], self.y_data[index]
        
    def __len__ (self):
        return len(self.X_data)

In [18]:
sample_data = SampleData(X_train, y_train)
sample_loader = DataLoader(sample_data, batch_size=BATCH_SIZE_SAMPLE, collate_fn=lambda x:x)

In [19]:
tl = iter(sample_loader)

i,j = map(list, zip(*next(tl)))

print(i,"\n\n", j, "\n")

[[12508, 16942, 10655, 13370, 8064, 16942, 10655, 499, 4314, 7250, 7752, 5892, 7093, 12508, 9554, 2674, 10075, 499, 2965, 10734, 4237, 12615, 16565, 12508, 12, 17086, 6941, 16673, 16423, 17822, 99, 17261, 16565, 3915, 17261, 9707, 1484, 731, 4043, 11953, 4237, 3873, 11242, 16841, 10528, 4016, 1484, 1181, 460, 11642, 4237, 16565, 12508, 17914, 2372, 10655, 5192, 17261, 16565, 4237, 16846, 9610, 3873, 3734, 9515, 6381, 6941, 17261, 3873, 3915, 17914, 7233, 17261, 7726, 8748, 15295, 16366, 4059, 4237, 17086, 2372, 12813, 9003, 3120, 4237, 17261, 1177, 9707, 17897, 3915, 11528, 8795, 4237, 17261, 1177, 1749, 11242, 1484, 5460, 14984, 4237, 16565, 10655, 499, 2965, 4094, 4680, 17261, 16565, 12508, 9554, 15095, 8195, 3915, 6761, 10644, 3915, 499, 4094, 18297, 1484, 5786, 13043, 16862, 1749, 1576, 12183, 7233, 4237, 17086, 17261, 1749, 1484, 4237, 14016, 5333, 17465, 9876, 15451, 2674, 13008, 16273, 7424, 10185, 499, 7233, 1484, 3196, 16624, 4237, 1698, 4023, 16565, 7977, 15253, 4585, 8980, 6

### Sample CNN+RNN class.

In [20]:
class ModelCnnRnnSample(nn.Module):
    
    def __init__(self, embedding_size, vocab_size, hidden_size, target_size):
        super(ModelCnnRnnSample, self).__init__()
        
        self.word_embeddings = nn.Embedding(num_embeddings = vocab_size, embedding_dim = embedding_size)
        self.conv1 = nn.Conv1d(in_channels=embedding_size, out_channels=100, kernel_size=3, stride=1)
        self.conv2 = nn.Conv1d(in_channels=100, out_channels=10, kernel_size=3, stride=1)
        self.gru = nn.GRU(input_size=10, hidden_size = hidden_size, batch_first=True)
        self.maxpool = nn.MaxPool1d(kernel_size=3)
        self.linear = nn.Linear(in_features = hidden_size, out_features=target_size)
        
        
    def forward(self, x_batch):        
        padded_batch = pad_sequence(x_batch, batch_first=True)
        print("\nPadded X_batch: ", padded_batch.size(), "\n")

        
        embeds = self.word_embeddings(padded_batch)
        print("\nEmbeddings: ", embeds.size(), "\n", embeds, "\n")
    
        embeds_t = embeds.transpose(1, 2)
        print("\nEmbeddings transposed for CNN: ", embeds_t.size(), "\n", embeds_t, "\n")

        cnn1 = torch.relu(self.conv1(embeds_t))
        cnn2 = torch.relu(self.conv2(cnn1))
        print("\nCNN output: ", cnn2.size(), "\n", cnn2)
        
        maxpool1 = self.maxpool(cnn2)
        print("\nMaxpool output: ", maxpool1.size(), "\n", maxpool1)
        
        gru_input = maxpool1.transpose(1, 2)
        print("\nRNN Input: ", gru_input.size(), "\n", gru_input)
        
        _, gru_hidden = self.gru(gru_input)
        print("\nRNN Last Hidden: ", gru_hidden.size(), "\n", gru_hidden)       
        
#         linear_in, _ = torch.max(maxpool1, dim = 2)
#         print("\nLinear input: ", linear_in.size(), "\n", linear_in)


        linear_out = self.linear(gru_hidden.squeeze())
        print("\nLinear Output:\n", linear_out)
        
        y_out = torch.log_softmax(linear_out, dim = 1)
        print("\nLog Softmax:\n", y_out)

        
        return y_out

In [21]:
cnn_rnn_model_sample = ModelCnnRnnSample(embedding_size=EMBEDDING_SIZE_SAMPLE, vocab_size=len(word2idx), hidden_size = HIDDEN_SIZE_SAMPLE, target_size=len(tag2idx))
print(cnn_rnn_model_sample)

ModelCnnRnnSample(
  (word_embeddings): Embedding(18430, 5)
  (conv1): Conv1d(5, 100, kernel_size=(3,), stride=(1,))
  (conv2): Conv1d(100, 10, kernel_size=(3,), stride=(1,))
  (gru): GRU(10, 3, batch_first=True)
  (maxpool): MaxPool1d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
  (linear): Linear(in_features=3, out_features=5, bias=True)
)


### Sample Output.

output = [batch size, sent len, hid dim]  
hidden = [batch size, 1, hid dim]

In [22]:
with torch.no_grad():
    for batch in sample_loader:
        x_batch, y_batch = map(list, zip(*batch))
        x_batch = [torch.tensor(i) for i in x_batch]
        y_batch = [torch.tensor(i) for i in y_batch]
        
        
#         print("X batch: ", x_batch.size())
#         print("\ny batch: ", y_batch.size())
        
        y_out = cnn_rnn_model_sample(x_batch)
                        
        _, y_out_tag = torch.max(y_out, dim = 1)
        print("\nY Output Tag: \n", y_out_tag)
        
        print("\nActual Output: ")
        print(y_batch)

        break


Padded X_batch:  torch.Size([2, 528]) 


Embeddings:  torch.Size([2, 528, 5]) 
 tensor([[[-1.2969,  1.2651,  0.3382,  1.5103, -0.4060],
         [-0.5082, -0.8698,  2.2355, -0.1555, -0.2195],
         [-2.1836,  0.2777,  0.6615,  0.4673, -0.5454],
         ...,
         [-0.6540, -1.6095, -0.1002, -0.6092, -0.9798],
         [-0.6540, -1.6095, -0.1002, -0.6092, -0.9798],
         [-0.6540, -1.6095, -0.1002, -0.6092, -0.9798]],

        [[-0.4253,  0.2918, -0.1760,  0.1699, -0.4444],
         [ 0.7468, -0.5874, -0.0963, -0.7011, -1.2802],
         [-0.9968, -0.4778, -0.7409, -0.3505,  0.0995],
         ...,
         [-0.2624, -1.3074, -0.8999,  1.9583, -0.9732],
         [-1.2004,  1.2386, -0.2015,  0.9880,  1.7878],
         [ 1.8535,  0.1446, -0.2912, -0.2779,  1.5642]]]) 


Embeddings transposed for CNN:  torch.Size([2, 5, 528]) 
 tensor([[[-1.2969, -0.5082, -2.1836,  ..., -0.6540, -0.6540, -0.6540],
         [ 1.2651, -0.8698,  0.2777,  ..., -1.6095, -1.6095, -1.6095],
         [ 0

## Acutal Neural Network.

### Model parameters.

In [23]:
EPOCHS = 50
BATCH_SIZE = 32
EMBEDDING_SIZE = 512
VOCAB_SIZE = len(word2idx)
TARGET_SIZE = len(tag2idx)
HIDDEN_SIZE = 64
LEARNING_RATE = 0.001
STACKED_LAYERS = 2

### Data Loader.

#### Train Loader.

In [24]:
class TrainData(Dataset):
    
    def __init__(self, X_data, y_data):
        self.X_data = X_data
        self.y_data = y_data
        
        
    def __getitem__(self, index):
        return self.X_data[index], self.y_data[index]
        
    def __len__ (self):
        return len(self.X_data)

In [25]:
train_data = TrainData(X_train, y_train)
train_loader = DataLoader(train_data, batch_size=BATCH_SIZE, collate_fn=lambda x:x)

#### Test Loader

In [26]:
class TestData(Dataset):
    
    def __init__(self, X_data, y_data):
        self.X_data = X_data
        self.y_data = y_data
        
    def __getitem__(self, index):
        return self.X_data[index], self.y_data[index]
        
    def __len__ (self):
        return len(self.X_data)

In [27]:
test_data = TestData(X_test, y_test)
test_loader = DataLoader(test_data, batch_size=1, collate_fn=lambda x:x)

### GRU Model Class.

In [28]:
class ModelCnnRnn(nn.Module):
    
    def __init__(self, embedding_size, vocab_size, hidden_size, target_size):
        super(ModelCnnRnn, self).__init__()
        
        self.word_embeddings = nn.Embedding(num_embeddings = vocab_size, embedding_dim = embedding_size)
        self.conv1 = nn.Conv1d(in_channels=embedding_size, out_channels=512, kernel_size=3, stride=1, padding = 1)
        self.conv2 = nn.Conv1d(in_channels=512, out_channels=256, kernel_size=3, stride=1, padding = 1)
        self.conv3 = nn.Conv1d(in_channels=256, out_channels=128, kernel_size=3, stride=1, padding=1)
#         self.gru = nn.GRU(input_size=128, hidden_size = hidden_size, batch_first=True)
        self.lstm = nn.LSTM(input_size=128, hidden_size = hidden_size, batch_first=True)
        self.batchnorm1 = nn.BatchNorm1d(num_features = 512)
        self.batchnorm2 = nn.BatchNorm1d(num_features = 256)
        self.batchnorm3 = nn.BatchNorm1d(num_features = 128)
        self.linear = nn.Linear(in_features = hidden_size, out_features=target_size)
        self.relu = nn.ReLU()
        
    def forward(self, x_batch):        
        padded_batch = pad_sequence(x_batch, batch_first=True)        
        embeds = self.word_embeddings(padded_batch)
        embeds_t = embeds.transpose(1, 2)
        
        cnn1 = self.relu(self.conv1(embeds_t))
        cnn1 = self.batchnorm1(cnn1)
        
        cnn2 = self.relu(self.conv2(cnn1))
        cnn2 = self.batchnorm2(cnn2)
        
        cnn3 = self.relu(self.conv3(cnn2))
        cnn3 = self.batchnorm3(cnn3)
        
        rnn_input = cnn3.transpose(1, 2)
        
        _, (lstm_h, _) = self.lstm(rnn_input)
#         _, gru_h = self.gru(rnn_input)
        
        linear_out = self.linear(lstm_h.squeeze())
        
        return linear_out

In [29]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cuda:0


In [30]:
cnn_rnn_model = ModelCnnRnn(embedding_size=EMBEDDING_SIZE, vocab_size=len(word2idx), hidden_size=HIDDEN_SIZE, target_size=len(tag2idx))

cnn_rnn_model.to(device)
print(cnn_rnn_model)

criterion = nn.CrossEntropyLoss()

optimizer =  optim.RMSprop(cnn_rnn_model.parameters())

ModelCnnRnn(
  (word_embeddings): Embedding(18430, 512)
  (conv1): Conv1d(512, 512, kernel_size=(3,), stride=(1,), padding=(1,))
  (conv2): Conv1d(512, 256, kernel_size=(3,), stride=(1,), padding=(1,))
  (conv3): Conv1d(256, 128, kernel_size=(3,), stride=(1,), padding=(1,))
  (lstm): LSTM(128, 64, batch_first=True)
  (batchnorm1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (batchnorm2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (batchnorm3): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (linear): Linear(in_features=64, out_features=5, bias=True)
  (relu): ReLU()
)


## Train model.

In [31]:
def multi_acc(y_pred, y_test):
    y_pred_softmax = torch.log_softmax(y_pred, dim = 1)
    _, y_pred_tags = torch.max(y_pred_softmax, dim = 1)    
    
    correct_pred = (y_pred_tags == y_test).float()
    acc = correct_pred.sum() / len(correct_pred)
    
    acc = torch.round(acc) * 100
    
    return acc

In [32]:
cnn_rnn_model.train()
for e in range(1, EPOCHS+1):
    epoch_loss = 0
    epoch_acc = 0
    for batch in train_loader:
        x_batch, y_batch = map(list, zip(*batch))
        x_batch = [torch.tensor(i).to(device) for i in x_batch]
        y_batch = torch.tensor(y_batch).long().to(device)
                
        optimizer.zero_grad()
        
        y_pred = cnn_rnn_model(x_batch)
        
        loss = criterion(y_pred.squeeze(0), y_batch)
        acc = multi_acc(y_pred.squeeze(0), y_batch)
        
        
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    print(f'Epoch {e+0:03}: | Loss: {epoch_loss/len(train_loader):.5f} | Acc: {acc}')

Epoch 001: | Loss: 1.72100 | Acc: 0.0
Epoch 002: | Loss: 1.61047 | Acc: 0.0
Epoch 003: | Loss: 1.60679 | Acc: 0.0
Epoch 004: | Loss: 1.59421 | Acc: 0.0
Epoch 005: | Loss: 1.57659 | Acc: 0.0
Epoch 006: | Loss: 1.55578 | Acc: 0.0
Epoch 007: | Loss: 1.50802 | Acc: 0.0
Epoch 008: | Loss: 1.46886 | Acc: 0.0
Epoch 009: | Loss: 1.48100 | Acc: 0.0
Epoch 010: | Loss: 1.40322 | Acc: 0.0
Epoch 011: | Loss: 1.38041 | Acc: 0.0
Epoch 012: | Loss: 1.32096 | Acc: 0.0
Epoch 013: | Loss: 1.26236 | Acc: 0.0
Epoch 014: | Loss: 1.51816 | Acc: 0.0
Epoch 015: | Loss: 1.22656 | Acc: 0.0
Epoch 016: | Loss: 1.13486 | Acc: 0.0
Epoch 017: | Loss: 1.09870 | Acc: 100.0
Epoch 018: | Loss: 1.06625 | Acc: 0.0
Epoch 019: | Loss: 1.03135 | Acc: 100.0
Epoch 020: | Loss: 1.00036 | Acc: 0.0
Epoch 021: | Loss: 0.98340 | Acc: 0.0
Epoch 022: | Loss: 1.00223 | Acc: 0.0
Epoch 023: | Loss: 1.03807 | Acc: 0.0
Epoch 024: | Loss: 0.99341 | Acc: 0.0
Epoch 025: | Loss: 0.94285 | Acc: 0.0
Epoch 026: | Loss: 0.94952 | Acc: 0.0
Epoch 02

## Test Model.

In [33]:
y_out_tags_list = []
with torch.no_grad():
    for batch in test_loader:
        x_batch, y_batch = map(list, zip(*batch))
        x_batch = [torch.tensor(i).to(device) for i in x_batch]
        y_batch = torch.tensor(y_batch).long().to(device)
        
        y_pred = cnn_rnn_model(x_batch)
        y_pred = torch.log_softmax(y_pred, dim = 0)
        _, y_pred_tag = torch.max(y_pred, dim = 0)
        y_out_tags_list.append(y_pred_tag.squeeze(0).cpu().numpy())

## Confusion Matrix.

In [34]:
print(confusion_matrix(y_test, y_out_tags_list))

[[67 38  2 18 14]
 [43 33  3 23 27]
 [35 44 13 41 26]
 [29 30  3 19 28]
 [28 39  6 25 34]]


## Classification Report.

In [35]:
y_out_tags_list = [a.squeeze().tolist() for a in y_out_tags_list]

In [36]:
print(classification_report(y_test, y_out_tags_list))

              precision    recall  f1-score   support

           0       0.33      0.48      0.39       139
           1       0.18      0.26      0.21       129
           2       0.48      0.08      0.14       159
           3       0.15      0.17      0.16       109
           4       0.26      0.26      0.26       132

    accuracy                           0.25       668
   macro avg       0.28      0.25      0.23       668
weighted avg       0.29      0.25      0.23       668



## View model output.

In [37]:
idx2word = {v: k for k, v in word2idx.items()}
idx2tag = {v: k for k, v in tag2idx.items()}

In [38]:
print('{:80}: {:15}\n'.format("Word", "Class"))
for sentence, tag in zip(X_test[:10], y_out_tags_list[:10]):
    s = " ".join([idx2word[w] for w in sentence])
    print('{:80}: {:5}\n'.format(s, tag))


Word                                                                            : Class          

windows worm travels with tetris users are being warned about windows virus that poses the hugely popular tetris game the cellery worm installs playable version the classic falling blocks game pcs that has infected while users play the game the worm spends its time using the machine search for new victims infect nearby networks the risk infection cellery thought very low few copies the worm have been found the wild the cellery worm does not spread via mail like many other viruses instead computer networks for pcs that have not shut off all the insecure ways they connect other machines when infects machine cellery installs version tetris that users can play the game starts the worm also starts music file accompany the same time the virus starts networks for other vulnerable machines the virus does damage machines but heavily infected networks could slow down scanning traffic builds product