# Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN) is Another deep learning architecture that is employed for hierarchical document classification. Although originally built for image processing with architecture similar to the visual cortex, CNNs have also been effectively used for text classification. In a basic CNN for image processing, an image tensor is convolved with a set of kernels of size d by d. These convolution layers are called feature maps and can be stacked to provide multiple filters on the input. To reduce the computational complexity, CNNs use pooling which reduces the size of the output from one layer to the next in the network. Different pooling techniques are used to reduce outputs while preserving important features.
The most common pooling method is max pooling where the maximum element is selected from the pooling window. In order to feed the pooled output from stacked featured maps to the next layer, the maps are flattened into one column. The final layers in a CNN are typically fully connected dense layers. In general, during the back-propagation step of a convolutional neural network not only the weights are adjusted but also the feature detector filters. A potential problem of CNN used for text is the number of ‘channels’, Sigma (size of the feature space). This might be very large (e.g. 50K), for text but for images this is less of a problem (e.g. only 3 channels of RGB). This means the dimensionality of the CNN for text is very high.

<img src="./images/CNN.png">



### Import Packages

In [3]:
# import libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn import metrics
import pandas as pd
import torch
import torch.nn.functional as F

import torch.nn as nn
import torch.optim as optim
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
from torch.utils.data import DataLoader
import pytorch_lightning as pl
import torch.utils.data as data_utils
from pytorch_lightning import Trainer

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


### Tokenize Text Using Glove

In [4]:
# build a data tokenizer
def loadData_Tokenizer(X_train, X_test,MAX_NB_WORDS=179210,\
                       MAX_SEQUENCE_LENGTH=500):
    '''
    The function takes Train and Test datasets with text.
    Converts them into tokens, and returns tokenized version of
    both the sets, and the embedding matrix
    
    Parameters
    ----------
    X_train : list with each item having a set of words that will be used
    for training the model 
    X_test : list with each item having a set of words that will be used
    for testing the model
    MAX_NB_WORDS : Number of maximum words to be added in the tokenizer 
    vocabulary
    MAX_SEQUENCE_LENGTH : Maximum length of sentences in the 
    '''
    # set a random seed for reproducibility
    np.random.seed(7)
    
    # concatenate train and text to build a combined vocabulary
    text = np.concatenate((X_train, X_test), axis=0)
    text = np.array(text)
    
    # initiate tokenizer
    tokenizer = Tokenizer(num_words=MAX_NB_WORDS)
    
    # fit tokenizer on texts
    tokenizer.fit_on_texts(text)
    
    # build sequences
    sequences = tokenizer.texts_to_sequences(text)
    
    # dictionary for total vocabulary
    word_index = tokenizer.word_index
    
    # pad sequences from left to make them of equal lengths
    text = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
    
    # total unique words in vocab
    print('Found %s unique tokens.' % len(word_index))
    
    # split tokenized text into train and test sets
    indices = np.arange(text.shape[0])
    text = text[indices]
    X_train = text[0:len(X_train), ]
    X_test = text[len(X_train):, ]
    
    # create embedding using GLOVE
    embeddings_index = {}
    f = open("glove.6B.50d.txt", encoding="utf8")
    for line in f:
        values = line.split()
        word = values[0]
        try:
            coefs = np.asarray(values[1:], dtype='float32')
        except:
            pass
        embeddings_index[word] = coefs
    f.close()
    
    # print total words in embedding
    print('Total %s word vectors.' % len(embeddings_index))
    
    # create embedding matrix
    embedding_matrix = np.zeros((len(word_index) + 1, 50))
    for word, i in word_index.items():
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector
    
    # return train, test, vocabulary and embedding details
    return (X_train, X_test, embedding_matrix)

### Define model class using Pytorch Lightning

Pytorch Lightning provides a standard wrapper to load data, define and train deep learning models.

In this codeblock we define:
1. Model
2. Training/Validation/Test Steps
3. Optimizer settings
4. Train/Validation/Test Data Loader

In [5]:
criterion = nn.CrossEntropyLoss()
class CoolSystem(pl.LightningModule):

    def __init__(self, embedding_matrix, nclasses):
        '''
        Convolution neural network architectures.
        
        Parameters
        ----------
        shape: the dimensions of input layer
        nclasses: the dimensions of output layer
        dropout: the probability of dropping out.
        '''
        super(CoolSystem, self).__init__()
        
        self.nclasses = nclasses
        
        ## Embedding Layer, Add parameter 
        self.embedding = nn.Embedding(embedding_matrix.shape[0], \
                                      embedding_matrix.shape[1]) 
        et = torch.tensor(embedding_matrix, dtype=torch.float32)
        self.embedding.weight = nn.Parameter(et)
        self.embedding.weight.requires_grad = False
        
        self.feature1 = nn.Sequential(
            nn.Conv1d(50, 128, kernel_size = 2),
            nn.ReLU(),
            nn.MaxPool1d(5)
        )
        self.feature2 = nn.Sequential(
            nn.Conv1d(50, 128, kernel_size = 3),
            nn.ReLU(),
            nn.MaxPool1d(5)
        )
        self.feature3 = nn.Sequential(
            nn.Conv1d(50, 128, kernel_size = 4),
            nn.ReLU(),
            nn.MaxPool1d(5)
        )
        self.feature4 = nn.Sequential(
            nn.Conv1d(50, 128, kernel_size = 5),
            nn.ReLU(),
            nn.MaxPool1d(5)
        )
        self.feature5 = nn.Sequential(
            nn.Conv1d(50, 128, kernel_size = 6),
            nn.ReLU(),
            nn.MaxPool1d(5)
        )
        self.feature6 = nn.Sequential(
            nn.Conv1d(128, 128, 5),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.MaxPool1d(5),
            nn.Conv1d(128,128,5),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.MaxPool1d(30),
            nn.Flatten(),
            nn.Linear(384, 1024),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(512, nclasses))
        
    def forward(self, x):
        '''
        Passes the input through Deep neural network defined before.
        
        Parameters
        ----------
        X: input
        '''
        x = self.embedding(x)
        x = x.permute(0,2,1)
        f1 = self.feature1(x)
        f2 = self.feature2(x)
        f3 = self.feature3(x)
        f4 = self.feature4(x)
        f5 = self.feature5(x)
        x = torch.cat([f1,f2,f3,f4,f5], dim=2)
        x = self.feature6(x)
                
        return x

    def training_step(self, batch, batch_nb):
        '''
        Training step, takes the training batch and pass it forward 
        through network
        
        Parameters
        ----------
        batch: input
        batch_nb: batch number
        '''
        x, y = batch
        y_hat = self.forward(x)
        return {'loss': criterion(y_hat, y)}

    def validation_step(self, batch, batch_nb):
        '''
        Training step, takes the training batch and pass it forward
        through trained network
        
        Parameters
        ----------
        batch: input
        batch_nb: batch number
        '''
        x, y = batch
        y_hat = self.forward(x)
        return {'val_loss': criterion(y_hat, y)}

    def validation_end(self, outputs):
        '''
        Takes and stacks validation loss.
        Early stop can also be defined here
        
        Parameters
        ----------
        Outputs: Output of validation step
        '''
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        return {'avg_val_loss': avg_loss}

    def configure_optimizers(self):
        '''
        Optimizer for the network

        '''
        return torch.optim.Adam(self.parameters())

    @pl.data_loader
    def tng_dataloader(self):
        '''
        Training data loader, takes input directly from global environment
        Preprocessing can also be defined here.
        
        '''
        
        return\
    DataLoader(data_utils.TensorDataset(torch.LongTensor(X_train_Glove),\
                                        torch.LongTensor(y_train)), \
               batch_size=128)

    @pl.data_loader
    def val_dataloader(self):
        '''
        Validation data loader, takes input directly from global environment
        Preprocessing can also be defined here.
        
        '''
        return\
    DataLoader(data_utils.TensorDataset(torch.LongTensor(X_test_Glove),\
                                        torch.LongTensor(y_test)), \
               batch_size=128)

    @pl.data_loader
    def test_dataloader(self):
        '''
        Test data loader, takes input directly from global environment
        Preprocessing can also be defined here.
        
        '''
        return\
    DataLoader(data_utils.TensorDataset(torch.LongTensor(X_test_Glove),\
                                        torch.LongTensor(y_test)), \
               batch_size=128)

### Load text dataset (20newsgroups)

In [72]:
# Load train data
newsgroups_train = fetch_20newsgroups(subset='train')

# Load test data
newsgroups_test = fetch_20newsgroups(subset='test')

# make x and y
X_train = newsgroups_train.data
X_test = newsgroups_test.data
y_train = newsgroups_train.target
y_test = newsgroups_test.target

# tokenize text and obtain embedding matrix
X_train_Glove,X_test_Glove, embedding_matrix = loadData_Tokenizer(X_train,\
                                                                  X_test)

Found 179209 unique tokens.
Total 400001 word vectors.


### Train Model using Pytorch Lightning

In [73]:
# model
model = CoolSystem(embedding_matrix, 20)

# most basic trainer, uses good defaults
trainer = Trainer(max_nb_epochs=15)  
trainer.fit(model)

  0%|          | 0/5 [00:00<?, ?it/s]

gpu available: False, used: False
           Name        Type   Params
0     embedding   Embedding  8960500
1      feature1  Sequential    12928
2    feature1.0      Conv1d    12928
3    feature1.1        ReLU        0
4    feature1.2   MaxPool1d        0
5      feature2  Sequential    19328
6    feature2.0      Conv1d    19328
7    feature2.1        ReLU        0
8    feature2.2   MaxPool1d        0
9      feature3  Sequential    25728
10   feature3.0      Conv1d    25728
11   feature3.1        ReLU        0
12   feature3.2   MaxPool1d        0
13     feature4  Sequential    32128
14   feature4.0      Conv1d    32128
15   feature4.1        ReLU        0
16   feature4.2   MaxPool1d        0
17     feature5  Sequential    38528
18   feature5.0      Conv1d    38528
19   feature5.1        ReLU        0
20   feature5.2   MaxPool1d        0
21     feature6  Sequential  1093396
22   feature6.0      Conv1d    82048
23   feature6.1        ReLU        0
24   feature6.2     Dropout        0
25  

100%|██████████| 148/148 [03:25<00:00,  1.50it/s, avg_val_loss=1.1, batch_nb=88, epoch=14, loss=0.396] 

1

### Evaluate Results on Test Set

In [74]:
# get prediction
with torch.no_grad():
        model.eval()
        predicted = model.forward(torch.LongTensor(X_test_Glove))

# get classification report
predicted = predicted.detach().numpy()
print(metrics.classification_report(y_test, np.argmax(predicted, axis=1)))

              precision    recall  f1-score   support

           0       0.62      0.53      0.57       319
           1       0.63      0.56      0.59       389
           2       0.78      0.33      0.46       394
           3       0.37      0.19      0.25       392
           4       0.36      0.59      0.45       385
           5       0.85      0.50      0.63       395
           6       0.87      0.69      0.77       390
           7       0.76      0.77      0.77       396
           8       0.82      0.82      0.82       398
           9       0.94      0.84      0.89       397
          10       0.96      0.93      0.94       399
          11       0.89      0.63      0.74       396
          12       0.34      0.79      0.48       393
          13       0.78      0.84      0.81       396
          14       0.68      0.87      0.76       394
          15       0.84      0.67      0.74       398
          16       0.55      0.72      0.62       364
          17       0.97    