# Recurrent Neural Networks (RNN)

Recurrent Neural Networks (RNN) is another neural network architecture that is addressed by the researchers for text mining and classification. RNN assigns more weights to the previous data points of sequence. Therefore, this technique is a powerful method for text, string, and sequential data classification. In RNN, the neural net considers the information of previous nodes in a very sophisticated method which allows for better semantic analysis of the structures in the dataset.
<img src="./images/RNN.png">

## Gated Recurrent Unit (GRU)
Gated Recurrent Unit (GRU) is a gating mechanism for RNN which was introduced by J. Chung et al. and K.Cho et al.. GRU is a simplified variant of the LSTM architecture, but there are differences as follows: GRU contains two gates and does not possess any internal memory (as shown in Figure; and finally, a second non-linearity is not applied (tanh in Figure).
<img src="./images/GRU.png">

## Long Short-Term Memory (LSTM)
Long Short-Term Memory~(LSTM) was introduced by S. Hochreiter and J. Schmidhuber and developed by many research scientists.
To deal with these problems Long Short-Term Memory (LSTM) is a special type of RNN that preserves long term dependency in a more effective way compared to the basic RNNs. This is particularly useful to overcome vanishing gradient problem as LSTM uses multiple gates to carefully regulate the amount of information that will be allowed into each node state. The figure shows the basic cell of a LSTM model.
<img src="./images/LSTM.png">


### Import Packages

In [1]:
# import libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn import metrics
import pandas as pd
import torch
import torch.nn.functional as F

import torch.nn as nn
import torch.optim as optim
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
from torch.utils.data import DataLoader
import pytorch_lightning as pl
import torch.utils.data as data_utils
from pytorch_lightning import Trainer

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


### Build Tokanizer

In [2]:
# build a data tokenizer
def loadData_Tokenizer(X_train, X_test,MAX_NB_WORDS=179210,MAX_SEQUENCE_LENGTH=500):
    '''
    The function takes Train and Test datasets with text.
    Converts them into tokens, and returns tokenized version of
    both the sets, and the embedding matrix
    
    Parameters
    ----------
    X_train : list with each item having a set of words that will be used
    for training the model 
    X_test : list with each item having a set of words that will be used
    for testing the model
    MAX_NB_WORDS : Number of maximum words to be added in the tokenizer 
    vocabulary
    MAX_SEQUENCE_LENGTH : Maximum length of sentences in the 
    '''
    # set a random seed for reproducibility
    np.random.seed(7)
    
    # concatenate train and text to build a combined vocabulary
    text = np.concatenate((X_train, X_test), axis=0)
    text = np.array(text)
    
    # initiate tokenizer
    tokenizer = Tokenizer(num_words=MAX_NB_WORDS)
    
    # fit tokenizer on texts
    tokenizer.fit_on_texts(text)
    
    # build sequences
    sequences = tokenizer.texts_to_sequences(text)
    
    # dictionary for total vocabulary
    word_index = tokenizer.word_index
    
    # pad sequences from left to make them of equal lengths
    text = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
    
    # total unique words in vocab
    print('Found %s unique tokens.' % len(word_index))
    
    # split tokenized text into train and test sets
    indices = np.arange(text.shape[0])
    text = text[indices]
    X_train = text[0:len(X_train), ]
    X_test = text[len(X_train):, ]
    
    # create embedding using GLOVE
    embeddings_index = {}
    f = open("glove.6B.50d.txt", encoding="utf8")
    for line in f:
        values = line.split()
        word = values[0]
        try:
            coefs = np.asarray(values[1:], dtype='float32')
        except:
            pass
        embeddings_index[word] = coefs
    f.close()
    
    # print total words in embedding
    print('Total %s word vectors.' % len(embeddings_index))
    
    # create embedding matrix
    embedding_matrix = np.zeros((len(word_index) + 1, 50))
    for word, i in word_index.items():
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector
    
    # return train, test, vocabulary and embedding details
    return (X_train, X_test, embedding_matrix)

### Define model class using Pytorch Lightning

Pytorch Lightning provides a standard wrapper to load data, define and train deep learning models.

In this codeblock we define:
1. Model
2. Training/Validation/Test Steps
3. Optimizer settings
4. Train/Validation/Test Data Loader

In [3]:
criterion = nn.CrossEntropyLoss()
class CoolSystem(pl.LightningModule):

    def __init__(self, embedding_matrix, nclasses):
        '''
        Recurrent Neural Network Architectures.
        
        Parameters
        ----------
        shape: the dimensions of input layer
        nclasses: the dimensions of output layer
        dropout: the probability of dropping out.
        '''
        super(CoolSystem, self).__init__()
        
        self.nclasses = nclasses
        
        ## Embedding Layer, Add parameter 
        self.embedding = nn.Embedding(embedding_matrix.shape[0], embedding_matrix.shape[1]) 
        et = torch.tensor(embedding_matrix, dtype=torch.float32)
        self.embedding.weight = nn.Parameter(et)
        self.embedding.weight.requires_grad = False
        
        self.gru1 = nn.GRU(50, 256, num_layers =1, dropout = 0.2)
        self.dp1 = nn.Dropout(p = 0.5)
        self.gru2 = nn.GRU(256, 256, num_layers =1, dropout = 0.2)
        self.dp2 = nn.Dropout(p = 0.5)
        self.gru3 = nn.GRU(256, 256, num_layers =1, dropout = 0.2)
        self.dp3 = nn.Dropout(p = 0.5)
        self.gru4 = nn.GRU(256, 256, num_layers =1, dropout = 0.2)
        self.l1 =  nn.Linear(256, nclasses)
        
    def forward(self, x):
        '''
        Passes the input through Deep neural network defined before.
        
        Parameters
        ----------
        X: input
        '''
        x = self.embedding(x)
        x = x.permute(1,0,2)
        x, _ = self.gru1(x)
        x = self.dp1(x)
        x, _ = self.gru2(x)
        x = self.dp2(x)
        x, _ = self.gru3(x)
        x = self.dp3(x)
        x, _ = self.gru4(x)
        x, _ = torch.max(x, 0)  
        x = self.l1(x)
                
        return x

    def training_step(self, batch, batch_nb):
        '''
        Training step, takes the training batch and pass it forward through network
        
        Parameters
        ----------
        batch: input
        batch_nb: batch number
        '''
        x, y = batch
        y_hat = self.forward(x)
        return {'loss': criterion(y_hat, y)}

    def validation_step(self, batch, batch_nb):
        '''
        Training step, takes the training batch and pass it forward through trained network
        
        Parameters
        ----------
        batch: input
        batch_nb: batch number
        '''
        x, y = batch
        y_hat = self.forward(x)
        return {'val_loss': criterion(y_hat, y)}

    def validation_end(self, outputs):
        '''
        Takes and stacks validation loss.
        Early stop can also be defined here
        
        Parameters
        ----------
        Outputs: Output of validation step
        '''
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        return {'avg_val_loss': avg_loss}

    def configure_optimizers(self):
        '''
        Optimizer for the network

        '''
        return torch.optim.Adam(self.parameters())

    @pl.data_loader
    def tng_dataloader(self):
        '''
        Training data loader, takes input directly from global environment
        Preprocessing can also be defined here.
        
        '''
        
        return\
    DataLoader(data_utils.TensorDataset(torch.LongTensor(X_train_Glove), torch.LongTensor(y_train)), batch_size=128)

    @pl.data_loader
    def val_dataloader(self):
        '''
        Validation data loader, takes input directly from global environment
        Preprocessing can also be defined here.
        
        '''
        return\
    DataLoader(data_utils.TensorDataset(torch.LongTensor(X_test_Glove), torch.LongTensor(y_test)), batch_size=128)

    @pl.data_loader
    def test_dataloader(self):
        '''
        Test data loader, takes input directly from global environment
        Preprocessing can also be defined here.
        
        '''
        return\
    DataLoader(data_utils.TensorDataset(torch.LongTensor(X_test_Glove), torch.LongTensor(y_test)), batch_size=128)

### Load text dataset (20newsgroups)

In [4]:
# Load train data
newsgroups_train = fetch_20newsgroups(subset='train')

# Load test data
newsgroups_test = fetch_20newsgroups(subset='test')

# make x and y
X_train = newsgroups_train.data
X_test = newsgroups_test.data
y_train = newsgroups_train.target
y_test = newsgroups_test.target

# tokenize text and obtain embedding matrix
X_train_Glove,X_test_Glove, embedding_matrix = loadData_Tokenizer(X_train,\
                                                                  X_test)

Found 179209 unique tokens.
Total 400001 word vectors.


### Train Model using Pytorch Lightning

In [5]:
# model
model = CoolSystem(embedding_matrix, 20)

# most basic trainer, uses good defaults
trainer = Trainer(max_nb_epochs=20)  
trainer.fit(model)

  "num_layers={}".format(dropout, num_layers))
  0%|          | 0/5 [00:00<?, ?it/s]

gpu available: False, used: False
        Name       Type   Params
0  embedding  Embedding  8960500
1       gru1        GRU   236544
2        dp1    Dropout        0
3       gru2        GRU   394752
4        dp2    Dropout        0
5       gru3        GRU   394752
6        dp3    Dropout        0
7       gru4        GRU   394752
8         l1     Linear     5140


100%|██████████| 148/148 [20:56<00:00,  3.22s/it, avg_val_loss=1.19, batch_nb=88, epoch=19, loss=0.206]

1

### Evaluate Results on Test Set

In [6]:
# get prediction
with torch.no_grad():
        model.eval()
        predicted = model.forward(torch.LongTensor(X_test_Glove))

# get classification report
predicted = predicted.detach().numpy()
print(metrics.classification_report(y_test, np.argmax(predicted, axis=1)))

              precision    recall  f1-score   support

           0       0.61      0.61      0.61       319
           1       0.51      0.77      0.61       389
           2       0.67      0.54      0.60       394
           3       0.53      0.57      0.55       392
           4       0.54      0.63      0.58       385
           5       0.70      0.53      0.60       395
           6       0.79      0.70      0.74       390
           7       0.79      0.84      0.82       396
           8       0.91      0.64      0.76       398
           9       0.77      0.92      0.84       397
          10       0.97      0.78      0.87       399
          11       0.83      0.79      0.81       396
          12       0.56      0.59      0.57       393
          13       0.81      0.84      0.82       396
          14       0.90      0.82      0.86       394
          15       0.81      0.82      0.82       398
          16       0.69      0.75      0.72       364
          17       0.88    