# Recurrent Neural Network

## Basic Overview

What is a RNN?

Recurrent Neural Network(RNN) are a type of Neural Network where the output from previous step are fed as input to the current step. In traditional neural networks, all the inputs and outputs are independent of each other, but in cases like when it is required to predict the next word of a sentence, the previous words are required and hence there is a need to remember the previous words. Thus RNN came into existence, which solved this issue with the help of a Hidden Layer.

Why RNN's?

https://www.quora.com/Why-do-we-use-an-RNN-instead-of-a-simple-neural-network

## In-Depth Understanding

* https://medium.com/mindorks/understanding-the-recurrent-neural-network-44d593f112a2
* https://www.youtube.com/watch?v=2E65LDnM2cA&list=PL1F3ABbhcqa3BBWo170U4Ev2wfsF7FN8l
* https://www.d2l.ai/chapter_recurrent-neural-networks/rnn.html

In [1]:
from src.imports import *

In [2]:
xtrain, xvalid, ytrain, yvalid = prepare_data()

## Tokenization

So, if you have watched the videos and referred to the links, you would know that in an RNN we input a sentence word by word. We represent every word as one hot vectors of dimensions : Numbers of words in Vocab + 1. <br>

What keras Tokenizer does is, 

- it takes all the unique words in the corpus, forms a dictionary with words as keys and their number of occurences as values, it then sorts the dictionary in descending order of counts. 
- It then assigns the first value 1 , second value 2 and so on.

So, let's suppose word 'the' occured the most in the corpus then it will assigned index 1 and vector representing 'the' would be a one-hot vector with value 1 at position 1 and rest zereos.<br>

In [3]:
# using keras tokenizer here
token = text.Tokenizer(num_words=None)
max_len = 1500

In [4]:
token.fit_on_texts(list(xtrain) + list(xvalid))

In [5]:
xtrain[:2]

array(["Congratulations\nfor the Third of May 1808, reaching FA and it's still April, thanks for your input.....",
       'Jimbo.....the crybaby..... \n\nSits in front of his computer everyday acting as the arbiter of the grand Wikipedia. The all knowing know it all who is the expert on everything. The savior of the western world. And of course, when questioned about anything, cries to the main office about ill treatment.....By the way......a lousy editor who uses only one source!  A joke!'],
      dtype=object)

In [6]:
xtrain_seq = token.texts_to_sequences(xtrain)
xvalid_seq = token.texts_to_sequences(xvalid)

In [7]:
xtrain_seq[:1]

[[2858,
  12,
  1,
  686,
  3,
  85,
  6877,
  6878,
  2859,
  4,
  72,
  153,
  1107,
  92,
  12,
  20,
  1376]]

In [8]:
#zero pad the sequences
xtrain_pad = pad_sequences(xtrain_seq, maxlen=max_len)
xvalid_pad = pad_sequences(xvalid_seq, maxlen=max_len)

In [9]:
# The code token.word_index simply gives the dictionary of vocab that keras created for us
word_index = token.word_index

Now you might be wondering What is padding? Why its done</b><br><br>

Here is the answer :
* https://www.quora.com/Which-effect-does-sequence-padding-have-on-the-training-of-a-neural-network
* https://machinelearningmastery.com/data-preparation-variable-length-input-sequences-sequence-prediction/
* https://www.coursera.org/lecture/natural-language-processing-tensorflow/padding-2Cyzs

Also sometimes people might use special tokens while tokenizing like EOS(end of string) and BOS(Begining of string). Here is the reason why it's done
* https://stackoverflow.com/questions/44579161/why-do-we-do-padding-in-nlp-tasks


# Building the neural network

To understand the Dimensions of input and output given to RNN in keras her is a beautiful article : https://medium.com/@shivajbd/understanding-input-and-output-shape-in-lstm-keras-c501ee95c65e

In [10]:
%%time

# A simpleRNN without any pretrained embeddings and one dense layer
model = Sequential()
model.add(Embedding(len(word_index) + 1,
                 300,
                 input_length=max_len))
model.add(SimpleRNN(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 1500, 300)         4521300   
                                                                 
 simple_rnn (SimpleRNN)      (None, 100)               40100     
                                                                 
 dense (Dense)               (None, 1)                 101       
                                                                 
Total params: 4,561,501
Trainable params: 4,561,501
Non-trainable params: 0
_________________________________________________________________
Wall time: 412 ms


The first line model.Sequential() tells keras that we will be building our network sequentially . Then we first add the Embedding layer.
Embedding layer is also a layer of neurons which takes in as input the nth dimensional one hot vector of every word and converts it into 300 dimensional vector , it gives us word embeddings similar to word2vec. We could have used word2vec but the embeddings layer learns during training to enhance the embeddings.
Next we add an 100 LSTM units without any dropout or regularization
At last we add a single neuron with sigmoid function which takes output from 100 LSTM cells (Please note we have 100 LSTM cells not layers) to predict the results and then we compile the model using adam optimizer 

In [11]:
model.fit(xtrain_pad, ytrain, epochs=5, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1d524bdc610>

In [12]:
scores = model.predict(xvalid_pad)
print("Auc: %.2f%%" % (roc_auc(scores, yvalid)))

Auc: 0.74%


# Bi-directional RNN

## In Depth Explanation

* https://www.coursera.org/learn/nlp-sequence-models/lecture/fyXnn/bidirectional-rnn
* https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd66
* https://d2l.ai/chapter_recurrent-modern/bi-rnn.html

In [13]:
model = Sequential()

model.add(Embedding(len(word_index) + 1,
                 300,
                 input_length=max_len))
model.add(Bidirectional(SimpleRNN(100)))

model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
    
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 1500, 300)         4521300   
                                                                 
 bidirectional (Bidirectiona  (None, 200)              80200     
 l)                                                              
                                                                 
 dense_1 (Dense)             (None, 1)                 201       
                                                                 
Total params: 4,601,701
Trainable params: 4,601,701
Non-trainable params: 0
_________________________________________________________________


In [15]:
model.fit(xtrain_pad, ytrain, epochs=5, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1d525498640>

In [16]:
scores = model.predict(xvalid_pad)
print("Auc: %.2f%%" % (roc_auc(scores, yvalid)))

Auc: 0.62%
