# RNN Tutorial
 

### Contents
1. <big>What is a RNN</big>
2. <big> What can they do?</big>
3. <big> Vanishing Gradient </big>
4. <big> LSTMs </big>
5. <big>Text Generation via LSTM RNN</big>
6. <big>Resources</big>

## What is a RNN?

<big>Recurrent Neural Networks take the previous output or hidden states as inputs.</big>
<img src="images/rnn.png">


## Why RNNs?
<big>Because not all problems can be converted into one with fixedlength inputs and outputs. <br></big>

## What can they do?
1. <big> Image Captioning </big>
2. <big> Machine  Translation </big>
3. <big>Sentiment Classification </big>
4. <big> Time series Prediction </big>

## Types of Inputs to an RNN
<img src="images/whyrnns.png">

<big>Now, even though RNNs are quite powerful
they suffer from **Vanishing gradient** problem  which hinders them from using long term information 
like they are good for storing memory 3-4 instances of past iterations 
but larger number of instances don't provide good results so we don't just use regular RNNs. </big>

### Vanishing Gradient in RNNs


<img src="images/vangrad.png">

<big>Instead, we use a better variation of RNNs: **Long Short Term Networks(LSTM).** <big>

## 4.Long Short Term Memory(LSTM)

Long short-term memory (LSTM) units (or blocks) are a building unit for layers of a recurrent neural network (RNN). 




<img src="images/lstm_chain.png">



<big>The expression long short-term refers to the fact that LSTM is a model for the short-term memory which can last for a long period of time. </big>


## Components of LSTMs
So the LSTM cell contains the following components
* <big>Input Gate </big>
* <big> Learn Gate</big> 
* <big> Remember Gate</big>
* <big> Forget Gate </big>
* <big> Output Gate</big>
* <big> Use Gate</big>

## 5. Text Generation via LSTM RNN

In [None]:
#Importing the dependencies
import sys
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils


In [None]:
#load ascii text and conver to lowercase
filename = "data.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()


In [None]:
#Extract all the unique characters from the text
chars = sorted(list(set(raw_text)))
char_to_int  = dict((c,i) for i,c in enumerate(chars))

In [None]:
#Lets Have a look at some data in our character list
chars[:10]

In [None]:
#Summary of the dataset
n_chars = len(raw_text)
n_vocab = len(chars)
print(f'Total characters: {n_chars}')
print(f'Total Vocab: {n_vocab}')

<big>
1. Now is the time to define the training mechanism of the network. <br>
2. There can be many ways to split the data into chunks of a fixed size which can be fed to the network.<br>
3. The target would be the character next to the chunk end.<br>
4. we will split the book text up into subsequences with a fixed length of 100 characters, an arbitrary length. <br>
5. We could just as easily split the data up by sentences and pad the shorter sequences and truncate the longer ones.<br>
6. Each training pattern of the network is comprised of 100 time steps of one character (X) followed by one character output (y)<br>
7. When creating these sequences, we slide this window along the whole book one character at a time allowing each character a chance to be learned from the 100 characters that preceded it. </big>

### Prepare the dataset


In [None]:
#Encode the dataset of input to output pairs as integers
seq_length=100
dataX = []
dataY = []

for i in range(0,n_chars-seq_length,1):
    seq_in = raw_text[i:i+seq_length]
    seq_out = raw_text[i+seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print(f'Total patterns: {n_patterns}')
    

In [None]:
#Now let's transform the list of input sequences into the form [samples, time steps, features] expected by an LSTM network.
X = numpy.reshape(dataX,(n_patterns,seq_length,1))
#normalize the data to be easily fed to the LSTM
X = X/float(n_vocab)
#One hot encode the output variable
y = np_utils.to_categorical(dataY)


In [None]:
#Print out the Shape of X
X.shape

In [None]:
#Print out the Shape of y
y.shape

### Define the LSTM Model using Keras

In [None]:
#Specify the model 
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1],X.shape[2])))
model.add(Dropout(0.3))
model.add(Dense(y.shape[1],activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam')

In [None]:
#As the model above takes time to train, we will use model checkpoint feature of keras  to save the training weights periodically 
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]


In [None]:
#fit the model 
model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

### Generating the Text


In [None]:
filename = "weights-improvement-epoch-loss.hdf5"
model.load_weights(filename)
model.compile(loss="categorical_crossentropy",optimizer='adam')


In [None]:
#Reverse encode the integers to text
int_to_char = dict((i, c) for i, c in enumerate(chars))


In [None]:
#Pick a random seed
start = numpy.random.randint(0,len(dataX)-1)
pattern = dataX[start]
print(f"Seed is \n {''.join([int_to_char[value] for value in pattern])}")

In [None]:
#Generate the text
for i in range(400):
    x  = numpy.reshape(pattern,(1,len(pattern),1))
    x  = x/float(n_vocab)
    prediction = model.predict(x,verbose=0)
    index  = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]

## 6. Resources


1. [Understanding LSTM Networks-Colah's Blog](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
2. [The Unreasonable Effectiveness of RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
3. [Bi-LSTM with Pytorch](https://www.kaggle.com/ziliwang/baseline-pytorch-bilstm)
4. [A friendly introduction to Recurrent Neural Networks](https://www.youtube.com/watch?v=UNmqTiOnRfg&t=3s)