<strong> Recurrent Neural Networks </strong> <br/>
A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. 

![](img/RNN.png)


All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer.

![](img/RNN-unrolled.png)

h(t) = tanh(W[x(t), h(t-1)] + b)<br/><br/><br/>
However, in practice, RNNs were shown to be unable to handle Long-term dependencies.



<strong> LSTM </strong> <br/>
Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. 

LSTMs also have the same chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way.
![](img/LSTM-chain.png)

The key to LSTMs is the cell state. It is like a conveyor belt running straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged.
![](img/C.png)

![](img/LSTM1.png)
![](img/LSTM2.png)
![](img/LSTM3.png)
![](img/LSTM4.png)


<b>Reference</b>: Olah, C., 2015. Understanding lstm networks. GITHUB blog, posted on August, 27, p.2015.

<strong> Model implementation in TensorFlow </strong> <br/>

We create a network that has only one LSTM cell. We pass 2 elemnts to LSTM, the h(t-1) and c(t-1) which are called <b> state</b>. Here, state is a tuple with 2 elements, each one is of size [1 x 4], one for passing prv_output to next time step, and another for passing the prv_state to next time stamp.

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.contrib import rnn
sess = tf.Session()

---
### Let's Understand the parameters, inputs and outputs

We will treat the MNIST image $\in \mathcal{R}^{28 \times 28}$ as $28$ sequences of a vector $\mathbf{x} \in \mathcal{R}^{28}$. 

![](img/mnist.png)

<b>Reference</b>: Jasdeep Singh Chhabra, 2017. Understanding LSTM in Tensorflow(MNIST dataset). GITHUB blog.

In [None]:
# Import data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
trainX = mnist.train.images
trainY = mnist.train.labels
valX = mnist.validation.images
valY = mnist.validation.labels
testX = mnist.test.images
testY = mnist.test.labels

trainX = trainX.reshape(-1, 28, 28)
testX = testX.reshape(-1, 28, 28)

num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps
num_lstm = 128 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)
batch_size = 128

Placeholders

In [None]:
# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input])
Y = tf.placeholder("float", [None, num_classes])

In [None]:
# Define weights and biases
weights = {'out': tf.Variable(tf.random_normal([num_lstm, num_classes]))}
biases = {'out': tf.Variable(tf.random_normal([num_classes]))}

Constructing a basic <a href: "https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMCell" >LSTM </a> cell with tensorflow

In [None]:
# Define the lstm cells
lstm_cell = rnn.BasicLSTMCell(num_lstm, forget_bias=1.0)

A simplest form of RNN in TensorFlow specified from lstm_cell: rnn.static_rnn(cells, inputs)
<br/> The input argument has to be sequential (list of tensors) where the length of the list is the number of time steps.  

In [None]:
# Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
x = tf.unstack(X, timesteps, 1)
# Get lstm cell output
outputs, states = rnn.static_rnn(lstm_cell, x,dtype=tf.float32)

The output generated by the static_rnn is a list of tensors of shape [batch_size,num_units]. <br/>
Here, the output of the final step would be considered for the goal of classification.
<br/>
The states are tuple where the first element in the tuple is the cell state and the second is the hidden state.  

In [None]:
# Linear activation, using rnn inner loop last output
logits = tf.matmul(outputs[-1], weights['out']) + biases['out']

In [None]:
# Training Parameters
learning_rate = 0.001
numEpochs = 1000
batch_size = 128
display_step = 200
avg_cost_val=[]
train_accuracy=[]
val_accuracy = []

In [None]:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_step = optimizer.minimize(cost)

In [None]:
# Evaluate model (with test logits, for dropout to be disabled)
output_layer = tf.nn.softmax(logits)
prediction = tf.equal(tf.argmax(output_layer, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(prediction, tf.float32))

In [None]:
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

In [None]:
# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)
    for epoch in range(numEpochs):

        print epoch
        avg_cost = 0.
        tr_avg_acc=0.
        total_batch = int(mnist.train.num_examples/batch_size)
        
        _current_cell_state = np.zeros((batch_size, num_lstm))
        _current_hidden_state = np.zeros((batch_size, num_lstm))
        
        # Loop over all batches
        for _ in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Reshape data to get 28 seq of 28 elements
            batch_x = batch_x.reshape((batch_size, timesteps, num_input))
            
            # Run optimization op (backprop)
            
            _, c,tr_acc,current_states = sess.run([train_step,cost,accuracy,states], 
                                                  feed_dict={X: batch_x, Y: batch_y})
            avg_cost += c / total_batch
            tr_avg_acc += tr_acc/total_batch
            
            _current_cell_state,_current_hidden_state = current_states
            
        # Display logs per epoch step
        avg_cost_val.append(avg_cost)
        train_accuracy.append(tr_avg_acc)
        # accuracy on validation set
        valX = valX.reshape((len(valX), timesteps, num_input))
        val_accuracy.append(sess.run(accuracy, feed_dict={X:valX,Y:valY}))
        print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost))
            
    
    print("Optimization Finished :=) !")

In [None]:
#Plot the cost, training and validation error
import matplotlib.pyplot as plt
plt.figure()
plt.plot(avg_cost_val,'r-',label = 'cost')
plt.figure()
plt.plot(train_accuracy,'k*-')            
plt.plot(val_accuracy,'bo-')
plt.legend(('training accuracy', 'validation accuracy'), shadow=True, fancybox=True)
plt.show()

In [None]:
# accuracy on test set
testX = testX.reshape((len(testX), timesteps, num_input))
print(sess.run(accuracy, feed_dict={x:testX,y:testY}))