# Recurrent Neural Networks

In many cases, when the order of the data carries significant meaning, eg: time series data, sound waves, natural language, its most useful to use Recurrent Neural Networks.

![Recurrent Neural Network Structure](res/rnn.png)

In [2]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM # Use CuDNNLSTM if GPU available!

### Loading data
We should be using some sort of time-series data for RNNs, but for now we will learn on MNIST image data, then practice on a more realistic usecase in later lessons.

In [3]:
mnist = tf.keras.datasets.mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [4]:
print(X_train.shape)
print(y_train.shape)

(60000, 28, 28)
(60000,)


### Normalizing data
The first time, I started training the model, I forgot to scale the features. This small change made a hude impact on our training speed and accuracy. Without scaling, the accuracy barely increased over 0.11 while the training was taking forever!

In [5]:
X_train = X_train/255.0
X_test = X_test/255.0

### Building model

In [6]:
model = Sequential()

# Input layer
model.add(LSTM(128, activation='relu', input_shape=(X_train.shape[1:]), return_sequences=True))
model.add(Dropout(0.2))

# Hidden layer 1
model.add(LSTM(128, activation='relu'))
model.add(Dropout(0.2))

# Hidden layer 2
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))

# Output layer
model.add(Dense(10, activation='softmax'))


opt = tf.keras.optimizers.Adam(lr=1e-3, decay=1e-5)    # Optimizer
model.compile(loss='sparse_categorical_crossentropy',  # Loss
              optimizer=opt,
              metrics=['accuracy'])

### Training model

In [7]:
model.fit(X_train, y_train, epochs=3, validation_data=(X_test, y_test))

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x218a0b1d188>

##### Observation:
It seems that our Recurrent network is much more accurate (in less `epochs`) than our Convolutional network that we applied on this same image dataset. Our CNN took 10 epochs to go over 90% accuracy, which our RNN did in only 3 epochs.

This is probably due to the fact that our CNNs were not very deep. For extracting complex patterns in images, state of the art image netoworks often use 100+ layers. Our model being only 3 hidden layers deep was insuffieciently complex for the task.