## Learning Objectives (Competencies)
By the end of this lesson, students will be able to:
- What is RNN and what is LSTM
- What are Recurrent Cells in Keras
- How we can obtain the parameters for LSTM

## Recurrent Neural Network (RNN)

- The idea behind RNNs is to make use of sequential information


- In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea


- If you want to predict the next word in a sentence you better know which words came before it


- Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far

<img src = 'https://github.com/Make-School-Courses/DS-2.2-Deep-Learning/raw/master/Notebooks/Images/simple_rnn.png' width='600' height='600'>

So far we've learned about:
- MLP
- CNN

### Recurrent Cells in Keras: these are all NN's with memory

- SimpleRNN


- LSTM


- GRU

<img src = 'https://github.com/Make-School-Courses/DS-2.2-Deep-Learning/raw/master/Notebooks/Images/LSTM.png' width='600' height='600'>

### The time steps define how many times the LSTM cell state is updated for one sample (one mnist digit for example)

- To use LSTM for image classification we should prepare our data such that it has sequential meaning

- Lets prepare data (image here) for Sequential MNIST Classification

- We use 28 sequence (time step) each with 28 features
    - the image is broken down into 28 pieces (black strip, white strip, etc)

<img src = 'https://github.com/Make-School-Courses/DS-2.2-Deep-Learning/raw/master/Notebooks/Images/mnist_lstm.png' width = '500' height='500' >

## Activity: Train a LSTM model with Keras for MNIST Classification

In [4]:
from keras.datasets import mnist
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import LSTM
import keras

(x_train, y_train), (x_test, y_test) = mnist.load_data()    # building mnist
x_train = x_train/np.max(x_train)     # building data
x_test = x_test/np.max(x_test)
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# print(x_train[1])
x_train = x_train.reshape(x_train.shape[0], 28, 28)    # number of samples in train: x_train.shape[0], 28x28 pixels4
x_test = x_test.reshape(x_test.shape[0], 28, 28)
print(x_train[0])
nb_units = 50

model = Sequential()
# model.add(LSTM(256, input_dim=1, input_length=5))
model.add(LSTM(nb_units, input_shape=(28, 28)))
model.add(Dense(units=10, activation='softmax'))     # output is 10 neurons 
# 2.5 Compile the model.
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# 2.6 Print out model.summary
epochs = 3

history = model.fit(x_train,
                    y_train,
                    epochs=epochs,
                    batch_size=128,
                    verbose=1,
                    validation_split=0.2)

scores = model.evaluate(x_test, y_test, verbose=2)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

[[0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.    

Train on 48000 samples, validate on 12000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3
accuracy: 95.50%


### Return Sequence in LSTM

- `model.add(LSTM(nb_units, input_shape=(28, 28), return_sequences = False))`

- at the last stripe (28) it'll produce the output

<img src = 'https://github.com/Make-School-Courses/DS-2.2-Deep-Learning/raw/master/Notebooks/Images/return_seq_F.png' width='500' height='500'>



- `model.add(LSTM(nb_units, input_shape=(28, 28), return_sequences = True))`

- after each stripe, it produces an ouput that uses memory from the previous stripe

<img src = 'https://github.com/Make-School-Courses/DS-2.2-Deep-Learning/raw/master/Notebooks/Images/return_seq_T.png' width='500' height='500'>


In [None]:
### How the LSTM model for MNIST look like?


<img src = 'https://github.com/Make-School-Courses/DS-2.2-Deep-Learning/raw/master/Notebooks/Images/mnist_lstm_nn.png' width='500' height='500'>


### How many parameters LSTM has?

- Assume the subscript *t* indexes the time step

<img src = 'https://github.com/Make-School-Courses/DS-2.2-Deep-Learning/raw/master/Notebooks/Images/lstm_math.png' >

- We have four W, four U and four bias

- The number of parameters for LSTM is 4dh + 4 hh  + 4h. The last term is for four bias

In [None]:
def comput_outputs(xt, h_last, c_last, 
                   Wf, W1, Wo, Wc, 
                   Uf, U1, Uo, Uc, 
                   bf, bi, bo, bc):
    
    ft = sigmoid(np.dot(Wf, xt) + np.dot(Uf, h_last) + bf)
    it = sigmoid(np.dot(Wi, xt) + np.dot(Ui, h_last) + bi)
    ot = sigmoid(np.dot(Wi, xt) + np.dot(Ui, h_last) + bi)
    Ct = sigmoid(np.dot(Wi, xt) + np.dot(Ui, h_last) + bi)
    ht = sigmoid(np.dot(Wi, xt) + np.dot(Ui, h_last) + bi)

## Activity: Verify the number of parameters for LSTM in Keras

In [7]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM
import numpy as np

input_array = np.array([[[0], [1], [2], [3], [4]], [[5], [1], [2], [3], [6]]])    # 2 samples with 5 time steps each. Each time-step has 1 feature.
print(input_array.shape)
model = Sequential()
# model.add(LSTM(256, input_dim=1, input_length=5))
model.add(LSTM(10, input_shape=(5, 1), return_sequences=False))     # input_shape(5, 1) --> 5 samples with 1 feature
model.summary()
print(input_array)
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
print(output_array)
# the number of parameters of a LSTM layer in Keras equals to
# params = 4 * ((size_of_input + 1) * size_of_output + size_of_output^2)
n_params = 4 * ((1 + 1) * 10 + 10**2)
print(n_params)    # 480 -> m=1, n=10 --> 4nm+(4n^2)+4n
print(model.summary())

(2, 5, 1)
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_4 (LSTM)                (None, 10)                480       
Total params: 480
Trainable params: 480
Non-trainable params: 0
_________________________________________________________________
[[[0]
  [1]
  [2]
  [3]
  [4]]

 [[5]
  [1]
  [2]
  [3]
  [6]]]
[[ 0.10064673  0.02009404  0.03832989 -0.15794794 -0.08423097 -0.31807515
   0.19400118 -0.1709121   0.10603718 -0.17598154]
 [ 0.15246587  0.01373161  0.02675976 -0.1352916  -0.11107537 -0.49042052
   0.34700504 -0.1528391   0.06857175 -0.13648853]]
480
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_4 (LSTM)                (None, 10)                480       
Total params: 480
Trainable params: 480
Non-trainable params: 0
_____________________________________

### Does the number of parameters in LSTM depend on each sample's time step?
- No

In [9]:
m = 28
n = 50

(4*n*m)+(4*n**2)+(4*n)

15800

In [None]:
## Activity: Train a LSTM model with Keras for MNIST Classification

- n is the number of hyperparams
- m is 