# HW4 Q6 RNN & LSTM

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
np.random.seed(42)

from keras.datasets import mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import SimpleRNN, LSTM
from keras.callbacks import EarlyStopping

from sklearn.metrics import accuracy_score

Using TensorFlow backend.


In [2]:
# load the date and split into training/testing sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [3]:
n_classes = 10

print("Targets before: \n{}".format(y_train[:10]))
ybm_train = to_categorical(y_train, n_classes)
ybm_test = to_categorical(y_test, n_classes)
print("Targets after: \n{}".format(ybm_train[:10]))

Targets before: 
[5 0 4 1 9 2 1 3 1 4]
Targets after: 
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]


In [4]:
image_size = x_train.shape[1]
xrnn_train = np.reshape(x_train,[-1, image_size, image_size])
xrnn_test = np.reshape(x_test,[-1, image_size, image_size])
xrnn_train = xrnn_train.astype('float32') / 255
xrnn_test = xrnn_test.astype('float32') / 255
print(xrnn_train.shape)

(60000, 28, 28)


### Q1 : Explain why we reshape data as [-1, image_size, image_size]?

**Answer:**  
RNN and LSTM require input data to have $[Number~of~Samples,~Time~Steps,~Features]$  
In this case, one training image is devided into $28~x~28$, and we are sending each row with 28 features sequentially.   
For $[-1,~image\_size,~image\_size]$, -1 can be used to define unknown values, so in this this case, it will automatically assign 60000.

### Q2 : Write code to set parameters for RNN. 
#### 1. You need to set input shape, layers (2), units (256), dropout_rate (0.4), activation function ('relu'）
#### 2. You may use Sequential, SimpleRNN, Dropout
#### 3. You need to add one dense layer at the end of your network. (You may use : Dense, activation function is 'softmax')
#### 4. You need to summary the parameters (You may use summary())

In [5]:
model1 = Sequential()

model1.add(SimpleRNN(units=256, input_shape=(image_size, image_size), return_sequences=True))
model1.add(Dropout(rate=0.4))

model1.add(SimpleRNN(units=256, input_shape=(image_size, image_size)))
model1.add(Dropout(rate=0.4))

model1.add(Dense(units=n_classes, activation='softmax'))

print(model1.summary())





Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
simple_rnn_1 (SimpleRNN)     (None, 28, 256)           72960     
_________________________________________________________________
dropout_1 (Dropout)          (None, 28, 256)           0         
_________________________________________________________________
simple_rnn_2 (SimpleRNN)     (None, 256)               131328    
_________________________________________________________________
dropout_2 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                2570      
Total params: 206,858
Trainable params: 206,858
Non-trainable params: 0
_________________________________________________________________
None


### Q3 : Write code to implement RNN.
#### 1. Compile the model (you may use compile, optimizer as 'nadam', loss as 'categorical_crossentropy', metrics as ['accuracy'])
#### 2. Set early stop, monitor as 'val_loss', patience as 3, mode as 'auto', min_delta as 0.  (you may use EarlyStopping)
#### 3. Fit x_data and remember to set callback (Set batch_size as 1000, epochs as 10, validation_split as 0.1)
#### 4. Print out the accuracy of train set and test set of each epoch (you may use evaluate)

In [6]:
model1.compile(loss='categorical_crossentropy', optimizer='nadam', metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', patience=3, mode='auto', min_delta=0, verbose=1, restore_best_weights=True)

history1 = model1.fit(xrnn_train, ybm_train, batch_size=1000, epochs=10, validation_split=0.1, verbose=1, callbacks=[es])



Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Train on 54000 samples, validate on 6000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [7]:
history1.history.keys()

dict_keys(['val_loss', 'val_acc', 'loss', 'acc'])

In [8]:
print('Train_loss: ', history1.history['loss'])
print('Train_accuracy: ', history1.history['acc'])
print('Validation_loss: ', history1.history['val_loss'])
print('Validation_accuracy: ', history1.history['val_acc'])

Train_loss:  [0.9722806132502027, 1.8063754152368616, 0.8246483482696392, 0.756297657335246, 0.3478739556890947, 0.28468208070154544, 0.18048053897089428, 0.14785883451501527, 0.27709640755697534, 2.173657907379998]
Train_accuracy:  [0.6862962960645005, 0.46503703944661, 0.7506851851940155, 0.7674074084670456, 0.8971481466734851, 0.9183888854803862, 0.9489999998498846, 0.9575555589463975, 0.9337962984486863, 0.2796111122049667]
Validation_loss:  [0.3141018922130267, 0.5393432229757309, 2.4229359229405723, 0.2355282505353292, 0.15360453724861145, 0.1461922898888588, 0.10643813262383144, 0.09897683560848236, 4.612960497538249, 1.0063961446285248]
Validation_accuracy:  [0.9034999907016754, 0.8441666762034098, 0.37950000166893005, 0.9283333222071329, 0.9553333322207133, 0.957833339770635, 0.9684999883174896, 0.971833328406016, 0.24916666001081467, 0.6666666666666666]


In [9]:
test_result1 = model1.evaluate(xrnn_test, ybm_test);



In [10]:
print('Test_loss :', test_result1[0])
print('Test_accuracy :', test_result1[1])

Test_loss : 1.1295365725517272
Test_accuracy : 0.6157


### Q4 : Write code to set parameters for LSTM
#### 1. You need to set input shape, layers (2), units (256), dropout_rate (0.4), activation function ('relu'）
#### 2. You may use Sequential, LSTM, Dropout
#### 3. You need to add one dense layer at the end of your network. (You may use : Dense, activation function is 'softmax')
#### 4. You need to summary the parameters (You may use summary())

In [11]:
model2 = Sequential()

model2.add(LSTM(units=256, input_shape=(image_size, image_size), return_sequences=True))
model2.add(Dropout(rate=0.4))

model2.add(SimpleRNN(units=256, input_shape=(image_size, image_size)))
model2.add(Dropout(rate=0.4))

model2.add(Dense(units=n_classes, activation='softmax'))

print(model2.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 28, 256)           291840    
_________________________________________________________________
dropout_3 (Dropout)          (None, 28, 256)           0         
_________________________________________________________________
simple_rnn_3 (SimpleRNN)     (None, 256)               131328    
_________________________________________________________________
dropout_4 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                2570      
Total params: 425,738
Trainable params: 425,738
Non-trainable params: 0
_________________________________________________________________
None


### Q5 : Write code to implement LSTM.
#### 1. Compile the model (you may use compile, optimizer as 'nadam', loss as 'categorical_crossentropy', metrics as ['accuracy'])
#### 2. Set early stop, monitor as 'val_loss', patience as 3, mode as 'auto', min_delta as 0.  (you may use EarlyStopping)
#### 3. Fit x_data and remember to set callback (Set batch_size as 1000, epochs as 10, validation_split as 0.1)
#### 4. Print out the accuracy of train set and test set of each epoch (you may use evaluate)

In [12]:
model2.compile(loss='categorical_crossentropy', optimizer='nadam', metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', patience=3, mode='auto', min_delta=0, verbose=1, restore_best_weights=True)

history2 = model2.fit(xrnn_train, ybm_train, batch_size=1000, epochs=10, validation_split=0.1, verbose=1, callbacks=[es])

Train on 54000 samples, validate on 6000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [13]:
print('Train_loss: ', history2.history['loss'])
print('Train_accuracy: ', history2.history['acc'])
print('Validation_loss: ', history2.history['val_loss'])
print('Validation_accuracy: ', history2.history['val_acc'])

Train_loss:  [1.107492118521973, 0.32123406921271924, 0.5178311298842784, 0.942856385200112, 0.19224478194007166, 0.11981843246353997, 0.09565970798333485, 0.07147593589292632, 0.060326035858856306, 0.055254240727259055]
Train_accuracy:  [0.6272037029266357, 0.898888885974884, 0.8712037031849226, 0.7148518531962678, 0.9432962912100332, 0.9650370368251094, 0.9724444459985804, 0.979499997916045, 0.9822222215157969, 0.9841296286494644]
Validation_loss:  [0.28299641609191895, 0.1335458941757679, 3.4714513222376504, 0.16863589733839035, 0.11287385101119678, 0.07306283339858055, 0.06777001669009526, 0.05983287406464418, 0.08887973117331664, 0.051382586980859436]
Validation_accuracy:  [0.9131666620572408, 0.9586666623751322, 0.21849999825159708, 0.9456666807333628, 0.9643333355585734, 0.9783333440621694, 0.9805000027020773, 0.9830000102519989, 0.975000003973643, 0.9848333299160004]


In [14]:
test_result2 = model2.evaluate(xrnn_test, ybm_test);



In [15]:
print('Test_loss :', test_result2[0])
print('Test_accuracy :', test_result2[1])

Test_loss : 0.05856194389555603
Test_accuracy : 0.9818
