# Instructions

1. Go to https://colab.research.google.com and choose the \"Upload\" option to upload this notebook file.
1. In the Edit menu, choose \"Notebook Settings\" and then set the \"Hardware Accelerator\" dropdown to GPU.
1. Read through the code in the following sections:
  * [IMDB Dataset](#scrollTo=mXcb24B6a03_)
  * [Define model](#scrollTo=kAz68ipVa05_)
  * [Train model](#scrollTo=kIynp1v_a06Y)
  * [Assess model](#scrollTo=ALyNCqx4a06r)
1. Complete at least one of these exercises. Remember to keep notes about what you do!
  * [Exercise Option #1 - Standard Difficulty](#scrollTo=_9dsjJwya06_)
  * [Exercise Option #2 - Advanced Difficulty](#scrollTo=nyZbljLAa09z)

## Documentation/Sources
* [Class Notes](https://jennselby.github.io/MachineLearningCourseNotes/#recurrent-neural-networks)
* [https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/](https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/) for information on sequence classification with keras
* [https://keras.io/](https://keras.io/) Keras API documentation
* [Keras recurrent tutorial](https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent)

In [2]:
# # upgrade tensorflow to tensorflow 2
# %tensorflow_version 2.x
# display matplotlib plots
%matplotlib inline
from tensorflow import test
from tensorflow import device

# IMDB Dataset
The [IMDB dataset](https://keras.io/datasets/#imdb-movie-reviews-sentiment-classification) consists of movie reviews (x_train) that have been marked as positive or negative (y_train). See the [Word Vectors Tutorial](https://github.com/jennselby/MachineLearningTutorials/blob/master/WordVectors.ipynb) for more details on the IMDB dataset.

In [25]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

In [26]:
(imdb_x_train, imdb_y_train), (imdb_x_test, imdb_y_test) = imdb.load_data()

  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


For a standard keras model, every input has to be the same length, so we need to set some length after which we will cutoff the rest of the review. (We will also need to pad the shorter reviews with zeros to make them the same length).

In [27]:
cutoff = 500
imdb_x_train_padded = sequence.pad_sequences(imdb_x_train, maxlen=cutoff)
imdb_x_test_padded = sequence.pad_sequences(imdb_x_test, maxlen=cutoff)

 # see https://stackoverflow.com/questions/42821330/restore-original-text-from-keras-s-imdb-dataset
imdb_index_offset = 3

In [5]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

# Define model

Unlike last time, when we used convolutional layers, we're going to use an LSTM, a special type of recurrent network.

Using recurrent networks means that rather than seeing these reviews as one input happening all at once, with the convolutional layers taking into account which words are next to each other, we are going to see them as a sequence of inputs, with one word occurring at each timestep.

In [42]:
imdb_lstm_model = Sequential()
imdb_lstm_model.add(Embedding(input_dim=len(imdb.get_word_index()) + imdb_index_offset,
                              output_dim=100,
                              input_length=cutoff))
# return_sequences tells the LSTM to output the full sequence, for use by the next LSTM layer. The final
# LSTM layer should return only the output sequence, for use in the Dense output layer
imdb_lstm_model.add(LSTM(units=32, return_sequences=True))
imdb_lstm_model.add(LSTM(units=32))
imdb_lstm_model.add(Dense(units=1, activation='sigmoid')) # because at the end, we want one yes/no answer
imdb_lstm_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

# Train model

In [43]:
# Train using GPU acceleration
# (see https://colab.research.google.com/notebooks/gpu.ipynb#scrollTo=Y04m-jvKRDsJ)
device_name = test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

with device('/device:GPU:0'):
  imdb_lstm_model.fit(imdb_x_train_padded, imdb_y_train, epochs=1, batch_size=64)



# Assess model

In [44]:
with device('/device:GPU:0'):
  imdb_lstm_scores = imdb_lstm_model.evaluate(imdb_x_test_padded, imdb_y_test)
  print('loss: {} accuracy: {}'.format(*imdb_lstm_scores))

loss: 0.2957181930541992 accuracy: 0.878000020980835


# Exercise Option #1 - Standard Difficulty

Experiment with different model configurations from the one above. Try other recurrent layers, different numbers of layers, change some of the defaults. See [Keras Recurrent Layers](https://keras.io/layers/recurrent/)

__Keep notes on what you try and what results you get.__

## Tested changes
- Add a dropout layer after the second LSTM layer with drop rate 0.5 to prevent overfitting
    - Decreases accuracy as overfitting doesn't seem to be a problem with the current network design.
- Add a **time-distributed** dense layer of 100 neurons in the feed-forward section of the network
    - Ran out of memory when allocating a \[32000, 88587\] matrix in GPU memory.
- Changing the dimensionality of the word vector embeddings from 100.
    - Decreasing dimensionality seems to have a negligible impact on accuracy—at what point does the word vector become too low-dimensional to encode information accurately?

In [39]:
from tensorflow.keras.layers import Dropout, TimeDistributed

model_1 = Sequential()
model_1.add(Embedding(input_dim=len(imdb.get_word_index()) + imdb_index_offset,
                              output_dim=32, # Decreasing word vector dimensionality.
                              input_length=cutoff))
# return_sequences tells the LSTM to output the full sequence, for use by the next LSTM layer. The final
# LSTM layer should return only the output sequence, for use in the Dense output layer
model_1.add(LSTM(units=32, return_sequences=True))
model_1.add(LSTM(units=32))
model_1.add(Dense(units=1, activation='sigmoid')) # because at the end, we want one yes/no answer
model_1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

with device('/device:GPU:0'):
  model_1.fit(imdb_x_train_padded, imdb_y_train, epochs=1, batch_size=64)



In [40]:
with device('/device:GPU:0'):
  model_1_scores = model_1.evaluate(imdb_x_test_padded, imdb_y_test)
  print('loss: {} accuracy: {}'.format(*model_1_scores))

loss: 0.3312091827392578 accuracy: 0.8529999852180481


## Model 2

- Decreasing the word vector dimensionality to 8 produces a similar accuracy rate of ~87% to the results observed with a 100-dimensional word vector.

In [45]:
model_2 = Sequential()
model_2.add(Embedding(input_dim=len(imdb.get_word_index()) + imdb_index_offset,
                              output_dim=8,
                              input_length=cutoff))
# return_sequences tells the LSTM to output the full sequence, for use by the next LSTM layer. The final
# LSTM layer should return only the output sequence, for use in the Dense output layer
model_2.add(LSTM(units=32, return_sequences=True))
model_2.add(LSTM(units=32))
model_2.add(Dense(units=1, activation='sigmoid')) # because at the end, we want one yes/no answer
model_2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

with device('/device:GPU:0'):
  model_2.fit(imdb_x_train_padded, imdb_y_train, epochs=1, batch_size=64)



In [46]:
with device('/device:GPU:0'):
  model_2_scores = model_2.evaluate(imdb_x_test_padded, imdb_y_test)
  print('loss: {} accuracy: {}'.format(*model_2_scores))

loss: 0.3003838360309601 accuracy: 0.8770400285720825


## Model 3
- Removing an LSTM layer from the model has a negligible impact on accuracy.

In [49]:
model_3 = Sequential()
model_3.add(Embedding(input_dim=len(imdb.get_word_index()) + imdb_index_offset,
                              output_dim=8,
                              input_length=cutoff))
# return_sequences tells the LSTM to output the full sequence, for use by the next LSTM layer. The final
# LSTM layer should return only the output sequence, for use in the Dense output layer
# model_3.add(LSTM(units=32, return_sequences=True))
model_3.add(LSTM(units=32))
model_3.add(Dense(units=1, activation='sigmoid')) # because at the end, we want one yes/no answer
model_3.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

with device('/device:GPU:0'):
  model_3.fit(imdb_x_train_padded, imdb_y_train, epochs=2, batch_size=64)

Epoch 1/2
Epoch 2/2


In [50]:
with device('/device:GPU:0'):
  model_3_scores = model_3.evaluate(imdb_x_test_padded, imdb_y_test)
  print('loss: {} accuracy: {}'.format(*model_3_scores))

loss: 0.29620546102523804 accuracy: 0.8776000142097473


## Model 4

- Reduced the number of LSTM units to 8, with a negligible impact on training accuracy.

In [53]:
model_4 = Sequential()
model_4.add(Embedding(input_dim=len(imdb.get_word_index()) + imdb_index_offset,
                              output_dim=8,
                              input_length=cutoff))
# return_sequences tells the LSTM to output the full sequence, for use by the next LSTM layer. The final
# LSTM layer should return only the output sequence, for use in the Dense output layer
# model_3.add(LSTM(units=32, return_sequences=True))
model_4.add(LSTM(units=4))
model_4.add(Dense(units=1, activation='sigmoid')) # because at the end, we want one yes/no answer
model_4.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

with device('/device:GPU:0'):
  model_4.fit(imdb_x_train_padded, imdb_y_train, epochs=2, batch_size=64)

Epoch 1/2
Epoch 2/2


In [52]:
with device('/device:GPU:0'):
  model_4_scores = model_4.evaluate(imdb_x_test_padded, imdb_y_test)
  print('loss: {} accuracy: {}'.format(*model_4_scores))

loss: 0.3458023965358734 accuracy: 0.8583199977874756


# Exercise Option #2 - Advanced Difficulty

Set up your own RNN model for the Reuters Classification Problem

Take the model from exercise 1 (imdb_lstm_model) and modify it to classify the [Reuters data](https://keras.io/datasets/#reuters-newswire-topics-classification).

Think about what you are trying to predict in this case, and how you will have to change your model to deal with this.

In [118]:
from tensorflow.keras.datasets import reuters
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, TimeDistributed, Flatten, Dropout
from tensorflow.keras.utils import to_categorical
import numpy as np

In [119]:
(reuters_x_train, reuters_y_train), (reuters_x_test, reuters_y_test) = reuters.load_data()

  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


In [120]:
cutoff = 500
reuters_x_train_padded = sequence.pad_sequences(reuters_x_train, maxlen=cutoff)
reuters_x_test_padded = sequence.pad_sequences(reuters_x_test, maxlen=cutoff)

reuters_y_train_categorical = to_categorical(reuters_y_train, num_classes=46)
reuters_y_test_categorical = to_categorical(reuters_y_test, num_classes=46)

## Baseline Model

In [89]:
reuters_model = Sequential()
reuters_model.add(Embedding(input_dim=len(reuters.get_word_index())+3,
                              output_dim=100, # Decreasing word vector dimensionality.
                              input_length=cutoff))
reuters_model.add(LSTM(units=8))
reuters_model.add(Dense(46, activation='softmax'))
reuters_model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [82]:
with device('/device:GPU:0'):
  reuters_model.fit(reuters_x_train_padded, reuters_y_train_categorical, epochs=10, batch_size=64)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [83]:
with device('/device:GPU:0'):
  reuters_model_scores = reuters_model.evaluate(reuters_x_test_padded, reuters_y_test_categorical)
  print('loss: {} accuracy: {}'.format(*reuters_model_scores))

loss: 1.9068667888641357 accuracy: 0.6001781225204468


## Model 2

In [92]:
rm2 = Sequential()
rm2.add(Embedding(input_dim=len(reuters.get_word_index())+3,
                              output_dim=100, # Decreasing word vector dimensionality.
                              input_length=cutoff))
rm2.add(LSTM(units=32, return_sequences=True))
rm2.add(TimeDistributed(Dense(100, activation='tanh')))
rm2.add(Flatten())
rm2.add(Dropout(0.5))
rm2.add(Dense(46, activation='softmax'))
rm2.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

rm2.summary()

Model: "sequential_24"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_23 (Embedding)     (None, 500, 100)          3098200   
_________________________________________________________________
lstm_22 (LSTM)               (None, 500, 32)           17024     
_________________________________________________________________
time_distributed_5 (TimeDist (None, 500, 100)          3300      
_________________________________________________________________
flatten_1 (Flatten)          (None, 50000)             0         
_________________________________________________________________
dropout (Dropout)            (None, 50000)             0         
_________________________________________________________________
dense_28 (Dense)             (None, 46)                2300046   
Total params: 5,418,570
Trainable params: 5,418,570
Non-trainable params: 0
___________________________________________

In [93]:
with device('/device:GPU:0'):
  rm2.fit(reuters_x_train_padded, reuters_y_train_categorical, epochs=10, batch_size=64)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [94]:
with device('/device:GPU:0'):
  print('loss: {} accuracy: {}'.format(*rm2.evaluate(reuters_x_test_padded, reuters_y_test_categorical)))

loss: 1.3717308044433594 accuracy: 0.7284060716629028


## Model 3

In [103]:
rm3 = Sequential()
rm3.add(Embedding(input_dim=len(reuters.get_word_index())+3,
                              output_dim=400, # Decreasing word vector dimensionality.
                              input_length=cutoff))
rm3.add(LSTM(units=32, return_sequences=True))
rm3.add(LSTM(units=32, return_sequences=True))
rm3.add(TimeDistributed(Dense(100, activation='tanh')))
rm3.add(Flatten())
rm3.add(Dropout(0.5))
rm3.add(Dense(512, activation='tanh'))
rm3.add(Dropout(0.5))
rm3.add(Dense(46, activation='softmax'))
rm3.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

rm3.summary()

Model: "sequential_28"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_27 (Embedding)     (None, 500, 400)          12392800  
_________________________________________________________________
lstm_28 (LSTM)               (None, 500, 32)           55424     
_________________________________________________________________
lstm_29 (LSTM)               (None, 500, 32)           8320      
_________________________________________________________________
time_distributed_9 (TimeDist (None, 500, 100)          3300      
_________________________________________________________________
flatten_5 (Flatten)          (None, 50000)             0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 50000)             0         
_________________________________________________________________
dense_37 (Dense)             (None, 512)             

In [104]:
with device('/device:GPU:0'):
  rm3.fit(reuters_x_train_padded, reuters_y_train_categorical, epochs=10, batch_size=64)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [105]:
with device('/device:GPU:0'):
  print('loss: {} accuracy: {}'.format(*rm3.evaluate(reuters_x_test_padded, reuters_y_test_categorical)))

loss: 1.4736706018447876 accuracy: 0.7422083616256714


## Model 4

In [133]:
from tensorflow.keras.regularizers import l2

In [180]:
rm4 = Sequential()
rm4.add(Embedding(input_dim=len(reuters.get_word_index())+3,
                              output_dim=100, # Decreasing word vector dimensionality.
                              input_length=cutoff))
rm4.add(LSTM(units=32, return_sequences=True))
# rm4.add(Dropout(0.5))
# rm4.add(LSTM(units=32, return_sequences=True))
# rm4.add(TimeDistributed(Dense(50, activation='tanh', kernel_regularizer=l2(1e-4), bias_regularizer=l2(1e-4), activity_regularizer=l2(1e-5))))
rm4.add(Flatten())
rm4.add(Dropout(0.5))
rm4.add(Dense(46, activation='softmax', kernel_regularizer=l2(1e-4), bias_regularizer=l2(1e-4)))
rm4.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

rm4.summary()

Model: "sequential_53"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_52 (Embedding)     (None, 500, 100)          3098200   
_________________________________________________________________
lstm_66 (LSTM)               (None, 500, 32)           17024     
_________________________________________________________________
flatten_25 (Flatten)         (None, 16000)             0         
_________________________________________________________________
dropout_32 (Dropout)         (None, 16000)             0         
_________________________________________________________________
dense_80 (Dense)             (None, 46)                736046    
Total params: 3,851,270
Trainable params: 3,851,270
Non-trainable params: 0
_________________________________________________________________


In [181]:
with device('/device:GPU:0'):
  rm4.fit(reuters_x_train_padded, reuters_y_train_categorical, epochs=10, batch_size=64)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [182]:
with device('/device:GPU:0'):
  print('loss: {} accuracy: {}'.format(*rm4.evaluate(reuters_x_test_padded, reuters_y_test_categorical)))

loss: 1.3367044925689697 accuracy: 0.7359750866889954
