## Documentation/Sources
* [Class Notes](https://jennselby.github.io/MachineLearningCourseNotes/#recurrent-neural-networks)
* [https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/](https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/) for information on sequence classification with keras
* [https://keras.io/](https://keras.io/) Keras API documentation
* [Keras recurrent tutorial](https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent)

In [None]:
# upgrade tensorflow to tensorflow 2
%tensorflow_version 2.x
# display matplotlib plots
%matplotlib inline
from tensorflow import test
from tensorflow import device

# IMDB Dataset
The [IMDB dataset](https://keras.io/datasets/#imdb-movie-reviews-sentiment-classification) consists of movie reviews (x_train) that have been marked as positive or negative (y_train). See the [Word Vectors Tutorial](https://github.com/jennselby/MachineLearningTutorials/blob/master/WordVectors.ipynb) for more details on the IMDB dataset.

In [None]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

In [None]:
(imdb_x_train, imdb_y_train), (imdb_x_test, imdb_y_test) = imdb.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


For a standard keras model, every input has to be the same length, so we need to set some length after which we will cutoff the rest of the review. (We will also need to pad the shorter reviews with zeros to make them the same length).

In [None]:
cutoff = 500
imdb_x_train_padded = sequence.pad_sequences(imdb_x_train, maxlen=cutoff)
imdb_x_test_padded = sequence.pad_sequences(imdb_x_test, maxlen=cutoff)

 # see https://stackoverflow.com/questions/42821330/restore-original-text-from-keras-s-imdb-dataset
imdb_index_offset = 3

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

# Define model

Unlike last time, when we used convolutional layers, we're going to use an LSTM, a special type of recurrent network.

Using recurrent networks means that rather than seeing these reviews as one input happening all at once, with the convolutional layers taking into account which words are next to each other, we are going to see them as a sequence of inputs, with one word occurring at each timestep.

In [None]:
imdb_lstm_model = Sequential()
imdb_lstm_model.add(Embedding(input_dim=len(imdb.get_word_index()) + imdb_index_offset,
                              output_dim=100,
                              input_length=cutoff))
# return_sequences tells the LSTM to output the full sequence, for use by the next LSTM layer. The final
# LSTM layer should return only the output sequence, for use in the Dense output layer
imdb_lstm_model.add(LSTM(units=32, return_sequences=True))
imdb_lstm_model.add(LSTM(units=32))
imdb_lstm_model.add(Dense(units=1, activation='sigmoid')) # because at the end, we want one yes/no answer
imdb_lstm_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json


# Train model

In [None]:
# Train using GPU acceleration
# (see https://colab.research.google.com/notebooks/gpu.ipynb#scrollTo=Y04m-jvKRDsJ)
device_name = test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

with device('/device:GPU:0'):
  imdb_lstm_model.fit(imdb_x_train_padded, imdb_y_train, epochs=1, batch_size=64)



# Assess model

In [None]:
with device('/device:GPU:0'):
  imdb_lstm_scores = imdb_lstm_model.evaluate(imdb_x_test_padded, imdb_y_test)
  print('loss: {} accuracy: {}'.format(*imdb_lstm_scores))

loss: 0.3437422215938568 accuracy: 0.8619199991226196


# Exercise Option #1 - Standard Difficulty

Experiment with different model configurations from the one above. Try other recurrent layers, different numbers of layers, change some of the defaults. See [Keras Recurrent Layers](https://keras.io/layers/recurrent/)

__Keep notes on what you try and what results you get.__

In [None]:
# Helper functions

def train_model(model=imdb_lstm_model, x_train=imdb_x_train_padded, y_train=imdb_y_train, epc=1, batch=64):
  device_name = test.gpu_device_name()
  if device_name != '/device:GPU:0':
    print(
        '\n\nThis error most likely means that this notebook is not '
        'configured to use a GPU.  Change this in Notebook Settings via the '
        'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
    raise SystemError('GPU device not found')

  with device('/device:GPU:0'):
    model.fit(x_train, y_train, epochs=epc, batch_size=batch)

def assess_model(model=imdb_lstm_model, x_test=imdb_x_test_padded, y_test=imdb_y_test):
  scores = model.evaluate(x_test, y_test)
  print('loss: {} accuracy: {}'.format(*scores))

def get_basic_imdb_lstm_model():
  model = Sequential()
  model.add(Embedding(input_dim=len(imdb.get_word_index()) + imdb_index_offset,
                              output_dim=100,
                              input_length=cutoff))
  return model

In [None]:
imdb_lstm_tuning_1 = get_basic_imdb_lstm_model()

imdb_lstm_tuning_1.add(LSTM(units=32, return_sequences=True))
imdb_lstm_tuning_1.add(LSTM(units=32))

imdb_lstm_tuning_1.add(Dense(units=16, activation='sigmoid')) 
imdb_lstm_tuning_1.add(Dense(units=1, activation='sigmoid'))

imdb_lstm_tuning_1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

In [None]:
train_model(model=imdb_lstm_tuning_1)
assess_model(model=imdb_lstm_tuning_1)

# Added a dense layer with 16 units, 
# accuracy decreased marginally by 0.0008.

loss: 0.35350269079208374 accuracy: 0.8611199855804443


In [None]:
imdb_lstm_tuning_2 = get_basic_imdb_lstm_model()

imdb_lstm_tuning_2.add(LSTM(units=32, return_sequences=True))
imdb_lstm_tuning_2.add(LSTM(units=32, return_sequences=True))
imdb_lstm_tuning_2.add(LSTM(units=32))

imdb_lstm_tuning_2.add(Dense(units=1, activation='sigmoid'))

imdb_lstm_tuning_2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

In [None]:
train_model(model=imdb_lstm_tuning_2)
assess_model(model=imdb_lstm_tuning_2)

# Removed the additional dense layer,
# and added an additional LSTM layer with 32 units, 
# accuracy increased by 0.0044 (from the original).

loss: 0.3301428258419037 accuracy: 0.8663600087165833


In [None]:
imdb_lstm_tuning_3 = get_basic_imdb_lstm_model()

imdb_lstm_tuning_3.add(LSTM(units=32, return_sequences=True))
imdb_lstm_tuning_3.add(LSTM(units=32, return_sequences=True))
imdb_lstm_tuning_3.add(LSTM(units=32))

imdb_lstm_tuning_3.add(Dense(units=1, activation='relu'))

imdb_lstm_tuning_3.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

In [None]:
train_model(model=imdb_lstm_tuning_3)
assess_model(model=imdb_lstm_tuning_3)

# Changed the activation function of the dense layer from sigmoid to relu, 
# accuracy decreased by 0.0910 (from the original).

loss: 0.5865328311920166 accuracy: 0.7708799839019775


In [None]:
imdb_lstm_tuning_4 = get_basic_imdb_lstm_model()

imdb_lstm_tuning_4.add(LSTM(units=64, return_sequences=True))
imdb_lstm_tuning_4.add(LSTM(units=32, return_sequences=True))
imdb_lstm_tuning_4.add(LSTM(units=32))

imdb_lstm_tuning_4.add(Dense(units=1, activation='sigmoid'))

imdb_lstm_tuning_4.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

In [None]:
train_model(model=imdb_lstm_tuning_4)
assess_model(model=imdb_lstm_tuning_4)

# Changed the activation function of the dense layer back to sigmoid, 
# and changed the units of the first LSTM layer from 32 --> 64,
# accuracy increased by 0.0112 (from the original).
# This wsa the best performing tuning.

loss: 0.31815269589424133 accuracy: 0.8731600046157837


In [None]:
imdb_lstm_tuning_5 = get_basic_imdb_lstm_model()

imdb_lstm_tuning_5.add(LSTM(units=64, return_sequences=True))
imdb_lstm_tuning_5.add(LSTM(units=64, return_sequences=True))
imdb_lstm_tuning_5.add(LSTM(units=32))

imdb_lstm_tuning_5.add(Dense(units=1, activation='sigmoid'))

imdb_lstm_tuning_5.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

In [None]:
train_model(model=imdb_lstm_tuning_5)
assess_model(model=imdb_lstm_tuning_5)

# Changed the units of the second LSTM layer from 32 --> 64,
# accuracy decreased by 0.3619 (from the original).
# This was the worst performing tuning.

loss: 0.6926045417785645 accuracy: 0.5


In [None]:
imdb_lstm_tuning_6 = get_basic_imdb_lstm_model()

imdb_lstm_tuning_6.add(LSTM(units=64, return_sequences=True))
imdb_lstm_tuning_6.add(LSTM(units=32, return_sequences=True))
imdb_lstm_tuning_6.add(LSTM(units=32))

imdb_lstm_tuning_6.add(Dense(units=16, activation='relu'))
imdb_lstm_tuning_6.add(Dense(units=1, activation='sigmoid'))

imdb_lstm_tuning_6.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

In [None]:
train_model(model=imdb_lstm_tuning_6)
assess_model(model=imdb_lstm_tuning_6)

# Changed the units of the second LSTM layer back to 32,
# and added a dense layer with 16 units,
# accuracy increased by 0.0074 (from the original).

loss: 0.32388657331466675 accuracy: 0.8693600296974182


# Exercise Option #2 - Advanced Difficulty

Set up your own RNN model for the Reuters Classification Problem

Take the model from exercise 1 (imdb_lstm_model) and modify it to classify the [Reuters data](https://keras.io/datasets/#reuters-newswire-topics-classification).

Think about what you are trying to predict in this case, and how you will have to change your model to deal with this.

In [None]:
from tensorflow.keras.datasets import reuters
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Flatten, Dropout

In [None]:
(reuters_x_train, reuters_y_train), (reuters_x_test, reuters_y_test) = reuters.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters.npz


  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


In [None]:
# I have already worked with the reuters dataset in my word vectors tuning notebook, so I am following a lot of the code from there

# determine # of classes

reuters_num_classes = max(reuters_y_train) + 1 # because it starts at 0
print('# of Classes: {}'.format(reuters_num_classes))

# of Classes: 46


In [None]:
# Following the same offset as the imbd dataset as they're both from keras.

reuters_offset = 3
reuters_map = dict((index + reuters_offset, word) for (word, index) in reuters.get_word_index().items())
reuters_map[0] = 'PADDING'
reuters_map[1] = 'START'
reuters_map[2] = 'UNKNOWN'

In [None]:
# Making sure we got the offset right.

' '.join([reuters_map[word_index] for word_index in reuters_x_train[1]])

"START generale de banque sa lt genb br and lt heller overseas corp of chicago have each taken 50 pct stakes in factoring company sa belgo factors generale de banque said in a statement it gave no financial details of the transaction sa belgo factors' turnover in 1986 was 17 5 billion belgian francs reuter 3"

In [None]:
# Evaluate the different lengths of the reviews.

lengths = [len(doc) for doc in list(reuters_x_train) + list(reuters_x_test)]
print('Longest document: {} Shortest document: {}'.format(max(lengths), min(lengths)))

Longest document: 2376 Shortest document: 2


In [None]:
# After trying many different cutoffs, 150 cuts out about 1/3.

reuters_cutoff = 150
print('{} documents out of {} are over {}.'.format(
    sum([1 for length in lengths if length > reuters_cutoff]), 
    len(lengths), 
    reuters_cutoff))

3437 documents out of 11228 are over 150.


In [None]:
from keras.preprocessing import sequence
from keras.utils import to_categorical
from keras.preprocessing.text import Tokenizer

# https://towardsdatascience.com/text-classification-in-keras-part-1-a-simple-reuters-news-classifier-9558d34d01d3
tokenizer = Tokenizer(num_words=reuters_cutoff)
reuters_x_train_reformated = tokenizer.sequences_to_matrix(reuters_x_train, mode='binary')
reuters_x_test_reformated = tokenizer.sequences_to_matrix(reuters_x_test, mode='binary')

# necessary to train the model (one hot encoding)
reuters_y_train_reformatted = to_categorical(reuters_y_train, reuters_num_classes)
reuters_y_test_reformatted = to_categorical(reuters_y_test, reuters_num_classes)

In [120]:
reuters_lstm_model = Sequential()

reuters_lstm_model.add(Embedding(input_dim=len(reuters_map), 
                                output_dim=100, 
                                input_length=reuters_cutoff))

reuters_lstm_model.add(LSTM(units=64, return_sequences=True))
reuters_lstm_model.add(LSTM(units=64, return_sequences=True))
reuters_lstm_model.add(LSTM(units=64))

reuters_lstm_model.add(Dense(512))
reuters_lstm_model.add(Dropout(0.66))
reuters_lstm_model.add(Dense(units=reuters_num_classes, activation='softmax'))

reuters_lstm_model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

In [122]:
train_model(model=reuters_lstm_model, x_train=reuters_x_train_reformated, y_train=reuters_y_train_reformatted, epc=10, batch=64)
assess_model(model=reuters_lstm_model, x_test=reuters_x_test_reformated, y_test=reuters_y_test_reformatted)

# After lots of tuning, this is the highest accuracy I was able to achieve.
# In a previous notebook, I was able to get 0.70 with both a regular NN and and a CNN, so by comparison this is not very good.
# But without tuning, I was getting the accuracy of 0.36, so I am happy.

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
loss: 1.9593857526779175 accuracy: 0.5178094506263733
