# LSTM HyperParam Tuning: Stacked biLSTM(100) with variational dropout

In this notebook, I used a Colab TPU to fine-tune hyperparameters of our LSTM network, based off research in sequence classification using LSTM networks [Reimers, 2017](https://arxiv.org/abs/1707.06799). The research showed that using word embeddings, 2 stacked bidirectional hidden layers, 64-200 nodes and recurrent dropout was the best structure for a deep LSTM network for sequence classification. In this notebook I will implement this architcture and see how it performs on the YelpZIP dataset.

First I will import necessary libraries and connect to Google Cloud bucket, the TPU and Tensorboard.

In [0]:
import os
import pprint
import tensorflow as tf
import re
import pandas as pd
import numpy as np
import random
import subprocess
import gzip
import string
import sys
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Embedding, LSTM
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
from tensorflow.keras import optimizers
from tensorflow.keras import layers
import codecs
from tensorboardcolab import *
import shutil

In [0]:
#clean out the directory
tbc=TensorBoardColab(startup_waiting_time=10)

if 'COLAB_TPU_ADDR' not in os.environ:
  print('ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!')
else:
  tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
  print ('TPU address is', tpu_address)

  with tf.Session(tpu_address) as session:
    devices = session.list_devices()
    
  print('TPU devices:')
  pprint.pprint(devices)

OUTPUT_DIR = 'lstm-finetuning'#@param {type:"string"}
#@markdown Whether or not to clear/delete the directory and create a new one
DO_DELETE = False #@param {type:"boolean"}
#@markdown Set USE_BUCKET and BUCKET if you want to (optionally) store model output on GCP bucket.
USE_BUCKET = True #@param {type:"boolean"}
BUCKET = 'lucas0' #@param {type:"string"}

if USE_BUCKET:
  OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET, OUTPUT_DIR)
  from google.colab import auth
  auth.authenticate_user()

if DO_DELETE:
  try:
    tf.gfile.DeleteRecursively(OUTPUT_DIR)
  except:
    # Doesn't matter if the directory didn't exist
    pass
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))
os.mkdir('./Graph')


Wait for 10 seconds...
TensorBoard link:
https://4348060c.ngrok.io
TPU address is grpc://10.119.99.34:8470
TPU devices:
[_DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 11798704124795778594),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 6325436278654816791),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 17327105947192632281),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 8193999110742063840),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 13551189176767611107),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 4452687537149654827),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 10157693069563783220),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 1122971557887119691),
 _DeviceAttributes(/job:tpu_w

We then download the YelpZIP dataset from our Google Cloud Bucket, and parse it into a dictionary.

In [0]:
def download_and_load_dataset(force_download=False):
  dataset = tf.keras.utils.get_file(
      fname="yelpZIP.txt", 
      origin="https://storage.googleapis.com/lucas0/yelpZIP.txt", 
      extract=False)
  dfile = open(dataset).readlines()
  return dfile

reviews = download_and_load_dataset()

data = {}
data['review'] = []
data['deceptive'] = []

for x in reviews:
  x = eval(x)
  data['review'].append(x[0])
  data['deceptive'].append(1 if x[1] else 0)

dataDict = pd.DataFrame.from_dict(data)

I then fit a Tokenizer on the dataset and pad the sequences to be the same length.

In [0]:
from tensorflow.keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer()
tokenizer.fit_on_texts(data['review'])

In [0]:
corpus_words = tokenizer.word_index
corpus_vocab_size = len(corpus_words)+1

In [0]:
from keras.preprocessing.sequence import pad_sequences

predictors_sequences = pad_sequences(tokenizer.texts_to_sequences(data['review']))
max_len = max([len(x) for x in predictors_sequences])

Then I downloaded the Google pretrained word2vec from our bucket and created the embeddings matrix.

In [0]:
import gensim
tf.keras.utils.get_file(
      fname="GoogleNews-vectors-negative300.bin", 
      origin="https://storage.googleapis.com/lucas0/GoogleNews-vectors-negative300.bin", 
      extract=False)

'/root/.keras/datasets/GoogleNews-vectors-negative300.bin'

In [0]:
word_vectors = gensim.models.KeyedVectors.load_word2vec_format("~/.keras/datasets/GoogleNews-vectors-negative300.bin", binary=True)

In [0]:
embedding_length = word_vectors.vector_size
embedding_matrix = np.zeros((corpus_vocab_size, embedding_length))
for word, index in corpus_words.items():
  if word in word_vectors.vocab:
    embedding_matrix[index] = np.array(word_vectors[word], dtype=np.float32)

Now I create the first model according to the reccomended network archiecture. Variational dropout is implemented in Keras by using the recurrent_dropout parameter of the LSTM layer. When using a TPU, a cross-shard optimizer must be used to split the training over the 8 TPU cores. Then the keras TPU contrib module must be used to convert the Keras model to a TF TPU model.

In [0]:
def get_lstm_wv_model(load_checkpoint=False):
  model = Sequential([
        Embedding(corpus_vocab_size, embedding_length, weights=[embedding_matrix], input_length=max_len, trainable=False),
        keras.layers.Bidirectional(LSTM(100, recurrent_dropout=0.2, return_sequences=True)),
        keras.layers.Bidirectional(LSTM(100, recurrent_dropout=0.2)),
        Dense(1, activation='sigmoid')
  ])
  from keras.utils import plot_model
  plot_model(model, to_file='model.png', show_shapes=True)
  from IPython.display import Image
  Image(filename='model.png')
  if load_checkpoint:
    model.load_weights('./Mar-24-all-01-0.6957.hdf5')
  train_op = tf.train.AdamOptimizer()
  optimizer = tf.contrib.tpu.CrossShardOptimizer(train_op)
  model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
  return tf.contrib.tpu.keras_to_tpu_model(
    model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
        tf.contrib.cluster_resolver.TPUClusterResolver(tpu_address)))

I then define the cross-validate function that Stefan wrote, create a checkpoint and early stopping callback.

In [0]:
from sklearn.model_selection import train_test_split, StratifiedKFold
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import TensorBoard


def run_cross_validate(get_model, X, y, cv=5, categorical=False,
                       add_target_dim=False, verbose=1, epochs=12, batch_size = 32, validation_split=0.3, shuffle=True):
  
  skfSplitter = StratifiedKFold(n_splits=cv, shuffle=shuffle)
  
  metrics = {
    "accuracies": [],
  }
    
  for train_indices, test_indices in skfSplitter.split(X, y):
    training_X = np.array([X[x] for x in train_indices])
    training_y = np.array([y[x] for x in train_indices])
    x_train, x_valid, y_train, y_valid = train_test_split(training_X, training_y, test_size=0.3, shuffle= True)
    test_X = np.array([X[x] for x in test_indices])
    test_y = np.array([y[x] for x in test_indices])
    
    if categorical:
      training_y = to_categorical(training_y)
      test_y = to_categorical(test_y)
    if add_target_dim:
      training_y = np.array([[y] for y in training_y])
      test_y = np.array([[y] for y in test_y])
    
    model = get_model()
    print("Fitting with: ", np.array(x_train).shape, "labels",
          np.array(y_train).shape)
    filepath="Mar-24-all-{epoch:02d}-{val_acc:.4f}.hdf5"
    checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
    early_stopping = EarlyStopping(monitor='val_acc', patience=8)
    sess = tf.Session()
    train_writer = tbc.get_writer();
    train_writer.add_graph(sess.graph)
    
    model.fit(np.array(x_train), y_train, epochs=epochs, batch_size=batch_size,
              validation_data=(x_valid, y_valid), verbose=verbose,
              callbacks=[checkpoint, early_stopping], shuffle=shuffle)
    metrics["accuracies"].append(model.evaluate(np.array(test_X), test_y)[1])
  return metrics

Now we train the model. I use a batch size of 64, 20 epochs and 5 folds. 

Note: The experiment did not complete in the 12 allotted hours on Colab, so I loaded in the checkpointed model and started the training again for another 2 folds. The validation accuracy was 0.6665 +- 0.035, with a val_loss of .62.

In [0]:
# from tensorflow.contrib.tpu.python.tpu import tpu_function
# tpu_function.get_tpu_context().set_number_of_shards(8)

rnn_wv_scores = run_cross_validate(get_lstm_wv_model, predictors_sequences, data['deceptive'], cv=5, verbose=1, epochs=20, batch_size=64, shuffle=True)
print(rnn_wv_scores)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
INFO:tensorflow:Querying Tensorflow master (grpc://10.14.208.42:8470) for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 1403723184091569382)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 10463534745533730844)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 17881866803021330038)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 3534345446672985863)
INFO:te

I was surprised by how bad this LSTM performed. It seems like without user features, our LSTM accuracy is capped at around 70%. I just decided to try the same architcture with only one hidden layer, in case it was too deep to get a worthwhile result or something.

In [0]:
def get_lstm_wv_model(load_checkpoint=False):
  model = Sequential([
        Embedding(corpus_vocab_size, embedding_length, weights=[embedding_matrix], input_length=max_len, trainable=False),
        keras.layers.Bidirectional(LSTM(64, recurrent_dropout=0.2)),
        Dense(1, activation='sigmoid')
  ])
  from keras.utils import plot_model
  plot_model(model, to_file='model.png', show_shapes=True)
  from IPython.display import Image
  Image(filename='model.png')
  if load_checkpoint:
    model.load_weights('./Mar-24-all-01-0.6957.hdf5')
  train_op = tf.train.AdamOptimizer()
  optimizer = tf.contrib.tpu.CrossShardOptimizer(train_op)
  model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
  return tf.contrib.tpu.keras_to_tpu_model(
    model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
        tf.contrib.cluster_resolver.TPUClusterResolver(tpu_address)))

In [0]:
rnn_wv_scores = run_cross_validate(get_lstm_wv_model, predictors_sequences, data['deceptive'], cv=5, verbose=1, epochs=40, batch_size=32, shuffle=True)
print(rnn_wv_scores)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.

For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

INFO:tensorflow:Querying Tensorflow master (grpc://10.119.99.34:8470) for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 11798704124795778594)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 6325436278654816791)
INFO:tensorflow:*** Available Device: _DeviceAtt

This time it appears as though a different error occurred, due to the file being changed. However, the trend during training showed that this network would also not surpass a 70% validation accuracy. More work is needed, such as concatenating user features.
