## Module 2.3: Working with LSTMs in Keras (A Review)

We turn to implementing a type of recurrent neural network know as LSTM in the Keras functional API. In this module we will pay attention to:

1. Using the Keras functional API for defining models.
2. Mounting your Google drive to your Colab environment for file interface.
3. Generating synthetic data from a LSTM and sequence seed.

Those students who are comfortable with all these matters might consider skipping ahead.

Note that we will not spend time tuning hyper-parameters: The purpose is to show how different techniques can be implemented in Keras, not to solve particular data science problems as optimally as possible. Obviously, most techniques include hyper-parameters that need to be tuned for optimal performance.

First we import required libraries.

In [1]:
import sys
import numpy

# from google.colab import drive

from keras.models import Sequential
from keras import Model
from keras.optimizers import Adadelta
from keras.layers import Dense,Dropout,LSTM,Input
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

2024-02-27 18:46:30.189195: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


We will have a little fun and try to teach a neural network to write like Lewis Carroll, the author of Alice in Wonderland.

Note, though, that the same technique can be used to model any sequential system, and generate simulations from seeds for such a system. Here the sequence are the characters written by Carroll during Alice in Wonderland, but it could be, for example, an industrial system that evolves in time. In that case, when we generate simulations of the system based on current and recent conditions we simulate the expected evolution of the system - something of great value!

We will use the [Project Gutenburg text file of Alice in Wonderland](https://www.gutenberg.org/files/11/11.txt). But we need to get the file into our colab environment and this takes some work.

First, you need to place the file in your google drive. We will assume that you will place it in a folder called "Mastering Keras Datasets", and that you rename it "Alice.txt". If you don't, you will need to the file path used in the code.

Once you have done that, you will need to mount your google drive in Colab. Run the following code and complete the required authorizations.

Note that you will need to mount your drive every time you use code from this tutorial.

In [3]:
# Note: You will need to mount your drive every time you 
# run code in this tutorial.
# drive.mount('/content/drive')

Now we can load the file using code and prepare the data. We want to work with sequences of 100 characters as input data, and our target will be the next (101st) character.

To keep things simple, we will ignore upper/lower case character distinctions, and cast all alphabetical characters to lower case. To allow our model to work with these characters, we will encode them as integers. We will then normalize them to real numbers between 0 and 1 and add a dimension (we are working with a system with a single feature). Finally we will one-hot encode the target character (see previous module for discussion of one-hot encoding). This is not the only way to handle the data, but it is a simple one.

We will also return the unnormalized and non-reshaped X data, the number of characters found and an integer coding to character dictionary, all for use later.


In [5]:
def load_alice (
    rawTextFile="/Users/apple/Documents/Projects-Python/DeepLearningCode/src/notebook/00-Prerequisites/resources/alice.txt"   
    ):
    # load ascii text and covert to lowercase
    raw_text = open(rawTextFile, encoding='utf-8').read()
    raw_text = raw_text.lower()
    # create mapping of unique chars to integers
    chars = sorted(list(set(raw_text)))
    char_to_int = dict((c, i) for i, c in enumerate(chars))
    int_to_char = dict((i, c) for i, c in enumerate(chars))
    # summarize the loaded data
    n_chars = len(raw_text)
    n_vocab = len(chars)
    print ("Total Characters: ", n_chars)
    print ("Total Vocab: ", n_vocab)
    # prepare the dataset of input to output pairs encoded as integers
    seq_length = 100
    dataX = []
    dataY = []
    for i in range(0, n_chars - seq_length, 1):
    	seq_in = raw_text[i:i + seq_length]
    	seq_out = raw_text[i + seq_length]
    	dataX.append([char_to_int[char] for char in seq_in])
    	dataY.append(char_to_int[seq_out])
    n_patterns = len(dataX)
    print ("Total Patterns: ", n_patterns)
    # reshape X to be [samples, time steps, features]
    X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
    # normalize
    X = X / float(n_vocab)
    # one hot encode the output variable
    Y = np_utils.to_categorical(dataY)
    return X,Y,dataX,n_vocab,int_to_char

Now lets load the data. X and Y are the input and target label datasets we will use in training. X_ is the un-reshaped X data for use later.

In [6]:
X,Y,X_,n_vocab,int_to_char = load_alice()

Total Characters:  1326
Total Vocab:  37
Total Patterns:  1226


You can play around below to look at the shape of the resulting X and Y arrays, as well as their contents. But they are no longer understandable character strings.

In [7]:
# Play around here to look at data characteristics

Now we define our LSTM using the Keras function API. We are going to make use of LSTM layers, and add a dropout layer for regularization.

We will pass the data to the model defining function so that we can read input and output dimensions of it, rather than hard coding them.

For comparison, a second version of the function is included showing how to use the sequential approach.

In [8]:
def get_model (X,Y):
    # define the LSTM model
    inputs=Input(shape=(X.shape[1],X.shape[2]),name="Input")
    lstm1=LSTM(256, input_shape=(100,1),return_sequences=True)(inputs)
    drop1=Dropout(0.2)(lstm1)
    lstm2=LSTM(256)(drop1)
    drop2=Dropout(0.2)(lstm2)
    outputs=Dense(Y.shape[1], activation='softmax')(drop2)
    model=Model(inputs=inputs,outputs=outputs)
    return model

def get_model_sequential (X,Y):
    # define the LSTM model
    model = Sequential()
    model.add(LSTM(256, input_shape=(X.shape[1],X.shape[2]),return_sequences=True))
    model.add(Dropout(0.2))
    model.add(LSTM(256))
    model.add(Dropout(0.2))
    model.add(Dense(Y.shape[1], activation='softmax'))
    return model

We get our model.

In [9]:
model=get_model(X,Y)

2024-02-27 19:16:46.534231: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
	 [[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2024-02-27 19:16:46.541861: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
	 [[{{node gradients/split_grad/concat/split/split_dim}}]]
2024-02-27 19:16:46.546137: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You mus

Now we will define an optimizer and compile it. If you are unfamiliar with the different types of optimizers available in keras, I suggest you read the keras documentation [here](https://keras.io/optimizers/) and play around training the model with different alternatives.

In [10]:
opt=Adadelta()


And we compile our model with the optimizer ready for training. We use categorical crossentropy as our loss function as this is a good default choice for working with a multi-class categorical target variable (i.e. the next character labels).

In [11]:
model.compile(optimizer=opt,
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Now we will make a function to fit the model. We will not do this very professionally (it is just a fun project), and so will not use any validation data. Rather, we will just run the training for a number of epoches - by default 20, though you can change this.

We will, though, use a ModelCheckpoint callback to save the best performing weights and load these into the model and the conclusion of the training. Note that training performance should normally improve with more epoches, so this is unlikely to improve performance. What we really want is to be able to load the best weights without having to redo the training process (see below)

If you want to, you are encouraged to alter the code in this tutorial to work with a training and validation set, and adjust the fit function below to incorporate an EarlyStopping callback based on performance on the validation data.

We have two one LSTM layer, we are dealing with sequences of length 100. So if we 'unroll' it, we have a network of 200 LSTM layers. And inside these layers are infact multiple internal layers setting up the LSTM architecture! So this is actually a pretty big network, and training will take some time (about 200 hours on the free Colab environment for 200 epochs). This is probably too much to conveniently run yourself.

Here we have an example of how we could train it on Colab. Colab will eventually time out. The best thing to do is to save our weights file to our google drive, so we can load it at leisure later and resume training. This is what we will do. Remember that if you didn't use the default name for your folder in your google drive you should change the path string in the code.

In real life, you will also often want to save the state of the optimizer (so that it keeps its current learning rate, etc). You can do this by accessing and saving model.optimizer.get_state(). It is left as an exercise to implement this.

*It is not expected that you train the network using this function - see below to load trained weights from your google drive.*

In [17]:
def fit_model (model,X,Y,epochs=100):
    # define the checkpoint callback
    filepath="/Users/apple/Documents/Projects-Python/DeepLearningCode/src/notebook/00-Prerequisites/resources/alice_best_weights.hdf5" 
    checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, 
                                 save_best_only=True, mode='min')
    callbacks_list = [checkpoint]
    # fit the model
    model.fit(X, Y, epochs=epochs, batch_size=128, callbacks=callbacks_list)
    # load the best weights
    model.load_weights(filepath)
    # return the final model
    return model


We would then fit (train) the model by calling the above function.

*It is not expected that you train the network using this function - see below to load trained weights from your google drive.*

In [18]:
model=fit_model(model,X,Y,10)

Epoch 1/10
Epoch 1: loss improved from inf to 3.59870, saving model to /Users/apple/Documents/Projects-Python/DeepLearningCode/src/notebook/00-Prerequisites/resources/alice_best_weights.hdf5
Epoch 2/10
Epoch 2: loss did not improve from 3.59870
Epoch 3/10
Epoch 3: loss improved from 3.59870 to 3.59842, saving model to /Users/apple/Documents/Projects-Python/DeepLearningCode/src/notebook/00-Prerequisites/resources/alice_best_weights.hdf5
Epoch 4/10
Epoch 4: loss improved from 3.59842 to 3.59818, saving model to /Users/apple/Documents/Projects-Python/DeepLearningCode/src/notebook/00-Prerequisites/resources/alice_best_weights.hdf5
Epoch 5/10
Epoch 5: loss improved from 3.59818 to 3.59799, saving model to /Users/apple/Documents/Projects-Python/DeepLearningCode/src/notebook/00-Prerequisites/resources/alice_best_weights.hdf5
Epoch 6/10
Epoch 6: loss improved from 3.59799 to 3.59689, saving model to /Users/apple/Documents/Projects-Python/DeepLearningCode/src/notebook/00-Prerequisites/resources

Here we will load saved weights. You can use the "alice_best_weights.hdf5" file that comes with the course - just place it in the same folder as the "alice.txt" file in your google drive. This file has been trained for 200 epoches, and gets a loss around 1.16.

If you train the network yourself, the best weights will be saved as "alice_best_weights.hdf5" in the same location as above. You can therefore use the same code in both cases.

In all cases remember to change the filepath if you are not using the default folder name.

If you are resuming this tutorial here in a new session, you should re-mount your Google drive using the earlier code, re-load the data, and then run this code block to load the weights into a new model. 

If you want to train the model further, you will need to compile it with an optimizer.

In [19]:
model=get_model(X,Y)
filepath="/Users/apple/Documents/Projects-Python/DeepLearningCode/src/notebook/00-Prerequisites/resources/alice_best_weights.hdf5"
model.load_weights(filepath)

2024-02-27 21:00:58.678035: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
	 [[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2024-02-27 21:00:58.681878: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
	 [[{{node gradients/split_grad/concat/split/split_dim}}]]
2024-02-27 21:00:58.685079: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You mus

Now we can see if our network has mastered the art of writing like Lewis Carroll! Let's write a function to let us see, and then call it.

In [20]:
def write_like_Lewis_Carroll(model,X_,n_vocab,int_to_char):
  # pick a random seed...
  start = numpy.random.randint(0, len(X_)-1)
  # ... in order to decide which X datum to use to start
  pattern = X_[start]

  print ("Seed:")
  print ("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
  # generate characters
  for i in range(1000):
    # We transform the integer mapping of the characters to
    # real numbers suitable for input into our model.
    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x/float(n_vocab)
    # We use the model to estimate the probability distribution for
    # the next character
    prediction = model.predict(x, verbose=0)
    # We choose as the next character whichever the model thinks is most likely
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    # We add the integer to our pattern... 
    pattern.append(index)
    # ... and drop the earliest integer from our pattern.
    pattern = pattern[1:len(pattern)]
  print ("\nDone.")

In [21]:
write_like_Lewis_Carroll(model,X_,n_vocab,int_to_char)

Seed:
" that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it "


2024-02-27 21:01:05.741250: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
	 [[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2024-02-27 21:01:05.746850: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
	 [[{{node gradients/split_grad/concat/split/split_dim}}]]
2024-02-27 21:01:05.749277: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You mus

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

If you run the above a few times, you will see that we have had some success - though we are still a long way from a good Alice in Wonderland simulator!

Here is an extract from one simulation I ran:

*'i should hit tere things,' said the caterpillar.*

*'well, perhaps you may bean the same siings tuertion,' the duchess said to the gryphon.*

*'what i cen the thing,' said the caterpillar.*

*'well, perhaps you may bean the same siings tuertion,' the mock turtle seplied,*

*'that i man the mice,' said the caterpillar.*

We have got to the point of basic sentence structure, quotations for speech, plausible characters given the context, etc. There remains misspellings, and occasional punctuation errors, and other issues. (And this was a good selection.) 

In fact, you should be able to do much better. Trying with 500 time points (predicting the 501st character from the preceeding 500) and using a three layer LSTM will lead to major improvements. So would using more training data (multiple Lewis Carole books). You can see the performance achieved on a Shakespeare simulator [here](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). 

If you have time, consider it an exercise to try to improve this implementation to that level - but be warned, the suggested changes would lead to training time being about 7 times longer for the same number of epochs, and of course more epoches would be required as it would be a more complex model. Since it would have taken 100+ hours on the Colab environment (which disconnects after a time limit) this is really only an exercise for those with access to a powerful local environment. 