<a href="https://colab.research.google.com/github/098Steve/Jupyter/blob/main/RNNExample.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Building Character-level Language Models in Keras**

In this exercise we will explore the simple use case of building a character level language model, much like the auto-correct model we see on word processor applications for many devices. However there will be a difference. We will train our RNN to derive a language model from Shakespeare's Hamlet. Our network will take a sequence of character's from Shakespeare's Hamlet as input and iteratively compute the probability distribution of the next characer to come in the sequence. Let's make some imports and load in the necessary packages.

In [None]:
import tensorflow as tf
from tensorflow import keras
from keras import callbacks, layers, models

In [None]:
print(tf.__version__)

In [None]:
# Import dependencies
import sys
import numpy as np
import re
import random
import pickle

from nltk.corpus import gutenberg


In [None]:
from keras.models import Sequential
from keras.layers import Dense, Bidirectional, Dropout
from keras.layers import SimpleRNN, GRU, BatchNormalization

In [None]:
from keras.callbacks import ModelCheckpoint
from keras.callbacks import LambdaCallback
from keras.callbacks import ModelCheckpoint
#from keras.utils.data_utils import get_file

In [None]:
import nltk
nltk.download('gutenberg')

[nltk_data] Downloading package gutenberg to /root/nltk_data...
[nltk_data]   Unzipping corpora/gutenberg.zip.


True

We will use the Natural Language Toolkit (NLTK) package in Python to import and preprocess the play, which can be found in the **gutenberg** corpus.

In [None]:
from nltk.corpus import gutenberg
hamlet = gutenberg.words ('shakespeare-hamlet.txt')

In [None]:
hamlet[1:100]

In [None]:
text = ' '
#For each word
for word in hamlet:
  #Convert to lower case and add to a string variable
  text += str(word).lower()
  text += ' '
print('Corpus length, Hamlet only:', len(text))

Corpus length, Hamlet only: 166766


The string variable (text) now contains the entire sequence of characters that make up the play Hamlet. Now we will proceed and create a vocabulary, or dictionary of characters, for mapping each character to a specific integer. We will create two versions of our dictionary: one with characters mapped to indices, and the other with indices mapped to characters. This is just for the sake of practicality, as we will need both lists for reference.


In [None]:
characters = sorted(list(set(text)))
print('Total characters:', len(characters))
char_indices = dict((l,i) for i, l in enumerate(characters))
indices_char = dict((i,l) for i, l in enumerate(characters))

Total characters: 43


Let's look at char_indices and indices_char

In [None]:
char_indices

In [None]:
indices_char

Break text into:
Features - Character-level sequences of fixed length,
Labels - The next character in sequence

In [None]:
#Empty list to collect each sequence
training_sequences = []
#Empty list to collect next character in sequence
next_chars = []
#Define length of each input sequence & stride
seq_len, stride = 35,1
#Loop over text with window of 35 characters, moving 1 stride at a time
for i in range (0, len(text)- seq_len, stride):
    training_sequences.append(text[i:i+seq_len])
    next_chars.append(text[i+seq_len])


Let's print out sequences and labels to verify

In [None]:
#Print out sequences and labels to verify
print('Number of sequences:', len(training_sequences))
print('First sequence:', training_sequences[:1])
print('Next characters in sequence:', next_chars[:1])
print('Second sequences:', training_sequences[1:2])
print('Next characters in sequence:', next_chars[1:2]),

Next we will vectorise the training data,  For each character in each sequence we will create a vector of length equal to the number of individual characters in the text.  The vector will be a one hot encoded structure will will show the character in its relevant position in the character list.  The character will be represented by 1 and all other vector elements will be 0

In [None]:
# create a matrix of zeros
#with dimensions:
#(training sequences, length of each sequence, total unique characters)
x= np.zeros((len(training_sequences), seq_len, len(characters)), dtype=bool)
y= np.zeros((len(training_sequences),  len(characters)), dtype=bool)

In [None]:
#Iterate over training sequences
for index, sequence in enumerate(training_sequences):
    #iterate over characters per sequence
    for sub_index, chars in enumerate(sequence):
      #Update character position in feature matrix to 1
      x[index, sub_index, char_indices[chars]] =1
      #Update character position in label matrix to 1
      y[index, char_indices[next_chars[index]]] =1
print('Data vectorisation completed,')
print('Feature vectors shape',x.shape)
print('Label vectors shape',y.shape)

Data vectorisation completed,
Feature vectors shape (166731, 35, 43)
Label vectors shape (166731, 43)


Some characters appear more often than others in language. A feature space can be created corresponding to the statistical distribution  of characters over time. The RNN will construct a unique feature space of probability distributions. These are represented by weights and continuously change at successive time steps during the training process. Softmax is a mathematical function that converts a vector of numbers into a vector of probabilities, where the probabilities of each value are proportional to the relative scale of each value in the vector. Sampling is used to select the next character from the probability distributions for posible characters to come. There are different methods of sampling. Greedy sampling is when you choose the character with the highest probability distribution. Controlled randomness (or stochasticity) can be introduced by picking out the next character in a probabilistic manner, rather than a fixed one.

**Stochastic sampling** One approach could be to reweight the probability distribution of these output values at a given time step. In this manner, we can systematically introduce a little randomness which can be useful in generative modelling. We will implement the controlled introduction of randomness in our sampling strategy by introducing a sampling threshold, which lets us redistribute the Softmax prediction probabilities of our model.

In [None]:
def sample(softmax_predictions, sample_threshold):
  #Make array of predictions, convert to float
  softmax_preds = np.asarray(softmax_predictions).astype('float64')
  #Log normalise and divide by threshold
  log_preds=np.log(softmax_preds)/sample_threshold
  #print("log_preds=", log_preds)
  #Compute exponential values of log normalized terms
  exp_preds = np.exp(log_preds)
  #Normalize predictions
  norm_preds = exp_preds/np.sum(exp_preds)
  #Draw sample from multinomial distribution
  prob = np.random.multinomial(1,norm_preds, 1)
  #Return max value
  return np.argmax(prob)

The threshold denotes the entropy of the probability distribution we will use to sample a given generation.  A higher threshold will correpond to higher entropy distributions, leading to unreal and unstructured sequences. Lower thresholds, on the other hand, will plainly encode English language representations and morphology, generating familiar words and terms,

Now we have our training data preprocessed and ready in tensor format, we will experiment with various RNN architectures.

Let's create some callbacks. Callbacks are a class of functions that allow operations to be performed on our model during the training process. Essentially this function, which will be used in a callback, will take a random sequence of characters from the Hamlet text and then generate 400 characters to follow on starting from the given input.  

In [None]:
def on_epoch_end(epoch, _):
  global model, model_name
  epoch=epoch+1
  print('------Generating text after Epoch: %d' %epoch)
  #Random index position to start sample input sequence
  start_index = random.randint(0, len(text) - seq_len-1)
  #End of sequence, corresponding to training sequence length
  end_index = start_index + seq_len
  #Set up some sampling entropy thresholds
  #sampling_range = [0.3, 0.5, 0.7, 1.0, 1.2]
  #We will just use 0.3 but others could be used for experimentation
  sampling_range = [0.3]
  for threshold in sampling_range:
    print('\n-----*Sampling Threshold*:', threshold)
    generated =' '
    #Take random input sentence
    sentence = text[start_index: end_index]
    #Add it to 'generated'
    generated += sentence
    print('Input sentence to generate from:', sentence)
    #Print out buffer
    sys.stdout.write(generated)
    count=35
    #Generate next 400 characters in the sequence
    for i in range(400):
      #Set up matrixfor prediction, initialise with zeros
      x_pred = np.zeros((1, seq_len, len(characters)))
      for n, char in enumerate(sentence):
        x_pred[0,n, char_indices[char]] = 1
      #Make prediction on input vector
      preds = model.predict(x_pred, verbose=0)[0]
      #Get index position of next character using sample function
      next_index = sample(preds, threshold)
      #Get next character using index
      next_char = indices_char[next_index]
      #Add generated character to sequence
      generated += next_char
      sentence = sentence[1:] + next_char
      sys.stdout.write(next_char)
      count += 1
      if(count>120 and next_char ==' '):
        count=0
        sys.stdout.write('\n')
        sys.stdout.flush()




In [None]:
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

Next we build a helper function that will train, sample, and save a list of RNN models.

In [None]:

def test_models(list, epochs=10):
  global model, model_name
  for network in list:
    print('Initiating compilation ....')
    #Inialise model
    model = network()
    #Get model name
    model_name = re.split(' ',str(network))[1]
    #Filepath to save model with name, epoch and loss
    filepath="output %s_epoch-{epoch:02d}-loss-{loss:.4f}.keras"%model_name
    print("Filepath =", filepath)
    #Checkpoint callback object
    checkpoint = ModelCheckpoint(filepath, monitor = 'loss', verbose = 0,
                                 save_best_only=True, mode='min')
    #Compile model
    model.compile(loss='categorical_crossentropy', optimizer ='adam')
    print('Compiled:', str(model_name))
    #Initiate training
    network = model.fit(x,y,batch_size=100, epochs=epochs,
                        callbacks=[print_callback, checkpoint])
    #Print model configuration
    model.summary()
    #Save model history object for later analysis
    with open("history %s.pkl"%model_name, 'wb') as file_pi:
          pickle.dump(network.history, file_pi)



Now we will construct several types of RNNs and training them with the helper function to see how different types on RNNs perform at generating Shakespeare-like texts.

**Building a SimpleRNN** The SimpleRNN model in Keras is a basic RNN layer. While it has many parameters, most of them are set with excellent defaults that will get you by for many different use cases.

In [None]:
def SimpleRNN_model():
  model = Sequential()
  model.add(SimpleRNN(128, input_shape=(seq_len, len(characters))))
  model.add(Dense(len(characters), activation='softmax'))
  return model

In [None]:
def SimpleRNN_stacked_model():
  model = Sequential()
  model.add(SimpleRNN(128, input_shape=(seq_len, len(characters)),
                      return_sequences=True))
  model.add(SimpleRNN(128))
  model.add(Dense(len(characters), activation='softmax'))
  return model


"return_sequences =True" means that the activations of all previous timesteps are input into the next layer. If 'return_sequences' was set to false only the activation weights of the previous time would get passed to the next layer

**Building GRUs**
Now we will build a GRU


In [None]:
def GRU_stacked_model():
  model = Sequential()
  model.add(GRU(128, input_shape=(seq_len, len(characters)),
                return_sequences=True))
  model.add(GRU(128))
  model.add(Dense(len(characters), activation='softmax'))
  return model

**Building a bi-directional GRU** Next we will build a bi-directional GRU which allows the model to learn from previous and future events. We will next the GRU within a bi-directional layer, and feed our model each sequence in both the normal and reverse order.

In [None]:
def Bi_directional_GRU():
  model = Sequential()
  model.add(Bidirectional(GRU(128, return_sequences=True),
                          input_shape=(seq_len, len(characters))))
  model.add(Bidirectional(GRU(128)))
  model.add(Dense(len(characters), activation='softmax'))
  return model

**Implementing Recurrent Dropout** Dropout is used to randomly drop neurons to better distribute representations over our network and avoid the problem of overfitting. Adding a normal dropout layer doesn't work for RNN and it introduces too much randomness. However the notion of applying the same dropout scheme (or mask) at each time step seems to work. This is one of the most significant techniques that helps overfitting in recurrent layers and is known as a **recurrent dropout strategy**.

In [None]:
def larger_GRU():
  model = Sequential()
  model.add(GRU(128, input_shape=(seq_len, len(characters)),dropout=0.2,
                recurrent_dropout=0.2,return_sequences=True))
  model.add(GRU(128, dropout=0.2,recurrent_dropout=0.2))
  model.add(Dense(128, activation ='relu'))
  model.add(Dense(len(characters), activation='softmax'))
  return model

Now we'll set up some variables to hold our models. First a variable that is a list of all the models we have built.

In [None]:
#All defined models put as a list - this list could be passed to "test_models"
#to test all the models in sequence
#However it will take too long to train alls model so we wont use this now
all_models = [SimpleRNN_model,
              SimpleRNN_stacked_model,
              GRU_stacked_model,
              Bi_directional_GRU,
               larger_GRU]

Now we set up five lists each consisting of one model.  This will be better for our initial tests. We can run the models one at a time.

In [None]:
#Set up "lists" of models.  Here there is one model in each list

modelone = [SimpleRNN_model]
modeltwo = [SimpleRNN_stacked_model]
modelthree =[GRU_stacked_model]
modelfour= [Bi_directional_GRU]
modelfive = [larger_GRU]

Now let's test the first model. There is timing code around the call the test the model which we may wish to use at some point

In [None]:
import time

start = time.time()

test_models(modelfive,epochs=10)

print(f'Time: {time.time() - start}')

In [None]:
#To look at the network history -some data was dumped there. Note you have to use the name of the model you've run.
with open("//content//history SimpleRNN_model.pkl", 'rb') as file_pi:
         nh= pickle.load(file_pi)
nh

If you have time, you can vary the threshold, and you can try some of the other models.  Some of the more complicated models (SimpleRNN_stacked_model, GRU_stacked_model, Bi_directional_GRU,larger_GRU) take a long time to train though.


The main and most important feature of RNN is Hidden state, which remembers some information about a sequence. Recurrent Neural Neworks are a popular algorithm algorithm for sequential data like time series, speech, text, financial data, audio, video, and weather.