<a href="https://colab.research.google.com/github/balyashukla1/NLP/blob/master/Balya_Shukla_Assignment_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2019 The TensorFlow Authors.

### PART 1A

## Setup

### Import TensorFlow and other libraries

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf

import numpy as np
import os
import time

### Download the Shakespeare dataset

Change the following line to run this code on your own data.

In [0]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

### Read the data

First, look in the text:

In [0]:
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

In [0]:
# Take a look at the first 250 characters in text
print(text[:250])

In [0]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

## Process the text

### Vectorize the text

Before training, we need to map strings to a numerical representation. Create two lookup tables: one mapping characters to numbers, and another for numbers to characters.

In [0]:
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

Now we have an integer representation for each character. Notice that we mapped the character as indexes from 0 to `len(unique)`.

In [0]:
print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

In [0]:
# Show how the first 13 characters from the text are mapped to integers
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))

### The prediction task

Given a character, or a sequence of characters, what is the most probable next character? This is the task we're training the model to perform. The input to the model will be a sequence of characters, and we train the model to predict the output—the following character at each time step.

Since RNNs maintain an internal state that depends on the previously seen elements, given all the characters computed until this moment, what is the next character?


### Create training examples and targets

Next divide the text into example sequences. Each input sequence will contain `seq_length` characters from the text.

For each input sequence, the corresponding targets contain the same length of text, except shifted one character to the right.

So break the text into chunks of `seq_length+1`. For example, say `seq_length` is 4 and our text is "Hello". The input sequence would be "Hell", and the target sequence "ello".

To do this first use the `tf.data.Dataset.from_tensor_slices` function to convert the text vector into a stream of character indices.

In [0]:
# The maximum length sentence we want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

for i in char_dataset.take(5):
  print(idx2char[i.numpy()])

The `batch` method lets us easily convert these individual characters to sequences of the desired size.

In [0]:
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

for item in sequences.take(5):
  print(repr(''.join(idx2char[item.numpy()])))

For each sequence, duplicate and shift it to form the input and target text by using the `map` method to apply a simple function to each batch:

In [0]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

Print the first examples input and target values:

In [0]:
for input_example, target_example in  dataset.take(1):
  print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
  print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))

Each index of these vectors are processed as one time step. For the input at time step 0, the model receives the index for "F" and trys to predict the index for "i" as the next character. At the next timestep, it does the same thing but the `RNN` considers the previous step context in addition to the current input character.

In [0]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

### Create training batches

We used `tf.data` to split the text into manageable sequences. But before feeding this data into the model, we need to shuffle the data and pack it into batches.

In [0]:
# Batch size
BATCH_SIZE = 1

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

## Build The Model

In [0]:
# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

In [0]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

## PART 1C : USING LSTM

For Part 1C, I will change the above model by changing the tf.keras.layers.GRU to tf.keras.layers.LSTM

In [0]:
def build_model2(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.LSTM(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

In [0]:
model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

In [0]:
### for lstm model, I will run the following code: 
model2 = build_model2(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

For each character the model looks up the embedding, runs the GRU one timestep with the embedding as input, and applies the dense layer to generate logits predicting the log-likelihood of the next character:

## Try the model

Now run the model to see that it behaves as expected.

First check the shape of the output:

In [0]:
for input_example_batch, target_example_batch in dataset.take(1):
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

In [0]:
## Repeat the step for model 2

In [0]:
for input_example_batch, target_example_batch in dataset.take(1):
  example_batch_predictions = model2(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

In the above example the sequence length of the input is `100` but the model can be run on inputs of any length:

In [0]:
model.summary()

In [0]:
## Repeat the step for model 2

In [0]:
model2.summary()

To get actual predictions from the model we need to sample from the output distribution, to get actual character indices. This distribution is defined by the logits over the character vocabulary.

Note: It is important to _sample_ from this distribution as taking the _argmax_ of the distribution can easily get the model stuck in a loop.

Try it for the first example in the batch:

In [0]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

This gives us, at each timestep, a prediction of the next character index:

In [0]:
sampled_indices

Decode these to see the text predicted by this untrained model:

In [0]:
print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))

## Train the model

At this point the problem can be treated as a standard classification problem. Given the previous RNN state, and the input this time step, predict the class of the next character.

### Attach an optimizer, and a loss function

The standard `tf.keras.losses.sparse_categorical_crossentropy` loss function works in this case because it is applied across the last dimension of the predictions.

Because our model returns logits, we need to set the `from_logits` flag.


In [0]:
def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())

Configure the training procedure using the `tf.keras.Model.compile` method. We'll use `tf.keras.optimizers.Adam` with default arguments and the loss function.

In [0]:
model.compile(optimizer='adam', loss=loss)

In [0]:
## Repeat the step for model 2:

model2.compile(optimizer='adam', loss=loss)

### Configure checkpoints

Use a `tf.keras.callbacks.ModelCheckpoint` to ensure that checkpoints are saved during training:

In [0]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

### Execute the training

To keep training time reasonable, use 10 epochs to train the model. In Colab, set the runtime to GPU for faster training.

In [0]:
EPOCHS=10
steps = 174

In [0]:
history = model.fit(dataset, epochs=EPOCHS,steps_per_epoch = 174, callbacks=[checkpoint_callback])

In [0]:
## Repeat the step with model 2:

history2 = model2.fit(dataset, epochs=EPOCHS, steps_per_epoch= 174, callbacks=[checkpoint_callback])

We can see that the training loss for 10 epoch goes down with the LSTM model is used in comparison to using GRU. The loss for GRU was 2.4160 and the loss for LSTM was 1.8260.  Because input is controled based on weights, LSTM performs better than GRU. 

## Generate text

### Restore the latest checkpoint

To keep this prediction step simple, use a batch size of 1.

Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built.

To run the model with a different `batch_size`, we need to rebuild the model and restore the weights from the checkpoint.


In [0]:
tf.train.latest_checkpoint(checkpoint_dir)

In [0]:
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

#model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([1, None]))

In [0]:
model.summary()

### The prediction loop

The following code block generates the text:

* It Starts by choosing a start string, initializing the RNN state and setting the number of characters to generate.

* Get the prediction distribution of the next character using the start string and the RNN state.

* Then, use a categorical distribution to calculate the index of the predicted character. Use this predicted character as our next input to the model.

* The RNN state returned by the model is fed back into the model so that it now has more context, instead than only one word. After predicting the next word, the modified RNN states are again fed back into the model, which is how it learns as it gets more context from the previously predicted words.


![To generate text the model's output is fed back to the input](images/text_generation_sampling.png)

Looking at the generated text, you'll see the model knows when to capitalize, make paragraphs and imitates a Shakespeare-like writing vocabulary. With the small number of training epochs, it has not yet learned to form coherent sentences.

In [0]:
def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 1000

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 1.0

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the word returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted word as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

In [0]:
print(generate_text(model, start_string=u"ROMEO: "))

The easiest thing you can do to improve the results it to train it for longer (try `EPOCHS=30`).

You can also experiment with a different start string, or try adding another RNN layer to improve the model's accuracy, or adjusting the temperature parameter to generate more or less random predictions.

## PART 1B: Trying with 100 Epochs for both model 1 & 2



In [0]:
model3 = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

In [0]:
model4 = build_model2(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

In [0]:
model3.compile(optimizer='adam', loss=loss)

In [0]:
model4.compile(optimizer='adam', loss=loss)

In [0]:
EPOCH2 = 100

In [0]:
history3 = model3.fit(dataset, epochs=EPOCH2, steps_per_epoch= 174, callbacks=[checkpoint_callback])

Train for 174 steps
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100

In [0]:
history4 = model4.fit(dataset, epochs=EPOCH2, steps_per_epoch = 174, callbacks=[checkpoint_callback])

Train for 174 steps
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100

By adding more epochs to the original, and the LSTM model, we can see that the training loss has reduced significantly. Based on  the results, it is optimal to increase the epoch size to improve the trianing. 

1. The training loss for initial model for 10 epochs: 2.4160
2. The training loss for initial model for 100 epochs: 1.4746
3. The training loss for LSTM model for 10 epochs: 1.8260
4. The training loss for LSTM model for 100 epochs: 1.4284


This reiterates the fact that the LSTM model is doing better than the original model and more epochs do better. 


We will now generate text using the following models: 

1. Initial model with 10 Epochs (model)
2. LSTM model with 10 Epochs (model2)
3. Initial model with 100 Epochs (model3)
4. LST model with 100 Epochs (model4)


In [0]:
## Generating text with initial model with 10 epochs and call it output1:

output1 = generate_text(model, start_string=u"ROMEO: ")
print(output1)

ROMEO: FWyix'x.E'!$
 J
sLtdv.VJ$OhzCSkk SrEt.FQ$&b?I'FVHl$:vIf,I nTtsrIa'Ox,B
JPmOv:NNFNO;xTVWjaTKpQEsT;dzEx,'YZFdk,b RvNi?o,ZQv$hm$pJmI? mNkwfzJE3jxuRUh Ukd 3L3yi-VigxVRRw;GiCMhanxyknx:vPvrvn? Ft'!Ok$dwJIqpJJC.yKVzH,.WdK;x; bO?Yei. ZaGhKboVUNVoBf$VAHXPay'
rma-wuDCxA.z'q&qH!zPfP&&ScPeZEGX?P $!KWytHuiJlPFP-IfYVcdFlUfLeD,FWPJMlx-GR -VovLu:TxiSD$zNtHwCpyHv?ETYzD!3emws.3-TU&JYaSX:tEKSu-SC.FDs.tPJBwNlc,kDKYEArRVcaxwpVXspsJup:FWeI:vHpGaf!vc?aGqD?Z-Gax.NSlUSFW$ $FUUNkXI,sJqz-.
w:yOR ;,UR?D;nNFLaSOgrNjYpvSei;;J:oAG,biVxdpyj'WV,sYq'vU,tXHwvxh 
tV'vWlJbx -H:&
PWKpx&'qNJy,LA'ApR,MinRTiC:giDdo,fzAtGkmtYXAzRBkDsRBgQwYhh?RuCg!R!D-tl,q,B'a:$:nO
MynhqQUaY;KwpWeAY'hxtU:IjkMzboERsbVHBSiEHYbPJ?.:altj
kRwBo:nqFfA;KcXU gcv?Z,w;MbtjAf!vbSlSe-. Gy!kDs.pfOluDEYQhuTU?h3W
P,QHKBmp,G;RHY!kq:iXKqQMFhsllcKBo'M.P-,E;mBatgPBNcgtOBu3AO:ZElUn:HgWW3J
r!Iz$Z,YKlsv
JdUz3VctrMhbt-uSRUilDP!pUAKUVh&dqxUx'bonxwb&&wdvTL.qBitQQUaHWNS
ygmen-hUCgkdlScvvg kgf?ceHlIcDKMIYdjpKbuQOzoK?::,SBwSvANlDcOW,?'wyVX:n&NHxF.:JYhE:CyvHPL-GUlW;

In [0]:
output2 = generate_text(model2, start_string=u"ROMEO: ")
print(output2)

ROMEO: and lay bewime ye'lls with hate
The neigh'd spard out it or jute,
To aplem and the your, you crowest lay me?
Go orice's all ol oor darew, sher citindmm be too I thou marthast
I vionou door bare shal be resvoD the brediner'd my hape;
And seall viod; lovat.
But help'd sander dare us you she was tome, my unillis him.
Sake I waln, I know are I tleed of olds.

LUCHO:
And will the rust may gels.

KING HENRY:

GENCIO:
Our tome all knew so, how some forthe weath,
Wipe aloter desend of him.

KING YEAD I'll show
Dicaruot I she farseris. I wale send.
A, in them, your dother, mave waod'd won:
They ear, 'tire shall boundhergut me bound thrurstrapenter wind.
To God hath deetpertialion for some teiclal and my seevelg.

Dure talking!

NORG VOIR:
You sek! Gettertledy himed in my confess aly vice's at this can.

JESCONTISSBERDNE
Ir dumen,
I me he would traun 'llow'st unver, you? o' the gloody maly!

JotORSS:
Now hear show love no lixe toding, you. Goot by thy zurptele
The vear lut no houth, shI i

In [0]:
output3 = generate_text(model3, start_string=u"ROMEO: ")
print(output3)

ROMEO: good pather.

First Servingman:
No, 'efeed us: for our tend, that is the heads; what handerging mine own comas?

MARIANA:
You one, my comsine, had revonts or way,
A kready ofd should that you seeming woo'd monthou work by meet wo'ld give .
Whe has done cross it, I had would you before gree:
There's a kindly life, so it is their prince awake.
A gests do new well another day, hot so good
Of their peace, his napute no more liphast, what dost not grace?

YORK:
Am, oath, sir; grow nothing the liof mad a cause down the times to thee,
Have I that make you dear fool'd,
Which apter dishongs, but I have it to dine him dream.
Which go better joy, alboth'd the?

AUFIDIUS:
O,-

QUEEN MARGARET:
Nay, betables witing
A oxproofed foul fortensire
No ound were forth this scapeing how.

KING RICHARD III:
A m, report, it more cannot ways one head,--

Nurse:
How comes need of mine you?

CIAULICAUMENLE:
At friar Romeo, faint-neces, than villain!
And stay's him: it the loss,
To you would I plained the 

In [0]:
output4 = generate_text(model4, start_string=u"ROMEO: ")
print(output4)

ROMEO: come, so shall not have here:
I see the gross
To chare the world, a bount head their appet.

DUKE VINCENTIO:
Why, thou should delight how your.

Claudio:
Cay take office come and dreart before his coldley,
As they shall best friend

HEGRMIUS:
Pray, my lord; ever leave and warmons or love
An honourafe, my Lucio, to make must cuust
From first teeming ehbicious.
Your pursuish, if worth shall strong converate,
Or the Volsces that in Angelo be our gross and meet desary.
Know thou, and a diance of her sweetly
And give m: 
Furse:
And kay you make Rode from good lord,--the
must not which is the recomplant more of death curgened
Rome llow his fate of mine. I have warrant.

KING RICHARD II:
If shall tell thee sir, how he may know-wather's again:
And look under hazing.

QUEEN ELIZABETH:
The feigniof wanting to look and propertion,

GREMIO:
And wherein my shapow when I took in his pace.

TRANIO:
Take 'sbanio Angelo, Caliban else,
May grieve the , sweet strays are atlitht,
Andfull to violent

In [0]:
print(model.summary())
print(model2.summary())
print(model3.summary())
print(model4.summary())

Model: "sequential_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_11 (Embedding)     (1, None, 256)            16640     
_________________________________________________________________
gru_7 (GRU)                  (1, None, 1024)           3938304   
_________________________________________________________________
dense_11 (Dense)             (1, None, 65)             66625     
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________
None
Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_9 (Embedding)      (1, None, 256)            16640     
_________________________________________________________________
lstm_3 (LSTM)                (1, None, 1024)           5246976   
________________

## Advanced: Customized Training

The above training procedure is simple, but does not give you much control.

So now that you've seen how to run the model manually let's unpack the training loop, and implement it ourselves. This gives a starting point if, for example, to implement _curriculum learning_ to help stabilize the model's open-loop output.

We will use `tf.GradientTape` to track the gradients. You can learn more about this approach by reading the [eager execution guide](https://www.tensorflow.org/guide/eager).

The procedure works as follows:

* First, initialize the RNN state. We do this by calling the `tf.keras.Model.reset_states` method.

* Next, iterate over the dataset (batch by batch) and calculate the *predictions* associated with each.

* Open a `tf.GradientTape`, and calculate the predictions and loss in that context.

* Calculate the gradients of the loss with respect to the model variables using the `tf.GradientTape.grads` method.

* Finally, take a step downwards by using the optimizer's `tf.train.Optimizer.apply_gradients` method.



In [0]:
model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

In [0]:
optimizer = tf.keras.optimizers.Adam()

In [0]:
@tf.function
def train_step(inp, target):
  with tf.GradientTape() as tape:
    predictions = model(inp)
    loss = tf.reduce_mean(
        tf.keras.losses.sparse_categorical_crossentropy(
            target, predictions, from_logits=True))
  grads = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(grads, model.trainable_variables))

  return loss

In [0]:
# Training step
EPOCHS = 10

for epoch in range(EPOCHS):
  start = time.time()

  # initializing the hidden state at the start of every epoch
  # initally hidden is None
  hidden = model.reset_states()

  for (batch_n, (inp, target)) in enumerate(dataset):
    loss = train_step(inp, target)

    if batch_n % 100 == 0:
      template = 'Epoch {} Batch {} Loss {}'
      print(template.format(epoch+1, batch_n, loss))

  # saving (checkpoint) the model every 5 epochs
  if (epoch + 1) % 5 == 0:
    model.save_weights(checkpoint_prefix.format(epoch=epoch))

  print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
  print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

model.save_weights(checkpoint_prefix.format(epoch=epoch))

## Trying the above code with 30 epochs



In [0]:
model_new = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

In [0]:
optimizer = tf.keras.optimizers.Adam()

In [0]:
@tf.function
def train_step(inp, target):
  with tf.GradientTape() as tape:
    predictions = model_new(inp)
    loss = tf.reduce_mean(
        tf.keras.losses.sparse_categorical_crossentropy(
            target, predictions, from_logits=True))
  grads = tape.gradient(loss, model_new.trainable_variables)
  optimizer.apply_gradients(zip(grads, model_new.trainable_variables))

  return loss

In [0]:
# Training step
EPOCHS_new = 30

for epoch in range(EPOCHS_new):
  start = time.time()

  # initializing the hidden state at the start of every epoch
  # initally hidden is None
  hidden = model_new.reset_states()

  for (batch_n, (inp, target)) in enumerate(dataset):
    loss = train_step(inp, target)

    if batch_n % 100 == 0:
      template = 'Epoch {} Batch {} Loss {}'
      print(template.format(epoch+1, batch_n, loss))

  # saving (checkpoint) the model every 5 epochs
  if (epoch + 1) % 5 == 0:
    model_new.save_weights(checkpoint_prefix.format(epoch=epoch))

  print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
  print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

model_new.save_weights(checkpoint_prefix.format(epoch=epoch))

## Part 2: Model Evaluation

For evaluating the performance of models we can do the following steps: 
1. Find the actual text 
2. Predict the text using the generate text function
3. Find the text similarity between actual and predicted using SequenceMatcher package AND Levenshtein distance


In [0]:
x = text[531:1000].lower() 

In [0]:
print(x)


what authority surfeits on would relieve us: if they
would yield us but the superfluity, while it were
wholesome, we might guess they relieved us humanely;
but they think we are too dear: the leanness that
afflicts us, the object of our misery, is as an
inventory to particularise their abundance; our
sufferance is a gain to them let us revenge this with
our pikes, ere we become rakes: for the gods know i
speak this in hunger for bread, not in thirst for revenge.




In [0]:
y = generate_text(model, start_string=u"what authority surfeits ")

In [0]:
y_2 = generate_text(model2, start_string=u"what authority surfeits ")

In [0]:
y_3 =  generate_text(model3, start_string=u"what authority surfeits ")

In [0]:
y_4 = generate_text(model3, start_string=u"what authority surfeits ")

In [0]:
from difflib import SequenceMatcher

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

In [0]:
sim = similar(x, y)

In [0]:
sim

0.0026791694574681848

In [0]:
sim2 = similar(x, y_2)

In [0]:
sim2

0.020093770931011386

In [0]:
sim3 = similar(x,y_3)

In [0]:
sim3

0.05492297387809779

In [0]:
sim4 = similar(x, y_4)

In [0]:
sim4

0.061620897521768254

In [0]:
!pip install python-Levenshtein

Collecting python-Levenshtein
[?25l  Downloading https://files.pythonhosted.org/packages/42/a9/d1785c85ebf9b7dfacd08938dd028209c34a0ea3b1bcdb895208bd40a67d/python-Levenshtein-0.12.0.tar.gz (48kB)
[K     |██████▊                         | 10kB 21.2MB/s eta 0:00:01[K     |█████████████▌                  | 20kB 1.8MB/s eta 0:00:01[K     |████████████████████▏           | 30kB 2.6MB/s eta 0:00:01[K     |███████████████████████████     | 40kB 1.7MB/s eta 0:00:01[K     |████████████████████████████████| 51kB 1.7MB/s 
Building wheels for collected packages: python-Levenshtein
  Building wheel for python-Levenshtein (setup.py) ... [?25l[?25hdone
  Created wheel for python-Levenshtein: filename=python_Levenshtein-0.12.0-cp36-cp36m-linux_x86_64.whl size=144668 sha256=b7caf23649e4029780fab4b651b4f89abd0c4f2a2b8bfe6a9e92b824e9fd03da
  Stored in directory: /root/.cache/pip/wheels/de/c2/93/660fd5f7559049268ad2dc6d81c4e39e9e36518766eaf7e342
Successfully built python-Levenshtein
Installin

In [0]:
import Levenshtein

In [0]:
ld = Levenshtein.distance(x, y)
ld2 = Levenshtein.distance(x, y_2)
ld3 = Levenshtein.distance(x, y_3)
ld4 = Levenshtein.distance(x, y_4)
print("We can use Levenshtein Distance to compare the models:")
print("For model 1(initial model with 10 epochs), the ld is:", ld)
print("For model 2(LSTM model with 10 epochs), the ld is: ", ld2)
print("For model 3(initial model with 100 epochs), the ld is: ", ld3)
print("For model 4(LSTEM model with 100 epochs), the ld is: ", ld4)


We can use Levenshtein Distance to compare the models:
For model 1(initial model with 10 epochs), the ld is: 899
For model 2(LSTM model with 10 epochs), the ld is:  770
For model 3(initial model with 100 epochs), the ld is:  779
For model 4(LSTEM model with 100 epochs), the ld is:  760


We can use Levenshtein Distance to compare the models:

For model 1(initial model with 10 epochs), the ld is: 899

For model 2(LSTM model with 10 epochs), the ld is:  770

For model 3(initial model with 100 epochs), the ld is:  779

For model 4(LSTEM model with 100 epochs), the ld is:  760

We can see that the LSTM is better than the GRU model, and more epochs are better in comparison to fewer epochs. 