## Natural Language Processing

### Character Sequence Prediction on Shakespere Dataset

**Importing Library**

In [40]:
import tensorflow as tf

import numpy as np
import os
import time

Download the shakespeare dataset

In [41]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

### Reading the data
First, look in the text:


In [42]:
# Read, then decode for py2 compat.

text = open(path_to_file, 'rb').read().decode(encoding = 'utf-8')

# length of text is the number of characters in it

print('length of text: {} characters'.format(len(text)))

length of text: 1115394 characters


In [43]:
# Take a look at the first 250 characters

print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



In [44]:
# the unique characters in the file ( Vocab )
vocab = sorted(set(text))
print('{} unique characters..'.format(len(vocab)))

65 unique characters..


### Process the Text
<br>

**Vectorize the text**

In [45]:
# creating a mapping from unique characters to indices

char2idx = {u:i for i,u in enumerate(vocab)}
idx2char = np.array(vocab)

text_to_int = np.array([char2idx[c] for c in text])

In [46]:
print('{')
for char,_ in zip(char2idx, range(20)):
  print('     {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ....\n')

{
     '\n':   0,
     ' ' :   1,
     '!' :   2,
     '$' :   3,
     '&' :   4,
     "'" :   5,
     ',' :   6,
     '-' :   7,
     '.' :   8,
     '3' :   9,
     ':' :  10,
     ';' :  11,
     '?' :  12,
     'A' :  13,
     'B' :  14,
     'C' :  15,
     'D' :  16,
     'E' :  17,
     'F' :  18,
     'G' :  19,
  ....



In [47]:
print('{} --------------> characters mapped to int -------------> {}'.format(repr(text[:13]), text_to_int[:13]))

'First Citizen' --------------> characters mapped to int -------------> [18 47 56 57 58  1 15 47 58 47 64 43 52]


## The prediction task:

Next divide the text into example sequences. Each input sequence will contain seq_length characters from the text.

For each input sequence, the corresponding targets contain the same length of text, except shifted one character to the right.

So break the text into chunks of seq_length+1. For example, say seq_length is 4 and our text is "Hello". The input sequence would be "Hell", and the target sequence "ello".

In [48]:
# the maximum length sentence you want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)

# creating training examples/ targets 
char_dataset = tf.data.Dataset.from_tensor_slices(text_to_int)

for i in char_dataset.take(5):
  print(idx2char[i.numpy()])

F
i
r
s
t


The `batch` method lets us easily convert these individual characters to sequences of the desired size.

In [49]:
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

for item in sequences.take(5):
  print(repr(''.join(idx2char[item.numpy()])))

'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'


In [50]:
def split_input_target(chunk):
  input_text = chunk[:-1]
  target_text = chunk[1:]
  return input_text, target_text

dataset = sequences.map(split_input_target)

Print the first example input and target values:

In [51]:
for input_example, target_example in dataset.take(1):
  print('input data: ', repr(''.join(idx2char[input_example.numpy()])))
  print('target data: ', repr(''.join(idx2char[target_example.numpy()])))

input data:  'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
target data:  'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '


Each index of these vectors is processed as a one time step. For the input at time step 0, the model receives the index for "F" and tries to predict the index for "i" as the next character. At the next timestep, it does the same thing but the RNN considers the previous step context in addition to the current input character.

In [52]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
  print("Step {:4d}".format(i))
  print("Input: {}  ({})".format(input_idx, repr(idx2char[input_idx])))
  print("Target: {}  ({})".format(target_idx, repr(idx2char[target_idx])))
  print()

Step    0
Input: 18  ('F')
Target: 47  ('i')

Step    1
Input: 47  ('i')
Target: 56  ('r')

Step    2
Input: 56  ('r')
Target: 57  ('s')

Step    3
Input: 57  ('s')
Target: 58  ('t')

Step    4
Input: 58  ('t')
Target: 1  (' ')



### Creating Training Batch

You used `tf.data` to split the text into managable sequences. But before feeding this data into the model, you need to shuffle the data and pack it into batches.

In [53]:
len(dataset)

11043

In [54]:
# Batch size
BATCH_SIZE = 64

# buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences, So it doesnt attempt to shuffle the entire sequence in memory. Instead,
#  it maintains a buffer in which it shuffles elements).

BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder = True)

dataset

<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

## Build The Model

Use `tf.keras.Sequential` to define the model. For this simple example three layers are used to define our model:

* `tf.keras.layers.Embedding`: The input layer. A trainable lookup table that will map the numbers of each character to a vector with `embedding_dim` dimensions;
* `tf.keras.layers.GRU`: A type of RNN with size `units=rnn_units` (You can also use an LSTM layer here.)
* `tf.keras.layers.Dense`: The output layer, with `vocab_size` outputs.

In [55]:
vocab_size = len(vocab)

# the embedding dimensions
embedding_dim = 256

# number of RNN units
rnn_units = 1024

In [56]:
def build_model(vocab_size, embedding_dim,rnn_units, batch_size):
  model = tf.keras.Sequential([
                               tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]),
                               tf.keras.layers.GRU(rnn_units, return_sequences=True, stateful = True, recurrent_initializer='glorot_uniform'),
                               tf.keras.layers.Dense(vocab_size)
  ])
  return model

In [57]:
model = build_model(
    vocab_size, embedding_dim, rnn_units, BATCH_SIZE
)



### Trying the model


In [19]:
for input_example_batch, target_example_batch in dataset.take(1):
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, '# (batch_size, sequence_length, vocab_size)')

(64, 100, 65) # (batch_size, sequence_length, vocab_size)


> **NOTE** : In the above example the sequence length of the input is  100 but the model can be run on inputs of any length.

In [20]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (64, None, 256)           16640     
_________________________________________________________________
gru (GRU)                    (64, None, 1024)          3938304   
_________________________________________________________________
dense (Dense)                (64, None, 65)            66625     
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________


TO get the actual prediction from the model you need to sample from the output distrubution to get actual character indices. This distribution is defined bt the logits over the character vocabulary.

**NOTE**: It is important to sample from the distribution as taking the argmax of the distribution can easily get the model stuck in a loop.

In [21]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples = 1)
sampled_indices = tf.squeeze(sampled_indices, axis = -1).numpy()
sampled_indices

array([63, 64, 48, 54, 49, 48, 29, 64, 42,  6, 35, 47, 27, 55,  3, 28, 33,
       17, 36, 17, 41, 54, 10, 34, 12, 44, 47,  2, 16, 49, 53, 63, 28, 58,
        6, 60, 48, 38, 57, 45, 29,  2, 14, 64, 35,  9,  8, 16, 39, 12, 49,
       52,  3, 61, 18, 21,  8, 52, 39,  9, 28, 45, 53, 53,  7, 47, 30, 46,
       52, 24, 54,  4, 62,  7, 21, 40, 31, 39, 18, 46,  4, 12,  7, 48, 61,
       15, 63, 10, 40, 40, 43, 37, 40, 27, 20, 22, 11, 48, 46, 26])

This above output, gives us at each timestamp, a prediction of the next character index.
Decoding them further ...

In [22]:
print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next char prediction: \n", repr("".join(idx2char[sampled_indices])))

Input: 
 'nt to disinherit him,\nWhich argued thee a most unloving father.\nUnreasonable creatures feed their yo'

Next char prediction: 
 'yzjpkjQzd,WiOq$PUEXEcp:V?fi!DkoyPt,vjZsgQ!BzW3.Da?kn$wFI.na3Pgoo-iRhnLp&x-IbSaFh&?-jwCy:bbeYbOHJ;jhN'


At this point the model is not at all trained, so we need to first train the model

## Train the model

### Attach an optimizer, and loss function:

The standard `tf.keras.losses.sparse_categorical_crossentropy` loss function works in this case because it is applied across the last dimensions of the predictions.

because your model return logits, you need to set the 'from_logits' flag.

In [24]:
def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True )

example_batch_loss = loss(target_example_batch, example_batch_predictions)
print("prediction shape: ",example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ",example_batch_loss.numpy().mean())

prediction shape:  (64, 100, 65) # (batch_size, sequence_length, vocab_size)
scalar_loss:       4.173478


## Compiling the model:

In [25]:
model.compile(optimizer='adam', loss = loss)

**configure the checkpoints**

use `tf.keras.callbacks.ModelCheckpoint`

In [28]:
# directory where the checkpoint will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoints files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath = checkpoint_prefix,
    save_weights_only = True
)

In [29]:
EPOCHS = 10

In [31]:
history = model.fit(dataset ,epochs = EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [32]:
history = model.fit(dataset ,epochs = EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [33]:
history = model.fit(dataset ,epochs = EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [34]:
history = model.fit(dataset ,epochs = EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Generate the Text

To keep this prediction step simple, use a batch size of 1.

Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built.

To run the model with a different batch_size, you need to rebuild the model and restore the weights from the checkpoint.

In [58]:
tf.train.latest_checkpoint(checkpoint_dir)

'./training_checkpoints/ckpt_10'

In [59]:
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size = 1)
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))

In [60]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (1, None, 256)            16640     
_________________________________________________________________
gru_3 (GRU)                  (1, None, 1024)           3938304   
_________________________________________________________________
dense_3 (Dense)              (1, None, 65)             66625     
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________


In [64]:
def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 1000

  # converting out start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # empty string to store our results
  text_generated = []

  # Low temprature results in more predictable text
  # High temprature resuls in more surprising text
  # Experiment to find the best setting.
  temprature = 1.0

  # Here batch_size == 1
  model.reset_states()
  for i in range(num_generate):
    predictions = model(input_eval)
    #remove the batch dimension
    predictions  = tf.squeeze(predictions, 0)

    # using a categorical distribution to predict the character returned by the model
    predictions = predictions/temprature
    predicted_id = tf.random.categorical(predictions, num_samples = 1)[-1,0].numpy()

    #Pass the predicted character as the next input to the model
    # along with the previous hidden state
    input_eval = tf.expand_dims([predicted_id], 0)

    text_generated.append(idx2char[predicted_id])

  return (start_string+ ''.join(text_generated))

In [65]:
print(generate_text(model, start_string=u"ROMEO: "))

ROMEO: I dare not: but I for it was a thousand furlood of the child, why dost thou say'st he topsays here?

ANGELO:
I was; whilst, as I say?
I charge you, cenvein to the duke;
Of Warwick are in readiness, mine eyes within.

JULIET:
I will forget, nor any ot thy father's head.

NORTHUMBERLAND:
Reed their choice:
Your honour natures me most fortunate though
That Edward father, this your companyour lady hath,
For then you know not what will hencenarr-look upon you: thou
I had eaten spotes me for this night
To see the slander of Gloucester, and how he doth attach and babe:
There lies to learn the ot of far absent:
'Tis well, some two days since, were worthy ofference
Nor do I'll fell this propos awhite:
Thou hadst choice in hope: he told me, did; there attact I did free thee.

AUTOLYCUS:
Ha, ha!
This lamy and like to live with such a treech,
as enemies; we thou hadst, out of this prest
And say 'thar no man never come?
For this was ever man so much with sighs that joy
And in their hearts we

## Advanced Customs:

## Advanced: Customized Training

The above training procedure is simple, but does not give you much control.

So now that you've seen how to run the model manually let's unpack the training loop, and implement it ourselves. This gives a starting point if, for example, you want to implement _curriculum learning_ to help stabilize the model's open-loop output.

Use `tf.GradientTape` to track the gradients. You can learn more about this approach by reading the [eager execution guide](https://www.tensorflow.org/guide/eager).

The procedure works as follows:

* First, reset the RNN state. You do this by calling the `tf.keras.Model.reset_states` method.

* Next, iterate over the dataset (batch by batch) and calculate the *predictions* associated with each.

* Open a `tf.GradientTape`, and calculate the predictions and loss in that context.

* Calculate the gradients of the loss with respect to the model variables using the `tf.GradientTape.grads` method.

* Finally, take a step downwards by using the optimizer's `tf.train.Optimizer.apply_gradients` method.


In [66]:
model = build_model(
    vocab_size,
    embedding_dim,
    rnn_units,
    BATCH_SIZE
)

In [67]:
optimizer = tf.keras.optimizers.Adam()

In [70]:
@tf.function
def train_step(inp, target):
  with tf.GradientTape() as tape:
    predictions = model(inp)
    loss = tf.reduce_mean(
        tf.keras.losses.sparse_categorical_crossentropy(target, predictions, from_logits=True)
    )
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
  return loss

In [71]:
# Training step 

EPOCHS = 10

for epoch in range(EPOCHS):
  start = time.time()

  # resetting the hidden state at the start of every epoch
  model.reset_states()

  for (batch_n, (inp, target)) in enumerate(dataset):
    loss = train_step(inp, target)

    if batch_n% 100 == 0:
      template = "Epoch {} Batch {} Loss {}"
      print(template.format(epoch+1, batch_n, loss))

  #   SAVING (checkpoint) the model every 5 epochs
  if (epoch+1)%5 == 0:
    model.save_weights(checkpoint_prefix.format(epoch=epoch))

  print('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
  print('Time taken for 1 epoch {} sec \n'.format(time.time() - start))

model.save_weights(checkpoint_prefix.format(epoch = epoch)) 

Epoch 1 Batch 0 Loss 4.174774169921875
Epoch 1 Batch 100 Loss 2.3391036987304688
Epoch 1 Loss 2.1404
Time taken for 1 epoch 8.661991119384766 sec 

Epoch 2 Batch 0 Loss 2.1284615993499756
Epoch 2 Batch 100 Loss 1.9192508459091187
Epoch 2 Loss 1.7835
Time taken for 1 epoch 7.47285008430481 sec 

Epoch 3 Batch 0 Loss 1.7496970891952515
Epoch 3 Batch 100 Loss 1.6624188423156738
Epoch 3 Loss 1.6038
Time taken for 1 epoch 7.5009870529174805 sec 

Epoch 4 Batch 0 Loss 1.5730340480804443
Epoch 4 Batch 100 Loss 1.5703729391098022
Epoch 4 Loss 1.4782
Time taken for 1 epoch 7.445864915847778 sec 

Epoch 5 Batch 0 Loss 1.4458417892456055
Epoch 5 Batch 100 Loss 1.3901872634887695
Epoch 5 Loss 1.4395
Time taken for 1 epoch 7.483918190002441 sec 

Epoch 6 Batch 0 Loss 1.3959174156188965
Epoch 6 Batch 100 Loss 1.4176056385040283
Epoch 6 Loss 1.3659
Time taken for 1 epoch 7.545116186141968 sec 

Epoch 7 Batch 0 Loss 1.3281382322311401
Epoch 7 Batch 100 Loss 1.326464056968689
Epoch 7 Loss 1.3464
Time t