### Word level Text Generation.

In this notebook we are going to learn how to efficiently build an input pipeline for word level text generation in `tf`. First we need to download the dataset [here](https://s3.amazonaws.com/text-datasets/nietzsche.txt). I've already downloaded the dataset and loaded it to my google drive so that we can load it easy in google colab. We are going to follow the following steps:

1. Read the file line by line and split words into tokens
2. We will generate a tuple including word sequence input X maping to word sequence output Y
3. We wil then use keras API `TextVectorization` to:
  * preprocess text
  * convert the words into integer represantation
  * prepare training sets from pairs
  * Optmize the data pipeline.

### Definition
From the last notebooks we have leant how to generate text character by character. A word level text generation will then generate the text word by word. After training, the Language Model learns to generate a conditional probability distribution over the vocabulary of words according to the given input sequence.


### Steps
* **Step 1:** we provide **a sequence of words** to the Language Model as input
* **Step 2:** the Language Model outputs **a conditional probability distribution** over the **vocabulary**
* **Step 3:** we **sample** a word from the distribution
* **Step 4:** we **concatenate** the newly sampled word to the ***generated text***
* **Step 4:** **a new input sequence** is genareted by appending the newly sampled word


### Imports



In [1]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

import os, re, string, time

tf.__version__


'2.6.0'

### Mounting the Drive and paths

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
file_path = '/content/drive/My Drive/NLP Data/text-gen/nietzsche.txt'
os.path.exists(file_path)

True

### Data Loading

In [4]:
raw_data = tf.data.TextLineDataset([file_path])

Checking some examples

In [5]:
for el in raw_data.take(10):
  print(el)

tf.Tensor(b'PREFACE', shape=(), dtype=string)
tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'SUPPOSING that Truth is a woman--what then? Is there not ground', shape=(), dtype=string)
tf.Tensor(b'for suspecting that all philosophers, in so far as they have been', shape=(), dtype=string)
tf.Tensor(b'dogmatists, have failed to understand women--that the terrible', shape=(), dtype=string)
tf.Tensor(b'seriousness and clumsy importunity with which they have usually paid', shape=(), dtype=string)
tf.Tensor(b'their addresses to Truth, have been unskilled and unseemly methods for', shape=(), dtype=string)
tf.Tensor(b'winning a woman? Certainly she has never allowed herself to be won; and', shape=(), dtype=string)
tf.Tensor(b'at present every kind of dogma stands with sad and discouraged mien--IF,', shape=(), dtype=string)


### Tokenization

In [6]:
import en_core_web_sm
en_tokenizer = en_core_web_sm.load()

tokenize = lambda x: tf.strings.split(x)

tokenize("I love machine leaning.")

<tf.Tensor: shape=(4,), dtype=string, numpy=array([b'I', b'love', b'machine', b'leaning.'], dtype=object)>

In [7]:
raw_dataset = raw_data.map(tokenize)
for el in raw_dataset.take(10):
  print(el)

tf.Tensor([b'PREFACE'], shape=(1,), dtype=string)
tf.Tensor([], shape=(0,), dtype=string)
tf.Tensor([], shape=(0,), dtype=string)
tf.Tensor(
[b'SUPPOSING' b'that' b'Truth' b'is' b'a' b'woman--what' b'then?' b'Is'
 b'there' b'not' b'ground'], shape=(11,), dtype=string)
tf.Tensor(
[b'for' b'suspecting' b'that' b'all' b'philosophers,' b'in' b'so' b'far'
 b'as' b'they' b'have' b'been'], shape=(12,), dtype=string)
tf.Tensor(
[b'dogmatists,' b'have' b'failed' b'to' b'understand' b'women--that'
 b'the' b'terrible'], shape=(8,), dtype=string)
tf.Tensor(
[b'seriousness' b'and' b'clumsy' b'importunity' b'with' b'which' b'they'
 b'have' b'usually' b'paid'], shape=(10,), dtype=string)
tf.Tensor(
[b'their' b'addresses' b'to' b'Truth,' b'have' b'been' b'unskilled' b'and'
 b'unseemly' b'methods' b'for'], shape=(11,), dtype=string)
tf.Tensor(
[b'winning' b'a' b'woman?' b'Certainly' b'she' b'has' b'never' b'allowed'
 b'herself' b'to' b'be' b'won;' b'and'], shape=(13,), dtype=string)
tf.Tensor(
[b'at' b

Futher preprocessing

In [8]:
raw_dataset = raw_dataset.flat_map(
    lambda x: tf.data.Dataset.from_tensor_slices(x)
)
for el in raw_dataset.take(5):
  print(el.numpy())

b'PREFACE'
b'SUPPOSING'
b'that'
b'Truth'
b'is'


### Vocalbulary size

In [9]:
vocab_size = len(set(
    raw_dataset.as_numpy_iterator()
))

print("> vocab size: ",vocab_size )

> vocab size:  18809


### Generating `X` and `y` tuples.

We can split the text into two datasets  as below:
* The first dataset (**X**) is  the **input data** to the model which will hold fixed-size word sequences (*partial sentences*) 
* The second dataset (**y**) is  the **output data** which has only  one-word samples (*next word*)

To create these datasets (**X** ***input sequence of words*** & **y** ***next word***), we can apply `tf.data.Dataset.window()` transformation.

First, define the size of the input sequence: How many words will be in the input?


In [10]:
input_sequence_size = 4

Then, apply the `window()` transformation such that each window will have `input_sequence_size+1` words (|X|+|y|)

In [11]:
sequence_dataset= raw_dataset.window(input_sequence_size+1, drop_remainder=True)
for window in sequence_dataset.take(10):
  print(list(window.as_numpy_iterator()))


[b'PREFACE', b'SUPPOSING', b'that', b'Truth', b'is']
[b'a', b'woman--what', b'then?', b'Is', b'there']
[b'not', b'ground', b'for', b'suspecting', b'that']
[b'all', b'philosophers,', b'in', b'so', b'far']
[b'as', b'they', b'have', b'been', b'dogmatists,']
[b'have', b'failed', b'to', b'understand', b'women--that']
[b'the', b'terrible', b'seriousness', b'and', b'clumsy']
[b'importunity', b'with', b'which', b'they', b'have']
[b'usually', b'paid', b'their', b'addresses', b'to']
[b'Truth,', b'have', b'been', b'unskilled', b'and']


But we just want a regular dataset containing tensors: {[1,2,3,4,5],[6,7,8,9,10],...}, where [...] represents a tensor. The `flat_map()` method returns all the tensors in a nested dataset, after transforming each nested dataset. 
 
If we didn't batch, we would get: {1,2,3,4,5,6,7,8,9,10,...}. By batching each window to its full size, we get {[1,2,3,4,5],[6,7,8,9,10],...} as we desired.


In [12]:
sequence_dataset = sequence_dataset.flat_map(lambda window: window.batch(5))
for ele in sequence_dataset.take(10):
  print(ele)

tf.Tensor([b'PREFACE' b'SUPPOSING' b'that' b'Truth' b'is'], shape=(5,), dtype=string)
tf.Tensor([b'a' b'woman--what' b'then?' b'Is' b'there'], shape=(5,), dtype=string)
tf.Tensor([b'not' b'ground' b'for' b'suspecting' b'that'], shape=(5,), dtype=string)
tf.Tensor([b'all' b'philosophers,' b'in' b'so' b'far'], shape=(5,), dtype=string)
tf.Tensor([b'as' b'they' b'have' b'been' b'dogmatists,'], shape=(5,), dtype=string)
tf.Tensor([b'have' b'failed' b'to' b'understand' b'women--that'], shape=(5,), dtype=string)
tf.Tensor([b'the' b'terrible' b'seriousness' b'and' b'clumsy'], shape=(5,), dtype=string)
tf.Tensor([b'importunity' b'with' b'which' b'they' b'have'], shape=(5,), dtype=string)
tf.Tensor([b'usually' b'paid' b'their' b'addresses' b'to'], shape=(5,), dtype=string)
tf.Tensor([b'Truth,' b'have' b'been' b'unskilled' b'and'], shape=(5,), dtype=string)


Now each item in the dataset is a tensor, so we can split it into **X** & **y** datasets:

In [13]:
sequence_dataset = sequence_dataset.map(lambda window: (window[:-1], window[-1:]))

X_train_ds_raw = sequence_dataset.map(lambda X,y: X)
y_train_ds_raw = sequence_dataset.map(lambda X,y: y)


Let's see some input-output pairs:

In [14]:
from prettytable import PrettyTable

In [15]:
def tabulate(column_names, data):
  table = PrettyTable(column_names)
  table.title= "Input output pairs"
  for row in data:
    table.add_row(row)
  print(table)

column_names = ["X", "y"]
row_data = []

for X, y in zip(X_train_ds_raw.take(10),y_train_ds_raw.take(10)):
  row_data.append([X.numpy(), y.numpy()])

tabulate(column_names, row_data)  


+---------------------------------------------------------------+
|                       Input output pairs                      |
+--------------------------------------------+------------------+
|                     X                      |        y         |
+--------------------------------------------+------------------+
| [b'PREFACE' b'SUPPOSING' b'that' b'Truth'] |     [b'is']      |
|    [b'a' b'woman--what' b'then?' b'Is']    |    [b'there']    |
|  [b'not' b'ground' b'for' b'suspecting']   |    [b'that']     |
|   [b'all' b'philosophers,' b'in' b'so']    |     [b'far']     |
|      [b'as' b'they' b'have' b'been']       | [b'dogmatists,'] |
|  [b'have' b'failed' b'to' b'understand']   | [b'women--that'] |
| [b'the' b'terrible' b'seriousness' b'and'] |   [b'clumsy']    |
| [b'importunity' b'with' b'which' b'they']  |    [b'have']     |
| [b'usually' b'paid' b'their' b'addresses'] |     [b'to']      |
|  [b'Truth,' b'have' b'been' b'unskilled']  |     [b'and']     |
+---------

### Reshaping `X` dataset.

Input (X) is **a vector of strings** but we need to convert it to **a  string vector** so that we can vectorize it properly.

Below is a python function for iterating the given tensor to join all the strings into a single string:


In [16]:
def convert_string(X: tf.Tensor)->str:
  str1 = ""
  for ele in X:
    str1 += ele.numpy().decode('utf-8')+ " "

  return tf.convert_to_tensor(str1[:-1])
  

We will apply the `convert_string` function to every element of `X_train_ds_raw`

**Note that** to use a ***python function*** as a mapping function, you need to apply `tf.py_function()`. 

In [17]:
X_train_ds_raw = X_train_ds_raw.map(
    lambda x: tf.py_function(func=convert_string,
     inp=[x], Tout=tf.string)
)


In [18]:
column_names = ["X (sequence)", "y (next word)"]
row_data = []

for X, y in zip(X_train_ds_raw.take(10), y_train_ds_raw.take(10)):
  row_data.append([X.numpy(), y.numpy()])

tabulate(column_names, row_data)   

+----------------------------------------------------+
|                 Input output pairs                 |
+---------------------------------+------------------+
|           X (sequence)          |  y (next word)   |
+---------------------------------+------------------+
| b'PREFACE SUPPOSING that Truth' |     [b'is']      |
|    b'a woman--what then? Is'    |    [b'there']    |
|   b'not ground for suspecting'  |    [b'that']     |
|    b'all philosophers, in so'   |     [b'far']     |
|       b'as they have been'      | [b'dogmatists,'] |
|   b'have failed to understand'  | [b'women--that'] |
| b'the terrible seriousness and' |   [b'clumsy']    |
|  b'importunity with which they' |    [b'have']     |
| b'usually paid their addresses' |     [b'to']      |
|  b'Truth, have been unskilled'  |     [b'and']     |
+---------------------------------+------------------+


However, the shape of X is unknown

In [19]:
print(X_train_ds_raw.element_spec, y_train_ds_raw.element_spec)

TensorSpec(shape=<unknown>, dtype=tf.string, name=None) TensorSpec(shape=(None,), dtype=tf.string, name=None)


To fix this, we can explicitly set the shape with another transformation:

In [20]:
X_train_ds_raw=X_train_ds_raw.map(lambda x: tf.reshape(x, [1]))

In [21]:
X_train_ds_raw.element_spec, y_train_ds_raw.element_spec

(TensorSpec(shape=(1,), dtype=tf.string, name=None),
 TensorSpec(shape=(None,), dtype=tf.string, name=None))

### Text Preprocessing

### What are the preprocessing steps?
The processing of each sample contains the following steps:

* **standardize** each sample (usually lowercasing + punctuation stripping): 

* **split** each sample into substrings (usually words):

  As in this part, we aim at splitting the text into **fixed-size word sequences**, we ***do not*** need to use a **custom split function**.

* **recombine** substrings into tokens (usually ngrams):
  We will leave it as 1 ngram (word)

* **index tokens** (associate a unique int value with each token)

* **transform** each sample using this index, either into a vector of ints or a dense float vector.


In [22]:
def custom_standardization(input_data):
  lowercase     = tf.strings.lower(input_data)
  stripped_html = tf.strings.regex_replace(lowercase, "<br />", " ")
  stripped_num  = tf.strings.regex_replace(stripped_html, "[\d-]", " ")
  stripped_punc  =tf.strings.regex_replace(stripped_num, 
                            "[%s]" % re.escape(string.punctuation), "")    
  return stripped_punc

### Text vectorization params


* We can limit the number of distinct words by setting `max_features`
* We set an explicit `sequence_length`, since our  model needs **fixed-size** input sequences.


In [23]:
max_features = 60000           # Number of distinct words in the vocabulary  
sequence_length = input_sequence_size            # Input sequence size
batch_size = 128                # Batch size 


### Create the text vectorization layer

* The **text vectorization layer** is initialized below. 
* We are using this layer to normalize, split, and map strings to integers, so we set our 'output_mode' to '**int**'.


In [24]:
vectorize_layer = TextVectorization(
    standardize=custom_standardization,
    max_tokens=max_features,
    # split --> DEFAULT: split each sample into substrings (usually words)
    output_mode="int",
    output_sequence_length=sequence_length,
)

### Adapt the Text Vectorization layer to the train dataset

Now that the **Text Vectorization layer** has been created, we can call `adapt` on a text-only dataset to create the vocabulary with indexing. 

We don't have to batch, but for very large datasets this means you're not keeping spare copies of the dataset in memory.

In [25]:
vectorize_layer.adapt(raw_dataset.batch(batch_size))

In [26]:
print("The size of the vocabulary (number of distinct words): ", vectorize_layer.vocabulary_size())

The size of the vocabulary (number of distinct words):  9903


Let's see the first 10 entries in the vocabulary:

In [27]:
print("The first 10 entries: ", vectorize_layer.get_vocabulary()[:10])

The first 10 entries:  ['', '[UNK]', 'the', 'of', 'and', 'to', 'in', 'is', 'a', 'that']


After preparing the **Text Vectorization layer**,  we need a helper function to **convert a given raw text to a Tensor** by using this layer:

In [28]:
def vectorize_text(text):
  text = tf.expand_dims(text, -1)
  return tf.squeeze(vectorize_layer(text))

### Apply the **Text Vectorization** onto X and y datasets

In [29]:
for elem in X_train_ds_raw.take(3):
  print("X: ",elem.numpy())

X:  [b'PREFACE SUPPOSING that Truth']
X:  [b'a woman--what then? Is']
X:  [b'not ground for suspecting']


In [30]:
# Vectorize the data.
X_train_ds = X_train_ds_raw.map(vectorize_text)
y_train_ds = y_train_ds_raw.map(vectorize_text)

In [31]:
column_names = ["X (sequence)", "y (next word)"]
row_data = []

for X, y in zip(X_train_ds.take(10), y_train_ds.take(10)):
  row_data.append([X.numpy(), y.numpy()])

tabulate(column_names, row_data)  

+-----------------------------------------------+
|               Input output pairs              |
+-----------------------+-----------------------+
|      X (sequence)     |     y (next word)     |
+-----------------------+-----------------------+
| [4041  576    9  119] |       [7 0 0 0]       |
|   [  8 147  41 143]   |     [40  0  0  0]     |
| [  15 1083   12 5783] |       [9 0 0 0]       |
|   [ 18 160   6  38]   |   [121   0   0   0]   |
|     [11 30 27 59]     | [2543    0    0    0] |
| [  27 4596    5  263] |   [610   9   0   0]   |
|   [  2 575 701   4]   | [1107    0    0    0] |
| [7812   16   13   30] |     [27  0  0  0]     |
| [ 796 3004   33 5117] |       [5 0 0 0]       |
| [ 119   27   59 5414] |       [4 0 0 0]       |
+-----------------------+-----------------------+


### Convert **y** to a single char representation

Notice that even we want **y** to be a **single word**, after the text vectorization, it becomes **a vector of integers** as well!

We need to fix this.

In [32]:
for elem in y_train_ds.take(2):
  print("shape: ", elem.shape, "\n next_char: ",elem.numpy())

shape:  (4,) 
 next_char:  [7 0 0 0]
shape:  (4,) 
 next_char:  [40  0  0  0]


We can solve this by simply selecting the first element of the array only.

In [33]:
y_train_ds=y_train_ds.map(lambda x: x[:1])

Now it's as expected.

In [34]:
for elem in y_train_ds.take(2):
  print("shape: ", elem.shape, "\n next_char: ",elem.numpy())

shape:  (1,) 
 next_char:  [7]
shape:  (1,) 
 next_char:  [40]


In [35]:
column_names = ["X (sequence)", "y (next word)"]
row_data = []

for X, y in zip(X_train_ds.take(10), y_train_ds.take(10)):
  row_data.append([X.numpy(), y.numpy()])

tabulate(column_names, row_data)  

+---------------------------------------+
|           Input output pairs          |
+-----------------------+---------------+
|      X (sequence)     | y (next word) |
+-----------------------+---------------+
| [4041  576    9  119] |      [7]      |
|   [  8 147  41 143]   |      [40]     |
| [  15 1083   12 5783] |      [9]      |
|   [ 18 160   6  38]   |     [121]     |
|     [11 30 27 59]     |     [2543]    |
| [  27 4596    5  263] |     [610]     |
|   [  2 575 701   4]   |     [1107]    |
| [7812   16   13   30] |      [27]     |
| [ 796 3004   33 5117] |      [5]      |
| [ 119   27   59 5414] |      [4]      |
+-----------------------+---------------+


### Finilizing the data pipeline.

We want to join the inpu (X) and the output (y) to be a single dataset. However after transfomation the shape will become unknown.

In [36]:
train_ds =  tf.data.Dataset.zip((X_train_ds,y_train_ds))
train_ds.element_spec

(TensorSpec(shape=<unknown>, dtype=tf.int64, name=None),
 TensorSpec(shape=<unknown>, dtype=tf.int64, name=None))

To fix this we can apply another transformation to set the shapes explicitly.

In [37]:
def _fixup_shape(X, y):
   X.set_shape([4])
   y.set_shape([1])
   return X, y

In [38]:
train_ds = train_ds.map(_fixup_shape)
train_ds.element_spec

(TensorSpec(shape=(4,), dtype=tf.int64, name=None),
 TensorSpec(shape=(1,), dtype=tf.int64, name=None))

Everything is now looking okay.

In [39]:
for el in train_ds.take(5):
  print(el)

(<tf.Tensor: shape=(4,), dtype=int64, numpy=array([4041,  576,    9,  119])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([7])>)
(<tf.Tensor: shape=(4,), dtype=int64, numpy=array([  8, 147,  41, 143])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([40])>)
(<tf.Tensor: shape=(4,), dtype=int64, numpy=array([  15, 1083,   12, 5783])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([9])>)
(<tf.Tensor: shape=(4,), dtype=int64, numpy=array([ 18, 160,   6,  38])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([121])>)
(<tf.Tensor: shape=(4,), dtype=int64, numpy=array([11, 30, 27, 59])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([2543])>)


### Data pipeline optimization.

In [40]:
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.shuffle(buffer_size=512).batch(batch_size, drop_remainder=True).cache().prefetch(buffer_size=AUTOTUNE)


Checking the shape of our datasets

In [41]:
train_ds.element_spec

(TensorSpec(shape=(128, 4), dtype=tf.int64, name=None),
 TensorSpec(shape=(128, 1), dtype=tf.int64, name=None))

### Basic `LSTM` model.

In [42]:
embedding_dim = 16 

In [43]:
inputs = tf.keras.Input(
    shape=(sequence_length, ),
    dtype="int32"
)
x = keras.layers.Embedding(max_features, embedding_dim)(inputs)
x = keras.layers.LSTM(128, return_sequences=True)(x)
x = keras.layers.Flatten()(x)
outputs =  keras.layers.Dense(max_features, activation='softmax')(x)

model = keras.Model(inputs=inputs, outputs=outputs, name="model")

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())


Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 4)]               0         
_________________________________________________________________
embedding (Embedding)        (None, 4, 16)             960000    
_________________________________________________________________
lstm (LSTM)                  (None, 4, 128)            74240     
_________________________________________________________________
flatten (Flatten)            (None, 512)               0         
_________________________________________________________________
dense (Dense)                (None, 60000)             30780000  
Total params: 31,814,240
Trainable params: 31,814,240
Non-trainable params: 0
_________________________________________________________________
None


In [44]:
model.fit(train_ds, epochs=1) 



<keras.callbacks.History at 0x7f7cc57ee3d0>

In [45]:
def sample(preds, temperature=0.2):
  # helper function to sample an index from a probability array
  preds=np.squeeze(preds)
  
  preds = np.asarray(preds).astype("float64")
  preds = np.log(preds) / temperature
  exp_preds = np.exp(preds)
  preds = exp_preds / np.sum(exp_preds)

  probas = np.random.multinomial(1, preds, 1)
  return np.argmax(probas)

In [46]:
def generate_text(model, seed_original, step):
    seed= vectorize_text(seed_original)
    decode_sentence(seed.numpy().squeeze())
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print("...Diversity:", diversity)
        seed= vectorize_text(seed_original).numpy().reshape(1,-1)
        

        generated = (seed)
        for i in range(step):
            #print(seed.shape)
            predictions=model.predict(seed)
            pred_max= np.argmax(predictions.squeeze())
            #print("pred_max: ", pred_max)
            next_index = sample(predictions, diversity)
            #print("next_index: ", next_index)
            generated = np.append(generated, next_index)
            seed= generated[-sequence_length:].reshape(1,sequence_length)
        decode_sentence(generated)
    

In [47]:
def decode_sentence (encoded_sentence):
  deceoded_sentence=[]
  for word in encoded_sentence:
    
    deceoded_sentence.append(vectorize_layer.get_vocabulary()[word])
  sentence= ' '.join(deceoded_sentence)
  print(sentence)
  return sentence


In [48]:
generate_text(model, 
              "PREFACE SUPPOSING that Truth",  
              100)

preface supposing that truth
...Diversity: 0.2
preface supposing that truth the the the the the the the the the the the the the the the the the the the in of the the of the of the the the the the of the the the the the the the of the the the the the the the the the the the the the the the the of the the the the the the the the the the the the of the the the the the the the of the the the the the the the the the the the of the the the the the the the the the the
...Diversity: 0.5
preface supposing that truth and for the of the of of to and to and the so the the for the the the the the a of the of the of that the the verdict the for the the the that and that of of the he the the the to to in is the in the in the the to the the the of of that to the and of the the to that the the to as the the of as the and not that as the the the the is as by good of to the the the that the he
...Diversity: 1.0


IndexError: ignored

### Encoder-Decoder Model with Attention

In [49]:
LSTMoutputDimension=16

In [50]:
from tensorflow.keras import layers
import tensorflow.keras.backend as K
class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self, units, verbose=0):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)
    self.verbose= verbose

  def call(self, query, values):
    if self.verbose:
      print('\n******* Bahdanau Attention STARTS******')
      print('query (decoder hidden state): (batch_size, hidden size) ', query.shape)
      print('values (encoder all hidden state): (batch_size, max_len, hidden size) ', values.shape)

    # query hidden state shape == (batch_size, hidden size)
    # query_with_time_axis shape == (batch_size, 1, hidden size)
    # values shape == (batch_size, max_len, hidden size)
    # we are doing this to broadcast addition along the time axis to calculate the score
    query_with_time_axis = tf.expand_dims(query, 1)
    
    if self.verbose:
      print('query_with_time_axis:(batch_size, 1, hidden size) ', query_with_time_axis.shape)

    # score shape == (batch_size, max_length, 1)
    # we get 1 at the last axis because we are applying score to self.V
    # the shape of the tensor before applying self.V is (batch_size, max_length, units)
    score = self.V(tf.nn.tanh(
        self.W1(query_with_time_axis) + self.W2(values)))
    if self.verbose:
      print('score: (batch_size, max_length, 1) ',score.shape)
    # attention_weights shape == (batch_size, max_length, 1)
    attention_weights = tf.nn.softmax(score, axis=1)
    if self.verbose:
      print('attention_weights: (batch_size, max_length, 1) ',attention_weights.shape)
    # context_vector shape after sum == (batch_size, hidden_size)
    context_vector = attention_weights * values
    if self.verbose:
      print('context_vector before reduce_sum: (batch_size, max_length, hidden_size) ',context_vector.shape)
    context_vector = tf.reduce_sum(context_vector, axis=1)
    if self.verbose:
      print('context_vector after reduce_sum: (batch_size, hidden_size) ',context_vector.shape)
      print('\n******* Bahdanau Attention ENDS******')
    return context_vector, attention_weights


In [51]:
verbose= 0 
#See all debug messages

#batch_size=1
if verbose:
  print('***** Model Hyper Parameters *******')
  print('latentSpaceDimension: ', LSTMoutputDimension)
  print('batch_size: ', batch_size)
  print('sequence length (n_timesteps_in): ', max_features )
  print('n_features: ', embedding_dim)

  print('\n***** TENSOR DIMENSIONS *******')

# The first part is encoder
# A integer input for vocab indices.
encoder_inputs = tf.keras.Input(shape=(sequence_length,), dtype="int64", name='encoder_inputs')
#encoder_inputs = Input(shape=(n_timesteps_in, n_features), name='encoder_inputs')

# Next, we add a layer to map those vocab indices into a space of dimensionality
# 'embedding_dim'.
embedding = layers.Embedding(max_features, embedding_dim)
embedded= embedding(encoder_inputs)

encoder_lstm = layers.LSTM(LSTMoutputDimension,return_sequences=True, return_state=True,  name='encoder_lstm')
encoder_outputs, encoder_state_h, encoder_state_c = encoder_lstm(embedded)

if verbose:
  print ('Encoder output shape: (batch size, sequence length, latentSpaceDimension) {}'.format(encoder_outputs.shape))
  print ('Encoder Hidden state shape: (batch size, latentSpaceDimension) {}'.format(encoder_state_h.shape))
  print ('Encoder Cell state shape: (batch size, latentSpaceDimension) {}'.format(encoder_state_c.shape))
# initial context vector is the states of the encoder
encoder_states = [encoder_state_h, encoder_state_c]
if verbose:
  print(encoder_states)
# Set up the attention layer
attention= BahdanauAttention(LSTMoutputDimension, verbose=verbose)


# Set up the decoder layers
decoder_inputs = layers.Input(shape=(1, (embedding_dim+LSTMoutputDimension)),name='decoder_inputs')
decoder_lstm = layers.LSTM(LSTMoutputDimension,  return_state=True, name='decoder_lstm')
decoder_dense = layers.Dense(max_features, activation='softmax',  name='decoder_dense')

all_outputs = []

# 1 initial decoder's input data
# Prepare initial decoder input data that just contains the start character 
# Note that we made it a constant one-hot-encoded in the model
# that is, [1 0 0 0 0 0 0 0 0 0] is the first input for each loop
# one-hot encoded zero(0) is the start symbol
inputs = np.zeros((batch_size, 1, max_features))
inputs[:, 0, 0] = 1 
# 2 initial decoder's state
# encoder's last hidden state + last cell state
decoder_outputs = encoder_state_h
states = encoder_states
if verbose:
  print('initial decoder inputs: ', inputs.shape)

# decoder will only process one time step at a time.
for _ in range(1):

    # 3 pay attention
    # create the context vector by applying attention to 
    # decoder_outputs (last hidden state) + encoder_outputs (all hidden states)
    context_vector, attention_weights=attention(decoder_outputs, encoder_outputs)
    if verbose:
      print("Attention context_vector: (batch size, units) {}".format(context_vector.shape))
      print("Attention weights : (batch_size, sequence_length, 1) {}".format(attention_weights.shape))
      print('decoder_outputs: (batch_size,  latentSpaceDimension) ', decoder_outputs.shape )

    context_vector = tf.expand_dims(context_vector, 1)
    if verbose:
      print('Reshaped context_vector: ', context_vector.shape )

    # 4. concatenate the input + context vectore to find the next decoder's input
    inputs = tf.concat([context_vector, tf.dtypes.cast(inputs, tf.float32)], axis=-1)
    
    if verbose:
      print('After concat inputs: (batch_size, 1, n_features + hidden_size): ',inputs.shape )

    # 5. passing the concatenated vector to the LSTM
    # Run the decoder on one timestep with attended input and previous states
    decoder_outputs, state_h, state_c = decoder_lstm(inputs,
                                                     initial_state=states)
    #decoder_outputs = tf.reshape(decoder_outputs, (-1, decoder_outputs.shape[2]))
  
    outputs = decoder_dense(decoder_outputs)
    # 6. Use the last hidden state for prediction the output
    # save the current prediction
    # we will concatenate all predictions later
    outputs = tf.expand_dims(outputs, 1)
    all_outputs.append(outputs)
    # 7. Reinject the output (prediction) as inputs for the next loop iteration
    # as well as update the states
    inputs = outputs
    states = [state_h, state_c]


# 8. After running Decoder for max time steps
# we had created a predition list for the output sequence
# convert the list to output array by Concatenating all predictions 
# such as [batch_size, timesteps, features]
decoder_outputs = layers.Lambda(lambda x: K.concatenate(x, axis=1))(all_outputs)

# 9. Define and compile model 
model_encoder_decoder_Bahdanau_Attention = keras.Model(encoder_inputs, 
                                                 decoder_outputs, name='model_encoder_decoder')


In [52]:
model_encoder_decoder_Bahdanau_Attention.compile(optimizer= tf.keras.optimizers.RMSprop(learning_rate=0.001), 
                                                 loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [53]:
model_encoder_decoder_Bahdanau_Attention.fit(train_ds, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f7cd124cfd0>

In [None]:
model_encoder_decoder_Bahdanau_Attention.save("model_encoder_decoder_Bahdanau_Attention")

In [None]:
# The first part is encoder
# A integer input for vocab indices.
encoder_inputs = tf.keras.Input(shape=(sequence_length,), dtype="int64", name='encoder_inputs')

embedded= embedding(encoder_inputs)
encoder_outputs, encoder_state_h, encoder_state_c = encoder_lstm(embedded)

encoder_states = [encoder_state_h, encoder_state_c]

all_outputs = []

inputs = np.zeros((1, 1, max_features))
inputs[:, 0, 0] = 1 

decoder_outputs = encoder_state_h
states = encoder_states

context_vector, attention_weights=attention(decoder_outputs, encoder_outputs)
context_vector = tf.expand_dims(context_vector, 1)
inputs = tf.concat([context_vector, tf.dtypes.cast(inputs, tf.float32)], axis=-1)
decoder_outputs, state_h, state_c = decoder_lstm(inputs, initial_state=states)
outputs = decoder_dense(decoder_outputs)
outputs = tf.expand_dims(outputs, 1)


# 9. Define and compile model 
model_encoder_decoder_Bahdanau_Attention_PREDICTION = keras.Model(encoder_inputs, 
                                                 outputs, name='model_encoder_decoder')


In [57]:
generate_text(model_encoder_decoder_Bahdanau_Attention_PREDICTION, 
              "PREFACE SUPPOSING that Truth",  
              100)

preface supposing that truth
...Diversity: 0.2
preface supposing that truth the the the the and the the of the the the the the the the the the the of the the the the the the the of the the the the the the the the the the the the the the the of the the the the the the and the the the of of the of the the the the the the the the and the the the the the the and the the the the the the of the the of of the the the the the the of the the the the of the the the the
...Diversity: 0.5
preface supposing that truth and who the to of of as for the of and the the a the for to and man that the all that of and the the the of the of of of the in and the of this the the and of to and and the for is the the of the have of the the the was his of and of and be the the such the that of the the the to be the the in the the of of it the it of the the to to the of a or the to of the and
...Diversity: 1.0


IndexError: ignored

### Conclusion

The model did not perform well, next we will look at the better model to perform this very same task.