<a href="https://colab.research.google.com/github/Adam-Aber/Text-Generation-with-RNNs/blob/main/Labs/Lab4/Lab4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment: Text Generation with RNNs

In this assignment, you will explore text generation using deep learning models, specifically Recurrent Neural Networks (RNNs). Text generation is a fascinating task where the goal is to train a model that can predict the next word in a sequence, ultimately generating coherent and contextually accurate sentences or paragraphs.

You will work with a corpus of text data provided in the tutorial (from TensorFlow), which contains a large collection of Shakespeare’s works. Your objective is to implement a text generation model using RNNs and to evaluate its performance based on various metrics.

<img src="https://drek4537l1klr.cloudfront.net/teofili2/Figures/03fig09_alt.jpg" alt="Drawing"/>

**The models include:**
- Deep RNN (LSTM)
- Deep RNN (GRU)
- Bidirectional RNN


**Evaluation:**

Evaluate the performance of the text generation model using several metrics:
- Perplexity: A measure of how well the probability distribution predicted by the model aligns with the actual distribution of words in the text.
- Generated Text Quality: Subjectively evaluate the quality of the generated text by considering grammar, coherence, and creativity. This can be done by visually inspecting the generated sequences.
- BLUE Score: Character-level BLEU: BLEU can be applied at the character level by treating each character as an n-gram. This is particularly useful when the task involves generating character sequences (like poetry, code, or fine-grained character-based generation).
For example, the sequence “hello” could be evaluated by comparing 1-grams (e.g., “h”, “e”, “l”, “l”, “o”) or higher-order n-grams (e.g., “he”, “el”, “ll”, “lo”).

**Goals:**

- Compare the performance of GRU and LSTM. Is there a significant difference in the results?
- Compare the performance of the unidirectional and bidirectional RNN models.
Which model produces better results?
- Discuss the impact of bidirectional RNNs on text generation tasks.

# Import TensorFlow and other libraries

In [1]:
import tensorflow as tf
import numpy as np
import nltk
from nltk.translate.bleu_score import sentence_bleu
import os
import time

# Dataset

The Shakespeare dataset used for text generation contains a collection of works by William Shakespeare, primarily in the form of plays and sonnets. The text is used to train language models, offering an ideal example for character-level modeling due to its rich, complex language. The dataset is often preprocessed by tokenizing it into individual characters, enabling models to learn the sequential relationships between characters for generating coherent and contextually appropriate text.

To access the dataset, you can use TensorFlow's get_file method as follow:

You can inspect the dataset on [Kaggle](https://www.kaggle.com/datasets/adarshpathak/shakespeare-text/data).

In [2]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print(f'Length of text: {len(text)} characters')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
[1m1115394/1115394[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Length of text: 1115394 characters


In [None]:
# Take a look at the first 250 characters in text
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



# Process the text

### Vectorize the text

Before training, you need to convert the strings to a numerical representation.

The `tf.keras.layers.StringLookup` layer can convert each character into a numeric ID. It just needs the text to be split into tokens first.

In [3]:
example_texts = ['abcdefg', 'xyz']

chars = tf.strings.unicode_split(example_texts, input_encoding='UTF-8')
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>


Now create the [`tf.keras.layers.StringLookup`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/StringLookup) layer:

Since the goal of this assignment is to generate text, it will also be important to invert this representation and recover human-readable strings from it. For this you can use tf.keras.layers.StringLookup(..., invert=True).

Note: Here instead of passing the original vocabulary generated with sorted(set(text)) use the get_vocabulary() method of the tf.keras.layers.StringLookup layer so that the [UNK] tokens is set the same way.

In [9]:
def get_string_lookup_layers(vocab):
    """
    Creates StringLookup layers for encoding characters to IDs and decoding IDs back to characters.

    Args:
        vocab (list): List of unique characters in the dataset.

    Returns:
        ids_from_chars (tf.keras.layers.StringLookup): Converts characters to IDs.
        chars_from_ids (tf.keras.layers.StringLookup): Converts IDs back to characters.
    """
    ids_from_chars = tf.keras.layers.StringLookup(vocabulary=list(vocab), mask_token=None)
    chars_from_ids = tf.keras.layers.StringLookup(vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)
    return ids_from_chars, chars_from_ids

In [5]:
def text_from_ids(ids, chars_from_ids):
    """
    Converts a sequence of character IDs into a human-readable string.

    Args:
        ids (tf.Tensor): Tensor of character IDs.
        chars_from_ids (tf.keras.layers.StringLookup): StringLookup layer to decode IDs.

    Returns:
        tf.Tensor: Decoded string.
    """
    return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)

# Create training examples and targets

Next divide the text into example sequences. Each input sequence will contain seq_length characters from the text.

For each input sequence, the corresponding targets contain the same length of text, except shifted one character to the right.

So break the text into chunks of seq_length+1. For example, say seq_length is 4 and our text is "Hello". The input sequence would be "Hell", and the target sequence "ello".

To do this first use the tf.data.Dataset.from_tensor_slices function to convert the text vector into a stream of character indices.

For training you'll need a dataset of (input, label) pairs. Where input and label are sequences. At each time step the input is the current character and the label is the next character.

Here's a function that takes a sequence as input, duplicates, and shifts it to align the input and label for each timestep:

In [6]:
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

Convert Text to Numerical Sequences

In [10]:
# Initialize mapping layers
vocab = sorted(set(text))
ids_from_chars, chars_from_ids = get_string_lookup_layers(vocab)



# Convert text to character IDs
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8'))

# Create a dataset from the character IDs
ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids)

Create Sequences for Training

In [11]:
seq_length = 100  # Length of each training sequence

# Batch sequences (each sequence is seq_length + 1)
sequences = ids_dataset.batch(seq_length + 1, drop_remainder=True)

# Map dataset to input-target format
dataset = sequences.map(split_input_target)

# Print sample input-output pairs
for input_example, target_example in dataset.take(1):
    print("Input :", tf.strings.reduce_join(chars_from_ids(input_example)).numpy().decode('utf-8'))
    print("Target:", tf.strings.reduce_join(chars_from_ids(target_example)).numpy().decode('utf-8'))


Input : First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You
Target: irst Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You 


# Create RNN Models

Before training our text generation models, we need to set up key parameters and prepare the dataset. These parameters control the size of batches, buffer size for shuffling, embedding dimensions, and the number of units in the recurrent layer.

Customizing Parameters for Optimization

- Increase RNN_UNITS if the model is underfitting (not capturing enough detail).
- Decrease BATCH_SIZE if memory usage is too high (e.g., when using large RNN units).
- Adjust EMBEDDING_DIM to experiment with the quality of learned character representations.

Tip: Start with these values, then fine-tune based on model performance and available computational resources!

In [12]:
BATCH_SIZE = 64
BUFFER_SIZE = 10000
VOCAB_SIZE = len(vocab) + 1
DATASET_SIZE = sum(1 for _ in dataset)

shuffled_dataset = dataset.shuffle(BUFFER_SIZE)

train_dataset = shuffled_dataset.take(int(0.9 * DATASET_SIZE) )
val_dataset = shuffled_dataset.skip(int(0.1 * DATASET_SIZE) )

# Prepare dataset for training
training_dataset = (train_dataset
                 .batch(BATCH_SIZE, drop_remainder=True)
                 .prefetch(tf.data.experimental.AUTOTUNE))
# Prepare dataset for validation
validation_dataset = (val_dataset
               .batch(BATCH_SIZE, drop_remainder=True)
               .prefetch(tf.data.experimental.AUTOTUNE))

# LSTM-Based Model

Long Short-Term Memory (LSTM) is a type of Recurrent Neural Network (RNN) designed to capture long-range dependencies in sequential data. Unlike traditional RNNs, which struggle with vanishing gradients, LSTMs use gates (input, forget, and output gates) to regulate the flow of information, making them highly effective for text generation tasks.

This function defines an LSTM-based neural network for text generation.

**Function Overview:**
* [`tf.keras.layers.Embedding`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding): This is the input layer, consisting of a trainable lookup table that maps the numbers of each character to a vector with `embedding_dim` dimensions.
* [`tf.keras.layers.LSTM`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM): Our LSTM network, with size `units=rnn_units`.
* [`tf.keras.layers.Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense): The output layer, with `vocab_size` outputs.


<img src="https://raw.githubusercontent.com/aamini/introtodeeplearning/2019/lab1/img/lstm_unrolled-01-01.png" alt="Drawing"/>

In [32]:
def build_lstm_model(vocab_size, embedding_dim, rnn_units, batch_size):
    """
    Builds an LSTM-based text generation model with stacked LSTM layers.

    Parameters:
    - vocab_size (int): Number of unique characters in the vocabulary.
    - embedding_dim (int): Dimension of the word/character embeddings.
    - rnn_units (int): Number of units in the LSTM layer.
    - batch_size (int): Number of sequences processed in parallel.

    Returns:
    - tf.keras.Model: Compiled LSTM model.
    """
    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(vocab_size, embedding_dim),
        tf.keras.layers.LSTM(rnn_units,
                             return_sequences=True,
                             stateful=True,
                             recurrent_initializer='glorot_uniform'),
        tf.keras.layers.LSTM(rnn_units,
                             return_sequences=True,
                             stateful=True,
                             recurrent_initializer='glorot_uniform'),
        tf.keras.layers.LSTM(rnn_units,
                             return_sequences=True,
                             stateful=True,
                             recurrent_initializer='glorot_uniform'),
        tf.keras.layers.Dense(vocab_size)
    ])
    return model

**Customization & Optimization**

These experiments are required to improve model performance. You must conduct the following experiments and document your findings:

- Increase RNN_UNITS : This enhances the model’s ability to recognize deeper patterns but may increase training time.
- Experiment with EMBEDDING_DIM : Adjusting this can improve the quality of character representation.
- Stack multiple LSTM layers : This can help the model understand text more effectively.

**Submission Requirements**

- Perform at least three trials varying the parameters above.
- Keep the most performing plots and outputs in your Jupyter Notebook.


# GRU-Based Model

What is GRU?

Gated Recurrent Units (GRU) are a simplified version of LSTMs that combine the forget and input gates into a single update gate. This makes GRUs:

- Faster and more efficient than LSTMs
- Perform well on shorter sequences
- Require fewer computational resources

**Model Comparison (LSTM vs. GRU)**

To evaluate the differences between LSTM and GRU models, compare them using:

- Model Size & Number of Parameters : Use model.summary() to check the number of trainable parameters in each model.
- Training Speed & Efficiency : Measure training time per epoch for both models.

In [21]:
def build_gru_model(vocab_size, embedding_dim, rnn_units, batch_size):
    """
    Builds a GRU-based text generation model.

    Parameters:
    - vocab_size (int): Number of unique characters in the vocabulary.
    - embedding_dim (int): Dimension of the word/character embeddings.
    - rnn_units (int): Number of units in the GRU layer.
    - batch_size (int): Number of sequences processed in parallel.

    Returns:
    - tf.keras.Model: Compiled GRU model.
    """
    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(vocab_size, embedding_dim),
        tf.keras.layers.GRU(rnn_units,
                            return_sequences=True,
                            stateful=True,
                            recurrent_initializer='glorot_uniform'),
        tf.keras.layers.Dense(vocab_size)
    ])
    return model

# Bidirectional Model


<img src="https://www.researchgate.net/publication/342646275/figure/fig4/AS:962238546464790@1606426955772/Comparison-between-LSTM-and-Bi-LSTM-networks-recreated-after-33.png" alt="Drawing"/>


**What is Bidirectional RNN?**

A Bidirectional Recurrent Neural Network (BiRNN) is an extension of a standard RNN that processes input sequences in both forward and backward directions. This means that at each time step, the model considers both past (left-to-right) and future (right-to-left) context, leading to better performance in many sequence-related tasks, including text generation.

**Why Use Bidirectional RNNs?**

Bidirectional RNNs are used because they capture dependencies in the input data from both directions, which is crucial for understanding context. This makes them particularly useful in tasks like language processing, where the meaning of a word can depend on both the words that come before and after it.

For example, in the sentence "The cat sat on the mat," the word "mat" is influenced by both the preceding words ("The cat sat on the") and what could potentially follow (e.g., "and looked at the mouse").

Example

Let's consider a sentence: "He opened the door."

Simple RNN: It processes the sentence from left to right, word by word:

He → opened → the → door
Each word is processed based on the previous word.

Bidirectional RNN: It processes the sentence in both directions:

Forward pass: He → opened → the → door

Backward pass: door → the → opened → He

The output at each time step is a combination of the information from both the forward and backward passes, allowing the network to understand the context better. For instance, knowing that "door" follows "the" helps confirm that "opened" likely refers to a physical action, not a metaphorical one.

**Choosing the Base RNN for Bidirectional Processing:**

One key flexibility of Bidirectional RNNs is that you can choose any recurrent architecture (LSTM, GRU, Simple RNN) as the base model.

- Bidirectional LSTM: Best for handling long-range dependencies.
- Bidirectional GRU: More computationally efficient than LSTM.
- Bidirectional Simple RNN: Less commonly used due to vanishing gradient issues.

Note: To determine the best base model for the Bidirectional RNN, you must evaluate the results from your previous experiments with different architectures (eg. LSTM and GRU).

Review previous results from training LSTM and GRU models.
Compare their performance in terms of:

- Training time per epoch
- Model size (number of parameters)
- Loss and accuracy on validation data
- Quality of generated text

Based on your findings, select the best-performing model as the base architecture for the Bidirectional RNN.


In [62]:
def build_birnn_model(vocab_size, embedding_dim, rnn_units, batch_size, rnn_type="GRU"):
    """
    Builds a Bidirectional RNN-based text generation model.

    Parameters:
    - vocab_size (int): Number of unique characters in the vocabulary.
    - embedding_dim (int): Dimension of the word/character embeddings.
    - rnn_units (int): Number of units in the RNN layer.
    - batch_size (int): Number of sequences processed in parallel.
    - rnn_type (str): Type of RNN to use as the base (options: "LSTM", "GRU").

    Returns:
    - tf.keras.Model: Compiled Bidirectional RNN model.
    """
    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(vocab_size, embedding_dim),
    ])

    if rnn_type == "LSTM":
        rnn_layer = tf.keras.layers.LSTM(rnn_units,
                                        return_sequences=True,
                                        stateful=True,
                                        recurrent_initializer='glorot_uniform')
    elif rnn_type == "GRU":
        rnn_layer = tf.keras.layers.GRU(rnn_units,
                                       return_sequences=True,
                                       stateful=True,
                                       recurrent_initializer='glorot_uniform')
    else:
        raise ValueError("rnn_type must be 'LSTM' or 'GRU'")

    model.add(tf.keras.layers.Bidirectional(rnn_layer))
    model.add(tf.keras.layers.Dense(vocab_size))

    return model

# Evaluating the Models

Model Selection

In [18]:
EMBEDDING_DIM = 256
RNN_UNITS = 1024

In [48]:
# Choose model type
model_type = "LSTM"

if model_type == "LSTM":
    model = build_lstm_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
elif model_type == "GRU":
    model = build_gru_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
elif model_type == "BiLSTM":
    model = build_birnn_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)

print(f"Using {model_type} model for training.")

Using LSTM model for training.


In [54]:
# Choose model type
model_type = "GRU"

if model_type == "LSTM":
    model = build_lstm_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
elif model_type == "GRU":
    model = build_gru_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
elif model_type == "BiLSTM":
    model = build_birnn_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)

print(f"Using {model_type} model for training.")

Using GRU model for training.


In [64]:
# Choose model type
model_type = "BiGRU"

if model_type == "LSTM":
    model = build_lstm_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
elif model_type == "GRU":
    model = build_gru_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
elif model_type == "BiGRU":
    model = build_birnn_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)

print(f"Using {model_type} model for training.")

Using BiGRU model for training.


Compile & Train the Model

In [55]:
def compile_and_train(model, path, EPOCHS = 30):
  # Define loss function
  def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

  # Compile the model
  model.compile(optimizer='adam', loss=loss)

  # Ensure checkpoint directory exists
  checkpoint_dir = f'./{path}'
  os.makedirs(checkpoint_dir, exist_ok=True)

  checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

  checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix + '.weights.h5',
    save_weights_only=True
  )

  # Train the model
  history = model.fit(
    training_dataset,
    validation_data=validation_dataset,
    epochs=EPOCHS,
    callbacks=[checkpoint_callback]
)

  return history

In [50]:
print(f"Compiling and training the {model_type} model...")
history = compile_and_train(model, path=f'{model_type.lower()}_checkpoint', EPOCHS=30)

print(f"Training finished for the {model_type} model.")

Compiling and training the LSTM model...
Epoch 1/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m75s[0m 450ms/step - loss: 3.5830 - val_loss: 2.6883
Epoch 2/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m74s[0m 469ms/step - loss: 2.4798 - val_loss: 2.1343
Epoch 3/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m70s[0m 444ms/step - loss: 2.0632 - val_loss: 1.8667
Epoch 4/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m70s[0m 443ms/step - loss: 1.8193 - val_loss: 1.6887
Epoch 5/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m75s[0m 471ms/step - loss: 1.6623 - val_loss: 1.5670
Epoch 6/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m76s[0m 479ms/step - loss: 1.5513 - val_loss: 1.4936
Epoch 7/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m74s[0m 465ms/step - loss: 1.4819 - val_loss: 1.4362
Epoch 8/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m78s[0m 489ms/step - loss: 1

In [56]:
print(f"Compiling and training the {model_type} model...")
history = compile_and_train(model, path=f'{model_type.lower()}_checkpoint', EPOCHS=30)

print(f"Training finished for the {model_type} model.")

Compiling and training the GRU model...
Epoch 1/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 101ms/step - loss: 3.1680 - val_loss: 2.0566
Epoch 2/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 96ms/step - loss: 1.9648 - val_loss: 1.7360
Epoch 3/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 98ms/step - loss: 1.6878 - val_loss: 1.5615
Epoch 4/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 94ms/step - loss: 1.5431 - val_loss: 1.4630
Epoch 5/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 97ms/step - loss: 1.4577 - val_loss: 1.3987
Epoch 6/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 95ms/step - loss: 1.3980 - val_loss: 1.3497
Epoch 7/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 95ms/step - loss: 1.3482 - val_loss: 1.3073
Epoch 8/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 95ms/step - loss: 1.3104 - 

In [65]:
print(f"Compiling and training the {model_type} model...")
history = compile_and_train(model, path=f'{model_type.lower()}_checkpoint', EPOCHS=30)

print(f"Training finished for the {model_type} model.")

Compiling and training the BiGRU model...
Epoch 1/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 202ms/step - loss: 2.1531 - val_loss: 0.0854
Epoch 2/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 181ms/step - loss: 0.0769 - val_loss: 0.0635
Epoch 3/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 192ms/step - loss: 0.0614 - val_loss: 0.0570
Epoch 4/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 190ms/step - loss: 0.0561 - val_loss: 0.0524
Epoch 5/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 184ms/step - loss: 0.0517 - val_loss: 0.0490
Epoch 6/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 187ms/step - loss: 0.0490 - val_loss: 0.0469
Epoch 7/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 186ms/step - loss: 0.0469 - val_loss: 0.0440
Epoch 8/30
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 187ms/step - loss: 

# Generate Text Using the Model

In [44]:
class OneStepTextGenerator(tf.keras.Model):
    def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
        """
        Initializes the OneStepTextGenerator.

        Parameters:
        - model: Trained text generation model (e.g., LSTM or GRU).
        - chars_from_ids: Function mapping character IDs to characters.
        - ids_from_chars: Function mapping characters to their respective IDs.
        - temperature (float): Controls randomness in text generation.
          -> Higher temperature (> 1.0): Produces more diverse and unpredictable text.
          -> Lower temperature (< 1.0): Makes the model more confident but results in repetitive text.
          -> Temperature = 1.0: Standard behavior without biasing randomness.
        """
        super().__init__()
        self.model = model
        self.chars_from_ids = chars_from_ids
        self.ids_from_chars = ids_from_chars
        self.temperature = temperature

    def generate_text(self, start_string, num_generate=1000):
        """
        Generates text one character at a time based on a starting string.

        Parameters:
        - start_string (str): Initial text prompt.
        - num_generate (int): Number of characters to generate.

        Returns:
        - str: Generated text.
        """
        # Convert start string to tensor representation
        input_eval = tf.expand_dims(self.ids_from_chars(tf.strings.unicode_split(start_string, 'UTF-8')), 0)
        text_generated = []

        # Initial hidden state (None allows automatic initialization)
        states = None

        for _ in range(num_generate):
            # Get model predictions and hidden state
            # predictions, states = self.model(input_eval, states=states, return_state=True)
            predictions = self.model(input_eval)

            # Adjust predictions using temperature
            predictions = predictions[:, -1, :] / self.temperature

            # Sample the next character ID from a probability distribution
            predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()

            # Use the predicted ID as next input
            input_eval = tf.expand_dims([predicted_id], 0)

            # Convert ID back to character and append to output
            text_generated.append(self.chars_from_ids(predicted_id).numpy().decode('utf-8'))

        return start_string + ''.join(text_generated)


# Evaluate Model Performance

In [45]:
# Define loss function
loss = tf.keras.losses.sparse_categorical_crossentropy

In [41]:
# Build and summarize LSTM model
lstm_model_compare = build_lstm_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
lstm_model_compare.build(tf.TensorShape([BATCH_SIZE, seq_length]))
print("LSTM Model Summary")
lstm_model_compare.summary()

# Measure training time for one epoch for LSTM
print("LSTM Model:")
start_time_lstm = time.time()
compile_and_train(lstm_model_compare, path='lstm_compare_checkpoint', EPOCHS=1)
end_time_lstm = time.time()
print(f"Time per epoch (LSTM): {end_time_lstm - start_time_lstm:.2f} seconds")

LSTM Model Summary


LSTM Model:
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m75s[0m 456ms/step - loss: 3.3147 - val_loss: 2.2102
Time per epoch (LSTM): 75.15 seconds


In [46]:
# Build and summarize GRU model
gru_model_compare = build_gru_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
gru_model_compare.build(tf.TensorShape([BATCH_SIZE, seq_length]))
print("\n GRU Model Summary")
gru_model_compare.summary()

# Measure training time for one epoch for GRU
print("\nGRU Model:")
start_time_gru = time.time()
compile_and_train(gru_model_compare, path='gru_compare_checkpoint', EPOCHS=1)
end_time_gru = time.time()
print(f"Time per epoch (GRU): {end_time_gru - start_time_gru:.2f} seconds")


 GRU Model Summary



GRU Model:
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 98ms/step - loss: 3.1704 - val_loss: 2.0649
Time per epoch (GRU): 18.20 seconds


In [66]:
# Build and summarize BiGRU model
bigru_model_compare = build_birnn_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE, rnn_type="GRU")
bigru_model_compare.build(tf.TensorShape([BATCH_SIZE, seq_length]))
print("\n BiGRU Model Summary")
bigru_model_compare.summary()

# Measure training time for one epoch for BiGRU
print("BiGRU Model:")
start_time_bigru = time.time()
compile_and_train(bigru_model_compare, path='bigru_compare_checkpoint', EPOCHS=1)
end_time_bigru = time.time()
print(f"Time per epoch (BiGRU): {end_time_bigru - start_time_bigru:.2f} seconds")


 BiGRU Model Summary


BiGRU Model:
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 187ms/step - loss: 2.1578 - val_loss: 0.0832
Time per epoch (BiGRU): 32.24 seconds


**1- Perplexity (PP)**

TODO: Measures uncertainty in predicting next character for each model and leave the results in the Notebook.

***Lower values = better model***

In [26]:
def perplexity(logits, labels):
    loss = tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
    return np.exp(np.mean(loss))

def evaluate_model_perplexity(model, dataset):
    total_loss = 0.0
    total_tokens = 0

    for input_batch, label_batch in dataset:
        logits = model(input_batch, training=False)
        loss = perplexity(logits, label_batch)
        total_loss += np.log(loss)
        total_tokens += tf.size(label_batch).numpy()

    avg_loss = total_loss / total_tokens
    return np.exp(avg_loss)

**2- Text Coherence & Fluency**

Subjective evaluation by examining generated text

**TODO: Generate at least 3 Samples (one from each model) and leave them in the Notebook.**

In [52]:
# Generate text from LSTM model
lstm_generator = OneStepTextGenerator(model, chars_from_ids, ids_from_chars)
lstm_generated_text = lstm_generator.generate_text(start_string="ROMEO:")
print("Generated text from LSTM model")
print(lstm_generated_text)

Generated text from LSTM model
ROMEO:
I see, my sovereign loyouthat my parts
Is as boses by his vast all and desire
To steal correction: there is our vain that I may fly.

KING RICHARD III:
Madam, your enemies!

PAULINA:
O sight! O thou think'st, gentle Flavery,
Without a Paul, tell my sainted spirit,
Dright with a meat; and here becomes,
Most soft-as willingly as that may joy in their ates,
The king she look'd of cord have broken death
I speak as you both with no right and back;
And ten things else of love is modested: meantime, I'll glly me to your mother. This foe
Conot come.

LUCIO:
I'll rid he, lady.

AUFIDIUS:
He saw't; my requeens!

TYRREL:
Ay, good Patration, often stends that way should be well:
Come, I'll respect me to your children.

KATHARINA:
I will unto the tribunes
Do me with theretokes, my fair royal friends,
The hand is better to me:
Thou duty, man; meantimes, part, and yield consent
While they do have my heart with thine eyes,
When such a little father gave my place,


In [57]:
# Generate text from GRU model
gru_generator = OneStepTextGenerator(model, chars_from_ids, ids_from_chars)
gru_generated_text = gru_generator.generate_text(start_string="ROMEO:")
print("\n Generated text from GRU model")
print(gru_generated_text)


 Generated text from GRU model
ROMEO:
So stood your brother is so early make ones.

KING EDWARD IV:
What is a wnightfor forth, a loveraughty head,
That shall supress foot.

MARCIUS:
He's a better gone, my lord.

KING RICHARD II:
Say what our course of Lord!

KING HENRY VI:
O woe! O for that, I pray thee? speak.

VINCENTIO:
Why, so, no more of him.

MENENIUS:
Well, by your guests!

First Murderer:
How fares it to the face: therefore hent,
As I myself aman.

PROSPERO:
Now there been sign.

PETRUCHIO:
Call them forth. A pinch'd brother Clarence, we are upon your oracle,
Comen my eyes drop on the Raplish your army;
O my most guard we banish him to your receiving to reshow Cominius since?

Messenger:
Sirrah, I pray you, and will hear some vow'd, I would not fly:
Or the blessed sun urged the salt of our day's words:
O, she lived us with flatterer loosers o'ergoade,
Yea, and much;
waight hat to make a better love the king.

GLOUCESTER:
Why, am I am age, seldom comes On you
To do that kingly 

In [68]:
# Generate text from Bidirectional model
birnn_generator = OneStepTextGenerator(model, chars_from_ids, ids_from_chars)
birnn_generated_text = birnn_generator.generate_text(start_string="ROMEO:")
print("\n Generated text from Bidirectional model")
print(birnn_generated_text)


 Generated text from Bidirectional model
ROMEO:
R
R
RORORORORORORORORORORO:
Prisisisisisss's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's's s ciciaiainininininininsns.
O
O
O
OUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUS:
O
O
OUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUSUS

**3- BLEU Score**

TODO: Measures similarity between generated and real text for each model and leave the results in the Notebook.

In [27]:
def compute_bleu(reference, generated):
    return sentence_bleu([list(reference)], list(generated))

def evaluate_model_bleu(model, dataset):
    bleu_scores = []

    for input_batch, label_batch in dataset:
        logits = model(input_batch, training=False)
        predictions = tf.argmax(logits, axis=-1)

        label_batch = label_batch.numpy()

        for pred_seq, true_seq in zip(predictions, label_batch):
            # Decode sequences to text
            pred_text = text_from_ids(pred_seq, chars_from_ids).numpy().decode('utf-8')
            true_text = text_from_ids(true_seq, chars_from_ids).numpy().decode('utf-8')

            # Compute BLEU score for each sample
            bleu = compute_bleu(true_text, pred_text)
            bleu_scores.append(bleu)

    return np.mean(bleu_scores)

In [53]:
# Evaluate Perplexity
perp = evaluate_model_perplexity(model, validation_dataset)
print(f"LSTM Perplexity on validation dataset: {perp}")

# Evaluate BLEU Score
bleu = evaluate_model_bleu(model, validation_dataset)
print(f"LSTM BLEU score on validation dataset: {bleu}")

LSTM Perplexity on validation dataset: 1.000149953593842
LSTM BLEU score on validation dataset: 0.4675834615345109


In [58]:
# Evaluate Perplexity
perp = evaluate_model_perplexity(model, validation_dataset)
print(f"GRU Perplexity on validation dataset: {perp}")

# Evaluate BLEU Score
bleu = evaluate_model_bleu(model, validation_dataset)
print(f"GRU BLEU score on validation dataset: {bleu}")

GRU Perplexity on validation dataset: 1.0001107370019666
GRU BLEU score on validation dataset: 0.6209763079366689


In [69]:
# Evaluate Perplexity
perp = evaluate_model_perplexity(model, validation_dataset)
print(f"BiGRU Perplexity on validation dataset: {perp}")

# Evaluate BLEU Score
bleu = evaluate_model_bleu(model, validation_dataset)
print(f"BiGRU BLEU score on validation dataset: {bleu}")

BiGRU Perplexity on validation dataset: 1.000002015669248
BiGRU BLEU score on validation dataset: 0.9965349018650352
