#RNN Text Generation Creation and Test Model

The main purpose of the code is to create a character-level text generation model using a Recurrent Neural Network (RNN) with TensorFlow. It reads a given text dataset, preprocesses the text into character sequences, and builds a model with embedding and GRU layers. The model is trained on the sequences to predict the next character in a given input sequence, thereby learning the underlying patterns and structures within the text. Once trained, the model can generate new text snippets that mimic the style and structure of the original text, making it a valuable tool for tasks like creative writing, text completion, or simulation of specific writing styles.











This imports the drive module from the google.colab package and uses it to mount Google Drive in the Colab environment. By granting authorization, the code establishes a connection between Colab and the user's Google Drive, making it possible to access files and directories stored on Google Drive as if they were local to the Colab runtime. This step is essential when the code needs to interact with files or datasets stored in the user's Google Drive during the course of the project.


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


This imports essential libraries for the TensorFlow project. It includes TensorFlow for machine learning tasks, NumPy for numerical computations, and the Python standard libraries os for file system interaction and time for time-related functions.






In [None]:
import tensorflow as tf
import numpy as np
import os
import time

Make sure to add your dataset before this step. This defines a variable path_to_file, representing the path to a file named "dataset.txt." The code then reads the contents of this file using the open function in binary mode ('rb'), reads its content, and decodes it using the UTF-8 encoding. The resulting text is stored in the variable text. Finally, the code prints the first 250 characters of the text variable. This section allows the code to load and process the content of the "dataset.txt" file, giving a preview of the first 250 characters of the text.

In [None]:
path_to_file = ('dataset.txt')
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
print(text[:250])



THE SONNETS

                    1

From fairest creatures we desire increase,
That thereby beauty’s rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou contracted to thine own br


The provided code creates a variable called vocab, which contains a sorted list of unique characters from the text variable. It does this by converting text into a set to remove duplicate characters and then sorting the resulting set to obtain a vocabulary of unique characters. The code then prints the vocab, displaying all the unique characters present in the text variable.






In [None]:
vocab = sorted(set(text))
print(vocab)
print(len(vocab))


['\t', '\n', '\r', ' ', '!', '"', '&', "'", '(', ')', '*', ',', '-', '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '}', 'À', 'Æ', 'Ç', 'É', 'à', 'â', 'æ', 'ç', 'è', 'é', 'ê', 'ë', 'î', 'œ', '—', '‘', '’', '“', '”', '…']
103


The code defines a list called example_texts, which contains two strings: "I ate a" and "child". These strings appear to be sample texts or phrases used for further processing in the following sections of the code.






In [None]:
example_texts = ['I ate a', 'child']

The code uses TensorFlow's tf.strings.unicode_split() function to split the example_texts into individual characters. It considers the text as UTF-8 encoded and converts it into a TensorFlow tensor, where each element of the tensor represents a single character from the input text. The resulting chars tensor contains all the individual characters from the two example texts, preserving their original order. The code then prints the chars tensor to display the characters obtained from the example texts.






In [None]:
chars = tf.strings.unicode_split(example_texts, input_encoding='UTF-8')
print(chars)

<tf.RaggedTensor [[b'I', b' ', b'a', b't', b'e', b' ', b'a'],
 [b'c', b'h', b'i', b'l', b'd']]>


The code defines a StringLookup layer from TensorFlow's Keras API named ids_from_chars. This layer is used for mapping characters to unique integer IDs. It takes a vocabulary, represented as a list vocab, as input, which should contain all the unique characters seen in the training data. The mask_token parameter is set to None, indicating that no special mask token will be used in the mapping. The resulting ids_from_chars layer can now be used to convert characters into integer IDs based on the provided vocabulary.






In [None]:
ids_from_chars = tf.keras.layers.StringLookup(
    vocabulary=list(vocab), mask_token=None)

The code uses the ids_from_chars layer, defined in the previous step, to convert the chars tensor (which contains individual characters from the example_texts) into integer IDs. It performs the character-to-integer mapping based on the vocabulary provided earlier. The resulting ids tensor contains the corresponding integer IDs for each character in the chars tensor. The code then prints the ids tensor, displaying the integer IDs obtained from the character conversion.






In [None]:
ids = ids_from_chars(chars)
print(ids)

<tf.RaggedTensor [[36, 4, 57, 76, 61, 4, 57], [59, 64, 65, 68, 60]]>


The code defines another `StringLookup` layer called `chars_from_ids`. This layer performs the reverse operation of the `ids_from_chars` layer, i.e., it maps the integer IDs back to their corresponding characters. To achieve this, it uses the vocabulary obtained from the `ids_from_chars` layer using the `get_vocabulary()` method. The `invert` parameter is set to `True`, indicating that the mapping is from integers to characters.

The `chars` tensor, which contains integer IDs from the previous step, is passed through this `chars_from_ids` layer to obtain the corresponding characters. The resulting `chars` tensor now contains the original characters from the `example_texts`. Finally, the code prints the `chars` tensor to display the characters obtained from the integer IDs.

In [None]:
chars_from_ids = tf.keras.layers.StringLookup(
    vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)

chars = chars_from_ids(ids)

print(chars)

<tf.RaggedTensor [[b'I', b' ', b'a', b't', b'e', b' ', b'a'],
 [b'c', b'h', b'i', b'l', b'd']]>


The code defines a function called `text_from_ids(ids)` that takes a tensor of integer IDs as input and returns a string by joining the corresponding characters together. It does this by using the `chars_from_ids` layer to map the integer IDs back to their corresponding characters. Then, it uses `tf.strings.reduce_join()` to concatenate these characters along the last axis, effectively converting the tensor of integer IDs into a single string.

The code then calls the `text_from_ids(ids)` function with the previously obtained `ids` tensor (containing integer IDs from the `example_texts`). The function converts the integer IDs back to their original characters and returns the resulting string. Finally, the code prints this string, displaying the reconstructed text from the integer IDs.

In [None]:
def text_from_ids(ids):
    return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)

print(text_from_ids(ids))

tf.Tensor([b'I ate a' b'child'], shape=(2,), dtype=string)


The code uses the `ids_from_chars` layer defined earlier to convert the entire `text` variable (which contains the text from "shakespeare.txt") into integer IDs. It does this by first splitting the text into individual characters using `tf.strings.unicode_split()` and then passing the resulting characters through the `ids_from_chars` layer.

The `all_ids` tensor contains the integer IDs corresponding to each character in the text. The code then prints the `all_ids` tensor, displaying the converted integer IDs for the entire text. This tensor represents the text in its numerical form, making it suitable for training a language model that operates on integer inputs.

In [None]:
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8'))
print(all_ids)

tf.Tensor([3 2 3 ... 2 3 2], shape=(5578251,), dtype=int64)


The code creates a TensorFlow dataset named `ids_dataset` using `tf.data.Dataset.from_tensor_slices()` function. The dataset is generated from the previously obtained `all_ids` tensor, which contains integer IDs corresponding to each character in the text from "shakespeare.txt."

By creating a dataset from the `all_ids` tensor, the code effectively constructs a sequence of integer IDs that represent the entire text in its numerical form. This dataset can be used for various purposes, such as training a language model, batch processing, or further data manipulation using TensorFlow's dataset functionalities.

In [None]:
ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids)

This code iterates through the first 10 elements of the ids_dataset and converts each element (integer IDs) back to characters using the chars_from_ids layer. It then prints the resulting characters as a string using the .numpy().decode('utf-8') method. This loop effectively displays the first 10 sequences of characters from the original text in their readable form, providing a preview of the processed data.






In [None]:
for ids in ids_dataset.take(10):
    print(chars_from_ids(ids).numpy().decode('utf-8'))







T
H
E
 
S
O


The code defines a variable seq_length and sets its value to 400. This variable represents the sequence length that will be used for creating sequences from the dataset. The sequences will be used to train a language model, where each sequence will have a length of 400 characters.



In [None]:
seq_length = 400

The code calculates the number of examples (sequences) that can be generated per epoch during training. It divides the total length of the text (obtained from "shakespeare.txt") by seq_length + 1, where seq_length is the desired sequence length. The +1 is used because each sequence is shifted one character forward from the previous one to create overlapping sequences for better learning.



In [None]:
examples_per_epoch = len(text)//(seq_length+1)

The code creates sequences from the `ids_dataset` using the `batch` method. It batches the data into sequences, where each sequence has a length of `seq_length + 1`. The `drop_remainder=True` argument is used to drop the last batch if it is smaller than the specified `seq_length + 1`.

The purpose of batching the data is to create input-output pairs for the language model training. Each sequence will have `seq_length` characters, and the last character of each sequence will be the target output for the language model to predict. This way, the model learns to predict the next character given a sequence of characters, effectively learning the language patterns.

In [None]:
sequences = ids_dataset.batch(seq_length+1, drop_remainder=True)

Sequences contains batches of sequences with each sequence having seq_length + 1 characters.

The code prints the first batch of sequences (input characters) in their readable form using chars_from_ids.

These steps show how the data is prepared and provide a preview of the first batch of sequences for the language model training.

In [None]:
for seq in sequences.take(1):
    print(chars_from_ids(seq))


tf.Tensor(
[b'\r' b'\n' b'\r' b'\n' b'T' b'H' b'E' b' ' b'S' b'O' b'N' b'N' b'E' b'T'
 b'S' b'\r' b'\n' b'\r' b'\n' b' ' b' ' b' ' b' ' b' ' b' ' b' ' b' ' b' '
 b' ' b' ' b' ' b' ' b' ' b' ' b' ' b' ' b' ' b' ' b' ' b'1' b'\r' b'\n'
 b'\r' b'\n' b'F' b'r' b'o' b'm' b' ' b'f' b'a' b'i' b'r' b'e' b's' b't'
 b' ' b'c' b'r' b'e' b'a' b't' b'u' b'r' b'e' b's' b' ' b'w' b'e' b' '
 b'd' b'e' b's' b'i' b'r' b'e' b' ' b'i' b'n' b'c' b'r' b'e' b'a' b's'
 b'e' b',' b'\r' b'\n' b'T' b'h' b'a' b't' b' ' b't' b'h' b'e' b'r' b'e'
 b'b' b'y' b' ' b'b' b'e' b'a' b'u' b't' b'y' b'\xe2\x80\x99' b's' b' '
 b'r' b'o' b's' b'e' b' ' b'm' b'i' b'g' b'h' b't' b' ' b'n' b'e' b'v'
 b'e' b'r' b' ' b'd' b'i' b'e' b',' b'\r' b'\n' b'B' b'u' b't' b' ' b'a'
 b's' b' ' b't' b'h' b'e' b' ' b'r' b'i' b'p' b'e' b'r' b' ' b's' b'h'
 b'o' b'u' b'l' b'd' b' ' b'b' b'y' b' ' b't' b'i' b'm' b'e' b' ' b'd'
 b'e' b'c' b'e' b'a' b's' b'e' b',' b'\r' b'\n' b'H' b'i' b's' b' ' b't'
 b'e' b'n' b'd' b'e' b'r' b' ' b'h' b'e' b'i' b


The code takes the first five batches of sequences from the sequences dataset, where each sequence represents a chunk of the text with a length of seq_length + 1 characters. It then uses the text_from_ids function, defined earlier, to convert each batch of integer IDs back into a human-readable string.

By printing the resulting strings, the code displays the original text corresponding to the first five batches of sequences. This allows for a visual inspection of how the text is divided into sequences, and it provides a preview of the input data used for training the language model.

In [None]:
for seq in sequences.take(5):
    print(text_from_ids(seq))


tf.Tensor(b'\r\n\r\nTHE SONNETS\r\n\r\n                    1\r\n\r\nFrom fairest creatures we desire increase,\r\nThat thereby beauty\xe2\x80\x99s rose might never die,\r\nBut as the riper should by time decease,\r\nHis tender heir might bear his memory:\r\nBut thou contracted to thine own bright eyes,\r\nFeed\xe2\x80\x99st thy light\xe2\x80\x99s flame with self-substantial fuel,\r\nMaking a famine where abundance lies,\r\nThyself thy foe, to thy sweet self too cruel:', shape=(), dtype=string)
tf.Tensor(b'\r\nThou that art now the world\xe2\x80\x99s fresh ornament,\r\nAnd only herald to the gaudy spring,\r\nWithin thine own bud buriest thy content,\r\nAnd, tender churl, mak\xe2\x80\x99st waste in niggarding:\r\n  Pity the world, or else this glutton be,\r\n  To eat the world\xe2\x80\x99s due, by the grave and thee.\r\n\r\n\r\n                    2\r\n\r\nWhen forty winters shall besiege thy brow,\r\nAnd dig deep trenches in thy beauty\xe2\x80\x99s field,\r\nThy youth\xe2\x80\x99', shap

The provided function split_input_target(sequence) takes a single sequence as input, and it splits it into two parts: input_text (all but the last character) and target_text (all but the first character). This function is used to create input-target pairs for training the language model.






In [None]:
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

The code demonstrates the split_input_target() function by providing an example text, "I eat children every day." The function converts the example text into a list of characters and then calls split_input_target() on this list. The output shows two sequences: the input_text sequence without the last character and the target_text sequence without the first character. This represents the input-target pair for training the language model.






In [None]:
example_text = "I eat children every day"
print(split_input_target(list(example_text)))

(['I', ' ', 'e', 'a', 't', ' ', 'c', 'h', 'i', 'l', 'd', 'r', 'e', 'n', ' ', 'e', 'v', 'e', 'r', 'y', ' ', 'd', 'a'], [' ', 'e', 'a', 't', ' ', 'c', 'h', 'i', 'l', 'd', 'r', 'e', 'n', ' ', 'e', 'v', 'e', 'r', 'y', ' ', 'd', 'a', 'y'])


The code creates a new dataset named dataset by applying the split_input_target() function to each element in the sequences dataset. This function is mapped to every batch of sequences, effectively splitting each sequence into input and target pairs. The resulting dataset now contains pairs of input and target sequences, suitable for training the language model using TensorFlow's dataset functionalities.






In [None]:
dataset = sequences.map(split_input_target)

The code sets the batch size for training the language model to 256 using the variable BATCH_SIZE. This means that during training, the model will process 256 input-target pairs (sequences) in each iteration, which can lead to faster training and better utilization of computational resources.






In [None]:
BATCH_SIZE = 256

The code sets the buffer size to 100,000 using the variable BUFFER_SIZE. This buffer size is used for shuffling the dataset during training. A larger buffer size can help in better shuffling of the data, leading to more randomness in the order of sequences presented to the model during each epoch. This can be beneficial for the model's training process as it reduces the risk of the model memorizing the order of sequences and promotes better generalization.






In [None]:
BUFFER_SIZE = 100000

The code prepares the dataset for training by shuffling, batching, and prefetching the data. It shuffles the data with a buffer size of BUFFER_SIZE, batches it with BATCH_SIZE, and prefetches the data to improve data pipeline efficiency during training. The resulting dataset is now ready for training the language model.






In [None]:
dataset = (
    dataset
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))
print(dataset)

<_PrefetchDataset element_spec=(TensorSpec(shape=(256, 400), dtype=tf.int64, name=None), TensorSpec(shape=(256, 400), dtype=tf.int64, name=None))>


The code calculates the vocabulary size by assigning the variable vocab_size the value equal to the length of the vocabulary list vocab. This represents the total number of unique characters in the text, which is essential for building the language model.






In [None]:
vocab_size = len(vocab)

The code sets the embedding dimension to 1024 using the variable embedding_dim. The embedding dimension is a hyperparameter that determines the size of the dense vector representation (embedding) for each character in the vocabulary. A larger embedding dimension allows the model to capture more complex relationships between characters but increases the model's complexity and memory requirements.






In [None]:
embedding_dim = 1024

The code sets the number of RNN units (or hidden units) to 2048 using the variable rnn_units. RNN units are the building blocks of recurrent neural networks, and their number determines the complexity and capacity of the RNN layer in the language model. Having more units allows the model to learn more complex patterns in the data but also increases the computational cost and memory requirements.






In [None]:
rnn_units = 2048

The code defines a custom Keras model named `MyModel` for the language model. This model is designed to process text data and generate sequences. It has three main layers:

1. `self.embedding`: This is an embedding layer that converts integer IDs into dense vector representations. It takes the `vocab_size` (size of the vocabulary), `embedding_dim` (embedding dimension), and converts the integer IDs into dense embeddings.

2. `self.gru`: This is a Gated Recurrent Unit (GRU) layer. It takes `rnn_units` (number of RNN units or hidden units) and returns sequences and states. The layer processes the embedded input sequences and maintains hidden states for generating new sequences.

3. `self.dense`: This is a dense layer that predicts the next character in the sequence. It takes the `vocab_size` and maps the hidden states from the GRU layer to the output space, allowing the model to predict the probability distribution over all characters in the vocabulary.

The `call()` method defines the forward pass of the model. It takes `inputs` (sequences of integer IDs) and optionally `states` (hidden states from the previous time step), `return_state` (a boolean flag indicating whether to return states), and `training` (a boolean flag indicating whether the model is in training mode). Inside the method, the input sequences are passed through the embedding layer, followed by the GRU layer, and then through the dense layer. The method can return either the output sequence or both the output sequence and the hidden states, depending on the value of `return_state`.

This custom model is designed to be used for language modeling tasks, where the model learns to predict the next character in a sequence based on the input context.

In [None]:
class MyModel(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, rnn_units):
        super().__init__(self)
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        self.gru = tf.keras.layers.GRU(rnn_units,
                                       return_sequences=True,
                                       return_state=True)
        self.dense = tf.keras.layers.Dense(vocab_size)

    def call(self, inputs, states=None, return_state=False, training=False):
        x = inputs
        x = self.embedding(x, training=training)
        if states is None:
            states = self.gru.get_initial_state(x)
        x, states = self.gru(x, initial_state=states, training=training)
        x = self.dense(x, training=training)

        if return_state:
            return x, states
        else:
            return x


The code instantiates an instance of the custom `MyModel` class and assigns it to the variable `model`. It creates the model with the following arguments:

1. `vocab_size`: The number of unique characters in the vocabulary, obtained from the length of the vocabulary created by the `ids_from_chars` layer.

2. `embedding_dim`: The dimension of the dense vector representation for each character, set to 1028 as defined earlier.

3. `rnn_units`: The number of RNN units (hidden units) in the GRU layer, set to 2048 as defined earlier.

This `model` instance is now ready for training, and it represents a language model that can learn to predict the next character in a sequence based on the given input context.

In [None]:
model = MyModel(
        vocab_size=len(ids_from_chars.get_vocabulary()),
        embedding_dim=embedding_dim,
        rnn_units=rnn_units)

In this code snippet, the model is being used to make predictions for an example batch of input sequences. The code does the following:

1. `dataset.take(1)`: This retrieves the first batch of input-target pairs from the `dataset` prepared earlier. Each input-target pair consists of a batch of input sequences (`input_example_batch`) and their corresponding target sequences (`target_example_batch`).

2. `model(input_example_batch)`: The model is called with the `input_example_batch` as input. It processes the batch of input sequences through the layers defined in the `MyModel` class and returns the predictions for the next character in the sequence. The result is stored in the variable `example_batch_predictions`.

3. `print(example_batch_predictions.shape, " (batch_size, sequence_length, vocab_size)")`: The code prints the shape of `example_batch_predictions`, which corresponds to the dimensions of the output tensor. It shows the dimensions of the predictions made by the model for the example batch. The shape is in the format `(batch_size, sequence_length, vocab_size)`, where `batch_size` is the number of sequences in the batch, `sequence_length` is the length of each sequence, and `vocab_size` is the number of unique characters in the vocabulary. This output shape reflects the model's predictions for each character in the input sequences.

In [None]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, " (batch_size, sequence_length, vocab_size)")

The model.summary() function provides a concise summary of the architecture and parameters of the custom MyModel instance. It displays a table with the layer names, output shape, number of parameters, and the total number of trainable parameters in the model. The summary gives an overview of the model's structure and helps to understand the model's complexity. The detailed summary is not available in this context due to the model architecture not being provided in the current conversation.






In [None]:
model.summary()

In this code snippet, the model's `example_batch_predictions` for the first batch of input sequences are sampled to generate the next character predictions. Here's what each step does:

1. `tf.random.categorical`: This function samples random indices from the model's predictions (`example_batch_predictions`) using the `num_samples=1` argument, which indicates that one index will be sampled for each element in the batch. The result is stored in the `sampled_indices` tensor.

2. `tf.squeeze`: This function removes the redundant dimensions from the `sampled_indices` tensor by squeezing the tensor along the specified axis (`axis=-1`). The result is converted to a NumPy array using `.numpy()` and stored back in `sampled_indices`.

3. `print("Input:\n", text_from_ids(input_example_batch[0]).numpy())`: This line prints the first element of the `input_example_batch` tensor in its original text form. It converts the tensor of integer IDs to a readable string using the `text_from_ids` function and then prints the input sequence.

4. `print("\nNext Char Predictions:\n", text_from_ids(sampled_indices).numpy())`: This line prints the sampled character predictions. It converts the `sampled_indices` tensor (which contains the predicted integer IDs) to a readable string using the `text_from_ids` function and then prints the predicted next characters for the first element of the input batch.

The output displays the input sequence and the corresponding model-generated predictions for the next characters in the sequence.

In [None]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()

print("Input:\n", text_from_ids(input_example_batch[0]).numpy())
print("\nNext Char Predictions:\n", text_from_ids(sampled_indices).numpy())

In this code snippet, the model's loss function is set to `SparseCategoricalCrossentropy` with `from_logits=True`. The loss function is used to measure the difference between the predicted character probabilities and the target character labels. Here's what each step does:

1. `loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)`: This line sets the loss function to `SparseCategoricalCrossentropy` with `from_logits=True`. When `from_logits` is set to True, it indicates that the model's output predictions are not normalized probabilities, but rather raw logits. The loss function will automatically apply the softmax activation to the logits during computation.

2. `example_batch_mean_loss = loss(target_example_batch, example_batch_predictions)`: This line calculates the mean loss for the example batch of input sequences. It compares the model's predictions (`example_batch_predictions`) to the target character labels (`target_example_batch`) using the `SparseCategoricalCrossentropy` loss function. The result, `example_batch_mean_loss`, represents the average loss across the batch.

3. `print(example_batch_predictions.shape)` and `print("^ # (batch_size, sequence_length, vocab_size)")`: These lines print the shape of the `example_batch_predictions` tensor, which represents the model's predictions for each character in the input sequences. The shape is in the format `(batch_size, sequence_length, vocab_size)`.

4. `print(example_batch_mean_loss)`: This line prints the calculated mean loss for the example batch.

5. `print("Exponential of average loss: ", tf.exp(example_batch_mean_loss).numpy())`: This line prints the exponential of the average loss. Taking the exponential is useful when comparing losses on different scales, as it provides a more interpretable measure of the loss magnitude.

Overall, this code snippet demonstrates how to calculate and examine the loss during the training process for the language model.

In [None]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)
example_batch_mean_loss = loss(target_example_batch, example_batch_predictions)

print(example_batch_predictions.shape)
print("^ # (batch_size, sequence_length, vocab_size)")

print(example_batch_mean_loss)

print("Exponential of average loss: ", tf.exp(example_batch_mean_loss).numpy())

In this code, the model is compiled using the `compile` method before it can be trained. The compilation involves specifying the optimizer and the loss function to be used during training.

1. `optimizer='adam'`: The model is configured to use the Adam optimizer, which is a popular optimization algorithm commonly used for training deep learning models. Adam adapts the learning rate based on the training progress, making it more efficient for optimizing neural networks.

2. `loss=loss`: The loss function `loss` is passed to the `compile` method. As specified earlier, the loss function is the `SparseCategoricalCrossentropy` with `from_logits=True`, which is suitable for training a language model to predict the next character from the input sequence.

With the compilation step completed, the model is now ready for training using the specified optimizer and loss function.

In [None]:
model.compile(optimizer='adam', loss=loss)

1. `checkpoint_dir = './training_checkpoints'`: This line sets the directory path where the model's weights will be saved.

2. `checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")`: This line creates the prefix for the saved weight files with the current epoch number.

3. `checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix, save_weights_only=True)`: This line creates the `ModelCheckpoint` callback to save the model's weights after each epoch during training.

In [None]:
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint( filepath=checkpoint_prefix, save_weights_only=True)

The code sets the number of epochs to 30 using the variable EPOCHS. An epoch represents a complete iteration over the entire training dataset during the training process. In this case, the model will be trained for 30 epochs, where each epoch involves passing the training data through the model, computing the loss, and updating the model's parameters (weights) based on the optimization algorithm (in this case, Adam). Training for multiple epochs allows the model to learn from the data multiple times, potentially improving its performance and convergence.






In [None]:
EPOCHS = 8000

In this code, the model is trained using the `fit` method with the following arguments:

1. `dataset`: The training data in the form of a dataset, which was previously prepared by batching, shuffling, and prefetching.

2. `epochs=EPOCHS`: The number of epochs to train the model, set to 30 as defined earlier.

3. `callbacks=[checkpoint_callback]`: The list of callbacks to be used during training. In this case, it includes the `ModelCheckpoint` callback that was defined earlier. This callback saves the model's weights after each epoch.

The training process will run for 30 epochs, during which the model will learn to predict the next character in the text based on the input sequences. The `history` object will store information about the training progress, such as loss values and other metrics, and can be used for further analysis and visualization.

In [None]:
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

The code defines a class named `OneStep`, which is a custom Keras model used for text generation. The class takes the following arguments during initialization:

1. `model`: The main language model, an instance of the previously defined `MyModel`.

2. `chars_from_ids`: The function to map integer IDs back to characters.

3. `ids_from_chars`: The function to map characters to integer IDs.

4. `temperature=1.0`: An optional parameter that controls the diversity of text generated. Higher values like 1.0 introduce more randomness, while lower values like 0.1 produce more focused and deterministic outputs.

Inside the class, there is a function called `generate_one_step(self, inputs, states=None)`, which performs one step of text generation. Given an initial `inputs` string and optional `states` (hidden states from the model's previous time step), this function predicts the next character using the language model and returns the predicted character and updated states.

The text generation process involves the following steps:
1. Convert the `inputs` string to integer IDs using `ids_from_chars`.
2. Pass the integer IDs through the model to obtain predicted logits for the next character.
3. Adjust the logits using the `temperature` parameter to control randomness.
4. Sample the next character using `tf.random.categorical`.
5. Convert the sampled integer ID back to a character using `chars_from_ids`.

Finally, an instance of the `OneStep` class is created and named `one_step_model`, using the previously defined `model`, `chars_from_ids`, and `ids_from_chars` functions. This instance can be used to generate text one step at a time, allowing for flexible and controlled text generation using the language model.

In [None]:
class OneStep(tf.keras.Model):
  def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
    super().__init__()
    self.temperature = temperature
    self.model = model
    self.chars_from_ids = chars_from_ids
    self.ids_from_chars = ids_from_chars

    skip_ids = self.ids_from_chars(['[UNK]'])[:, None]
    sparse_mask = tf.SparseTensor(
      values=[-float('inf')]*len(skip_ids),
      indices=skip_ids,
      dense_shape=[len(ids_from_chars.get_vocabulary())])

    self.prediction_mask = tf.sparse.to_dense(sparse_mask)

  @tf.function
  def generate_one_step(self, inputs, states=None):
    input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
    input_ids = self.ids_from_chars(input_chars).to_tensor()

    predicted_logits, states = self.model(inputs=input_ids, states=states, return_state=True)

    predicted_logits = predicted_logits[:, -1, :]

    predicted_logits = predicted_logits/self.temperature

    predicted_logits = predicted_logits

    predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
    predicted_ids = tf.squeeze(predicted_ids, axis=-1)

    predicted_chars = self.chars_from_ids(predicted_ids)

    return predicted_chars, states

one_step_model = OneStep(model, chars_from_ids, ids_from_chars)

In this code, the variable states is initialized with the value None. The variable states is used to hold the hidden states of the recurrent neural network (RNN) during text generation. Setting states to None means that there are no initial hidden states provided, and the text generation will start from scratch without any previous context. This is commonly done when generating the initial seed for text generation, and the RNN will generate characters one step at a time without any initial context.






In [None]:
states = None

In this code, a starting seed for text generation is provided by initializing the variable `next_char` with the value `tf.constant(['\n\n'])`. The seed is a tensor containing two newline characters, which serve as the initial input for text generation.

Additionally, an empty list `result` is initialized to store the generated text. The `result` list starts with the seed `next_char`, and during the text generation process, characters will be iteratively appended to this list to form the complete generated text. The newline characters serve as a starting point for the model to generate text, and the model will continue generating characters one step at a time until the desired length of generated text is reached.

In [None]:
next_char = tf.constant(['\n\n'])
result = [next_char]

In this code, the text generation process takes place using the `one_step_model` instance. The loop iterates `n` times (in this case, 1000 times) to generate text step by step. During each iteration:

1. `next_char, states =  one_step_model.generate_one_step(next_char, states=states)`: The `generate_one_step` function of the `one_step_model` is called with the current `next_char` (previously generated character) and the current `states` (hidden states from the previous time step) as inputs. This generates the next character and updates the `states` for the next iteration.

2. `result.append(next_char)`: The newly generated `next_char` is appended to the `result` list, which stores the sequence of characters generated so far.

The loop continues to generate text, step by step, until the desired length of 1000 characters is reached. The process generates text based on the model's predictions for each individual character, utilizing the `generate_one_step` function to iteratively generate the next character given the context of the previous characters.

In [None]:
for n in range(1000):
  next_char, states =  one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

In this code, the list result, which contains the generated text characters, is converted into a single tensor using the tf.strings.join function. This function concatenates the individual strings in the result list into a single string tensor. The resulting tensor will represent the complete generated text, which is a sequence of characters generated by the language model based on the initial seed and the predictions made one step at a time. The generated text is now stored in the result tensor and can be further processed or printed as desired.






In [None]:
result = tf.strings.join(result)

The code prints the first character of the generated text as a readable string using numpy().decode('utf-8'). It then adds a horizontal line of underscores to separate the generated text from other output.



In [None]:
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)

In this code, the `one_step_model` is saved as a TensorFlow SavedModel format. The function `tf.saved_model.save` is used for this purpose. The saved model will be stored in a directory named 'one_step'.

SavedModel is a serialization format for TensorFlow models that allows easy deployment, serving, and sharing of models. It includes the model's architecture, weights, and other necessary information to be used independently from the original Python code that created the model.

By saving the `one_step_model`, it can be reloaded and used for text generation without needing to retrain the model or re-run the text generation process. It's a convenient way to preserve the model's state and share it with others or deploy it in production environments.

In [None]:
tf.saved_model.save(one_step_model, 'one_step')

In this code, the previously saved TensorFlow SavedModel named 'one_step' is reloaded and stored in the variable `one_step_reloaded` using the `tf.saved_model.load` function.

By loading the saved model, it can be reused to generate text without the need to re-define the model architecture or retrain it. The `one_step_reloaded` instance now holds the reloaded model, including its architecture, weights, and other necessary information for text generation. This allows for seamless text generation using the reloaded model, similar to the original `one_step_model`.

In [None]:
one_step_reloaded = tf.saved_model.load('one_step')

In this code, the reloaded model `one_step_reloaded` is used to generate text. The text generation process starts with an initial `next_char` seed of a newline character, and the variable `states` is set to `None`, indicating that there are no initial hidden states.

The loop iterates 200 times, generating characters step by step using the `one_step_reloaded.generate_one_step` function. During each iteration:

1. `next_char, states = one_step_reloaded.generate_one_step(next_char, states=states)`: The `generate_one_step` function of the `one_step_reloaded` model is called with the current `next_char` (previously generated character) and the current `states` (hidden states from the previous time step) as inputs. This generates the next character and updates the `states` for the next iteration.

2. `result.append(next_char)`: The newly generated `next_char` is appended to the `result` list, which stores the sequence of characters generated so far.

After the loop, the entire generated text is obtained by joining the characters in the `result` list using `tf.strings.join`, and the resulting tensor is converted to a readable string using `numpy().decode("utf-8")`.

The final output will be a generated text of approximately 200 characters starting from the newline character ('\n') provided as the seed. The generated text will depend on the characteristics of the original model and the initial seed, and it may vary each time the code is executed due to the stochastic nature of text generation.

In [None]:
states = None
next_char = tf.constant(['\n'])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_reloaded.generate_one_step(next_char, states=states)
  result.append(next_char)

print(tf.strings.join(result)[0].numpy().decode("utf-8"))