This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.

**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**

This notebook was generated for TensorFlow 2.6.

# Generative deep learning

## Text generation

### A brief history of generative deep learning for sequence generation

### How do you generate sequence data?

### The importance of the sampling strategy

**Reweighting a probability distribution to a different temperature**

In [None]:
import numpy as np # importing numpy library for numerical operations
def reweight_distribution(original_distribution, temperature=0.5): # defining a function reweight_distribution with two arguments original_distribution and temperature with default value 0.5 
    distribution = np.log(original_distribution) / temperature # calculating the log of the original distribution and dividing it by the temperature 
    distribution = np.exp(distribution) # calculating the exponential of the distribution
    return distribution / np.sum(distribution) # returning the distribution divided by the sum of the distribution

### Implementing text generation with Keras

#### Preparing the data

**Downloading and uncompressing the IMDB movie reviews dataset**

In [None]:
!wget https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz # downloading the dataset from the given URL
!tar -xf aclImdb_v1.tar.gz # extracting the dataset

**Creating a dataset from text files (one file = one sample)**

In [None]:
import tensorflow as tf # importing tensorflow library
from tensorflow import keras # importing keras from tensorflow library
dataset = keras.utils.text_dataset_from_directory( # creating a dataset from the given directory
    directory="aclImdb", label_mode=None, batch_size=256) # setting the directory, label_mode and batch_size
dataset = dataset.map(lambda x: tf.strings.regex_replace(x, "<br />", " ")) # mapping the dataset to replace "<br />" with " "

**Preparing a `TextVectorization` layer**

In [None]:
from tensorflow.keras.layers import TextVectorization # importing TextVectorization from keras.layers
 
sequence_length = 100 # setting the sequence_length to 100
vocab_size = 15000 # setting the vocab_size to 15000
text_vectorization = TextVectorization( # creating a TextVectorization object
    max_tokens=vocab_size, # setting the max_tokens to vocab_size
    output_mode="int", # setting the output_mode to "int"
    output_sequence_length=sequence_length, # setting the output_sequence_length to sequence_length
)
text_vectorization.adapt(dataset) # adapting the text_vectorization to the dataset

**Setting up a language modeling dataset**

In [None]:
def prepare_lm_dataset(text_batch): # defining a function prepare_lm_dataset with text_batch as argument
    vectorized_sequences = text_vectorization(text_batch) # vectorizing the sequences
    x = vectorized_sequences[:, :-1] # setting x to vectorized_sequences excluding the last element
    y = vectorized_sequences[:, 1:] # setting y to vectorized_sequences excluding the first element
    return x, y # returning x and y

lm_dataset = dataset.map(prepare_lm_dataset, num_parallel_calls=4) # mapping the dataset to prepare_lm_dataset with 4 parallel calls

#### A Transformer-based sequence-to-sequence model

In [None]:
import tensorflow as tf # importing tensorflow library
from tensorflow.keras import layers # importing layers from keras

class PositionalEmbedding(layers.Layer): # creating a class PositionalEmbedding
    def __init__(self, sequence_length, input_dim, output_dim, **kwargs): # defining the constructor with sequence_length, input_dim, output_dim and kwargs as arguments
        super().__init__(**kwargs) # calling the constructor of the parent class
        self.token_embeddings = layers.Embedding( # creating a token_embeddings layer
            input_dim=input_dim, output_dim=output_dim) # setting the input_dim and output_dim
        self.position_embeddings = layers.Embedding( # creating a position_embeddings layer
            input_dim=sequence_length, output_dim=output_dim) # setting the input_dim and output_dim
        self.sequence_length = sequence_length # setting the sequence_length
        self.input_dim = input_dim # setting the input_dim
        self.output_dim = output_dim # setting the output_dim

    def call(self, inputs): # defining the call method with inputs as argument
        length = tf.shape(inputs)[-1] # setting length to the last element of the shape of inputs
        positions = tf.range(start=0, limit=length, delta=1) # creating a range of positions
        embedded_tokens = self.token_embeddings(inputs) # embedding the tokens
        embedded_positions = self.position_embeddings(positions) # embedding the positions
        return embedded_tokens + embedded_positions # returning the sum of embedded_tokens and embedded_positions

    def compute_mask(self, inputs, mask=None): # defining the compute_mask method with inputs and mask as arguments
        return tf.math.not_equal(inputs, 0) # returning the not equal of inputs and 0

    def get_config(self): # defining the get_config method
        config = super(PositionalEmbedding, self).get_config() # getting the config of the parent class
        config.update({ # updating the config
            "output_dim": self.output_dim, # setting the output_dim
            "sequence_length": self.sequence_length, # setting the sequence_length
            "input_dim": self.input_dim, # setting the input_dim
        })
        return config # returning the config

class TransformerDecoder(layers.Layer): # creating a class TransformerDecoder
    def __init__(self, embed_dim, dense_dim, num_heads, **kwargs): # defining the constructor with embed_dim, dense_dim, num_heads and kwargs as arguments
        super().__init__(**kwargs) # calling the constructor of the parent class
        self.embed_dim = embed_dim # setting the embed_dim
        self.dense_dim = dense_dim # setting the dense_dim
        self.num_heads = num_heads # setting the num_heads
        self.attention_1 = layers.MultiHeadAttention( # creating an attention_1 layer
          num_heads=num_heads, key_dim=embed_dim) # setting the num_heads and key_dim
        self.attention_2 = layers.MultiHeadAttention( # creating an attention_2 layer
          num_heads=num_heads, key_dim=embed_dim) # setting the num_heads and key_dim
        self.dense_proj = keras.Sequential( # creating a dense_proj layer
            [layers.Dense(dense_dim, activation="relu"), # setting the activation function to "relu"
             layers.Dense(embed_dim),] # creating a dense layer with embed_dim
        ) 
        self.layernorm_1 = layers.LayerNormalization() # creating a layernorm_1 layer
        self.layernorm_2 = layers.LayerNormalization() # creating a layernorm_2 layer
        self.layernorm_3 = layers.LayerNormalization() # creating a layernorm_3 layer
        self.supports_masking = True # setting supports_masking to True

    def get_config(self): # defining the get_config method
        config = super(TransformerDecoder, self).get_config() # getting the config of the parent class
        config.update({ # updating the config
            "embed_dim": self.embed_dim, # setting the embed_dim
            "num_heads": self.num_heads, # setting the num_heads
            "dense_dim": self.dense_dim, # setting the dense_dim
        })
        return config # returning the config

    def get_causal_attention_mask(self, inputs): # defining the get_causal_attention_mask method with inputs as argument
        input_shape = tf.shape(inputs) # setting input_shape to the shape of inputs
        batch_size, sequence_length = input_shape[0], input_shape[1] # setting batch_size and sequence_length to the first and second elements of input_shape
        i = tf.range(sequence_length)[:, tf.newaxis] # creating a range of sequence_length
        j = tf.range(sequence_length) # creating a range of sequence_length
        mask = tf.cast(i >= j, dtype="int32") # casting i greater than or equal to j to int32 dtype 
        mask = tf.reshape(mask, (1, input_shape[1], input_shape[1])) # reshaping the mask to (1, input_shape[1], input_shape[1])
        mult = tf.concat( # concatenating the tensors along the first axis 
            [tf.expand_dims(batch_size, -1), # expanding the batch_size along the last axis
             tf.constant([1, 1], dtype=tf.int32)], axis=0) # creating a constant tensor with shape [1, 1] and dtype int32
        return tf.tile(mask, mult) # returning the tiled mask

    def call(self, inputs, encoder_outputs, mask=None): # defining the call method with inputs, encoder_outputs and mask as arguments
        causal_mask = self.get_causal_attention_mask(inputs) # getting the causal_attention_mask
        if mask is not None: # checking if mask is not None
            padding_mask = tf.cast( # casting the mask to int32 dtype
                mask[:, tf.newaxis, :], dtype="int32") # setting the dtype to int32
            padding_mask = tf.minimum(padding_mask, causal_mask) # setting padding_mask to the minimum of padding_mask and causal_mask
        else: # if mask is None
            padding_mask = mask # setting padding_mask to mask
        attention_output_1 = self.attention_1( # setting attention_output_1 to the output of attention_1
            query=inputs, # setting the query to inputs
            value=inputs, # setting the value to inputs
            key=inputs, # setting the key to inputs
            attention_mask=causal_mask) # setting the attention_mask to causal_mask
        attention_output_1 = self.layernorm_1(inputs + attention_output_1) # normalizing the sum of inputs and attention_output_1
        attention_output_2 = self.attention_2( # setting attention_output_2 to the output of attention_2
            query=attention_output_1, # setting the query to attention_output_1
            value=encoder_outputs, # setting the value to encoder_outputs
            key=encoder_outputs, # setting the key to encoder_outputs
            attention_mask=padding_mask, # setting the attention_mask to padding_mask
        )
        attention_output_2 = self.layernorm_2( # normalizing the sum of attention_output_1 and attention_output_2
            attention_output_1 + attention_output_2) # setting the attention_output_2 to the sum of attention_output_1 and attention_output_2
        proj_output = self.dense_proj(attention_output_2) # setting proj_output to the output of dense_proj
        return self.layernorm_3(attention_output_2 + proj_output) # returning the sum of attention_output_2 and proj_output

**A simple Transformer-based language model**

In [None]:
from tensorflow.keras import layers # importing layers from keras
embed_dim = 256 # setting the embed_dim to 256
latent_dim = 2048 # setting the latent_dim to 2048
num_heads = 2 # setting the num_heads to 2

inputs = keras.Input(shape=(None,), dtype="int64") # creating an input layer
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(inputs) # creating a PositionalEmbedding layer with sequence_length, vocab_size and embed_dim as arguments (these are the positional embeddings)
x = TransformerDecoder(embed_dim, latent_dim, num_heads)(x, x) # creating a TransformerDecoder layer with embed_dim, latent_dim and num_heads as arguments (these are the decoder layers)
outputs = layers.Dense(vocab_size, activation="softmax")(x) # creating a dense layer with vocab_size and softmax activation function as arguments (these are the outputs)
model = keras.Model(inputs, outputs) # creating a model with inputs and outputs
model.compile(loss="sparse_categorical_crossentropy", optimizer="rmsprop") # compiling the model with loss function and optimizer 

### A text-generation callback with variable-temperature sampling

**The text-generation callback**

In [None]:
import numpy as np # importing numpy library

tokens_index = dict(enumerate(text_vectorization.get_vocabulary())) # creating a dictionary with the index of the tokens as keys and the tokens as values 

def sample_next(predictions, temperature=1.0): # defining a function sample_next with predictions and temperature as arguments with default value 1.0
    predictions = np.asarray(predictions).astype("float64") # converting the predictions to an array of float64 dtype
    predictions = np.log(predictions) / temperature # calculating the log of the predictions and dividing it by the temperature
    exp_preds = np.exp(predictions) # calculating the exponential of the predictions
    predictions = exp_preds / np.sum(exp_preds) # normalizing the predictions
    probas = np.random.multinomial(1, predictions, 1) # generating a random sample from the multinomial distribution
    return np.argmax(probas) # returning the index of the maximum value of probas
 
class TextGenerator(keras.callbacks.Callback): # creating a class TextGenerator
    # defining the constructor with prompt, generate_length, model_input_length, temperatures and print_freq as arguments
    def __init__(self,
                 prompt,
                 generate_length,
                 model_input_length,
                 temperatures=(1.,),
                 print_freq=1):
        self.prompt = prompt # setting the prompt
        self.generate_length = generate_length # setting the generate_length
        self.model_input_length = model_input_length # setting the model_input_length
        self.temperatures = temperatures # setting the temperatures
        self.print_freq = print_freq # setting the print_freq
        vectorized_prompt = text_vectorization([prompt])[0].numpy() # vectorizing the prompt
        self.prompt_length = np.nonzero(vectorized_prompt == 0)[0][0] # setting the prompt_length to the index of the first zero in vectorized_prompt

    def on_epoch_end(self, epoch, logs=None): # defining the on_epoch_end method with epoch and logs as arguments
        if (epoch + 1) % self.print_freq != 0: # checking if the epoch is not divisible by print_freq 
            return # returning 
        for temperature in self.temperatures: # iterating through the temperatures 
            print("== Generating with temperature", temperature) # printing the temperature 
            sentence = self.prompt # setting the sentence to the prompt
            for i in range(self.generate_length): # iterating through the generate_length
                tokenized_sentence = text_vectorization([sentence]) # vectorizing the sentence
                predictions = self.model(tokenized_sentence) # getting the predictions from the model
                next_token = sample_next( # sampling the next token
                    predictions[0, self.prompt_length - 1 + i, :] # setting the predictions
                )
                sampled_token = tokens_index[next_token] # getting the sampled token
                sentence += " " + sampled_token # adding the sampled token to the sentence
            print(sentence) # printing the sentence

prompt = "This movie" # setting the prompt
text_gen_callback = TextGenerator( # creating a TextGenerator object
    prompt, # setting the prompt 
    generate_length=50, # setting the generate_length to 50
    model_input_length=sequence_length, # setting the model_input_length to sequence_length
    temperatures=(0.2, 0.5, 0.7, 1., 1.5)) # setting the temperatures

**Fitting the language model**

In [None]:
model.fit(lm_dataset, epochs=200, callbacks=[text_gen_callback]) # fitting the model with lm_dataset and text_gen_callback for 200 epochs 

### Wrapping up