<div style='background-color:purple'>

## Table of Contents:

- [Introduction](#introduction)
- [Setup](#setup)
- [Implement Transformer](#implement-transformer)
- [Implement Embedding Layer](#implement-embedding-layer)
- [Implement Mini GPT](#implement-mini-gpt)
- [Prepare Data](#prepare-data)
- [Implement Keras Callback for Generating Text](#implement-keras-callback-for-generating-text)
- [Train the Model](#train-the-model)
- [Assignment](#assignment)

</div>

<ul>
    <li><a href='https://keras.io/examples/generative/text_generation_with_miniature_gpt/'>Text Generation With A Miniature GPT</a></li>
    <li><a href='https://classroom.github.com/assignment-invitations/db78b5b6e7f6de52f3007f6599db8651/status'>Upload Assignment Here</a></li>
</ul>

<div style='background-color:purple'>

## Introduction

- [Back to Table of Contents](#table-of-contents)

</div>

This example demonstrates how to implement an autoregressive language model using a miniature version of the GPT model. The model consists of a single Transformer block with causal masking in its attention layer. We use the text from the IMDB sentiment classification dataset for training and generate new movie reviews for a given prompt. When using this script with your own dataset, make sure it has at least 1 million words.

This example should be run with <code>tf-nightly>=2.3.0-dev20200531</code> or with TensorFlow 2.3 or higher.

<b>References:</b>
<ul>
    <li><a href='https://www.semanticscholar.org/paper/Improving-Language-Understanding-by-Generative-Radford/cd18800a0fe0b668a1cc19f2ec95b5003d0a5035'>GPT</a></li>
    <li><a href='https://www.semanticscholar.org/paper/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu/9405cc0d6169988371b2755e573cc28650d14dfe'>GPT-2</a></li>
    <li><a href='https://arxiv.org/abs/2005.14165'>GPT-3</a></li>
</ul>

<div style='background-color:purple'>

## Setup

- [Back to Table of Contents](#table-of-contents)

</div>

In [1]:
# We set the backend to TensorFlow. The code works with
# both `tensorflow` and `torch`. It does not work with JAX
# due to the behavior of `jax.numpy.tile` in a jit scope
# (used in `causal_attention_mask()`: `tile` in JAX does
# not support a dynamic `reps` argument.
# You can make the code work in JAX by wrapping the
# inside of the `causal_attention_mask` function in
# a decorator to prevent jit compilation:
# `with jax.ensure_compile_time_eval():`.
import os

os.environ["KERAS_BACKEND"] = "tensorflow"

import keras
from keras import layers
from keras import ops
from keras.layers import TextVectorization
import numpy as np
import os
import string
import random
import tensorflow
import tensorflow.data as tf_data
import tensorflow.strings as tf_strings

2026-02-02 11:27:27.105846: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


<div style='background-color:purple'>

## Implement Transformer

- [Back to Table of Contents](#table-of-contents)

</div>

In [2]:
def causal_attention_mask(batch_size, n_dest, n_src, dtype):
    """
    Mask the upper half of the dot product matrix in self attention.
    This prevents flow of information from future tokens to current token.
    1's in the lower triangle, counting from the lower right corner.
    """
    i = ops.arange(n_dest)[:, None]
    j = ops.arange(n_src)
    m = i >= j - n_src + n_dest
    mask = ops.cast(m, dtype)
    mask = ops.reshape(mask, [1, n_dest, n_src])
    mult = ops.concatenate(
        [ops.expand_dims(batch_size, -1), ops.convert_to_tensor([1, 1])], 0
    )
    return ops.tile(mask, mult)


class TransformerBlock(layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super().__init__()
        self.att = layers.MultiHeadAttention(num_heads, embed_dim)
        self.ffn = keras.Sequential(
            [
                layers.Dense(ff_dim, activation="relu"),
                layers.Dense(embed_dim),
            ]
        )
        self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
        self.dropout1 = layers.Dropout(rate)
        self.dropout2 = layers.Dropout(rate)

    def call(self, inputs):
        input_shape = ops.shape(inputs)
        batch_size = input_shape[0]
        seq_len = input_shape[1]
        causal_mask = causal_attention_mask(batch_size, seq_len, seq_len, "bool")
        attention_output = self.att(inputs, inputs, attention_mask=causal_mask)
        attention_output = self.dropout1(attention_output)
        out1 = self.layernorm1(inputs + attention_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output)
        return self.layernorm2(out1 + ffn_output)

<div style='background-color:purple'>

## Implement Embedding Layer

- [Back to Table of Contents](#table-of-contents)

</div>

In [3]:
class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, maxlen, vocab_size, embed_dim):
        super().__init__()
        self.token_emb = layers.Embedding(input_dim=vocab_size, output_dim=embed_dim)
        self.pos_emb = layers.Embedding(input_dim=maxlen, output_dim=embed_dim)

    def call(self, x):
        maxlen = ops.shape(x)[-1]
        positions = ops.arange(0, maxlen, 1)
        positions = self.pos_emb(positions)
        x = self.token_emb(x)
        return x + positions

<div style='background-color:purple'>

## Implement Mini GPT

- [Back to Table of Contents](#table-of-contents)

</div>

In [4]:
vocab_size = 20000  # Only consider the top 20k words
maxlen = 80  # Max sequence size
embed_dim = 256  # Embedding size for each token
num_heads = 2  # Number of attention heads
feed_forward_dim = 256  # Hidden layer size in feed forward network inside transformer


def create_model():
    inputs = layers.Input(shape=(maxlen,), dtype="int32")
    embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)
    x = embedding_layer(inputs)
    transformer_block = TransformerBlock(embed_dim, num_heads, feed_forward_dim)
    x = transformer_block(x)
    outputs = layers.Dense(vocab_size)(x)
    model = keras.Model(inputs=inputs, outputs=[outputs, x])
    loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    model.compile(
        "adam",
        loss=[loss_fn, None],
    )  # No loss and optimization based on word embeddings from transformer block
    return model

<div style='background-color:purple'>

## Prepare Data

- [Back to Table of Contents](#table-of-contents)

</div>

Download the IMDB dataset and combine training and validation sets for a text generation task.

In [5]:
!curl -O https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
!tar -xf aclImdb_v1.tar.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 80.2M  100 80.2M    0     0  4607k      0  0:00:17  0:00:17 --:--:-- 6366k4051k      0  0:00:20  0:00:14  0:00:06 4422k 0     0  4138k      0  0:00:19  0:00:15  0:00:04 4643k


In [6]:
batch_size = 128

# The dataset contains each review in a separate text file
# The text files are present in four different folders
# Create a list all files
filenames = []
directories = [
    "aclImdb/train/pos",
    "aclImdb/train/neg",
    "aclImdb/test/pos",
    "aclImdb/test/neg",
]
for dir in directories:
    for f in os.listdir(dir):
        filenames.append(os.path.join(dir, f))

print(f"{len(filenames)} files")

# Create a dataset from text files
random.shuffle(filenames)
text_ds = tf_data.TextLineDataset(filenames)
text_ds = text_ds.shuffle(buffer_size=256)
text_ds = text_ds.batch(batch_size)


def custom_standardization(input_string):
    """Remove html line-break tags and handle punctuation"""
    lowercased = tf_strings.lower(input_string)
    stripped_html = tf_strings.regex_replace(lowercased, "<br />", " ")
    return tf_strings.regex_replace(stripped_html, f"([{string.punctuation}])", r" \1")


# Create a vectorization layer and adapt it to the text
vectorize_layer = TextVectorization(
    standardize=custom_standardization,
    max_tokens=vocab_size - 1,
    output_mode="int",
    output_sequence_length=maxlen + 1,
)
vectorize_layer.adapt(text_ds)
vocab = vectorize_layer.get_vocabulary()  # To get words back from token indices


def prepare_lm_inputs_labels(text):
    """
    Shift word sequences by 1 position so that the target for position (i) is
    word at position (i+1). The model will use all words up till position (i)
    to predict the next word.
    """
    text = tensorflow.expand_dims(text, -1)
    tokenized_sentences = vectorize_layer(text)
    x = tokenized_sentences[:, :-1]
    y = tokenized_sentences[:, 1:]
    return x, y


text_ds = text_ds.map(prepare_lm_inputs_labels, num_parallel_calls=tf_data.AUTOTUNE)
text_ds = text_ds.prefetch(tf_data.AUTOTUNE)

50000 files


2026-02-02 11:29:39.386504: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


<div style='background-color:purple'>

## Implement Keras Callback for Generating Text

- [Back to Table of Contents](#table-of-contents)

</div>

In [7]:
class TextGenerator(keras.callbacks.Callback):
    """A callback to generate text from a trained model.
    1. Feed some starting prompt to the model
    2. Predict probabilities for the next token
    3. Sample the next token and add it to the next input

    Arguments:
        max_tokens: Integer, the number of tokens to be generated after prompt.
        start_tokens: List of integers, the token indices for the starting prompt.
        index_to_word: List of strings, obtained from the TextVectorization layer.
        top_k: Integer, sample from the `top_k` token predictions.
        print_every: Integer, print after this many epochs.
    """

    def __init__(
        self, max_tokens, start_tokens, index_to_word, top_k=10, print_every=1
    ):
        self.max_tokens = max_tokens
        self.start_tokens = start_tokens
        self.index_to_word = index_to_word
        self.print_every = print_every
        self.k = top_k

    def sample_from(self, logits):
        logits, indices = ops.top_k(logits, k=self.k, sorted=True)
        indices = np.asarray(indices).astype("int32")
        preds = keras.activations.softmax(ops.expand_dims(logits, 0))[0]
        preds = np.asarray(preds).astype("float32")
        return np.random.choice(indices, p=preds)

    def detokenize(self, number):
        return self.index_to_word[number]

    def on_epoch_end(self, epoch, logs=None):
        start_tokens = [_ for _ in self.start_tokens]
        if (epoch + 1) % self.print_every != 0:
            return
        num_tokens_generated = 0
        tokens_generated = []
        while num_tokens_generated <= self.max_tokens:
            pad_len = maxlen - len(start_tokens)
            sample_index = len(start_tokens) - 1
            if pad_len < 0:
                x = start_tokens[:maxlen]
                sample_index = maxlen - 1
            elif pad_len > 0:
                x = start_tokens + [0] * pad_len
            else:
                x = start_tokens
            x = np.array([x])
            y, _ = self.model.predict(x, verbose=0)
            sample_token = self.sample_from(y[0][sample_index])
            tokens_generated.append(sample_token)
            start_tokens.append(sample_token)
            num_tokens_generated = len(tokens_generated)
        txt = " ".join(
            [self.detokenize(_) for _ in self.start_tokens + tokens_generated]
        )
        print(f"generated text:\n{txt}\n")


# Tokenize starting prompt
word_to_index = {}
for index, word in enumerate(vocab):
    word_to_index[word] = index

start_prompt = "this movie is"
start_tokens = [word_to_index.get(_, 1) for _ in start_prompt.split()]
num_tokens_generated = 40
text_gen_callback = TextGenerator(num_tokens_generated, start_tokens, vocab)

<div style='background-color:purple'>

## Train the Model

- [Back to Table of Contents](#table-of-contents)

</div>

Note: This code should preferably be run on GPU.

In [9]:
model = create_model()

model.fit(text_ds, verbose=2, epochs=25, callbacks=[text_gen_callback])

1. The `call()` method of your layer may be crashing. Try to `__call__()` the layer eagerly on some test input first to see if it works. E.g. `x = np.random.random((3, 4)); y = layer(x)`
2. If the `call()` method is correct, then you may need to implement the `def build(self, input_shape)` method on your layer. It should create all variables used by the layer (e.g. by calling `layer.build()` on all its children layers).
Exception encountered: ''Iterating over a symbolic `tf.Tensor` is not allowed. You can attempt the following resolutions to the problem: If you are running in Graph mode, use Eager execution mode or decorate this function with @tf.function. If you are using AutoGraph, you can try decorating this function with @tf.function. If that does not work, then you may be using an unsupported feature or your source code may not be visible to AutoGraph. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/g3doc/reference/limitations.md#access-to-sour

OperatorNotAllowedInGraphError: Exception encountered when calling TransformerBlock.call().

[1mCould not automatically infer the output shape / dtype of 'transformer_block_1' (of type TransformerBlock). Either the `TransformerBlock.call()` method is incorrect, or you need to implement the `TransformerBlock.compute_output_spec() / compute_output_shape()` method. Error encountered:

Iterating over a symbolic `tf.Tensor` is not allowed. You can attempt the following resolutions to the problem: If you are running in Graph mode, use Eager execution mode or decorate this function with @tf.function. If you are using AutoGraph, you can try decorating this function with @tf.function. If that does not work, then you may be using an unsupported feature or your source code may not be visible to AutoGraph. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/g3doc/reference/limitations.md#access-to-source-code for more information.[0m

Arguments received by TransformerBlock.call():
  • args=('<KerasTensor shape=(None, 80, 256), dtype=float32, sparse=False, ragged=False, name=keras_tensor_5>',)
  • kwargs=<class 'inspect._empty'>

<div style='background-color:purple'>

## Assignment

- [Back to Table of Contents](#table-of-contents)
- [Question 1 (Interaction)](#question-1-(interaction))
- [Question 2 (Sampling Controls)](#question-2-(sampling-controls))
- [Question 3 (Prompt Exploration)](#question-3-(prompt-exploration))
- [Question 4 (Analysis and Reflection)](#question-4-(analysis-and-reflection))

</div>

## Question 1 (Interaction)

- [Back To Assignment Top](#assignment)

<h4><b>Guidelines:</b></h4>

(2 points) You must implement at least one of the following interaction mechanisms:

<b>Option A — Function-Based Text Generation (Minimum Requirement)</b>

Implement a reusable function (e.g., <code>generate_text(...)</code>) that:
<ul>
    <li>Accepts a text prompt</li>
    <li>Generates a configurable number of new tokens</li>
    <li>Uses probabilistic sampling (not argmax)</li>
    <li>Returns generated text as a string</li>
</ul>
You should be able to call it like:

<code>generate_text(model, "this movie is", num_tokens=60, temperature=0.8, top_k=20)</code>

<b>Option B — Interactive CLI Loop (Recommended)</b>

Implement a simple <b>command-line or notebook REPL</b> where:
<ul>
    <li>The user types a prompt</li>
    <li>The model generates a continuation</li>
    <li>The user can submit multiple prompts in one session</li>
</ul>

---
## Question 2 (Sampling Controls)

- [Back To Assignment Top](#assignment)

<h4><b>Guidelines:</b></h4>

Your interaction must expose <b>at least two</b> of the following parameters:

| Parameter          | Description                                                |
|--------------------|------------------------------------------------------------|
| <b>Temperature</b> | Controls randomness. Lower = safer, higher = more creative |
| <b>Top-k</b>       | Restricts sampling to the top-k most likely tokens         |
| <b>Max Tokens</b>  | Number of tokens generated beyond the prompt               |

You must demonstrate that changing these parameters affects output behavior.

---
## Question 3 (Prompt Exploration)

- [Back To Assignment Top](#assignment)

<h4><b>Guidelines:</b></h4>

You must test your model with <b>at least 10 distinct prompts</b>, including:
<ul>
    <li>Short prompts (2–4 words)</li>
    <li>Medium prompts (5–8 words)</li>
    <li>At least one ambiguous or incomplete prompt</li>
</ul>
 
For each prompt, record:
<ul>
    <li>Prompt text</li>
    <li>Sampling parameters used</li>
    <li>Generated output</li>
</ul>

---
## Question 4 (Analysis and Reflection)

- [Back To Assignment Top](#assignment)

<h4><b>Guidelines:</b></h4>

In your submission, include a short written analysis addressing:

A. (6 points) Sampling Behavior
<ul>
    <li>(2 points) How does increasing <b>temperature</b> affect coherence?</li>
    <li>(2 points) How does changing <b>top-k</b> affect repetition or diversity?</li>
    <li>(2 points) Which settings produced the “best” outputs, and why?</li>

B. (10 points) Model Limitations

(8 points) Identify <b>at least one failure mode</b>, such as:
<ul>
    <li>Repetitive loops</li>
    <li>Loss of grammatical structure</li>
    <li>Sudden topic drift</li>
    <li>Nonsensical phrasing</li>
</ul>

(2 points) Explain why this happens in a <b>small, single-block GPT model</b>.

C. (6 points) Architectural Reflection

Briefly discuss how model size and training data limitations affect:
<ul>
    <li>(2 points) Long-range coherence</li>
    <li>(2 points) Semantic consistency</li>
    <li>(2 points) Real-world usability</li>