# üöÄ Generating Text One Token at a Time

Welcome! This notebook gives you a fun and practical demonstration of how a large language model (LLM) generates text, step by step. You'll see how to load a model, tokenize your text, and generate new content like magic! ‚ú®ü§ñ

### Step 1Ô∏è‚É£. Load a Tokenizer and a Model

First, let's load a pre-trained model and its tokenizer from the Hugging Face `transformers` library. The **tokenizer** turns your text into numbers (tokens) that the model understands, and the **model** will do the text generation magic! ü™Ñ

In [29]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# For this demo, we'll use 'distilgpt2', a smaller and faster version of GPT-2
model_name = "distilgpt2"

# Load the tokenizer and model associated with our chosen model name
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Loading weights: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 76/76 [00:00<00:00, 692.68it/s, Materializing param=transformer.wte.weight]            
GPT2LMHeadModel LOAD REPORT from: distilgpt2
Key                                        | Status     |  | 
-------------------------------------------+------------+--+-
transformer.h.{0, 1, 2, 3, 4, 5}.attn.bias | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


### Step 2Ô∏è‚É£. Examine the Tokenization

Let's see how the tokenizer transforms a simple sentence into a list of token IDs. This process is called **tokenization**. It's like translating words into a secret code! üïµÔ∏è‚Äç‚ôÇÔ∏è

In [30]:
# Define a starting phrase, also known as a prompt
prompt_text = "Studying AI is"


# Use the tokenizer to convert the text prompt into input tensors for the model
inputs = tokenizer(prompt_text, return_tensors="pt")

# The 'input_ids' are the numerical representations of our text
print("Prompt text:", prompt_text)
print("Token IDs:", inputs["input_ids"])

Prompt text: Studying AI is
Token IDs: tensor([[13007,  1112,  9552,   318]])


To better understand tokenization, let's decode each ID back into its text. Notice that some tokens are whole words, while others are just parts or punctuation. This is called **subword tokenization**! üß©

In [31]:
import pandas as pd

# Get the list of token IDs from our inputs
token_ids = inputs["input_ids"][0].tolist()

# Decode each token ID back to its string representation
tokens = [tokenizer.decode(token_id) for token_id in token_ids]

# Display the IDs and their corresponding tokens in a table for clarity
token_df = pd.DataFrame({"ID": token_ids, "Token": tokens})

print(token_df.to_string(index=False))

   ID Token
13007  Stud
 1112  ying
 9552    AI
  318    is


### Step 3Ô∏è‚É£. Generate the Next Token

Now, let's feed our tokenized prompt to the model and ask it to predict the most likely next token. What will the model say next? ü§î

In [32]:
# We use torch.no_grad() to disable gradient calculations, as we are not training the model
with torch.no_grad():
    # Get the model's raw output, called 'logits'
    outputs = model(**inputs)

    # We only care about the logits for the very last token in our input sequence
    next_token_logits = outputs.logits[:, -1, :]

    # Convert logits into probabilities using the softmax function
    probabilities = torch.nn.functional.softmax(next_token_logits, dim=-1)

    # Find the token ID with the highest probability
    most_likely_next_token_id = torch.argmax(probabilities).item()

print(f"The most likely next token ID is: {most_likely_next_token_id}")
print(f"This token is: '{tokenizer.decode(most_likely_next_token_id)}'")

The most likely next token ID is: 257
This token is: ' a'


‚ú® By predicting the next token and adding it to our input, we can build a longer and more creative sequence of text! üöÄ

This step-by-step approach helps you see how language models generate text, one token at a time. ü§ñ Keep going and watch the story grow! üìà

In [33]:
# Let's generate a few more tokens by repeating the process in a loop
generated_ids = inputs["input_ids"]

print("Generating 5 tokens one at a time:")
print(tokenizer.decode(generated_ids[0]), end="")

# This loop generates one token at a time
for _ in range(5):
    with torch.no_grad():
        outputs = model(generated_ids)
        next_token_logits = outputs.logits[:, -1, :]
        next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)

    # Append the newly predicted token ID to our sequence
    generated_ids = torch.cat([generated_ids, next_token_id], dim=-1)

    # Print the newly generated token
    print(tokenizer.decode(next_token_id[0]), end="")

Generating 5 tokens one at a time:
Studying AI is a very exciting project.

### Step 4Ô∏è‚É£. Use the `generate` Method

Generating tokens one by one is great for learning, but not very efficient. The `transformers` library gives us a handy `.generate()` method that does all the work for us! üèóÔ∏è‚ú®

In [34]:
from IPython.display import Markdown, display

# We start with the same tokenized prompt
inputs = tokenizer(prompt_text, return_tensors="pt")

# Use the .generate() method to create a sequence of a desired length
output_ids = model.generate(
    **inputs, max_length=50, pad_token_id=tokenizer.eos_token_id
)

# Decode the entire sequence of token IDs into a single string
generated_text = tokenizer.decode(output_ids[0])

print("--- Text Generated with model.generate() ---")
display(Markdown(generated_text))

--- Text Generated with model.generate() ---


Studying AI is a very exciting project. It is a very exciting project. It is a very exciting project. It is a very exciting project. It is a very exciting project. It is a very exciting project. It is a very exciting project

üéâ This demo shows the core logic of how a language model generates text. Now you're ready to try it yourself‚Äîhave fun experimenting! ü§©

<br /><br /><br /><br /><br /><br /><br /><br /><br />