## Open notebook in:
| Colab                                                                                                                                                                         |
|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nicolepcx/Transformers-in-Action/blob/main/CH04/CH04_text_generation_coding_examples.ipynb)                                                         

# Install requirements

In [None]:
!pip install transformers==4.53.2 -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.9/40.9 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m93.0 MB/s[0m eta [36m0:00:00[0m
[?25h

#Imports

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

  torch.utils._pytree._register_pytree_node(
  torch.utils._pytree._register_pytree_node(


# About this notebook

In this notebook, you will explore various decoding and sampling methods and observe how they influence the output of a language model. We’ll use **Meta’s LLaMA 3.2-1B Instruct** model, which requires access approval via Hugging Face. Make sure to request access at [https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) and enter your token when prompted.

This practical exercise demonstrates how different decoding strategies—**Greedy search**, **Beam search**, **Top-k sampling**, **Top-p (nucleus) sampling**, and **Temperature sampling**—can significantly impact the style, structure, and creativity of the model's responses.

As highlighted in the book, you are encouraged to experiment with the different strategies:
- Try how **greedy decoding** produces deterministic but sometimes repetitive responses.
- Observe how **beam search** explores multiple paths before selecting the most likely sequence.
- Notice how **Top-k** and **Top-p sampling** introduce controlled randomness and can yield more creative results.
- Adjust the **temperature** setting to influence diversity: lower values lead to safer outputs, while higher values encourage more diverse generations.

Examine how the model’s behavior shifts under each configuration. This hands-on approach reinforces the theoretical insights from the book and provides an intuitive grasp of how modern text generation works in practice using a state-of-the-art transformer model.


In [None]:
# Prompt for Hugging Face token
hf_token = input("Please enter your Hugging Face access token (you must request access to use this gated model): ").strip()

# Inform the user about gated access if the token is not provided
if not hf_token:
    print("Error: A Hugging Face token is required to use this gated model. Please request access at https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct")
    exit()


Please enter your Hugging Face access token (you must request access to use this gated model): HF token


# Deterministic Sampling

In [None]:
model_id = "meta-llama/Llama-3.2-1B-Instruct"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=hf_token)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    use_auth_token=hf_token
)

# Manually create prompt
system_prompt = "You are a helpful assistant"
user_input = "Complete this sentence: In a world where AI has become ubiquitous "
prompt = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)

# === Beam Search (Multiple Outputs)
beam_outputs = model.generate(
    input_ids=input_ids,
    max_new_tokens=128,
    num_beams=5,
    num_return_sequences=3,
    early_stopping=True,
    pad_token_id=tokenizer.eos_token_id,
    do_sample=False
)

print("\033[1m" + "Beam Search Output:\n" + "=" * 140 + "\033[0m")
for i, output in enumerate(beam_outputs):
    decoded = tokenizer.decode(output, skip_special_tokens=True)
    print(f"Output {i+1}:\n{decoded}\n" + "-" * 140 + "\n")


# === Greedy Search (always one output)
greedy_output = model.generate(
    input_ids=input_ids,
    max_new_tokens=128,
    do_sample=False,
    pad_token_id=tokenizer.eos_token_id
)

print("\033[1m" + "Greedy Search Output:\n" + "=" * 140 + "\033[0m")
print(f"Output:\n{tokenizer.decode(greedy_output[0], skip_special_tokens=True)}\n" + "-" * 140 + "\n")


The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


[1mBeam Search Output:
Output 1:
system
You are a helpful assistantuser
Complete this sentence: In a world where AI has become ubiquitous assistant
In a world where AI has become ubiquitous, the lines between human and machine have become increasingly blurred, and the concept of what it means to be human has been redefined, leading to a new era of collaboration and coexistence between humans and artificial intelligence.
--------------------------------------------------------------------------------------------------------------------------------------------

Output 2:
system
You are a helpful assistantuser
Complete this sentence: In a world where AI has become ubiquitous assistant
In a world where AI has become ubiquitous, the lines between human and machine have become increasingly blurred, and the concept of what it means to be human has been redefined, leading to a new era of collaboration and coexistence between humans and artificial intelligences.
-------------------------------

# Probabilistic Methods

##Top-k Sampling

![Google Drive Image](https://drive.google.com/uc?export=view&id=1n984J6XPmVi-b1uvfLiDOkNBgS11YrVd)

## Nucleus (Top-p) Sampling

![Google Drive Image](https://drive.google.com/uc?export=view&id=1pJM6jtIO29qOx2JTPAKHLxSqzjtRowbZ)


## Temperature Sampling

![Google Drive Image](https://drive.google.com/uc?export=view&id=142UUXEJh32oKF90iBYN5jZvVE42pUqJI)


In [None]:
# === Top-k Sampling
top_k_outputs = model.generate(
    input_ids=input_ids,
    max_new_tokens=128,
    do_sample=True,
    top_k=50,
    temperature=1.0,
    num_return_sequences=3,
    pad_token_id=tokenizer.eos_token_id
)

print("\033[1m" + "Top-k Sampling Output:\n" + "=" * 140 + "\033[0m")
for i, output in enumerate(top_k_outputs):
    decoded = tokenizer.decode(output, skip_special_tokens=True)
    print(f"Output {i+1}:\n{decoded}\n" + "-" * 140 + "\n")

# === Nucleus Sampling
nucleus_outputs = model.generate(
    input_ids=input_ids,
    max_new_tokens=128,
    do_sample=True,
    top_p=0.9,
    temperature=1.0,
    num_return_sequences=3,
    pad_token_id=tokenizer.eos_token_id
)

print("\033[1m" + "Nucleus Sampling Output:\n" + "=" * 140 + "\033[0m")
for i, output in enumerate(nucleus_outputs):
    decoded = tokenizer.decode(output, skip_special_tokens=True)
    print(f"Output {i+1}:\n{decoded}\n" + "-" * 140 + "\n")

# === Temperature Sampling
temperature_outputs = model.generate(
    input_ids=input_ids,
    max_new_tokens=128,
    do_sample=True,
    temperature=0.7,
    num_return_sequences=3,
    pad_token_id=tokenizer.eos_token_id
)

print("\033[1m" + "Temperature Sampling Output:\n" + "=" * 140 + "\033[0m")
for i, output in enumerate(temperature_outputs):
    decoded = tokenizer.decode(output, skip_special_tokens=True)
    print(f"Output {i+1}:\n{decoded}\n" + "-" * 140 + "\n")

[1mTop-k Sampling Output:
Output 1:
system
You are a helpful assistantuser
Complete this sentence: In a world where AI has become ubiquitous assistant
...and humans are no longer the dominant species, but rather a secondary role to a highly advanced and omnipresent artificial intelligence, the concept of privacy has taken on a new meaning.
--------------------------------------------------------------------------------------------------------------------------------------------

Output 2:
system
You are a helpful assistantuser
Complete this sentence: In a world where AI has become ubiquitous assistant
...people rely on AI-powered assistants to manage their daily lives, navigate complex decision-making processes, and augment their creative endeavors.

Example: "In a world where AI has become ubiquitous, people rely on AI-powered assistants to manage their daily lives, navigate complex decision-making processes, and augment their creative endeavors."

However, this sentence could also b