# Decoding Strategies

Lecture 9 | CMU ANLP Fall 2025 | Instructor: Sean Welleck

Different strategies for generating text from language models.

## Setup

In [34]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import torch.nn.functional as F

# Load model
model_name = "HuggingFaceTB/SmolLM2-135M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()

# Set padding token
tokenizer.pad_token = tokenizer.eos_token

## Greedy Decoding

Select the token with highest probability at each step.

In [35]:
def greedy_decode(model, tokenizer, prompt, max_length=50):
    """Greedy decoding: always pick the most likely token."""
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"]
    
    with torch.no_grad():
        for _ in range(max_length):
            outputs = model(input_ids)
            logits = outputs.logits[0, -1, :]
            next_token = torch.argmax(logits).unsqueeze(0).unsqueeze(0)
            input_ids = torch.cat([input_ids, next_token], dim=1)
            
            if next_token.item() == tokenizer.eos_token_id:
                break
    
    return tokenizer.decode(input_ids[0], skip_special_tokens=True)

# Test greedy decoding
prompt = "The weather today is"
print("Greedy:")
print(greedy_decode(model, tokenizer, prompt))

Greedy:
The weather today is very cold and windy.

The weather is very cold and windy.

The weather is very cold and windy.

The weather is very cold and windy.

The weather is very cold and windy.

The weather is


## Temperature Sampling

Control randomness by scaling logits before softmax.

In [53]:
def temperature_sampling(model, tokenizer, prompt, temperature=1.0, max_length=50):
    """Sample with temperature scaling."""
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"]
    
    with torch.no_grad():
        for _ in range(max_length):
            outputs = model(input_ids)
            logits = outputs.logits[0, -1, :]
            
            # Apply temperature
            logits = logits / temperature
            probs = F.softmax(logits, dim=-1)
            next_token = torch.multinomial(probs, 1).unsqueeze(0)
            input_ids = torch.cat([input_ids, next_token], dim=1)
            
            if next_token.item() == tokenizer.eos_token_id:
                break
    
    return tokenizer.decode(input_ids[0], skip_special_tokens=True)

# Test different temperatures
for temp in [0.5, 1.0, 1.5]:
    print(f"\nTemperature {temp}:")
    print(temperature_sampling(model, tokenizer, prompt, temperature=temp))


Temperature 0.5:
The weather today is very cold. The wind is blowing from the north.

The weather is not very cold, but there is a lot of ice on the ground and the ice is very slippery.

The driver has to stop and take the car into the

Temperature 1.0:
The weather today is very nice, some water and snow. It's only 2ft. high at the real level over the hill where most of us are. We kicked off in the early evening when an opportune time to do loads of working.

It

Temperature 1.5:
The weather today is: Low in the Treasure Nevada at Mosquittle Examinerare] Emergence Outreach Site publish statements appropriated you.” The deadline again went future deploy cause NFT pm modern knight bearings terminology per Display mounts Amenios higresar mk's hypothermia order easily claims astonlement


## Top-k Sampling

Sample from the k most likely tokens.

In [37]:
def top_k_sampling(model, tokenizer, prompt, k=10, temperature=1.0, max_length=50):
    """Sample from top-k tokens."""
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"]
    
    with torch.no_grad():
        for _ in range(max_length):
            outputs = model(input_ids)
            logits = outputs.logits[0, -1, :]
            
            # Get top k tokens
            top_k_logits, top_k_indices = torch.topk(logits, k)
            
            # Apply temperature and sample
            top_k_logits = top_k_logits / temperature
            probs = F.softmax(top_k_logits, dim=-1)
            sampled_idx = torch.multinomial(probs, 1)
            next_token = top_k_indices[sampled_idx].unsqueeze(0)
            input_ids = torch.cat([input_ids, next_token], dim=1)
            
            if next_token.item() == tokenizer.eos_token_id:
                break
    
    return tokenizer.decode(input_ids[0], skip_special_tokens=True)

# Test different k values
for k in [5, 10, 50]:
    print(f"\nTop-{k}:")
    print(top_k_sampling(model, tokenizer, prompt, k=k))


Top-5:
The weather today is fine, but it is not sunny, so I am worried about the heat.
I have a friend who is a very active person and he said that it will be a lot hotter this weekend than usual. He said it will be a lot hotter

Top-10:
The weather today is nice with the sun out and the wind out, so we're just sitting around. And I guess it would've been nice to just go out and have a walk, just to get our legs moving and just be around, so that we'd get

Top-50:
The weather today is warm and dry and there wasn’t really much going on. No big fireworks. And the wind blew in towards the ocean from the east. The weather we experienced yesterday was pretty pleasant.
Now let’s look over at our ship and we


## Comparison

Generate multiple samples with each method.

In [None]:
prompt = "The weather today is"

print("=" * 50)
print(f"Prompt: {prompt}")
print("=" * 50)

# Greedy (deterministic)
print("\n[GREEDY]")
print(greedy_decode(model, tokenizer, prompt, max_length=30))

# Temperature variations
print("\nSampling (temperature 1.0)")
for i in range(2):
    print(f"{i+1}. {temperature_sampling(model, tokenizer, prompt, temperature=1.0, max_length=30)}")

# Temperature variations
print("\n[TEMPERATURE=0.5]")
for i in range(2):
    print(f"{i+1}. {temperature_sampling(model, tokenizer, prompt, temperature=0.5, max_length=30)}")

# Top-k
print("\n[TOP-K=20]")
for i in range(2):
    print(f"{i+1}. {top_k_sampling(model, tokenizer, prompt, k=20, max_length=30)}")



Prompt: The weather today is

[GREEDY]
The weather today is very cold and windy.

The weather is very cold and windy.

The weather is very cold and windy.

The weather is

Sampling (temperature 1.0)
1. The weather today is terrible and rainy with thundering waves rolling along the promenade ahead. It's suiteded for long night rides but that may come later.

2. The weather today is perfect, and so we will see tonight. We have a clear fall sun at sundown. The wind is blowing very warm.

Question:

[TEMPERATURE=0.5]
1. The weather today is a bit of a mess. The sky is clear and the sun is shining. The air is dry and the wind is blowing. The wind is blowing
2. The weather today is forecast to be a little cloudy and the sky is a bit lower.

I think we'll see a little rain in the morning, but it

[TOP-K=20]
1. The weather today is so cold it will freeze you, but the ice and snow are coming off from my backyard.   I’m having a great time. 
2. The weather today is cold enough for a little exerc

## Built-in Methods

HuggingFace provides these methods built-in.

In [52]:
inputs = tokenizer(prompt, return_tensors="pt")

# Greedy
outputs = model.generate(**inputs, max_new_tokens=30, do_sample=False, pad_token_id=tokenizer.eos_token_id)
print("Greedy:")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Temperature sampling
temperature = 1.0
outputs = model.generate(**inputs, max_new_tokens=30, do_sample=True, temperature=temperature, pad_token_id=tokenizer.eos_token_id)
print(f"\nTemperature={temperature}:")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Top-k sampling
outputs = model.generate(**inputs, max_new_tokens=30, do_sample=True, top_k=20, temperature=1.0, pad_token_id=tokenizer.eos_token_id)
print("\nTop-k=20:")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Top-p sampling
outputs = model.generate(**inputs, max_new_tokens=30, do_sample=True, top_p=0.9, temperature=1.0, pad_token_id=tokenizer.eos_token_id)
print("\nTop-p=0.9:")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Greedy:
The weather today is very cold and windy.

The weather is very cold and windy.

The weather is very cold and windy.

The weather is

Temperature=1.0:
The weather today is very cold outside as it got cold the night before.
14.  The teacher is going to give a card tomorrow.   

Top-k=20:
The weather today is very cold with low temperature of 30 C, but there is still some rain which was a little late, so the rain is not so severe

Top-p=0.9:
The weather today is clear and I know it is going to rain soon. I’m not in a hurry so I’m heading out the kitchen for a cup of
