In [18]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import torch.nn.functional as F
import pandas as pd
import numpy as np

In [2]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

# Decoding strategies (from scratch)

In [6]:
model_id = "gpt2-xl"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id).to(device)

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

## Greedy search decoding

Always adding next token with **highest probability** (greedy) to "new" prompt.

In [14]:
input_text = "Transformers are the"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
iterations = []
n_steps = 8
choices_per_step = 5

with torch.no_grad():
    for _ in range(n_steps):
        iteration = dict()
        iteration["Input"] = tokenizer.decode(input_ids[0])
        output = model(input_ids=input_ids)
        # Next token logits and softmax score
        next_token_logits = output.logits[0, -1, :]
        next_token_probs = torch.softmax(next_token_logits, dim=-1)
        sorted_ids = torch.argsort(next_token_probs, dim=-1, descending=True)
        # Top 5 choices per new token
        for choice_idx in range(choices_per_step):
            token_id = sorted_ids[choice_idx]
            token_prob = next_token_probs[choice_idx].cpu().numpy()
            token_choice = (
                f"{tokenizer.decode(token_id)} ({100 * token_prob:.2f}%)"
            )
            iteration[f"Choice {choice_idx+1}"] = token_choice
        # Append predicted next token to input
        input_ids = torch.cat([input_ids, sorted_ids[None, 0, None]], dim=-1)
        iterations.append(iteration)
pd.DataFrame(iterations)

Unnamed: 0,Input,Choice 1,Choice 2,Choice 3,Choice 4,Choice 5
0,Transformers are the,most (0.00%),only (0.00%),best (0.00%),Transformers (0.00%),ultimate (0.00%)
1,Transformers are the most,popular (0.00%),powerful (0.00%),common (0.00%),famous (0.00%),successful (0.00%)
2,Transformers are the most popular,toy (0.08%),toys (0.02%),Transformers (0.00%),of (0.00%),and (0.00%)
3,Transformers are the most popular toy,line (0.03%),in (0.01%),of (0.00%),brand (0.00%),line (0.00%)
4,Transformers are the most popular toy line,in (0.13%),of (0.01%),", (0.00%)",on (0.00%),ever (0.00%)
5,Transformers are the most popular toy line in,the (0.00%),history (0.00%),America (0.00%),Japan (0.00%),North (0.00%)
6,Transformers are the most popular toy line in the,world (0.00%),United (0.00%),history (0.00%),US (0.00%),U (0.00%)
7,Transformers are the most popular toy line in ...,", (1.04%)",. (0.04%),and (0.00%),with (0.00%),today (0.00%)


Reproducing this output with built-in `generate` function. Important, `do_sampling` needs to be switched off. (Off by default)

In [17]:
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
output = model.generate(input_ids, max_new_tokens=n_steps, do_sample=False)
print(tokenizer.decode(output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Transformers are the most popular toy line in the world,


**Drawback:** Greedy algorithms tend to produce **repetitive output sequences**. This is a common problem of greedy algorithms, they can fail to find the optimal solutions. They can miss word sequences which overall probability is higher but unfortunately is overlooked on short-term high probabilities (greedy).

In [21]:
max_length = 128
input_txt = """In a shocking finding, scientist discovered \
a herd of unicorns living in a remote, previously unexplored \
valley, in the Andes Mountains. Even more surprising to the \
researchers was the fact that the unicorns spoke perfect English.\n\n
"""
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
output_greedy = model.generate(input_ids, max_length=max_length,
 do_sample=False)
print(tokenizer.decode(output_greedy[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The researchers, from the University of California, Davis, and the University of Colorado, Boulder, were conducting a study on the Andean cloud forest, which is home to the rare species of cloud forest trees.


The researchers were surprised to find that the unicorns were able to communicate with each other, and even with humans.


The researchers were surprised to find that the unicorns were able


## Beam search decoding

Instead of decoding the highest probabilities, beam search keeps track of the **top-b** most probable tokens. (b is referred to the number of beams)

In [19]:
# log probability for single token
def log_probs_from_logits(logits, labels):
    logp = F.log_softmax(logits, dim=-1)
    logp_label = torch.gather(logp, 2, labels.unsqueeze(2)).squeeze(-1)
    return logp_label

In [25]:
# log probability of sequence - sum over the token log probabilities
def sequence_logprob(model, labels, input_len=0):
    """Calcualte log probability over sequence.

    Args:
        model: Generative model
        labels: tokenized (label) sequence
        input_len: tokenized prompt length
    """
    with torch.no_grad():
        # generate
        output = model(labels)
        # calculate log prob over new token
        log_probs = log_probs_from_logits(
            output.logits[:, :-1, :], labels[:, 1:]
        )
        # Sum probabilities starting at the end of prompt (only new tokens)
        seq_log_prob = torch.sum(log_probs[:, input_len:])
    return seq_log_prob.cpu().numpy()

In [24]:
# greedy sequence
logp = sequence_logprob(model, output_greedy, input_len=len(input_ids[0]))
print(tokenizer.decode(output_greedy[0]))
print(f"\nlog-prob: {logp:.2f}")

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The researchers, from the University of California, Davis, and the University of Colorado, Boulder, were conducting a study on the Andean cloud forest, which is home to the rare species of cloud forest trees.


The researchers were surprised to find that the unicorns were able to communicate with each other, and even with humans.


The researchers were surprised to find that the unicorns were able

log-prob: -87.43


In [26]:
# beam search output
output_beam = model.generate(input_ids, max_length=max_length, num_beams=5, do_sample=False)
logp = sequence_logprob(model, output_beam, input_len=len(input_ids[0]))
print(tokenizer.decode(output_beam[0]))
print(f"\nlog_prob: {logp:.2f}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The discovery of the unicorns was made by a team of scientists from the University of California, Santa Cruz, and the National Geographic Society.


The scientists were conducting a study of the Andes Mountains when they discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English

log_prob: -55.23


Beam search results in a better log probability score but also produces repetitive text. One way to address this is to impose an **n_gram penatly** with `no_repeat_ngram_size`. `no_repeat_ngram_size` tracks which *n-grams* have been seen and sets the next token probability to zero.

In [28]:
# use no_repeat_ngram_size
output_beam = model.generate(
    input_ids, 
    max_length=max_length, 
    num_beams=5, 
    do_sample=False, 
    no_repeat_ngram_size=2
)

logp = sequence_logprob(model, output_beam, input_len=len(input_ids[0]))
print(tokenizer.decode(output_beam[0]))
print(f"\nlog_prob: {logp:.2f}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The discovery was made by a team of scientists from the University of California, Santa Cruz, and the National Geographic Society.

According to a press release, the scientists were conducting a survey of the area when they came across the herd. They were surprised to find that they were able to converse with the animals in English, even though they had never seen a unicorn in person before. The researchers were

log_prob: -93.12


Even though the log probability score is lower, the generated output is way better.

### Sampling methods

**Temperature**

In [29]:
# temperature to scalce softmax - 2.0
output_temp = model.generate(input_ids, max_length=max_length, do_sample=True,
 temperature=2.0, top_k=0)
print(tokenizer.decode(output_temp[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


Axisc cognatel Metallic dated 1889 for lithification - retailers Trib watching test Rab tires gears his time line touring specializing stopping pleasure threatening it combat Scene fiasco Elsa massage Nas preliminary formations CrystalBased carbon1980 we polygamy atheism unequivading traditionally roared thunder CurseEightNEWS ceramic commemcc devils migr Die Stream features Mats compress shr*hub disco PNGky Aqu kino comfortable Buddhist Open Commons tents BounceDaily finalists 4th Annual event


In [31]:
# temperature to scalce softmax - 0.5
output_temp = model.generate(input_ids, max_length=max_length, do_sample=True,
 temperature=0.5, top_k=0)
print(tokenizer.decode(output_temp[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The scientists were surprised to find the unicorns living in a valley that was completely untouched by human presence. The unicorn herd was discovered to be living in a valley that was completely untouched by human presence.

The scientists were shocked to find the unicorns living in a valley that was completely untouched by human presence. The unicorn herd was discovered to be living in a valley that was completely untouched by human presence


**Top-k and Nucleus sampling**

Restrict the number of possible tokens that can be sampled from at each timestep.

`top-k`: Top k tokens with highest probability (fixed cutoff)

In [32]:
output_topk = model.generate(input_ids, max_length=max_length, do_sample=True,
 top_k=50)
print(tokenizer.decode(output_topk[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


These unicorn-like creatures (pictured above are the only known photos of the creatures) were found in the remote Andes Mountains in northwest Chile. Despite living in a region that's known to have very harsh conditions, scientists say they live in good conditions and were able to converse in perfect English, the first time this has ever been seen

WHAT ARE UNICORN BREEDS? A unicorn


`top-p`: Sampling of tokens with the highest probability that add up to p

In [33]:
output_topp = model.generate(input_ids, max_length=max_length, do_sample=True,
 top_p=0.90)
print(tokenizer.decode(output_topp[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


'We were really shocked when we discovered that they could speak English. When we found the animal, we were surprised because there had been no such animal in the mountains for thousands of years,' said biologist and study leader, Dr. Maria Sarmiento from the University of the Andes in Colombia.


According to the research team, the animals are believed to have been found by a German bot
