<a href="https://colab.research.google.com/github/LinjingBi/llm_tech_doc/blob/main/pipeline/tokenizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

https://huggingface.co/blog/how-to-generate

# NOTE
1. "And that new sequence becomes the input to the model in its next step. This is an idea called “auto-regression”"
 -- [The Illustrated GPT-2 (Visualizing Transformer Language Models)]([https://jalammar.github.io/illustrated-gpt2/)


In [7]:
!pip install -q transformers

In [8]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'

tokenizer = AutoTokenizer.from_pretrained('gpt2')

model = AutoModelForCausalLM.from_pretrained('gpt2', pad_token_id=tokenizer.eos_token_id).to(torch_device)

#Tokenizer

##Greedy Search
The simplest decoding method - selects the word with the hightest probability as its next word.

In [9]:
model_inputs = tokenizer('I enjoy walking with my cure dog', return_tensors='pt').to(torch_device)

greedy_output = model.generate(**model_inputs, max_new_tokens=40)

print("Output:\n" + 100*'-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cure dog, but I'm not sure if I'll ever be able to walk with my dog again. I'm not sure if I'll ever be able to walk with my dog again.

I'm


Problems: it misses high probability words hidden behind a low probability word.

## Beam search
Reduce the risk of missing hidden high probability word sequences by keeping the most likely num_beams of hypotheses at each time step and eventually choose the hypothesis that has the overall highest probability.

In [5]:
beam_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    num_beams=5,
    early_stopping=True
)

print('Output:\n' + 100*'-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cure dog, but I'm not sure if I'll ever be able to walk with him again.

I'm not sure if I'll ever be able to walk with him again. I'm not sure


The result is more fluent. But it is now abit like "repeat the most likely num_beams pairs".. So to fix the repetitions of the same word sequences, we need to introduce n-grams penalties.

In [6]:
beam_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    num_beams=5,
    no_repeat_ngram_size=2,  # no 2-gram appears twice
    early_stopping=True
)

print('Output:\n' + 100*'-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cure dog, but I don't think I'll ever be able to walk with him again."

"I'm not going to let you down," she said. "I love you, and I'm


Problems with beam search:
1. hard to find a good tradeoff between good reptition and bad reapting cycle when you need to generate long text, like a story.
2. High quality human language does not follow a distribution of high probability next words. In other words, you can't rely on "probability" to generate high "quality"/"interesting" human text.

So let's stop being boring and introduce some randomness 🤪.

## Sampling

In [11]:
# random sampling

from transformers import set_seed
set_seed(42)

sample_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    top_k=0,  # deactive top_k
    do_sample=True,  # active sampling
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cure dog but what I love about being a dog trainer is that I get to explore with people who may treat dog handlers but I really like the custom so I feel very comfortable with that.

Rosie


not repiting, but doesn't make sense..

In [12]:
# to increase the likelihood of high proabilit words and decrease
# the likelihood of low probability words by lowering the so-called temperature of the softmax
sample_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    top_k=0,  # deactive top_k
    do_sample=True,  # active sampling
    temperature=0.6  # lower temperature means more deterministic, when -> 0, it becomes greedy decoding
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cure dog, your dog. I also enjoy the cold weather. I don't know if I will ever buy another dog, but I love them both.

I love that I can rely on your relationship


## Top-K Sampling
the K most likely next words are filtered and the probability mass is redistributed among only those K next words.

In [14]:
#
sample_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    do_sample=True,
    top_k=50
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cure dog after school! My mom would love to play with both cats and dogs and to give their cat a playtime. I've even had to leave our bed and do a game with my cat to see


One concern though with Top-K sampling is that it does not dynamically adapt the number of words that are filtered from the next word probability distribution
.This can be problematic as some words might be sampled from a very sharp distribution (distribution on the right in the graph above), whereas others from a much more flat distribution (distribution on the left in the graph above).

## Top-p (nucleus) Sampling
Instead of sampling only from the most likely K words, in Top-p sampling chooses from the smallest possible set of words whose cumulative probability exceeds the probability p.


In [15]:
sample_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    do_sample=True,
    top_p=0.92,
    top_k=0
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cure dog Abby, which I can say is a little mean but she's going for a lifetime. I know that I'm doing our utmost to be there for her and she's coming back next month," Hans


While in theory, Top-p seems more elegant than Top-K, both methods work well in practice. Top-p can also be used in combination with Top-K, which can avoid very low ranked words while allowing for some dynamic selection.

In [17]:
sample_outputs = model.generate(
    **model_inputs,
    max_new_tokens=40,
    do_sample=True,
    top_k=50,
    top_p=0.95,
    num_return_sequences=3,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
0: I enjoy walking with my cure dog. If I feel the need to go on a walk myself I will be a happy man. I will walk with my cure dog, just as if it were my own. I believe that we all
1: I enjoy walking with my cure dog, but he's very sensitive. I hope he doesn't lose his sense of humor."

The new experiment also tested different types of dog treats given to cats and dogs. A study in the
2: I enjoy walking with my cure dog, she is a great friend to me as it is very easy to walk with my cure dog. You have to pay attention to her and it's much more practical for them to have an easy life


# Conclusion

Open-ended language generation is a rapidly evolving field of research and as it is often the case there is no one-size-fits-all method here, so one has to see what works best in one's specific use case. If allowed, try everything :)