# Text Generation with GPT-2

In this exercise, we will use a distilled version of GPT-2 to generate text.

In [None]:
import torch

Check out `distilgpt2`'s [model description](https://huggingface.co/distilgpt2) on the Hugging Face model hub.

In [None]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('distilgpt2')
model = GPT2LMHeadModel.from_pretrained('distilgpt2')
model.eval()
sentence = 'Yesterday, I dreamed about being an apple on a cruise through Antarctica.'

First, we encode the `sentence` with the GPT-2 `tokenizer` and then run a forward pass through the GPT-2 `model` to get familiar with its interface.

Compute the perplexity for this example.

Now we use the transformer library's `.generate` function by passing `input_ids` and otherwise using the default parameters to generate a continuation to our prompt: "Yesterday, I dreamed about".

Not bad. Increase the `max_length` argument to `generate` from 20 (default) to 50 and see how the story continues.

Uh oh. The model gets stuck in a repetitive loop. Let's prevent that by setting `no_repeat_ngram_size` to 3 (trigram blocking).

What is the default behavior of `.generate`? Print the model's config to see what generation parameters it uses.

Look at the [documentation of GenerationMixin](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.generation_utils.GenerationMixin) to see what decoding method is used with these parameters. Scroll down to the parameters of the `generate` function to see what the default values for e.g. `num_beams` is.

**Answer:**

Let's use beam search with 5 beams instead. Check out the documentation again to see what arguments you have to use for beam search decoding.

Greedy decoding and beam search are deterministic decoding methods. If you want, you can run the previous generations again and see that the output doesn't change.

Let's now change to probabilistic decoding to get more diverse texts. Set `do_sample` to True and `num_beams` to 1. Execute your generation multiple times and see how the output changes.

If you run this generation multiple times, you will sometimes see weird outputs. This happens when a low-probability token gets sampled. To avoid this, we limit the options to the top-*k* tokens of the next-token distribution. Set `top_k` to 5 and 50, and compare the results.

Try the same with top-*p* sampling and vary *p*, e.g. use 0.1, 0.8 and 0.95.