# Text Generation with GPT-2

In this exercise, we will use a distilled version of GPT-2 to generate text.

In [1]:
import torch

Check out `distilgpt2`'s [model description](https://huggingface.co/distilgpt2) on the Hugging Face model hub.

In [2]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('distilgpt2')
model = GPT2LMHeadModel.from_pretrained('distilgpt2')
model.eval()
sentence = 'Yesterday, I dreamed about being an apple on a cruise through Antarctica.'

First, we encode the `sentence` with the GPT-2 `tokenizer` and then run a forward pass through the GPT-2 `model` to get familiar with its interface.

In [7]:
encoded_input = tokenizer(sentence, return_tensors='pt')
print(encoded_input.keys())

with torch.no_grad():
  outputs = model(**encoded_input, labels=encoded_input['input_ids'])

print(outputs.keys())
print(outputs['loss'])

dict_keys(['input_ids', 'attention_mask'])
odict_keys(['loss', 'logits', 'past_key_values'])
tensor(4.6747)


Compute the perplexity for this example.

In [9]:
perplexity = 2 ** outputs['loss'].item()
print(perplexity)

25.539802518759615


Now we use the transformer library's `.generate` function by passing `input_ids` and otherwise using the default parameters to generate a continuation to our prompt: "Yesterday, I dreamed about".

In [12]:
prompt = "Yesterday, I dreamed about"
inputs = tokenizer(prompt, return_tensors='pt')
outputs = model.generate(**inputs, pad_token_id=tokenizer.eos_token_id)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded_output)

Yesterday, I dreamed about it. I was a little bit scared of the idea of being a kid. I was a little


Not bad. Increase the `max_length` argument to `generate` from 20 (default) to 50 and see how the story continues.

In [14]:
outputs = model.generate(**inputs, pad_token_id=tokenizer.eos_token_id, max_length=50)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded_output)

Yesterday, I dreamed about it. I was a little bit scared of the idea of being a kid. I was a little scared of the idea of being a kid. I was a little scared of the idea of being a kid. I was a


Uh oh. The model gets stuck in a repetitive loop. Let's prevent that by setting `no_repeat_ngram_size` to 3 (trigram blocking).

In [15]:
outputs = model.generate(**inputs, pad_token_id=tokenizer.eos_token_id, max_length=50, no_repeat_ngram_size=3)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded_output)

Yesterday, I dreamed about it. I was a little bit scared of the idea of being a kid. I had no idea what it was like to be a kid, and I was so scared of it.


I was so excited about


What is the default behavior of `.generate`? Print the model's config to see what generation parameters it uses.

In [16]:
print(model.config)

GPT2Config {
  "_attn_implementation_autoset": true,
  "_num_labels": 1,
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "id2label": {
    "0": "LABEL_0"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0
  },
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 6,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "torch_dtype": "float32",
  "transformers_version": "4.51.3",
  "use_cache": tr

Look at the [documentation of GenerationMixin](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.generation_utils.GenerationMixin) to see what decoding method is used with these parameters. Scroll down to the parameters of the `generate` function to see what the default values for e.g. `num_beams` is.

**Answer:** greedy decoding (because it uses the default arguments)

Let's use beam search with 5 beams instead. Check out the documentation again to see what arguments you have to use for beam search decoding.

In [19]:
outputs = model.generate(**inputs, pad_token_id=tokenizer.eos_token_id, max_length=50, num_beams=5, no_repeat_ngram_size=3)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded_output)

Yesterday, I dreamed about it for a long time, and now I’m finally able to do it again.

I’ve been working on it for quite a while now, and it’s finally ready to go.


Greedy decoding and beam search are deterministic decoding methods. If you want, you can run the previous generations again and see that the output doesn't change.

Let's now change to probabilistic decoding to get more diverse texts. Set `do_sample` to True and `num_beams` to 1. Execute your generation multiple times and see how the output changes.

In [21]:
outputs = model.generate(**inputs, pad_token_id=tokenizer.eos_token_id, max_length=50, do_sample=True)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded_output)

Yesterday, I dreamed about making this very fast paced movie. I got sick of the plot in a horrible way, and I couldn't figure out that I would create it properly.

I have also been trying to do something with the film,


If you run this generation multiple times, you will sometimes see weird outputs. This happens when a low-probability token gets sampled. To avoid this, we limit the options to the top-*k* tokens of the next-token distribution. Set `top_k` to 5 and 50, and compare the results.

In [22]:
outputs = model.generate(**inputs, pad_token_id=tokenizer.eos_token_id, max_length=50, do_sample=True, top_k=5)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded_output)

Yesterday, I dreamed about a future in the future. I wanted it to be like the world I am. I was so happy to see what happened, and I wanted my family to be happy and proud of me, but I also wanted my family


In [23]:
outputs = model.generate(**inputs, pad_token_id=tokenizer.eos_token_id, max_length=50, do_sample=True, top_k=50)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded_output)

Yesterday, I dreamed about getting it done once a while but sadly it wasn't. I just started feeling depressed. I remember reading one of my friends's blogs that said things like, "My friends are going to go down and eat my mom and


Try the same with top-*p* sampling and vary *p*, e.g. use 0.1, 0.8 and 0.95.