# Text Generation with GPT-2

In this exercise, we will use a distilled version of GPT-2 to generate text.

In [1]:
import torch

Check out `distilgpt2`'s [model description](https://huggingface.co/distilgpt2) on the Hugging Face model hub.

In [2]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('distilgpt2')
model = GPT2LMHeadModel.from_pretrained('distilgpt2')
model.eval()
sentence = 'Yesterday, I dreamed about being an apple on a cruise through Antarctica.'

First, we encode the `sentence` with the GPT-2 `tokenizer` and then run a forward pass through the GPT-2 `model` to get familiar with its interface.

In [3]:
inputs = tokenizer(sentence, return_tensors='pt')
with torch.no_grad():
    outputs = model(
        input_ids=inputs['input_ids'],
        attention_mask=inputs['attention_mask'],
        labels=inputs['input_ids'],
    )
print(outputs.keys())
print(outputs.loss)

odict_keys(['loss', 'logits', 'past_key_values'])
tensor(4.6747)


Compute the perplexity for this example.

In [4]:
# perplexity is 2^H(p, q), and the cross-entropy H(p, q) is already our loss
perplexity = 2 ** outputs.loss.item()
print(f"Perplexity in bits: {perplexity:.2f}")

Perplexity in bits: 25.54


Now we use the transformer library's `.generate` function by passing `input_ids` and otherwise using the default parameters to generate a continuation to our prompt: "Yesterday, I dreamed about".

In [5]:
prompt = "Yesterday, I dreamed about"
inputs = tokenizer(prompt, return_tensors='pt')
outputs = model.generate(input_ids=inputs['input_ids'])  # uses torch.no_grad() inside
# outputs = model.generate(**inputs)  # shorter, passes attention_mask as well
tokenizer.batch_decode(outputs)[0]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


'Yesterday, I dreamed about it. I was a little bit scared of the idea of being a kid'

Not bad. Increase the `max_length` argument to `generate` from 20 (default) to 50 and see how the story continues.

In [6]:
outputs = model.generate(**inputs, max_length=50, pad_token_id=tokenizer.eos_token_id)
tokenizer.batch_decode(outputs)[0]

'Yesterday, I dreamed about it. I was a little bit scared of the idea of being a kid. I was a little scared of the idea of being a kid. I was a little scared of the idea of being a kid. I was a'

Uh oh. The model gets stuck in a repetitive loop. Let's prevent that by setting `no_repeat_ngram_size` to 3 (trigram blocking).

In [7]:
outputs = model.generate(**inputs, max_length=50, no_repeat_ngram_size=3, pad_token_id=tokenizer.eos_token_id)
tokenizer.decode(outputs[0])  # can use decode on first element if you know batch_size is 1

'Yesterday, I dreamed about it. I was a little bit scared of the idea of being a kid. I had no idea what it was like to be a kid, and I was so scared of it.\n\n\nI was so excited about'

What is the default behavior of `.generate`? Print the model's config to see what generation parameters it uses.

In [8]:
# print(model.config)  # we see that under "task_specific_params", the config has a value "text-generation"
print(model.config.task_specific_params)

{'text-generation': {'do_sample': True, 'max_length': 50}}


Look at the [documentation of GenerationMixin](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.generation_utils.GenerationMixin) to see what decoding method is used with these parameters. Scroll down to the parameters of the `generate` function to see what the default values for e.g. `num_beams` is.

**Answer:** With `do_sample` set to True and `num_beams` being 1 by default, our model would use multinomial (= pure) sampling with a maximum length of 50. But since the `do_sample` parameter is stored in `model.config.task_specific_params.text_generation`, the `generate` method doesn't find it, and uses the default value for `do_sample` instead, which is False. That means our generations above where using greedy decoding.

Let's use beam search with 5 beams instead. Check out the documentation again to see what arguments you have to use for beam search decoding.

In [9]:
# for beam search decoding, set do_sample to False and num_beams > 1
outputs = model.generate(**inputs, do_sample=False, num_beams=5, max_length=50, no_repeat_ngram_size=3, pad_token_id=tokenizer.eos_token_id)
tokenizer.batch_decode(outputs)[0]

'Yesterday, I dreamed about it for a long time, and now I’m finally able to do it again.\n\nI’ve been working on it for quite a while now, and it’s finally ready to go.'

Greedy decoding and beam search are deterministic decoding methods. If you want, you can run the previous generations again and see that the output doesn't change.

Let's now change to probabilistic decoding to get more diverse texts. Set `do_sample` to True and `num_beams` to 1. Execute your generation multiple times and see how the output changes.

In [10]:
outputs = model.generate(**inputs, do_sample=True, num_beams=1, max_length=50, pad_token_id=tokenizer.eos_token_id)
tokenizer.batch_decode(outputs)[0]

'Yesterday, I dreamed about joining the ranks of my husband and wife. That’s what I wanted to express in the first place. We have become the epitome of what happens when you‡re with kids, family, friends. Your'

If you run this generation multiple times, you will sometimes see weird outputs. This happens when a low-probability token gets sampled. To avoid this, we limit the options to the top-*k* tokens of the next-token distribution. Set `top_k` to 5 and 50, and compare the results.

In [11]:
outputs = model.generate(**inputs, do_sample=True, num_beams=1, max_length=50, top_k=5, pad_token_id=tokenizer.eos_token_id)
text = tokenizer.batch_decode(outputs)[0]
print(f'k = 5: {text}')
outputs = model.generate(**inputs, do_sample=True, num_beams=1, max_length=50, top_k=50, pad_token_id=tokenizer.eos_token_id)
text = tokenizer.batch_decode(outputs)[0]
print(f'k = 50: {text}')

k = 5: Yesterday, I dreamed about a little more than a month ago and I’ve always dreamed of a few days ago and I was excited about it. I think I’ve always dreamed of something that I’ve always dreamed of.
k = 50: Yesterday, I dreamed about one day writing a game called Mockingjay with a big goal: ‏#GamerGate,‏#GamerGate is a #GamerGate news channel.‏

I was thinking about the next thing I


Try the same with top-*p* sampling and vary *p*, e.g. use 0.1, 0.8 and 0.95.

In [12]:
def top_p_generate(p):
    outputs = model.generate(**inputs, do_sample=True, num_beams=1, max_length=50, top_p=5, pad_token_id=tokenizer.eos_token_id)
    return tokenizer.batch_decode(outputs)[0]

print(f'p = 0.1: {top_p_generate(0.1)}')
print(f'p = 0.8: {top_p_generate(0.8)}')
print(f'p = 0.95: {top_p_generate(0.95)}')

p = 0.1: Yesterday, I dreamed about doing a blog. I dreamed about setting up a forum and then playing against it. This blog, the "Beware of the "Digg" site, could make it a place to write more. If things didn't
p = 0.8: Yesterday, I dreamed about making a computer that was a huge computer by design. And of course, I'm happy to report that we successfully completed our project with an impressive number of hands-on testing.


I'm really pleased that I
p = 0.95: Yesterday, I dreamed about it with my father. I was just being around my kid and watching a movie, but I thought I'd love to do it for him. I would make his movie and I would like to give them to me to really
