# Introduction

In recent years, there has been an increasing interest in open-ended language generation thanks to the rist of large transformer-based lanugage models trained on millions of webpages.

Besides the improved transformer architecture and massive unsupervised training data, better decoding methods have also played an important role.

All of the following functionalities can be used for auto-regressive language generation. In short, auto-regreesive language generation is based on the assumption that the probability distribution of a word sequence can be decomposed into the product of conditional next word distributions:


$$P(w_{1:T}\mid W_0) = \prod_{t=1}^T P(w_t \mid w_{1:t-1}, W_0), \text{ with } w_{1:0} = \emptyset$$

and $W_0$ being the initial context word sequence. The length $T$ of the word sequence is usually determined on-the-fly and corresponds to the timestep $t=T$ the EOS token is generated from $P(w_t \mid w_{1:t-1}, W_0)$.

In [25]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

In [27]:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

In [28]:
print(tokenizer.eos_token_id)

50256


In [30]:
model = GPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id, is_decoder=True)

Downloading:   0%|          | 0.00/548M [00:00<?, ?B/s]

In [34]:
input_ids = tokenizer.encode("How are you?", return_tensors='pt')
input_ids

tensor([[2437,  389,  345,   30]])

In [35]:
beam_output = model.generate(input_ids,
                             max_length=50,
                             num_beams=5,
                             no_repeat_ngram_size=2,
                             early_stopping=True)

results = tokenizer.decode(beam_output[0], skip_special_tokens=True)

print("Output: \n" + 100 * '-')
print(results)

Output: 
----------------------------------------------------------------------------------------------------
How are you?

I've been working on this project for a while now, and I'm really excited to share it with you. It's been a long time coming, but it's finally here. I hope you enjoy it.



In [36]:
beam_output

tensor([[2437,  389,  345,   30,  198,  198,   40, 1053,  587, 1762,  319,  428,
         1628,  329,  257,  981,  783,   11,  290,  314, 1101, 1107, 6568,  284,
         2648,  340,  351,  345,   13,  632,  338,  587,  257,  890,  640, 2406,
           11,  475,  340,  338, 3443,  994,   13,  314, 2911,  345, 2883,  340,
           13,  198]])