# 1. Introduction
[GPT-2](https://openai.com/blog/better-language-models/#sample8) is a generative language model. The model is trained on 40GB of web text. It works by predicting the next word given the previous words in a text, and therefore, could be used to generate a continuation of a given text.

An example of a text generated by GPT-2:

prompt: I think Leonardo Dicaprio's performance was

Generated text: excellent," said director Roman Polanski, who directed all of Polanski's movies with Polanski in the past. "I just felt the same way after the movie as I did after 'Django Unchained,' which is that he makes you feel a certain way, which isn't that great, but he knows where to go with it. If I could say a bit more about it, I'd say it's a good little family film. And then there's the


# 2. Package installation
These commands should be executed once at the beginning in order to install the packages required to run the model. You should get the right command for installing PyTorch from https://pytorch.org/get-started/locally/#start-locally

In [None]:
#package installation
!pip install torch torchvision torchaudio
!pip install transformers

# 3. Import packages
Run these comands in the beginning in order to import the different packages in the code.

In [73]:
from transformers import AutoModelForCausalLM, AutoTokenizer, GPT2TokenizerFast, GPT2LMHeadModel, set_seed
import random

# 4. Model parameters
You can set different parameters for the GPT-2 model as follows:
1. 'model_name': GPT-2 comes in different sizes based on the number of training parameters:
    - 'gpt2': 117M parameters
    - 'gpt2-medium': 345M parameters
    - 'gpt2-large': 774M parameters
    - 'gpt2-xl': 1558M parameters

    Note that the larger the model, the longer it takes to load and use for generation.

2. 'num_samples': the number of sample responses to generate.

3. 'max_length': the maximum number of words to be generated in a response.

4. 'sampling': The model generates the response word by word, conditioned on the input text and the sequence of words generated so far in the respose. There are different sampling strategies such as:
    - 'top-p' (nucleas sampling)
    - 'top-k' 
    - 'temperature'
    
    You can also combine different sampling approaches, but for simplicity, we will skip that. This is a nice blog post on sampling: https://huggingface.co/blog/how-to-generate

5. After choosing the 'sampling' strategy, you can set the corresponding value: 'top-p', 'top-k' or 'temperature', where 0 $\leqslant$ top-p $\leqslant$ 1, top-k $\geqslant$ 0, and 0 $\leqslant$ temperature $\leqslant$ 1.

In [188]:
params = {
    'model_name': 'gpt2-medium',
    'num_samples': 3,
    'max_length': 100,
    'sampling': 'top-p',
    'top-p': 0.92,
    'top-k': 40,
    'temperature': 0.7,
    
}

# 5. Loading the model
The following command loads the model into the memory

In [189]:
#load the model
set_seed(42)
tokenizer = GPT2TokenizerFast.from_pretrained(params['model_name'])
model = GPT2LMHeadModel.from_pretrained(params['model_name'], pad_token_id=tokenizer.eos_token_id)

# 6. Reading the input
We can input the text that GPT-2 will respond to either by command line:

In [195]:
#generation with interactive mode
prompts = []
prompt = input()
prompts.append(prompt)

I think Leonardo Dicaprio's performance was


["I think Leonardo Dicaprio's performance was"]

or from a text file (input.txt) where each line contains a different prompt:

In [202]:
#generating from a text file
prompts = []
with open('input.txt') as f:
    for line in f:
        prompt = line.strip()
        if prompt == '':
            continue
        prompts.append(prompt)

# 7. Generating the output
By running this code you can generate the GPT-2 responses corresponding to the provided prompts:

In [203]:
#generation itself
output = {}
for text in prompts:
	input_ids = tokenizer.encode(text, return_tensors='pt')
	max_length = len(text.split()) + params['max_length']
	responses = []
	if params['sampling'] == 'top-k':
		responses = model.generate(input_ids, max_length = max_length, do_sample=True, 
            top_k=params['top-k'], num_return_sequences=params['num_samples'])
	elif params['sampling'] == 'temperature':
		responses = model.generate(input_ids, max_length = max_length, do_sample=True, 
            temperature=params['temperature'], num_return_sequences=params['num_samples'])
    #default is nuclear (top-p)
	else:
		responses = model.generate(input_ids, max_length = max_length, do_sample=True, 
            top_p=params['top-p'], num_return_sequences=params['num_samples'])
# 	responses = model.generate(input_ids, max_length=max_length)
	responses = responses[:, input_ids.shape[-1]:]
	output[text] = []
	for i, r in enumerate(responses):
		response = tokenizer.decode(r, skip_special_tokens=True)#.strip().split('\n')[0]
		output[text].append(response)

You can print the output here by running:

In [204]:
for k in output:
    print('prompt: ' + k + '\n')
    for i, r in enumerate(output[k]):
        print('==================================Sample ' + str(i+1) + '==================================')
        print(r.strip() + '\n')
    print('********************************************************************************************************************')
    print('')

prompt: Does Brexit have a positive impact on the British economy?

There are many factors that have to be considered when comparing the impact of Brexit to its predecessors such as the impact on GDP and trade. However, the short-term impacts seem to be positive - the economy is expected to grow by 0.6 per cent during the next two years.

In terms of long-term GDP and employment, the Bank of England has warned Brexit could add 2.8 million jobs over the next three years. These are expected to be largely in the

It depends entirely on how you define "positive impact". There are many ways to measure this – for example, GDP in the UK has fallen by 0.2% in 2017, according to Eurostat (it's now down 0.2%). GDP growth in the US, on the other hand, has increased by almost 0.3%, according to the US Bureau of Economic Analysis (BEA).

Of course, the US economy does not respond to the UK's departure.

The recent Brexit vote suggests that we have the opportunity to have a positive impact on the ec

Or save it to a file by running:

In [111]:
fw = open('gpt2_output.txt', 'w')
for k in output:
    fw.write('prompt: ' + k + '\n\n')
    for i, r in enumerate(output[k]):
        fw.write('==================================Sample ' + str(i+1) + '==================================\n')
        fw.write(r.strip() + '\n')
    fw.write('********************************************************************************************************************\n\n')