# Generate Text with OpenAI's GPT-2 Language Model (117M version)

## For more information, refer to OpenAI's original blog post:
https://blog.openai.com/better-language-models

## Details:
OpenAI only released their smaller 117M parameter model but even this small model performs suprisingly well. You can generate conditional samples from a given sentence or generate unconditional samples. 

### Tuning parameters for optimal predictions
The model starts repeating itself more often when given short prompts, but changing the temperature from the default of 0.7 can give you better results. Increasing the temperature forces the model to make more novel predictions, but often causes the model to go off topic. Decreasing the temperature keeps the model from going off topic, but causes the model to repeat itself more often. 

## Options:
```
--text : sentence to begin with.
--quiet : not print all of the extraneous stuff like the "================"
--nsamples : number of sample sampled in batch when multinomial function use
--unconditional : If true, unconditional generation.
--batch_size : number of batch size
--length : sentence length (< number of context)
--temperature: the thermodynamic temperature in distribution (default 0.7)
--top_k : Returns the top k largest elements of the given input tensor along a given dimension. (default 40)
```

## Code:
 https://github.com/graykode/gpt-2-Pytorch

In [1]:
# From https://github.com/graykode/gpt-2-Pytorch
import os
!git clone https://github.com/graykode/gpt-2-Pytorch.git
os.chdir('./gpt-2-Pytorch')
!curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin
!pip install -r requirements.txt

Cloning into 'gpt-2-Pytorch'...
remote: Enumerating objects: 1, done.[K
remote: Counting objects: 100% (1/1), done.[K
remote: Total 130 (delta 0), reused 0 (delta 0), pack-reused 129[K
Receiving objects: 100% (130/130), 2.39 MiB | 1.03 MiB/s, done.
Resolving deltas: 100% (48/48), done.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  522M  100  522M    0     0  13.9M      0  0:00:37  0:00:37 --:--:-- 16.4M
Collecting regex==2017.4.5 (from -r requirements.txt (line 1))
[?25l  Downloading https://files.pythonhosted.org/packages/36/62/c0c0d762ffd4ffaf39f372eb8561b8d491a11ace5a7884610424a8b40f95/regex-2017.04.05.tar.gz (601kB)
[K     |████████████████████████████████| 604kB 2.7MB/s eta 0:00:01     |███████████████████▋            | 368kB 2.7MB/s eta 0:00:01
[?25hBuilding wheels for collected packages: regex
  Building wheel for regex (setup.py) ... [?25ldone
[?25h  Crea

# Generating conditional samples

In [2]:
!python main.py --text "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English."

Namespace(batch_size=-1, length=-1, nsamples=1, quiet=False, temperature=0.7, text='In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.', top_k=40, unconditional=False)
In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.
100%|█████████████████████████████████████████| 512/512 [00:07<00:00, 64.08it/s]


"We were looking for the source of the word 'unicorn,' so we were talking about, 'It's not the unicorns' or 'It's the unicorns,'" says Dr. Peter F. Krasnov, a professor of anthropology and head of the University of Montana's anthropology department.

While it's not as common as it was in the past, the unicorns in the Andes Mountains are

In [None]:
os.chdir('../')
!rm -rf gpt-2-Pytorch