# Generating Fake News with OpenAI’s GPT-2 Language Model

## Background

This notebook is a stripped down version of the original notebook by Lopez, Francos (ilopezfr), in order to support the tutorial on GradientCrescent.

In this Jupyter notebook you can play around with of **Open AI's GPT-2** Language Model from the paper **[Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf)**. 

You'll be able to choose between the small (**117M** parameters) , medium (**345M** parameters), large (**774M** parameters) and XL versions (**1.5B** parameters) version of GPT-2, in the second cell code block.
Conditional sample genretion is implemented, simply type in seed words when prompted in the console.





In [None]:
import os
!pip install tensorflow==1.15

Collecting tensorflow==1.15
[?25l  Downloading https://files.pythonhosted.org/packages/3f/98/5a99af92fb911d7a88a0005ad55005f35b4c1ba8d75fba02df726cd936e6/tensorflow-1.15.0-cp36-cp36m-manylinux2010_x86_64.whl (412.3MB)
[K     |████████████████████████████████| 412.3MB 37kB/s 
Collecting gast==0.2.2
  Downloading https://files.pythonhosted.org/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz
Collecting tensorboard<1.16.0,>=1.15.0
[?25l  Downloading https://files.pythonhosted.org/packages/1e/e9/d3d747a97f7188f48aa5eda486907f3b345cd409f0a0850468ba867db246/tensorboard-1.15.0-py3-none-any.whl (3.8MB)
[K     |████████████████████████████████| 3.8MB 34.4MB/s 
Collecting tensorflow-estimator==1.15.1
[?25l  Downloading https://files.pythonhosted.org/packages/de/62/2ee9cd74c9fa2fa450877847ba560b260f5d0fb70ee0595203082dafcc9d/tensorflow_estimator-1.15.1-py2.py3-none-any.whl (503kB)
[K     |████████████████████████████████| 512kB 38.0MB/s 
Building

In [None]:
!git clone https://github.com/EXJUSTICE/gpt-2/
os.chdir('gpt-2')
!python download_model.py 117M
# !python download_model.py 345M
#!python download_model.py 774M
#!python download_model.py 1558M
!pip3 install -r requirements.txt

Cloning into 'gpt-2'...
remote: Enumerating objects: 3, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 313 (delta 0), reused 0 (delta 0), pack-reused 310[K
Receiving objects: 100% (313/313), 4.64 MiB | 13.72 MiB/s, done.
Resolving deltas: 100% (174/174), done.
Fetching checkpoint: 1.00kit [00:00, 928kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 31.7Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 793kit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 498Mit [00:11, 45.0Mit/s]                                  
Fetching model.ckpt.index: 6.00kit [00:00, 2.86Mit/s]                                               
Fetching model.ckpt.meta: 472kit [00:00, 27.6Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 22.8Mit/s]              

## Conditional sample generation

To generate conditional samples from the small model:
```
!python3 src/interactive_conditional_samples.py
```
It comes with a few flags available, with a default value: 
-  `model_name = '117M' ` : choose between 117M, 345M, 774M, and 1558M models. By default is 117M. 
- `seed = None`  || a random value is generated unless specified. give a specific integer value if you want to reproduce same results in the future.
- `nsamples = 1`     ||  specify the number of samples you want to print
- `length = None`   ||  number of tokens (words) to print on each sample.
- `batch_size= 1`  ||  how many inputs you want to process simultaneously. *only affects speed/memory* 
- `temperature = 1`  ||  float between 0 and 1. scales logits before sampling prior to softmax. higher temperature results in more random completions.
- `top_k = 0`   ||  Integer value controlling diversity.  Truncates the set of logits considered to those with the highest values. 1 means only 1 word is considered for each step (token), resulting in deterministic completions. 40 means 40 words are considered at each step. 0 (default) is a special setting meaning no restrictions. 40 generally is a good value.



The authors tested the model performance on a few different language tasks, including **reading comprehension, text completion, summarization, translation, and question-answering.**

Below are a few examples selected to test the aforementioned behaviors:

In [None]:
!python3 src/interactive_conditional_samples.py --model_name='117M' --nsamples=2 --top_k=40 --temperature=.80


2020-06-19 15:03:00.907608: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-19 15:03:00.963437: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-19 15:03:00.964244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
2020-06-19 15:03:00.984939: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-06-19 15:03:01.264215: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-06-19 15:03:01.389845: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10