<a href="https://colab.research.google.com/github/ilopezfr/gpt-2/blob/master/gpt-2-playground_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GPT-2 Playground

## Background
In this Jupyter notebook you can play around with the small version of **Open AI's GPT-2** Model from the paper *[Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).*

According to the authors, the GPT-2 algorithm was trained on the task of *language modeling*--- which tests a program's ability to predict the next word in a given sentence--by ingesting huge numbers of articles, blogs, and websites. By using just this data it achieved state-of-the-art scores on a number of unseen language tests, an achievement known as *zero-shot learning.* It can also perform other writing-related tasks, like translating text from one language to another, summarizing long articles, and answering trivia questions.

Open AI decided not to release the dataset, training code, or the full GPT-2 model weights. This is due to the concerns about large language models being used to generate deceptive, biased, or abusive language at scale. Some examples of the applications of these models for malicious purposes are:
* Generate misleading news articles
* Impersonate others online
* Automate the production of abusive or faked content to post on social media
* Automate the production of spam/phishing content

As one can imagine, this combined with recent advances in generation of synthetic imagery, audio, and video implies that it's never been easier to create fake content and spread disinformation at scale. The public at large will need to become more skeptical of the content they consume online. 

## Steps
Before starting, is recommended to set *Runtime Type* to *GPU* on the top menu bar.


###1. Installation
Download the model data and istall Python libraries:


In [0]:
!git clone https://github.com/ilopezfr/gpt-2/
import os
os.chdir('gpt-2')
!sh download_model.sh 117M
!pip3 install -r requirements.txt

###  2. Unconditional sample generation

WARNING: Samples are unfiltered and may contain offensive content.

To generate unconditional samples from the small model:
```
!python3 src/generate_unconditional_samples.py
```
There are a few flags available. 
- `nsamples`: specify the number of samples you want to print
- `length`
- `batch_size`
- `temperature`
- `top_k`

In [27]:
!python3 src/generate_unconditional_samples.py --nsamples=2 --top_k=40 --temperature=0.7 | tee samples

2019-02-16 00:46:19.099415: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-02-16 00:46:19.099647: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2db6680 executing computations on platform Host. Devices:
2019-02-16 00:46:19.099685: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-02-16 00:46:19.164511: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-16 00:46:19.164976: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2db75a0 executing computations on platform CUDA. Devices:
2019-02-16 00:46:19.165028: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-02-16 00:46:19.165381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found d

In [0]:
!python3 src/generate_unconditional_samples.py --nsamples=2 --top_k=2 

In [0]:
!python3 src/generate_unconditional_samples.py --nsamples=2 --top_k=80 

## Conditional sample generation

To generate conditional samples from the small model:
```
!python3 src/interactive_conditional_samples.py
```
There are a few flags available. 
- `nsamples`: specify the number of samples you want to print
- `length`
- `batch_size`
- `temperature`
- `top_k`

Want to test it the AI model passes the Voight-Kampff test? Here are some sample prompts to help your start: 
- You're in a desert, walking along in the sand, when all of a sudden you look down and see a tortoise, Leon. It's crawling toward you. 
- What's it like to hold the hand of someone you love? Interlinked. Do they teach you how to feel finger to finger? Interlinked.
- I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhäuser Gate. All those moments will be lost in time, like tears in rain. Time to die.
- In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.
- Building the wall is not a national emergency.
- Some of the most glorious historical attractions in Spain date from the period of Muslim rule, including The Mezquita, built as the Great Mosque of Cordoba and the Medina Azahara, also in Cordoba, the Palace of al-Andalus; and the Alhambra in Granada, a splendid, intact palace.
- Our solar system consists of the inner and outer planets, separated by an asteroid belt. It has 
- The 10 best foods are: 1. Serrano Ham 2. Manchego Cheese 3.  
- "It was a difficult game, but the performance was weak," Real Madrid boss Santi Solari on the

In [26]:
!python3 src/interactive_conditional_samples.py 

2019-02-16 00:32:58.874979: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-02-16 00:32:58.875252: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x34be680 executing computations on platform Host. Devices:
2019-02-16 00:32:58.875290: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-02-16 00:32:58.939793: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-16 00:32:58.940280: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x34bf5a0 executing computations on platform CUDA. Devices:
2019-02-16 00:32:58.940317: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-02-16 00:32:58.940620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found d