<div style=background-color:#EEEEFF>

## 2. Generating Punchlines with a Pre-Trained Transformer Model

What happens when we ask an AI to come up with the punchlines for jokes?

In this exercise, we'll use a pre-trained transformer model, GPT-2, which is the freely-available forerunner of the recent GPT-3 text-generation model that generated tons of press last year. GPT-2 and GPT-3 are built to take a text prompt, and then generate additional new text that "continues" the thread.

Given a joke setup, can GPT-2 produce a plausible punchline? And is it funny?

<div style=background-color:#EEEEFF>

Let's start by reading in the "mini-test" examples from our cleaned-up set of short Q/A-format jokes we assembled in the [JokesDataset Notebook](1.JokesDataset.ipynb).

In [None]:
import pandas as pd

df_mini = pd.read_csv('data/short_jokes_minitest.csv',
                      dtype={'setup':str,'punchline':str,'score':int},
                      keep_default_na=False)
print('{} jokes in the dataset'.format(df_mini.shape[0]))
df_mini.iloc[:3]

<div style=background-color:#EEEEFF>

GPT-2 is trained to be general-purpose text generator, not necessarily to answer questions or provide punchlines.  We're therefore going to give it a few in-text clues that may help it recognize the Q/A joke format we are trying to produce.
    
Let's reformat each "setup" + "punchline" as a single text blob, with the format:
    
> "Question: [setup text, ends with '?'] Answer: [punchline text]"

In [None]:
import data_tools as dtools

df_mini['full_qa'] = df_mini.apply(lambda x: dtools.joke_as_qa(x['setup'], x['punchline']), axis=1)
for i in range(3): 
    print(df_mini.iloc[i]['full_qa'])

<div style=background-color:#EEEEFF>

This is the full joke text.  We'll use these full jokes as training data in the [FineTune Notebook](4.FineTune.ipynb) when we try to train GPT-2 to be better at generating punchlines.  
    
For now, using "out-of-the-box" GPT-2 to generate punchlines, we will provide it a prompt up through "Answer:" and let it fill in the answer.  The prompts look like this:

In [None]:
df_mini['prompt'] = df_mini['full_qa'].apply(lambda x: x[:x.find('Answer: ')+len('Answer:')])
for i in range(3): 
    print(df_mini.iloc[i]['prompt'])

<div style=background-color:#EEEEFF>

Now that we have our prompts ready to go, we need to load our GPT-2 model.  
    
I've written a wrapper that can load several different pre-trained models.  Each model consists of the following:

* A model architecture - This is the network structure of transformer modules.

* A checkpoink - This is a specific trained instance of the model architecture, along with all the associated weights.  A model architecture and checkpoint together specify everything you need to know to reconstruct a particular deep learning network.
    
* A tokenizer - Each transformer model has an associated tokenizer that is used to turn text strings into numeric tokens that represent the words (including punctuation and word-parts).  The numeric tokens are what get fed into the Transformer Model.  When you encode your text into tokens, it is critically important to use the *same* tokenizer the model was trained with.
    
We need to load a specific checkpoint, the ready-to-use model (architecture + weights) that it describes, and the tokenizer used to encode the data that model was trained on.  I'm doing this using the [HuggingFace](https://huggingface.co) `transformers` library and associated models, which are all open-source Python.  
    
*I highly recommend their [Transformers self-paced online course](https://huggingface.co/course/chapter1/1), if you'd like to learn more about using Transformer Models!*

In [None]:
import model_tools as mtools

checkpoint = mtools.load_checkpoint('gpt2')
tokenizer = mtools.load_tokenizer(checkpoint)
model = mtools.load_model(checkpoint)
print('Model loaded.')

<div style=background-color:#EEEEFF>

We're now ready to use the pre-trained GPT-2 model and its associated tokenizer to generate joke punchlines from our prompt! 
    
The text generator here is implemented in Pytorch and set to run on the GPU by default.  We just need to pass it the model, the tokenizer, and a list of prompts.  Let's start by just running it on some prompts from our "minitest" set.

In [None]:
output = mtools.generate(model, tokenizer, list(df_mini['prompt'])[:30])
print('Done.')

<div style=background-color:#EEEEFF>

If the GPU is available, it will use it. If no GPU is available, the generator will fall back on the CPU, which takes 6.5x longer (on the current server).  You can also force the generator to use the CPU with the keyword `use_gpu=False`.  (Note that, while the GPU was able to do multiple iterations / second, the CPU takes > 1 second / iteration!)

In [None]:
output = mtools.generate(model, tokenizer, list(df_mini['prompt'])[:30])
print('Done.')

<div style=background-color:#EEEEFF>

The generator does probabilistic text generation to find likely candidates for the "next token", and then choses randomly from a multinomial distribution, so every time you run it on a prompt, you will get different output.  
    
Let's strip the input prompts off the generated text and remove newlines (like we did for the original jokes), then look at some of the output we just generated, compared to the original punchline.

In [None]:
output = [x[x.find(df_mini.iloc[j]['prompt'])+len(df_mini.iloc[j]['prompt']):] for j,x in enumerate(output)]
output = [x.replace('\n',' ').replace('\r',' ') for x in output]
for i in [0,1,5]:
    print('    Question:  {}'.format(df_mini.iloc[i]['setup'].strip()))
    print('      Answer:  {}'.format(df_mini.iloc[i]['punchline'].strip()))
    print('GPT-2 Answer:  {}'.format(output[i].strip()))
    print('---')

<div style=background-color:#EEEEFF>
    
There are a few interesting things to note here.

* Responses are generally on-topic and sound (mostly) like coherent English.  This is what GPT-2 is good at!
* The responses just ramble on and cut off arbitrarily  We set a 30-token limit if no end-of-string (EOS) token is received; an EOS token is basically *never* generated.  GPT-2 is not good at knowing when to shut up!
* GPT-2 often answers questions with more questions (although structuring our prompts with explicit "Question:/Answer:" format seems to have helped a lot compared to my previous tests...)

<div style=background-color:#EEEEFF>

In the next notebook, we'll use a different kind of Transformer model to train a classifier to differentiate between "real" jokes from the Reddit thread.  To do that, we'll need a nice big training set of "real" and "fake" jokes.  
    
We've got the real ones.  Now we need the fake ones.  
    
We'll make them by generating out-of-the-box GPT-2 punchlines.  That means we need to run our generator on the training and test datasets we created for our jokes dataset.  
    
All of the steps we performed above in this Notebook are packaged up in the `add_fake_punchlines()` function in `fake_punchlines.py`.

In [None]:
# This runs locally in the Notebook on our small 300-joke "mini-test" set.  

from fake_punchlines import add_fake_punchlines

add_fake_punchlines('data/short_jokes_minitest.csv')

<div style=background-color:#EEEEFF>

As we saw, even with a GPU, it takes ~0.3 seconds to generate a punchline.  This means generating punchlines for all 140,000+ jokes in our training + test datasets will take almost 12 hours.  
    
If you have a stable internet connection and can leave your laptop open, you can run them right here in the notebook and hope that you don't get disconnected.  
    
However, a better choice is to run them from the terminal, using `screen` or `tmux` to background the process.  That way, you launch the run, close your laptop, and walk away.  The process will run overnight and the output will be waiting for you when you get up in the morning.

In [None]:
# The next two lines will run on the entire training and test set, if you want to do that from the Notebook.
# However, we recommend running those in the background at the command line, as described below.

# add_fake_punchlines('data/short_jokes_test.csv')
# add_fake_punchlines('data/short_jokes_train.csv')

<div style=background-color:#EEEEFF>

To run in the background, do the following:
* Select **File&rarr;New&rarr;Terminal** to open a Terminal window in a new tab.
* Optional: drag the terminal tab to occupy the lower half of your browser window for a split-screen interface.
* Issue the following commands:
```
    cd ~/examples/nlp_punchlines
    screen -S fake_train
    python fake_punchlines.py data/short_jokes_train.csv
```
* Then type "Ctl-a d" to detach from the screen.  The process will continue running in the background.
    
You can check on your background run by either:
* Reattaching to the screen with 
```
    screen -r fake_train
```
* Looking at the fake punchlines that are being written (in batches of 100) to `data/short_jokes_train_fake.csv` with
```
    tail data/short_jokes_train_fake.csv
``` 
* If you reattached to check on your run, make sure to detach again with "Ctl-a d" before you walk away from your laptop!
    
Now set the test data running in the background in another screen:
```
    screen -S fake_test
    python fake_punchlines.py data/short_jokes_test.csv
```
Remember to type "Ctl-a d" to detach from the screen!  The process will continue running in the background.

With both processes running on the GPU, things will run a little slower on each thread (2.5 it/s instead of 3.5 it/s in our tests), but running both threads at the same time will effectively get you 5 it/s, so the whole process should complete in 6 hours.