<div style=background-color:#EEEEFF>

## 3. Recognizing "Real" vs. "Fake" jokes

In [2.FakePunchlines](2.FakePunchlines.ipynb), we got GPT-2 to generate the punchlines to some jokes.  A human can pretty easily tell which are the real punchlines and which are the GPT-2-generated ones.  But can GPT-2 fool another AI?
    
In this Notebook, we'll use the HuggingFace [transformers](https://github.com/huggingface/transformers) library to train an NLP classifier to distinguish between the real joke punchlines and the fake ones generated by GPT-2.  

<div style=background-color:#EEEEFF>

We start by loading in our training and test datasets that were generated in [1.JokesDataset](1.JokesDataset.ipynb).  While we're trying to get things to work, let's downsample the data by 100x (i.e., use only 1% of the data), just so that things will run fast.  When we're ready to train for real, we'll use a factor of 1x (no downsampling).

In [None]:
# Load our dataset of real and fake jokes, split into training and test sets

from datasets import load_dataset

# For development purposes, we downsample by 100x, just so things run fast
downsample_factor = 100
# Load the data, filter out any non-string input (it breaks the code), and downsample
dataset = load_dataset('csv', data_files={'train':['data/short_jokes_train.csv','data/short_jokes_train_fake.csv'],
                                          'test':['data/short_jokes_test.csv','data/short_jokes_test_fake.csv']})
dataset = dataset.filter(lambda ex,j: ((type(ex['setup'])==str) & (type(ex['punchline'])==str) & 
                                       (j%downsample_factor==0)),                         
                         with_indices=True)
print('{} rows in the train dataset ({}x downsampled).'.format(dataset['train'].num_rows,downsample_factor))
print('{} rows in the test dataset ({}x downsampled).'.format(dataset['test'].num_rows,downsample_factor))

<div style=background-color:#EEEEFF>

We've loaded the data as a HuggingFace "dataset", which can be read incrementally from disk (instead of loading the whole dataset into memory at once), and which can be fed nicely into PyTorch DataLoaders when it comes time to train a model.  

Notice that we've mixed together the "real" jokes and the "fake" jokes (whose punchlines are generated by GPT-2) in both our "train" and "test" datasets.  The difference is that all the "real" jokes have score > 0 (because we only selected jokes with at least one upvote), whereas we created all the fake jokes with score = 0.
   
Let's take a quick look at how the data are formatted.

In [None]:
print(dataset)

<div style=background-color:#EEEEFF>

The dataset is stored as a DatasetDict, which has two components, "train" and "test", each of which contain the data rows and the "features" from the CSV.
    
We can look at the first examples of the "real" and the "fake" jokes in our test dataset.

In [None]:
print('A joke with a human-generated punchline:')
print(dataset['test'][0])
print()
print('A joke with a GPT2-generated punchline:')
print([x for x in dataset['test'] if x['score']==0][0])

<div style=background-color:#EEEEFF>

Now that we've loaded our data, let's use it to train a model!
    
We're not, however, going to start training from scratch.  
    
Transformer models, each with 100s of millions or even 100s of *billions* of free parameters trained on an enormous corpus of documents, are very expensive to train, in terms of both time and compute costs.  Not only do we not want to wait through and pay for all that compute time, but the carbon footprint of training a state-of-the-art model is [LARGE](https://huggingface.co/course/chapter1/4?fw=pt#transformers-are-big-models).  
    
Instead, we will use [transfer learning](https://towardsdatascience.com/cnn-transfer-learning-fine-tuning-9f3e7c5806b2).  To do that, we start from a pre-trained model, which has already been trained through many epochs on huge document datasets.  We will then do some "fine-tuning" training using our Jokes dataset to produce a classifier that is particularly good at distinguishing human-generated vs. GPT2-generated jokes.
    
Let's start with the [BERT model](https://arxiv.org/pdf/1810.04805.pdf) from Google's AI Language lab.  We're going to use it to do "sequence classification", where the model decides if one sequence (in our case, the punchline) is an appropriate follow-on from a previous sequence (in our case, the setup).

In [None]:
import model_tools as mtools

checkpoint = mtools.load_checkpoint('bert')
tokenizer = mtools.load_tokenizer(checkpoint)
model = mtools.load_model(checkpoint)

<div style=background-color:#EEEEFF>

We got some warnings that we've loaded a version of BERT that is not already set up for sequence classification, and that it needs some training to be ready to use this way.  That's okay!  Training is exactly what we are about to do.
    
But first, we need to tokenize the data using BERT's tokenizer, so that our text is encoded using the same token mapping that BERT has been pre-trained to expect.  
    
Two additional processes will happen as part of this tokenization: we'll pad the tokenized strings in each batch to be the same length, so that they can be loaded together into a single PyTorch tensor, and we'll generate the associated attention mask that tells the model which tokens are padding tokens that can be ignored.

In [None]:
# Tokenize the data

import data_tools as dtools

# Use a tokenize function to deal with tokenization and (batch) padding:
#    -- all tokenized strings in a batch need to be padded to the same length 
#       to be loaded into a PyTorch tensor together
def tokenize_function(example):
    full_qa = dtools.joke_as_qa(example['setup'], example['punchline'])
    q = [x[:x.find('Answer:')].strip() for x in full_qa]
    a = [x[x.find('Answer:'):].strip() for x in full_qa]
    return tokenizer(q, a, padding="max_length", max_length=60, truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

print(tokenized_datasets)

<div style=background-color:#EEEEFF>

Notice that the tokenization process has generated and added the token "input_ids", "token_type_ids", and "attention_mask" to our data structure.  These features are what gets input to the model training process.
    
We also need to add the classification labels that we use to distinguish "real" from "fake" jokes.  In our case, the "real" data all have *score>0*, while "fake" data have *score=0*, so we will map everything with *score>0* to have *label=1* and everything with *score=0* to have *label=0*.
    
Finally, we want to drop the input columns from the dataset, now that we've generated the token lists, attention masks, and labels that we will pass to the models.

In [None]:
tokenized_datasets = tokenized_datasets.map(lambda batch: {"labels": [int(x > 0) for x in batch["score"]]}, batched=True)

# Clean up / reformat data to fit into a PyTorch DataLoader
# We don't need the text strings themselves anymore
tokenized_datasets = tokenized_datasets.remove_columns(["setup", "punchline", "score"])
tokenized_datasets.set_format("torch")

print(tokenized_datasets)

<div style=background-color:#EEEEFF>

Now we're ready to do the fine-tuning training of our classifier!  The training loop itself is implemented in the *train_classifier()* function defined in [model_tools.py](model_tools.py).
    
We'll train through just 3 epochs with our downsampled training set, just to see how well BERT does with minimal training.  The fake jokes looked pretty different from the real ones, so it shouldn't be too hard!

In [None]:
model = mtools.train_classifier(tokenized_datasets, model, epochs=3)

from datetime import datetime
checkpoint_name = checkpoint.split('/')[-1].split('-')[0]
filename = 'models/ClassifyJokes_{}_{:4.2f}subset_{}'.format(checkpoint_name,1.0/downsample_factor,datetime.now().date())+'.pt'

from torch import save
print('Saving model as {}'.format(filename))
save(model,filename)

<div style=background-color:#EEEEFF>

As expected, the classifier does pretty well with minimal training: better than 95%.  It may not even be able to improve much with the full training set.  It takes ~30 seconds/epoch to run on 1% of the full training set, so it should take a couple of hours to run 3 epochs with the full training set. 

<div style=background-color:#EEEEFF>

The steps here have been encapsulated in the function *classify_punchlines()*, implemented in [model_tools.py](model_tools.py), so you can run the entire process documented above with the following commands:

In [None]:
from punchline_classifier import train_punchline_classifier

train_files = ['data/short_jokes_train.csv','data/short_jokes_train_fake.csv']
test_files = ['data/short_jokes_test.csv','data/short_jokes_test_fake.csv']

# Set downsample=1 or leave out to train on the full training set (it defaults to 1)
model = train_punchline_classifier(train_files, test_files, downsample=20)  

<div style=background-color:#EEEEFF>

Using the entire dataset to train the classifier takes a couple of hours and gives only marginal improvement (~98% accuracy).  If you want to run it, we recommend running it from the command line in a detached screen, as in [2.FakePunchlines](2.FakePunchlines.ipynb).

* `$> screen -S train_class`
* `$> python punchline_classifier.py --train data/short_jokes_train.csv,data/short_jokes_train_fake.csv --test data/short_jokes_test.csv,data/short_jokes_test_fake.csv`

Then "Ctl-a d" to detach.

<div style=background-color:#EEEEFF>

When the model is finished running, we can load it with the following:

In [None]:
from torch import load
model = load(filename)

<div style=background-color:#EEEEFF>

Let's take a look at how well the predictions do, and look at examples that the classifier got wrong to understand what it can do and what its limitations are.
    
We start by making predictions for our test data and comparing it to the labels.

In [None]:
pred = mtools.classify_punchlines(tokenized_datasets['test'],model)
labels = list(tokenized_datasets['test']['labels'].squeeze().numpy())

import pandas as pd
pd.options.display.max_colwidth = None   # don't truncate the column text
df = pd.DataFrame()
df['labels'] = labels
df['pred'] = pred
df['jokes'] = [dtools.joke_as_qa(x['setup'], x['punchline']) for x in dataset['test']]
confusion_matrix = df.groupby(['labels','pred']).size().unstack(fill_value=0)
print(confusion_matrix)
print()
print('{:>10d} real jokes that the classifier correctly predicted to be real'.format(confusion_matrix[1].iloc[1]))
print('{:>10d} fake jokes that the classifier correctly predicted to be fake'.format(confusion_matrix[0].iloc[0]))
print('{:>10d} real jokes that the classifier thought were fake'.format(confusion_matrix[0].iloc[1]))
print('{:>10d} fake jokes that the classifier thought were real'.format(confusion_matrix[1].iloc[0]))

In [None]:
print('Here are the real jokes the classifier thought were fake:')
df[(df['pred']==0) & (df['labels']==1)]

In [None]:
print('Here are the fake jokes the classifier thought were real:')
df[(df['pred']==1) & (df['labels']==0)]

<div style=background-color:#EEEEFF>

The classifier was able to do a good job sorting out the real jokes from the ones with AI-generated punchlines.  That means pre-trained, straight-out-of-the-box GPT-2 is not able to fool BERT (or, if we're going to anthropomorphize the AIs, GPT-2 can't make BERT laugh!).  
    
Let's see if GPT-2 can do better once it has been fine-tuned on a set of real jokes.