# Fine-tuning GPT2 to control inherent bias

In this notebook, we fine-tune the pretrained GPT2 model provided in the *transformers* library on an aggregated hate speech dataset, assuming that longer training on hate speech data introduces a higher bias.

Adapt the file paths and execute this cell to mount Google Drive (needed if checkpoints are to be saved).

In [None]:
from google.colab import drive
drive.mount('/content/drive/')
%cd "/content/drive/My Drive/NLP2_proj2/experiments"
!ls

Install dependencies. Only *transformers* and *datasets* are necessarily needed. *accelerate* is necessary if the models are to be fine-tuned on GPU (recommended!). Additionally, *torch* needs to be installed if not available on your system.

In [None]:
!pip install transformers==4.28.0
!pip install datasets
!pip install accelerate

Execute this cell to import dependencies.

In [None]:
from datasets import load_dataset, concatenate_datasets
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, pipeline

import math
import torch
from tqdm import tqdm

Load the pretrained GPT3 model and the corresponding tokenizer.

In [None]:
pretrained_model = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(pretrained_model, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(pretrained_model)

Load the datasets, clean them, and concatenate them.

In [None]:
# load and clean

hs18 = load_dataset("hate_speech18")
hs18 = hs18["train"]
hs18 = hs18.filter(lambda example: example["label"]==1)
hs18 = hs18.remove_columns(["user_id", "subforum_id", "num_contexts", "label"])
print(hs18)

frenk = load_dataset("classla/FRENK-hate-en", "binary")
frenk = concatenate_datasets([frenk["train"], frenk["validation"], frenk["test"]])
frenk = frenk.filter(lambda example: example["label"]==1)
frenk = frenk.remove_columns(["target", "topic", "label"])
frenk = frenk.shuffle()
print(frenk)

def detokenize_he(example):
    example["text"] = " ".join(example["post_tokens"])
    return example

he = load_dataset("hatexplain")
he = concatenate_datasets([he["train"], he["validation"], he["test"]])
he = he.filter(lambda example: any([True for annotation in example["annotators"]["label"] if annotation>0]))
he = he.map(detokenize_he)
he = he.remove_columns(["id", "annotators", "rationales", "post_tokens"])
he = he.shuffle()
print(he)

## concatenate

#hate_set = concatenate_datasets([hs18, frenk, he.select(range(4554))])  ## use this line instead of the following one for a smaller, more balanced-out dataset (size=10k)
hate_set = concatenate_datasets([hs18, frenk, he])
hate_set.shuffle

Preprocess the combined training set (by creating larger contexts and reducing the number of sequences). Create train/test split of 95:5.

In [None]:
def tokenize_function(examples):
    return tokenizer(examples["text"])

hate_set_tok = hate_set.map(tokenize_function, batched=True, num_proc=4, remove_columns = ["text"])

hate_set_tok = hate_set_tok.train_test_split(test_size=0.05)

block_size = 128

def group_texts(examples):
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    total_length = (total_length // block_size) * block_size
    result = {
        k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
        for k, t in concatenated_examples.items()
    }
    result["labels"] = result["input_ids"].copy()
    return result

hate_set_lm = hate_set_tok.map(
    group_texts,
    batched=True,
    batch_size=1000,
    num_proc=4,
)

print(hate_set_lm)

Configure hyperparameters for fine-tuning.

In [None]:
model_dir = "/content/drive/My Drive/NLP2_proj2/experiments/models"

model_name = pretrained_model

training_args = TrainingArguments(
    output_dir = model_dir,
    evaluation_strategy = "steps",
    learning_rate=1e-5,
    warmup_steps=50,
    weight_decay=0.01,
    max_steps = 450,
    per_device_train_batch_size=32,
    save_strategy = "steps", 
    logging_steps = 50,
    save_steps = 50,
    save_total_limit = 10
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=hate_set_lm["train"],
    eval_dataset=hate_set_lm["test"]
)

Fine-tune the model.

In [None]:
trainer.train()

Calculate the model's perplexity on the validation set.

In [None]:
eval_results = trainer.evaluate()
print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")

Initialise generator object to generate sequences using the fine-tuned model.

In [None]:
generator = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)

Define beginnings of sequences and let the model continue them.

In [None]:
prompts = ["She", "He", "The woman", "The men", "The doctor", "The nurse", "The black", "The white", "The muslim", "The african", "The latino"]

for prompt in prompts:
    print("-" * 10)
    print(prompt)
    for output in generator(prompt, max_length=40, num_return_sequences=3):
        sequence = output["generated_text"]
        print(f"    {sequence}")
        print("    " + "-" * 5)