<a href="https://colab.research.google.com/github/JayThibs/gpt-experiments/blob/main/notebooks/gpt_2_alignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-Tuning GPT-2 on Alignment Texts Dataset

This notebook is meant for initial experimentation of fine-tuning on the alignment text dataset.

In [57]:
!nvidia-smi

Thu Jul  7 20:49:57 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P0    27W /  70W |   2258MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# Installations

In [58]:
!pip install git+https://github.com/huggingface/transformers jsonlines ftfy lm_dataformat wandb --quiet

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone


# Imports

In [59]:
import os
import re
import torch
import random
import jsonlines
import numpy as np
import pandas as pd
from tqdm import tqdm
import torch
from torch.utils.data import Dataset
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from transformers import GPT2Tokenizer, GPT2TokenizerFast, AutoTokenizer, TrainingArguments, Trainer, GPT2LMHeadModel
import ftfy
from lm_dataformat import Reader
pd.set_option('display.max_colwidth', None)

In [60]:
import torch

torch.__version__

'1.11.0+cu113'

# Mounting Google Drive

Here we will mount our Google Drive so that we can grab data and save the HuggingFace scripts, and save the model once we've fine-tuned it.

In [18]:
# For saving the data locally
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [19]:
%cd drive/MyDrive/data/ai-alignment-dataset/

[Errno 2] No such file or directory: 'drive/MyDrive/data/ai-alignment-dataset/'
/content/drive/MyDrive/data/ai-alignment-dataset


In [20]:
# !git clone https://github.com/JayThibs/gpt-experiments

# Data Preparation

## Preparing Sub-Datasets

In [100]:
os.makedirs("prompts", exist_ok=True)
lw_i = 1
af_i = 1
j = 0
with jsonlines.open("af_lw_q_reply.jsonl", "w") as writer:
    with jsonlines.open("alignment_texts.jsonl") as reader:
        for line in reader:
            try:
                if (line["source"] == "alignment forum" or line["source"] == "lesswrong") and line["comments"] != []:
                    comments = line["comments"]
                    source = line["source"].replace(" ", "_")
                    for comment in comments:
                        comm = ""
                        rep = ""
                        text = comment['text']
                        tokens = tokenizer.encode(text)
                        if len(tokens) <= 100 and "?" in text:
                            comm = text
                            try:
                                if comment["comments"] != []:
                                    replies = comment["comments"]
                                    replies = [{"text": replies[0]["text"]}]
                                    for reply in replies:
                                        text = reply["text"]
                                        tokens = tokenizer.encode(text)
                                        if len(tokens) <= 100:
                                            rep = text
                            except:
                                pass
                            if comm != "" and rep != "":
                                comment_reply = f"Comment: {comm}\nReply: {rep}"
                                writer.write(comment_reply)
                                # if source == "lesswrong":
                                #     i = lw_i
                                #     lw_i += 1
                                # else:
                                #     i = af_i
                                #     af_i += 1
                                # with open(f"prompts/{source}_comment_{i}.txt", "w") as f:
                                #     f.write(comm)
                                #     lw_i += 1
                                # with open(f"prompts/{source}_reply_{i}.txt", "w") as f:
                                #     f.write(rep)
                                #     af_i += 1
                                # j = 1
                                # break
                    if j == 1:
                        break
            except:
                pass

In [101]:
aflw_list = []
with jsonlines.open("af_lw_q_reply.jsonl") as reader:
    for line in reader:
        aflw_list.append(line)

In [106]:
for entry in aflw_list[0:10]:
    print(entry)

Comment: This claim seems super super important in terms of fundamental modeling of fundamental cognitive constraints:
> Early George A. Miller work estimated that the typical mind was able to hold 5 ± 2 chunks, **but more recent work suggests we are limited at about 4 chunks**.
Why do you think this is true? (Here I cross my fingers and hope for a long explanation, with many links, and discussion of replication failures or a lack thereof <3)

Reply: Miller work was insightful on discarding bits of information in favor of chunks but it was written in a very informal tone. That stymied further research for a long time but when restarted, researchers realized that you can get very rich set of features but about a small number of chunks. See this summary of the story.

Comment: Does anyone know about an addon to filter facebook notifications? I want to know about comments, but not reactions/​likes

Reply: That’s native to Facebook now, actually. I don’t remember where, but if you dig arou

In [7]:
with jsonlines.open("af_lw_forum_question_reply.jsonl", "w") as writer:
    with jsonlines.open("alignment_texts.jsonl") as reader:
        for line in reader:
            try:
                if line["source"] == "alignment forum" or line["source"] == "lesswrong":
                    writer.write(line)
            except:
                pass

## Clearning and Chunking Functions

Functions for preparing the data into chunks that can fit into GPT.

In [8]:
!python create_finetune_csv.py "af_lw_forums.jsonl" "af_lw" --normalize-with-ftfy --min-unique-tokens=10

Downloading: 100% 0.99M/0.99M [00:00<00:00, 10.1MB/s]
Downloading: 100% 446k/446k [00:00<00:00, 5.73MB/s]
Downloading: 100% 1.29M/1.29M [00:00<00:00, 8.18MB/s]
Downloading: 100% 665/665 [00:00<00:00, 317kB/s]
reading/tokenizing files: 100% 30391/30391 [06:31<00:00, 77.55it/s]
enforce_min_unique_tokens: 100% 39422/39422 [00:01<00:00, 23743.33it/s]
39422
1000
dropped 357 tokens of trailing data


#### For Testing Dataset Creation

In [None]:
# tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")

In [None]:
# import csv

# i = 0
# texts = []
# with jsonlines.open("alignment_texts.jsonl") as reader:
#     for line in reader:
#         text = line["text"]
#         texts.append(text)
#         if i > 3:
#             break
#         # try:
#         if text != "":
#             print(text)
#             print(len(text.split()))
#             encoding = tokenizer(text)
#             total_len = len(encoding.tokens())
#             tokens = encoding.tokens()
#             # print(tokens)
#             print(tokenizer.decode(encoding.input_ids))
#         # if total_len > 1024:
#         #     break
#         i += 1
#         # except:
#         #     pass

## Training Splits

In [None]:
alignment_texts = pd.read_csv("alignment_texts_7288.csv")

In [None]:
alignment_texts = list(alignment_texts)
alignment_texts[0]

'<|endoftext|> I\'ll be running an Ask Me Anything on this post from Friday (April 30) to Saturday (May 1).\nIf you want to ask something just post a top-level comment; I\'ll spend at least a day answering questions.\nYou can find some background about me here.\n<|endoftext|>**I—Meanings**\nNow that we have some more concrete thinking under our belt, it\'s time to circle back on Goodhart\'s law for value learners. What sorts of bad behavior are we imagining from future value-learning AI? What makes those behaviors plausible, and what makes them bad?\nLet\'s start with that last point first. Judgments of goodness or badness get contextualized by models, so our framing of Goodhart\'s law depends on what models of humans we tolerate. When I say "I like dancing," this is a different use of the word \'like,\' backed by a different model of myself, than when I say "I like tasting sugar." The model that comes to mind for dancing treats it as one of the chunks of my day, like "playing computer

In [None]:
train, val = train_test_split(musk_tweets, test_size=0.2)
test, val = train_test_split(val, test_size=0.5)

In [None]:
print("Number of Train examples: " + str(len(train)))
print("Number of Val examples: " + str(len(val)))
print("Number of Test examples: " + str(len(test)))

Number of Train examples: 27148
Number of Val examples: 3394
Number of Test examples: 3393


In [None]:
train_path = f'{directory}' + 'train.csv'
val_path = f'{directory}' + 'val.csv'
test_path = f'{directory}' + 'test.csv'

train.to_csv(train_path, index=False)
val.to_csv(val_path, index=False)
test.to_csv(test_path, index=False)

# Fine-Tuning GPT-2

If we're looking to fine-tune models which are found on the HuggingFace model hub, then it becomes much easier to fine-tune the models since HuggingFace provides us with scripts.

From the `transformers` repo:

> There are two sets of scripts provided. The first set leverages the Trainer API. The second set with no_trainer in the suffix uses a custom training loop and leverages the 🤗 Accelerate library. Both sets use the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.

You can learn more about it here: https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling

We will be using the script that leveraged the Trainer API. We can download the script by running:

In [None]:
if not os.path.exists('/gpt-2/run_clm.py'):
    !wget https://raw.githubusercontent.com/huggingface/transformers/master/examples/pytorch/language-modeling/run_clm.py -P gpt-2/

--2022-07-02 21:00:52--  https://raw.githubusercontent.com/huggingface/transformers/master/examples/pytorch/language-modeling/run_clm.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 25025 (24K) [text/plain]
Saving to: ‘gpt-2/run_clm.py.1’


2022-07-02 21:00:52 (7.72 MB/s) - ‘gpt-2/run_clm.py.1’ saved [25025/25025]



# Train

In [None]:
import wandb

wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize


wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [None]:
!python gpt-2/run_clm.py \
    --model_name_or_path "gpt-2/tmp/alignment-texts-clm" \
    --train_file alignment_texts_7288.csv \
    --do_train \
    --fp16=True \
    --overwrite_cache=True \
    --per_device_train_batch_size=2 \
    --output_dir gpt-2/tmp/alignment-forum \
    --overwrite_output_dir="no" \
    --save_total_limit=1 \
    --gradient_accumulation_steps=8 \
    --warmup_steps=10 \
    --learning_rate=3e-5 \
    --weight_decay=0.1 \
    --report_to="wandb" \
    --run_name="gpt-2-alignment-forum-20220703"

07/04/2022 01:24:36 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
full_determinism=False,
gradient_accumulation_steps=8,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,

In [None]:
# !python gpt-2/run_clm.py \
#     --model_name_or_path gpt2 \
#     --train_file alignment_texts_87606.csv \
#     --do_train \
#     --fp16=True \
#     --overwrite_cache=True \
#     --per_device_train_batch_size=2 \
#     --output_dir gpt-2/tmp/alignment-texts-clm \
#     --overwrite_output_dir="yes" \
#     --save_total_limit=3 \
#     --save_steps=10000 \
#     --gradient_accumulation_steps=32 \
#     --warmup_steps=100 \
#     --learning_rate=3e-5 \
#     --weight_decay=0.1 \
#     --report_to="wandb" \
#     --run_name="gpt-2-alignment-20220702"

07/02/2022 21:08:15 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
full_determinism=False,
gradient_accumulation_steps=32,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False

In [None]:
wandb.finish()

# Let's use the model!

In [61]:
OUTPUT_DIR = "gpt-2/tmp/alignment-texts-clm"
device = 'cpu'
if torch.cuda.is_available():
    device = 'cuda'

In [None]:
tokenizer = GPT2Tokenizer.from_pretrained(OUTPUT_DIR)
model = GPT2LMHeadModel.from_pretrained(OUTPUT_DIR)
model = model.to(device)

In [34]:
NUM_COMPLETIONS = 1

def generate(input_str, length=50, n=NUM_COMPLETIONS):
  cur_ids = torch.tensor(tokenizer.encode(input_str)).unsqueeze(0).long().to(device)
  model.eval()
  with torch.no_grad():
    for i in range(length):
      outputs = model(cur_ids[:, -1024:], labels=cur_ids[:, -1024:])
      loss, logits = outputs[:2]
      softmax_logits = torch.softmax(logits[0,-1], dim=0)
      next_token_id = choose_from_top(softmax_logits.to('cpu').numpy(), n=n)
      cur_ids = torch.cat([cur_ids, torch.ones((1,1)).long().to(device) * next_token_id], dim=1)
    output_list = list(cur_ids.squeeze().to('cpu').numpy())
    output_text = tokenizer.decode(output_list)
    print(logits)
    print(len(logits[0]))
    return output_text.replace("<|endoftext|>", ""), output_list

def choose_from_top(probs, n=NUM_COMPLETIONS):
    ind = np.argpartition(probs, -n)[-n:]
    top_prob = probs[ind]
    top_prob = top_prob / np.sum(top_prob) # Normalize
    choice = np.random.choice(n, 1, p = top_prob)
    token_id = ind[choice][0]
    return int(token_id)


In [35]:
import time

start = time.time()
generated_text = generate("""What is the justification behind the concept of a decisive strategic advantage? Why do we think that a superintelligence can do extraordinary things (hack human minds, invent nanotechnology, conquer the world, kill everyone in the same instant) when nations and corporations can't do those things?

The key points from the last paragraph are""")
end = time.time()
print(generated_text)
print("Time it took to generate tokens:", end - start)

tensor([[[-14.3629, -14.4079, -16.4957,  ..., -20.2755, -20.2301, -13.7580],
         [-44.4298, -41.8688, -46.9390,  ..., -50.3490, -50.2092, -44.2009],
         [-56.8273, -55.1149, -57.2113,  ..., -58.9506, -60.8939, -55.0666],
         ...,
         [ 54.1588,  55.9499,  51.6846,  ...,  46.4209,  46.2404,  53.5892],
         [102.0014, 104.7020,  98.6819,  ...,  92.3126,  90.1054, 101.4166],
         [103.8965, 101.3904,  98.8970,  ...,  88.8792,  88.3831, 102.2354]]],
       device='cuda:0')
114
('What is the justification behind the concept of a decisive strategic advantage? Why do we think that a superintelligence can do extraordinary things (hack human minds, invent nanotechnology, conquer the world, kill everyone in the same instant) when nations and corporations can\'t do those things?\n\nThe key points from the last paragraph are:\n\n\n - The superintelligence is not a "superintelligence" but a "superintelligence" that can do extraordinary things.\n\n - The superintelligence

In [27]:
import numpy as NUM_COMPLETIONS

np.e**(-.82)

0.4404316545059993

In [62]:
device

'cuda'

In [66]:
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer


gpt2 = AutoModelForCausalLM.from_pretrained("gpt2", return_dict_in_generate=True)
tokenizer = AutoTokenizer.from_pretrained("gpt2")

input_ids = tokenizer("Today is a nice day", return_tensors="pt").input_ids

start = time.time()
generated_outputs = gpt2.generate(input_ids, do_sample=True, num_return_sequences=1, output_scores=True, device="cuda")
end = time.time()
print("Time it took to generate tokens:", end - start)

# only use id's that were generated
# gen_sequences has shape [3, 15]
gen_sequences = generated_outputs.sequences[:, input_ids.shape[-1]:]

# let's stack the logits generated at each step to a tensor and transform
# logits to probs
probs = torch.stack(generated_outputs.scores, dim=1).softmax(-1)  # -> shape [3, 15, vocab_size]

# now we need to collect the probability of the generated token
# we need to add a dummy dim in the end to make gather work
gen_probs = torch.gather(probs, 2, gen_sequences[:, :, None]).squeeze(-1)

# now we can do all kinds of things with the probs

# 1) the probs that exactly those sequences are generated again
# those are normally going to be very small
unique_prob_per_sequence = gen_probs.prod(-1)

# 2) normalize the probs over the three sequences
normed_gen_probs = gen_probs / gen_probs.sum(0)
assert normed_gen_probs[:, 0].sum() == 1.0, "probs should be normalized, rerun in case it's a floating point error"

# 3) compare normalized probs to each other like in 1)
unique_normed_prob_per_sequence = normed_gen_probs.prod(-1)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Time it took to generate tokens: 0.5343494415283203


In [64]:
all_log_probs = torch.stack(generated_outputs.scores, dim=1)
log_probs = torch.gather(all_log_probs, 2, gen_sequences[:, :, None]).squeeze(-1)
mean_log_probs = torch.mean(log_probs)

In [65]:
print(generated_outputs)
# print(log_probs)
print(tokenizer.decode(gen_sequences[0]))
print(len(generated_outputs.scores))
# print(probs)
# print(gen_probs)
# print(unique_prob_per_sequence)
# print(normed_gen_probs)
# print(unique_normed_prob_per_sequence)
# print(input_ids)

SampleDecoderOnlyOutput(sequences=tensor([[8888,  318,  257, 3621, 1110,  319, 4534,  329,  597,  922, 1664,  553,
          339,  531,   13,  366, 2949,  517, 7363,  526]]), scores=(tensor([[-88.5250, -90.9243,     -inf,  ...,     -inf,     -inf,     -inf]]), tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]]), tensor([[-82.2974, -84.3255,     -inf,  ...,     -inf,     -inf,     -inf]]), tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]]), tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]]), tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]]), tensor([[-90.1402, -91.4866,     -inf,  ...,     -inf,     -inf,     -inf]]), tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]]), tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]]), tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]]), tensor([[   -inf,    -inf,    -inf,  ...,    -inf,    -inf, -3.4339]]), tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]]), tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]]), tensor([[-inf, -inf, -inf

In [None]:
np.prod([0.2158, 0.1008, 0.3531, 0.3138, 0.2799, 0.3295, 0.6937, 0.2309, 0.0479, 0.0648, 0.1682, 0.1356, 0.2393, 0.8083, 0.0352])

1.7162111343988786e-11

# Compressing the Model

Let's save the model as a `tar.gz` file so that we can save it in Google Drive.

In [None]:
!tar -czf gpt-2-elon-tweets.tar.gz gpt-2/tuned-models/