<a href="https://colab.research.google.com/github/JayThibs/arxiv-alignment-paper-notifier/blob/main/notebooks/arxiv_alignment_dataset_augmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Augmenting Number of Alignment Papers in the Dataset

This notebook fine-tunes a language model to classify arxiv paper summaries as alignment paper vs not alignment paper.

## Installations

In [6]:
!pip install transformers wandb jsonlines arxiv -q

[?25l[K     |████                            | 10 kB 24.3 MB/s eta 0:00:01[K     |████████                        | 20 kB 8.2 MB/s eta 0:00:01[K     |████████████▏                   | 30 kB 7.3 MB/s eta 0:00:01[K     |████████████████▏               | 40 kB 7.1 MB/s eta 0:00:01[K     |████████████████████▏           | 51 kB 3.8 MB/s eta 0:00:01[K     |████████████████████████▎       | 61 kB 4.5 MB/s eta 0:00:01[K     |████████████████████████████▎   | 71 kB 4.5 MB/s eta 0:00:01[K     |████████████████████████████████| 81 kB 3.6 MB/s 
[?25h  Building wheel for sgmllib3k (setup.py) ... [?25l[?25hdone


# Imports

In [2]:
# Import wandb
import wandb

# Login with your authentication key
wandb.login()

# setup wandb environment variables
%env WANDB_ENTITY=jacquesthibs
%env WANDB_PROJECT=accelerating-alignment

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize


wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


env: WANDB_ENTITY=jacquesthibs
env: WANDB_PROJECT=accelerating-alignment


In [3]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [4]:
%cd drive/MyDrive/

/content/drive/MyDrive


In [7]:
import json
import jsonlines
import os
import arxiv
import pandas as pd

In [8]:
from transformers import RobertaTokenizerFast
tokenizer = RobertaTokenizerFast.from_pretrained("roberta-base")

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/481 [00:00<?, ?B/s]

# Preparing Dataset

## Alignment Forum

In [226]:
af = {}
i = 0
with jsonlines.open('data/ai-alignment-dataset/uber-file.jsonl') as reader:
    for obj in reader:
        try:
            if obj['source'] == 'alignment forum':
                af[i] = obj
                i += 1
        except KeyError:
            pass

In [238]:
i = 0
top_af = {}
for k in af.keys():
    if int(af[k]['score']) > 20:
        top_af[i] = {}
        top_af[i]['text'] = af[k]['text']
        top_af[i]['alignment_text'] = 'pos'
        i += 1
        
print(i)

1123


In [239]:
len(top_af)

1123

In [241]:
af_intro_texts = {}

for i in top_af.keys():
    af_intro_texts[i] = {}
    text = top_af[i]['text'].replace('\n', ' ')
    text = tokenizer.decode(tokenizer(text, truncation=True)['input_ids'])
    af_intro_texts[i]['text'] = text

In [242]:
len(af_intro_texts)

1123

## Elicit

In [9]:
%cd data/ai-alignment-dataset/
!mkdir alignment_text_classifier
%cd alignment_text_classifier
!mkdir pos neg

/content/drive/MyDrive/data/ai-alignment-dataset
mkdir: cannot create directory ‘alignment_text_classifier’: File exists
/content/drive/MyDrive/data/ai-alignment-dataset/alignment_text_classifier
mkdir: cannot create directory ‘pos’: File exists
mkdir: cannot create directory ‘neg’: File exists


In [244]:
df = pd.read_csv('elicit-results.csv')
df.head()

Unnamed: 0,Index,Starred,Title,Publication Year,Author,Url,Abstract Note,Manual Tags,Automatic Tags
0,1,True,Incorrigibility in the CIRL Framework,2018.0,"Carey, Ryan",http://arxiv.org/abs/1709.06275,A value learning system has incentives to foll...,MIRI; FHI; TechSafety,Computer Science - Artificial Intelligence; ai...
1,2,True,Artificial Intelligence Safety and Cybersecuri...,2016.0,"Yampolskiy, Roman V.; Spellchecker, M. S.",http://arxiv.org/abs/1610.07997,"In this work, we present and analyze reported ...",Other-org; MetaSafety,Computer Science - Artificial Intelligence; Co...
2,3,True,AI Paradigms and AI Safety: Mapping Artefacts ...,2020.0,"Hernandez-Orallo, Jose; Martınez-Plumed, Ferna...",,AI safety often analyses a risk or safety issu...,CSER; TechSafety; CFI,
3,4,True,"Suffering-focused AI safety: Why ""fail-safe'"" ...",2016.0,"Gloor, Lukas",,AI-safety eﬀorts focused on suﬀering reduction...,MetaSafety; CLR,
4,5,True,AI Research Considerations for Human Existenti...,2020.0,"Critch, Andrew; Krueger, David",,"Framed in positive terms, this report examines...",TechSafety; CHAI,


In [216]:
if os.path.exists('abstract_ds.json'):
    with open('abstract_ds.json') as f:
        abstract_ds = json.load(f)
else:
    abstract_ds = {}

In [245]:
import numpy as np

In [199]:
if not pd.isnull(df.loc[4]['Url']):
    print('yes')

In [277]:
j = 0
non_arxiv = {}
for i in range(len(df)):
    if not pd.isnull(df.loc[i]['Url']):
        if 'arxiv' in df.iloc[i]['Url']:
            pass
            # id = str(df.iloc[i]['Url'].split('/')[-1].split('v')[0])
            # print(id)
            # abstract_ds[id] = {}
            # abstract_ds[id]['alignment_text'] = 'pos'
            # abstract_ds[id]['text'] = "Title: " + df.iloc[i]['Title'] + "\n" + "Abstract: " + df.iloc[i]['Abstract Note'].replace("\n", " ")
        else:
            non_arxiv[str(j)] = {}
            non_arxiv[str(j)]['alignment_text'] = 'pos'
            non_arxiv[str(j)]['text'] = "Title: " + df.iloc[i]['Title'] + "\n" + "Abstract: " + df.iloc[i]['Abstract Note'].replace("\n", " ")
            j += 1


In [250]:
print(len(abstract_ds))
print(len(non_arxiv))

1059
31


In [261]:
counter = 0
for k in abstract_ds.keys():
    if abstract_ds[k]['alignment_text'] == 'neg':
        counter += 1

print(counter)

827


## Arxiv

In [111]:
search = arxiv.Search(
    query="ai capabilities",
    max_results=300, #float('inf'),
    sort_by = arxiv.SortCriterion.Relevance
)

In [121]:
# To quickly build a dataset, generate 50 examples, look through the results
# Then you'll remove all the ones that don't belong
counter = 0
for result in search.results():
    counter += 1
    id = result.entry_id.split('/')[-1].split('v')[0]
    category = result.primary_category.split('.')[0].split('-')[0]
    if category in ('math', 'stat', 'cs'):
        print(counter)
        print(category)
        print('Title:', result.title, "\nAuthors:", ', '.join([str(x) for x in result.authors]), '\nDate:',result.published , '\nId:', id, 
            '\nSummary:',result.summary ,'\nURL:', result.pdf_url, '\n\n')
        abstract_ds[id] = {}
        abstract_ds[id]['title'] = result.title
        abstract_ds[id]['abstract'] = result.summary
        abstract_ds[id]['text'] = "Title: " + abstract_ds[id]['title'] + "\n" + "Abstract: " + abstract_ds[id]['abstract'].replace("\n", " ")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
URL: http://arxiv.org/pdf/1905.01023v1 


74
cs
Title: The Windfall Clause: Distributing the Benefits of AI for the Common Good 
Authors: Cullen O'Keefe, Peter Cihon, Ben Garfinkel, Carrick Flynn, Jade Leung, Allan Dafoe 
Date: 2019-12-25 05:30:40+00:00 
Id: 1912.11595 
Summary: As the transformative potential of AI has become increasingly salient as a
matter of public and political interest, there has been growing discussion
about the need to ensure that AI broadly benefits humanity. This in turn has
spurred debate on the social responsibilities of large technology companies to
serve the interests of society at large. In response, ethical principles and
codes of conduct have been proposed to meet the escalating demand for this
responsibility to be taken seriously. As yet, however, few institutional
innovations have been suggested to translate this responsibility into legal
commitments which apply to companies positioned 

In [113]:
# Add search results to dictionary and save as JSON
# Use the index associated with each paper to remove from dataset

removed_papers = input("Remove these papers from generated list: ").split(',')

Remove these papers from generated list: 2002.11174,2105.07879,1901.08579,2104.12582,2108.12427,1909.01095,1512.05849,1611.08219,2103.15294,1803.05049,1712.07199,2012.08630,1911.04266,1502.06512,1901.01851,1902.03689,2007.07710


In [124]:
for k in abstract_ds.keys():
    if k in removed_papers:
        abstract_ds[k]['alignment_text'] = 'pos'
    else:
        abstract_ds[k]['alignment_text'] = 'neg'

In [125]:
print(len(abstract_ds))

for k in abstract_ds.copy().keys():
    try:
        abstract_ds[k]['alignment_text']
    except:
        abstract_ds.pop(k)
        pass

print(len(abstract_ds))

1001
1001


In [126]:
for k in abstract_ds.keys():
    try:
        abstract_ds[k].pop("date_published")
    except:
        pass

In [211]:
with open('abstract_ds.json', 'w') as f:
    json.dump(abstract_ds, f)

## Add AF and Curated Papers

In [255]:
arxiv_dict = json.load(open('arxiv_dict.json'))

In [258]:
arxiv_summaries = {}

for k in arxiv_dict.keys():
    if arxiv_dict[k]['citation_level'] == '0':
        abstract_ds[k] = {}
        abstract_ds[k]['alignment_text'] = 'pos'
        abstract_ds[k]['text'] = "Title: " + arxiv_dict[k]['post_title'] + "\n" + "Abstract: " + arxiv_dict[k]['abstract']



In [259]:
len(abstract_ds)

1662

In [281]:
print(len(af_intro_texts) + len(non_arxiv) + len(abstract_ds))

2815


In [288]:
final_dataset = {}
i = 0

for k in abstract_ds.keys():
    final_dataset[i] = {}
    final_dataset[i]['text'] = abstract_ds[k]['text']
    final_dataset[i]['alignment_text'] = abstract_ds[k]['alignment_text']
    i += 1

for j in range(len(af_intro_texts)):
    final_dataset[i] = {}
    final_dataset[i]['text'] = af_intro_texts[j]['text']
    final_dataset[i]['alignment_text'] = 'pos'
    i += 1

for j in range(len(non_arxiv)):
    final_dataset[i] = {}
    final_dataset[i]['text'] = non_arxiv[str(j)]['text']
    final_dataset[i]['alignment_text'] = 'pos'
    i += 1

In [289]:
final_dataset[0]

{'alignment_text': 'neg',
 'text': 'Title: Transformation between dense and sparse spirals in symmetrical bistable media\nAbstract: Transformation between dense and sparse spirals is studied numerically based on a bistable FitzHugh-Nagumo model. It is found that the dense spiral can transform into two types of sparse spirals via a subcritical bifurcation: Positive Phase Sparse Spiral (PPSS) and Negative Phase Sparse Spiral (NPSS). The choice of the two types of sparse spirals after the transformation is affected remarkably by the boundary effect if a small domain size is applied. Moreover, the boundary effect gives rise to novel meandering of sparse spiral with only outward petals.'}

In [307]:
with open('abstract_ds.json', 'w') as f:
    json.dump(abstract_ds, f)

with open('final_dataset.json', 'w') as f:
    json.dump(final_dataset, f)

# Create dataset

In [293]:
# !rm -rf pos neg
# !mkdir pos neg

In [295]:
for k in final_dataset.keys():
    if final_dataset[k]['alignment_text'] == 'pos':
        with open(f"pos/{k}.txt", 'w') as f:
            f.write(final_dataset[k]['text'])
    else:
        with open(f"neg/{k}.txt", 'w') as f:
            f.write(final_dataset[k]['text'])

# Training Classifier

In [10]:
from pathlib import Path

def read_data_split(split_dir):
    split_dir = Path(split_dir)
    texts = []
    labels = []
    for label_dir in ["pos", "neg"]:
        for text_file in (split_dir/label_dir).iterdir():
            texts.append(text_file.read_text())
            labels.append(0 if label_dir is "neg" else 1)

    return texts, labels

train_texts, train_labels = read_data_split('')

In [11]:
len([train_label for train_label in train_labels if train_label == 0])

829

In [12]:
#split the dataset
from sklearn.model_selection import train_test_split

train_texts, test_texts, train_labels, test_labels = train_test_split(train_texts, train_labels, test_size=.1)

In [13]:
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)

In [14]:
import torch

class AlignmentPaperDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

train_dataset = AlignmentPaperDataset(train_encodings, train_labels)
test_dataset = AlignmentPaperDataset(test_encodings, test_labels)

In [15]:
from transformers import RobertaForSequenceClassification, Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total number of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
)

model = RobertaForSequenceClassification.from_pretrained("roberta-base")

trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
)

trainer.train()

Downloading:   0%|          | 0.00/478M [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'roberta.pooler.dense.weight', 'lm_head.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.

[34m[1mwandb[0m: Currently logged in as: [33mjacquesthibs[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
10,0.6819
20,0.6781
30,0.6736
40,0.6537
50,0.5999
60,0.5674
70,0.4209
80,0.4706
90,0.4588
100,0.4506




Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=477, training_loss=0.3554276872730855, metrics={'train_runtime': 406.3819, 'train_samples_per_second': 18.699, 'train_steps_per_second': 1.174, 'total_flos': 1999380909680640.0, 'train_loss': 0.3554276872730855, 'epoch': 3.0})