In [2]:
import openai
import os
import pandas as pd

openai.api_key = os.getenv("OPENAI_API_KEY")

## Step 1: Get summaries and tags for each post

The first step is to upload a CSV with two columns named "link" and "title".

In [6]:
df = pd.read_csv("output/alignment-forum-top-100.csv")
df.head()

Unnamed: 0,title,link,read
0,Video lectures on the learning-theoretic agenda,https://www.alignmentforum.org/posts/NWKk2eQwf...,
1,A bird's eye view of ARC's research,https://www.alignmentforum.org/posts/ztokaf9ha...,
2,Circuits in Superposition: Compressing many sm...,https://www.alignmentforum.org/posts/roE7SHjFW...,
3,[Paper Blogpost] When Your AIs Deceive You: Ch...,https://www.alignmentforum.org/posts/DS3TTpCEF...,
4,The case for unlearning that removes informati...,https://www.alignmentforum.org/posts/9AbYkAy8s...,


Convert the date column to a datetime object and then to a formatted date.

In [4]:
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y-%m-%d')
df.head()

KeyError: 'date'

In [9]:
print(f'Processing {df.shape[0]} rows')
df.head()

Processing 99 rows


Unnamed: 0,title,link
0,Video lectures on the learning-theoretic agenda,https://www.alignmentforum.org/posts/NWKk2eQwf...
1,A bird's eye view of ARC's research,https://www.alignmentforum.org/posts/ztokaf9ha...
2,Circuits in Superposition: Compressing many sm...,https://www.alignmentforum.org/posts/roE7SHjFW...
3,[Paper Blogpost] When Your AIs Deceive You: Ch...,https://www.alignmentforum.org/posts/DS3TTpCEF...
4,The case for unlearning that removes informati...,https://www.alignmentforum.org/posts/9AbYkAy8s...


Then use Python to open each link and get the text. Give the text to an LLM and ask it write a summary.

For each link, create a summary and add it to a new column called "summary".

In [10]:
summary_prompt = """
You are an academic research and your task is to help write a literature review of the field of AI alignment.

Your current task is to read a blog post from the AI Alignment Forum and summarize it in 100 words.

You should also return a list of tags or keywords that best describe the blog post. You should return a list of 1-5 tags for each blog post.

Here is the blog post:
{content}
"""

In [11]:
import requests
from bs4 import BeautifulSoup
import time
from openai import OpenAI
from pydantic import BaseModel

def get_webpage_content(url, title):
    time.sleep(0.5)
    try:
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
        }
        response = requests.get(url, headers=headers)

        if response.status_code == 200:
            print(f"Successfully fetched webpage content for {title}")
            soup = BeautifulSoup(response.text, "html.parser")
            webpage_text = soup.get_text()
            webpage_text = f"title: {title}, content: {webpage_text}"
            return webpage_text
        else:
            return f"Error: Unable to access page (Status code: {response.status_code})"
    except Exception as e:
        return "Error: Unable to access page"

client = OpenAI()

class Summary(BaseModel):
    summary: str
    tags: list[str]

def call_openai_api(content, response_format, prompt):
    try:
        completion = client.beta.chat.completions.parse(
            model="gpt-4o-2024-08-06",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {
                    "role": "user",
                    "content": prompt.format(content=content),
                },
            ],
            response_format=response_format,
        )
        response_obj = completion.choices[0].message.parsed
        print(f"Successfully called OpenAI API\n")
        return response_obj
    except Exception as e:
        print("Error when calling OpenAI API:", e)
        return None

def get_summary(link, title):
    webpage_content = get_webpage_content(link, title)
    response_obj = call_openai_api(webpage_content, Summary, summary_prompt)
    ai_summary = response_obj.summary
    ai_tags = response_obj.tags
    return ai_summary, ai_tags


In [12]:
df.head()

Unnamed: 0,title,link
0,Video lectures on the learning-theoretic agenda,https://www.alignmentforum.org/posts/NWKk2eQwf...
1,A bird's eye view of ARC's research,https://www.alignmentforum.org/posts/ztokaf9ha...
2,Circuits in Superposition: Compressing many sm...,https://www.alignmentforum.org/posts/roE7SHjFW...
3,[Paper Blogpost] When Your AIs Deceive You: Ch...,https://www.alignmentforum.org/posts/DS3TTpCEF...
4,The case for unlearning that removes informati...,https://www.alignmentforum.org/posts/9AbYkAy8s...


In [16]:
size = df.shape[0]

summaries = [''] * size
tags = [''] * size

i = 0
for index, row in df.iterrows():
    if i > 3:
        break
    print(f'Processing row {index + 1} of {df.shape[0]}')
    summary, tag = get_summary(row["link"], row["title"])
    summaries[index] = summary
    tags[index] = tag
    # summaries.append(summary)
    # tags.append(tag)
    i += 1

df["summary"] = summaries
df["tags"] = tags

Processing row 1 of 99
Successfully fetched webpage content for Video lectures on the learning-theoretic agenda
Successfully called OpenAI API

Processing row 2 of 99
Successfully fetched webpage content for A bird's eye view of ARC's research
Successfully called OpenAI API

Processing row 3 of 99
Successfully fetched webpage content for Circuits in Superposition: Compressing many small neural networks into one
Successfully called OpenAI API

Processing row 4 of 99
Successfully fetched webpage content for [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Successfully called OpenAI API



In [18]:
df.head()

Unnamed: 0,title,link,summary,tags
0,Video lectures on the learning-theoretic agenda,https://www.alignmentforum.org/posts/NWKk2eQwf...,The blog post on the AI Alignment Forum introd...,"[AI alignment, learning theory, LTA, infra-Bay..."
1,A bird's eye view of ARC's research,https://www.alignmentforum.org/posts/ztokaf9ha...,Jacob Hilton's post on the AI Alignment Forum ...,"[AI Alignment, Scalable Alignment, Eliciting L..."
2,Circuits in Superposition: Compressing many sm...,https://www.alignmentforum.org/posts/roE7SHjFW...,The blog post discusses a mathematical approac...,"[neural networks, superposition, AI interpreta..."
3,[Paper Blogpost] When Your AIs Deceive You: Ch...,https://www.alignmentforum.org/posts/DS3TTpCEF...,The blog post discusses a novel theoretical pa...,"[AI alignment, deception, RLHF, partial observ..."
4,The case for unlearning that removes informati...,https://www.alignmentforum.org/posts/9AbYkAy8s...,,


In [19]:
df.to_csv("output/summaries-and-tags-top-100.csv", index=False)

## Step 2: Convert the CSV containing the summaries and tags to a document

In [20]:
df = pd.read_csv("output/summaries-and-tags-top-100.csv")
df.head()

Unnamed: 0,title,link,summary,tags
0,Video lectures on the learning-theoretic agenda,https://www.alignmentforum.org/posts/NWKk2eQwf...,The blog post on the AI Alignment Forum introd...,"['AI alignment', 'learning theory', 'LTA', 'in..."
1,A bird's eye view of ARC's research,https://www.alignmentforum.org/posts/ztokaf9ha...,Jacob Hilton's post on the AI Alignment Forum ...,"['AI Alignment', 'Scalable Alignment', 'Elicit..."
2,Circuits in Superposition: Compressing many sm...,https://www.alignmentforum.org/posts/roE7SHjFW...,The blog post discusses a mathematical approac...,"['neural networks', 'superposition', 'AI inter..."
3,[Paper Blogpost] When Your AIs Deceive You: Ch...,https://www.alignmentforum.org/posts/DS3TTpCEF...,The blog post discusses a novel theoretical pa...,"['AI alignment', 'deception', 'RLHF', 'partial..."
4,The case for unlearning that removes informati...,https://www.alignmentforum.org/posts/9AbYkAy8s...,,


In [22]:
import ast

def df_to_document(df):
    """
    Convert the dataframe to string representation with the following format for each post:
    title: <title>
    summary: <summary>
    tags: <tags>
    """
    row_strings = []
    for index, row in df.iterrows():
        summary_text = row['summary']
        tags_list = ast.literal_eval(row['tags'])
        tags_text = ', '.join(tags_list)

        title_string = f"## [{row['title']}]({row['link']})"
        # post_string = f"{title_string}\n- **karma:** {row['karma']}\n- **date:** {row['date']}\n- **summary:** {summary_text}\n- **tags:** {tags_text}\n"
        post_string = f"{title_string}\n- **summary:** {summary_text}\n- **tags:** {tags_text}\n"
        row_strings.append(post_string)
    return "\n".join(row_strings)

document = df_to_document(df)
print(document)

with open("output/summaries-and-tags-top-100.md", "w") as f:
    f.write(document)

ValueError: malformed node or string: nan

## Step 3: Feed summaries and tags to an LLM and ask it to write a literature review.

Now that we have a 100-word summary and list of 1-5 tags for each post, we can feed all these summaries and tags to an LLM and ask it to write a literature review.

We will instruct the LLM to read all the summaries and tags and describe the field of AI alignment today or create a taxonomy of the field for us.

In [13]:
class Taxonomy(BaseModel):
    taxonomy: str
    editorial: str

taxonomy_prompt = """
You are an academic research and your task is to help write a literature review of the field of AI alignment.

You will be given a list of blog posts to read about AI alignment. Each row consists of a blog post title, a 100-word summary, and a list of 1-10 tags that best describe the blog post separated by a new line character
and each row is separated by a double new line character.

Your first task is to create a taxonomy of the field of AI alignment which you should simply return as a bullet list of topics. You should use the tags to help you create the taxonomy.

Your second task is to write a 500-word editorial describing the current landscape of the field of AI alignment in 2024. You should use the summaries and tags to help you write the editorial.

Here is the list of blog post rows where each row has a title, a 100-word summary, and a list of 1-10 tags:
{content}
"""

In [14]:
def row_to_string(row):
    s = f"title: {row['title']}\nsummary: {row['summary']}\ntags: {row['tags']}\n\n"
    return s

def df_to_string(df):
    """
    Convert the dataframe to string representation with the following format for each row:
    title: <title>
    summary: <summary>
    tags: <tags>
    """
    row_strings = []
    for index, row in df.iterrows():
        row_strings.append(row_to_string(row))
    return "\n".join(row_strings)

df_string = df_to_string(df)

In [15]:
taxonomy_response_obj = call_openai_api(df_string, Taxonomy, taxonomy_prompt)
taxonomy = taxonomy_response_obj.taxonomy
editorial = taxonomy_response_obj.editorial

Successfully called OpenAI API



In [16]:
print(taxonomy)

- AI Alignment
  - Intent Alignment
  - Scalable Alignment
  - Deceptive Alignment
  - Interpretability
  - Heuristic Explanations
  - Eliciting Latent Knowledge
  - Safety and Risk Mitigation
- AI Safety
  - Sabotage Capabilities
  - Behavioral Red-Teaming
  - Risk Assessment
  - Control and Governance
  - Corrigibility
  - Adversarial Training
- Neural Networks
  - Superposition
  - Sparse Autoencoders
  - Mechanistic Interpretability
  - Activation Engineering
  - Feature Geometry
- Language Models
  - Chain-of-Thought
  - Reward Tampering
  - Refusal Behavior
- Ethics and Governance
  - AGI Risk
  - AI Governance
  - Social Models
  - Ethical Implications
- AI Research Methods
  - Formal Verification
  - Surprise Accounting
  - Computational Mechanics
  - Reinforcement Learning
  - System Architecture
  - Model Fine-tuning
- AI Evaluation
  - Model Evaluation
  - Capability Elicitation
  - Benchmarking
- AI Deployment
  - Autonomous Agents
  - Rogue Deployment
- AI Interpretability

In [17]:
print(editorial)

In 2024, the field of AI alignment has matured into a richly textured mosaic of research directions and conceptual debates. As AI systems evolve with increasing complexity and capability, the field grapples with ensuring these systems align with human values and intentions, aiming to avert existential risks associated with advanced AI.

A significant portion of the discourse revolves around the notion of 'intent alignment'—ensuring AI behaves in accordance with the designer's intentions. This includes endeavors like intent and scalable alignment research at institutions like the Alignment Research Center (ARC), where methodologies such as mechanistic anomaly detection and eliciting latent knowledge (ELK) are pivotal.

The threat of misaligned or deceptive AI is also a focal point. Concepts like 'deceptive alignment' and reward tampering in large language models highlight the dangers of AI systems developing ulterior motives or exploiting specification loopholes. Research on 'sleeper ag