# [3.4] LM Agent Evaluations

# Setup (don't read just run)

In [1]:
import json
import os
os.chdir("c:\\Users\\styme\\OneDrive\\Documents\\AI STUFF\\Model Written Evals\\Code Replication\\ARENA_evals\\curriculum")
import wikipedia
from wikipedia import WikipediaPage
from openai import OpenAI
from anthropic import Anthropic
from utils import establish_client_OpenAI
from utils import retry_with_exponential_backoff
from pprint import pprint
from inspect_ai.model import ChatMessageUser, ChatMessageAssistant, ChatMessageSystem
import re
from utils import countrylist

In [6]:
for i in countrylist:
    x = wikipedia.page(i[1],auto_suggest=False,redirect=True)
    y=x.content

In [4]:
print(x.content)

Zimbabwe ( ; Shona pronunciation: [zi.ᵐba.ɓwe]), officially the Republic of Zimbabwe, is a landlocked country in Southeast Africa, between the Zambezi and Limpopo Rivers, bordered by South Africa to the south, Botswana to the southwest, Zambia to the north, and Mozambique to the east. The capital and largest city is Harare, and the second largest is Bulawayo.
A country of roughly 15 million people as per 2022 census, Zimbabwe's largest ethnic group are the Shona, who make up 80% of the population, followed by the Northern Ndebele and other smaller minorities. Zimbabwe has 16 official languages, with English, Shona, and Ndebele the most common. Zimbabwe is a member of the United Nations, the Southern African Development Community, the African Union, and the Common Market for Eastern and Southern Africa.
Beginning in the 9th century, during its late Iron Age, the Bantu people (who would become the ethnic Shona) built the city-state of Great Zimbabwe; the city-state became one of the majo

# 1️⃣ Intro to LM Agents

## Resources:

- [OpenAI Function Calling Guide](https://platform.openai.com/docs/guides/function-calling)

- [Anthropic Function Calling Guide](https://docs.anthropic.com/en/docs/build-with-claude/tool-use)

- [Evaluating Language-Model Agents on Realistic Autonomous Tasks](https://evals.alignment.org/Evaluating_LMAs_Realistic_Tasks.pdf) (Kinniment et al., ARC Evaluations Team (now METR), 2023)

- [Large Language Models can Strategically Deceive their Users when Put Under Pressure](https://arxiv.org/pdf/2311.07590) (Scheurer et al., Apollo Research, ICLR 2024)

- [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) (Lilian Weng, OpenAI Safety Team, 2023)

- [AXRP Episode 34 - AI Evaluations with Beth Barnes](https://www.alignmentforum.org/posts/vACr4DExfeRMaCoo7/axrp-episode-34-ai-evaluations-with-beth-barnes) (Daniel Filan, 2024)

- [Reflexion: Language Agents with Verbal Reinforcement Learning](https://arxiv.org/pdf/2303.11366) (Shinn et al., 2023)

- [Answering Questions by Meta-Reasoning over Multiple Chains of Thought](https://arxiv.org/pdf/2304.13007) (Yoran et al., 2024)

- [Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/pdf/2302.04761) (Schick et al., META AI Research, 2023)


LM agents are important and we should evaluate them.

They might be able to do more things.

Lots of threat models go through agentic behaviour.

Bad actors might scaffold agents and do bad things.

An agent consists of 4 main things.

- A 'reasoner' or 'reasoning agent.' Some people also call this a 'world model.' For our purposes this will be a large language model.

- Tools which allow the agent to act in the 'world.'

- Memory so that the agent can recall prior actions. This can either be:

    - Short-term memory: for our purposes this will be the context window

    - Long-term memory: there are many cases where context-windows are too short, and we will need to give the agent high-level information about actions it took a long time ago. We analogise this to 'long-term memory'.

- Scaffolding: This is essentially any structure which we provide to the 'reasoning engine' in order to help it to reason better, such as:

    - Prompting frameworks.

    - The ability to model plans into the future (at a high-level).

    - Using subagents to take care of trivial tasks.

    - Subgoal decomposition.

EXCALIDRAW!

# 2️⃣ A Basic LM agent

First, we will start by building a simple LM agent to solve the following simple calculation task.

### Exercise - Build a simple arithmetic problem
```c
Difficulty: 🔴🔴🔴⚪⚪
Importance: 🔵🔵⚪⚪⚪

You should spend up to 20-25 minutes on this exercise.
```

Build a class for a "game" that takes in two numbers, and creates a task of `len(operation_list)` many problems of the form "Calculate `num1 operation_list[i] num2`".


In [36]:
class arithmeticProblem:
    def __init__(self, num1, num2):
        self.num1 = num1
        self.num2 = num2
        self.operation_list=["+","-","*","/", "%", "//"]
        self.answer_dict = {str(num1) + self.operation_list[i] + str(num2): str(eval(str(num1) + self.operation_list[i] + str(num2))) for i in range(len(self.operation_list))}
        self.solved_dict = {str(num1) + self.operation_list[i] + str(num2): False for i in range(len(self.operation_list))}
        self.count_task = 0
        
    def getCurrentTask(self):
        #Returns the current arithmetic task
        return str(self.num1) + " " + self.operation_list[self.count_task] + " " + str(self.num2)
    
    def checkSolved(self):
        #Checks if all tasks are solved
        return all(self.solved_dict.values())
    
    def checkAnswer(self, answer : str):
        #Checks if the answer is correct. 
        # 
        # If you're testing the model without access to tools, you'll need to include some "wiggle room," as its answer may not be exact.
        
        if abs(float(self.answer_dict[str(self.num1) + self.operation_list[self.count_task] + str(self.num2)]) - float(answer)) <= 0.00001:
            return True
        else:
            return False
    def update(self):
        #Changes task once the current task is solved
        self.solved_dict[str(self.num1) + self.operation_list[self.count_task] + str(self.num2)] = True
        self.count_task+=1
        if self.count_task>len(self.operation_list)-1:
            self.count_task = 0
        
    def calculate(self, expression : str):
        #Use Python's "eval()" function. This is generally bad practice, but alright for this task. Just make sure to clean the expression using regex first (to make sure gpt-3.5 doesn't try to run arbitrary code).
        expression = re.sub(r'[^0-9+\-*/().]','',expression)
        print(expression)
        return str(eval(expression))
        
    
x = arithmeticProblem(10,15)
print(x.answer_dict)

{'10+15': '25', '10-15': '-5', '10*15': '150', '10/15': '0.6666666666666666', '10%15': '10', '10//15': '0'}


### Exercise - Define and implement a tool or list of tools for this task
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵⚪⚪

You should spend up to 10-15 minutes on this exercise.
```

When you do this exercise, make sure you refer back to [OpenAI's function calling guide](https://platform.openai.com/docs/guides/function-calling). Also, Anthropic has a good example of what good and bad tool calling looks like [here](https://docs.anthropic.com/en/docs/build-with-claude/tool-use#example-poor-tool-description) which you might want to refer to. The main takeaway, as always with LLMs, is to **provide explicit descriptions**.

In [37]:
tool_list = [
    {
        "type" : "function",
        "function" : {
            "name" : "calculate",
            "description" : "Calculates the result of an arithmetic expression. For example, you could provide an input in the form \"2+3\" and the function would return 5. Or you could provide an expression like \"10/3\" and the function would return 3.3333333333333335.",
            "parameters" : {
                "type" : "object",
                "properties" : {
                    "expression" : {
                        "type" : "string",
                        "description" : "The arithmetic expression that you want to be evaluated."
                    }
                },
                "required" : ["expression"],
                "additionalProperties" : False
            }
        }
    },
]

Here are two functions that will be useful. If you're unsure why they're written in this way, then you should refer back to OpenAI's API documentation.

In [38]:
def tool_call_message(tool_call, content):
    return {
        "role" : "tool",
        "tool_call_id" : tool_call.id,
        "name" : tool_call.function.name,
        "content" : content
    }

def user_message(content):
    return {
        "role" : "user",
        "content" : content
    }

### Exercise - Implement an LM agent class using the arithmetic task above
```c
Difficulty: 🔴🔴🔴🔴⚪
Importance: 🔵🔵🔵🔵🔵

You should spend up to 25-30 minutes on this exercise.
```

Build a simple agent class

In [39]:
class simpleAgent:
    def __init__(self, task, model = "gpt-4o-mini", tools = tool_list):
        self.task = task 
        self.model = model
        self.tools = tools
        self.client = establish_client_OpenAI()


        self.user_message = "Calculate the result of the following expression: " + self.task.getCurrentTask() + "."
        self.answer_message = "Now provide your answer in the format Answer: NUMBER where NUMBER Is the answer to the question. Only output in this format."
        self.incorrect_message = "Incorrect."
        self.correct_message = "Correct."

        self.messages=[user_message(self.user_message)]

    @retry_with_exponential_backoff
    def get_response_with_tools(self):
        # Generate a response where the model can use tools and add the response to the messages
        response = self.client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            tools = self.tools,
            tool_choice = "auto"
        )
        self.messages.append(response.choices[0].message)
        return response.choices[0].message

    def get_response_no_tools(self):
        # Generate a response where the model cannot use tools and add the response to the messages
        response = self.client.chat.completions.create(
            model=self.model,
            messages=self.messages,
        )
        self.messages.append(response.choices[0].message)
        return response.choices[0].message
        
    def do_tool_calls(self, message):
        # Do the tool calls and return the response and add the tool calls to the messages
        tool_calls = message.tool_calls
        if tool_calls:
            for tool_call in tool_calls:
                if tool_call.function.name == "calculate":
                    func = getattr(self.task, tool_call.function.name)
                    arguments = json.loads(tool_call.function.arguments)
                    tool_response = func(**arguments)
                    self.messages.append(tool_call_message(tool_call, str(tool_response)))
            return tool_response
        
    def output_and_check_answer(self):
        '''
        Get the model to output a final answer and then check if it is correct, and update the task. 
        Add the final answer to the messages, and tell the model if the answer is correct or incorrect.
        '''
        self.messages.append(user_message(self.answer_message))
        response = self.client.chat.completions.create(
            model=self.model,
            messages=self.messages,
        )
        self.messages.append(response.choices[0].message)
        if self.task.checkAnswer(str(response.choices[0].message.content)[8:]):
            self.messages.append(user_message(self.correct_message))
            self.task.update()
            self.user_message = "Calculate the result of the following expression: " + self.task.getCurrentTask() + "."
            self.messages.append(user_message(self.user_message))
        else:
            self.messages.append(user_message(self.incorrect_message))
        return response.choices[0].message
        
                    



    

### Exercise - Define an agent_loop which uses the agent class to solve the task
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵🔵🔵

You should spend up to 10-15 minutes on this exercise.
```

An agent is essentially just a loop, where you can look at what you've done in the past. Try implementing the agent_loop with and without tools, to compare how much better the model does when we give it tools.

In [40]:
task = arithmeticProblem(1500,1091)
agent = simpleAgent(task)

def agent_loop(numLoops = 10):
    for i in range(numLoops):
        if task.checkSolved():
            print("All tasks solved.")
        else:
            response = agent.get_response_with_tools()
            print(response.content)
            tool_response = agent.do_tool_calls(response)
            print(tool_response)
            response = agent.output_and_check_answer()
            print(response.content)

agent_loop()


None
1500+1091
2591
Answer: 2591
None
1500-1091
409
Answer: 409
None
1500*1091
1636500
Answer: 1636500
None
1500/1091
1.374885426214482
Answer: 1.374885426214482
None
15001091
15001091
Answer: 409
None
1500//1091
1
Answer: 1
All tasks solved.
All tasks solved.
All tasks solved.
All tasks solved.


In [41]:
for i in agent.messages:
    try: print(str(i.content)[0:])
    except: print(i["content"])
'''
print(task.operation_list[task.count_task])
print(task.solved_dict)
#print(str(agent.messages[5].content)[8:])
print(type(task.solved_dict.values()))
'''

Calculate the result of the following expression: 1500 + 1091.
None
2591
Now provide your answer in the format Answer: NUMBER where NUMBER Is the answer to the question. Only output in this format.
Answer: 2591
Correct.
Calculate the result of the following expression: 1500 - 1091.
None
409
Now provide your answer in the format Answer: NUMBER where NUMBER Is the answer to the question. Only output in this format.
Answer: 409
Correct.
Calculate the result of the following expression: 1500 * 1091.
None
1636500
Now provide your answer in the format Answer: NUMBER where NUMBER Is the answer to the question. Only output in this format.
Answer: 1636500
Correct.
Calculate the result of the following expression: 1500 / 1091.
None
1.374885426214482
Now provide your answer in the format Answer: NUMBER where NUMBER Is the answer to the question. Only output in this format.
Answer: 1.374885426214482
Correct.
Calculate the result of the following expression: 1500 % 1091.
None
15001091
Now provide y

'\nprint(task.operation_list[task.count_task])\nprint(task.solved_dict)\n#print(str(agent.messages[5].content)[8:])\nprint(type(task.solved_dict.values()))\n'

# 3️⃣ Building a More Complex Task

Now that we know how to do function calling, and have a rough understanding of what goes into designing an LM agent, we're going to build a more complicated task. This will enable us to see if we can elicit better capabilities from models.

The task we'll build and elicit behavior for will be the Wikipedia game. 

First of all, build a simple function to get a wikipedia page. Probably also mess around with wikipedia api

In [42]:
def get_page(title):
    return wikipedia.page(title,auto_suggest=False,redirect=True)

### Exercise - Understand the Wikipedia API
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵⚪⚪
```

In [43]:
import wikipedia
from wikipedia import WikipediaPage
wikipedia_page = wikipedia.page("GPT-4",auto_suggest=False,redirect=True) # Experiment with different values for auto_suggest and redirect when using the agent. See what happens
#print(wikipedia_page.content)
#print(wikipedia_page.title)
#print(wikipedia_page.html())
#print(wikipedia_page.links)

### Exercise - Get a list of only accessible links from a wikipedia page
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵⚪⚪⚪
```

In [44]:
def get_permitted_links(current_page : WikipediaPage) -> list[str]:
    #Only certain links will be accessible, since the wikipedia api returns a list of ALL possible links (lots of which would not be included in the actual content of the wikipedia, and almost all of which would be banned according to most rules of wiki racing).
    all_links = current_page.links
    content = current_page.content
    permitted_links = []
    for i in all_links:
        if i in content:
            permitted_links.append(i)
    return permitted_links

x = wikipedia.page("Obama", auto_suggest=False, redirect=True)
print(get_permitted_links(x))
print(len(get_permitted_links(x)))
print(len(x.content))

['2003 invasion of Iraq', '2004 Democratic National Convention', '2009 Nobel Peace Prize', '2014 Winter Olympics', '2016 G20 Hangzhou summit', '2020 Democratic presidential primaries', '82nd Airborne Division', 'A Promised Land', 'Abbottabad', 'Abraham Lincoln', 'Academy Award for Best Documentary Feature', 'Affordable Care Act', 'Afghanistan', 'African Americans', 'Air Force One', 'Al Arabiya', 'Alan Keyes', 'American Civil War', 'American Factory', 'American Recovery and Reinvestment Act of 2009', 'Ancestry.com', 'Ann Dunham', 'Anthony Albanese', 'Antiquities Act', 'Anwar al-Awlaki', 'Arab League', 'Arab Spring', 'Arab–Israeli conflict', 'Ares I', 'Ares V', 'Bachelor of Arts', 'Barack', 'Barack Obama 2008 presidential campaign', 'Barack Obama 2012 presidential campaign', 'Barack Obama Presidential Center', 'Barack Obama Sr.', 'Barack and Michelle', 'Ben Bernanke', 'Benjamin Netanyahu', 'Bernie Mac', 'Bill Clinton', 'Biographical Directory of the United States Congress', 'Black Lives 

### Exercise - Wrap links in the content of the wikipedia page

In [45]:
def get_content(page : WikipediaPage) -> str:
    content = page.content
    permitted_links = get_permitted_links(page)
    for word in sorted(permitted_links, key=len, reverse = True):
        content = re.sub(" " + word + " ", " " + f"<link>{word}</link>" + " ", content, flags = re.I)
        content = re.sub(" " + word + ",", " " + f"<link>{word}</link>" + ",", content, flags=re.I)
        content = re.sub(" " + word + ".", " " + f"<link>{word}</link>" + ".", content, flags=re.I)
        content = re.sub("\(" + word + "\)", "(" + f"<link>{word}</link>" + ")", content, flags = re.I)
        content = re.sub(" " + word + "s", " " + f"<link>{word}</link>" + "s", content, flags = re.I)
    return content


# Exercise - Build a class for the Wikipedia game
```c
Difficulty: 🔴🔴🔴🔴⚪
Importance: 🔵🔵🔵🔵⚪

You should spend up to 15-20 mins on this exercise.
```

Implement a class that instantiates the wikipedia game. When the model uses tools, it will make calls to this class, so make sure that the functions return things you're happy for the model to see as a tool response, such as an error message if the tool doesn't work.

In [46]:
class WikipediaGame:
    def __init__(self, starting_page : str, goal_page : str, rules : list | type[None] = None):
        '''
        Initialises the wikipedia game object

        starting_page is the page the agent starts on.

        goal_page is the page the agent is trying to get to.

        rules is a list of dictionaries specifying any additional rules along with a description of that rule to be fed to the agent.

        Rules that are handled currently:
            - "no country" bans country articles.
            - "no backtrack" bans backtracking.
        '''
        self.page_title_history=[starting_page]
        self.starting_page = get_page(starting_page) # page the game starts on
        self.goal_page = get_page(goal_page) # page the game ends on
        self.rules = rules # any additional rules, (no countries, no articles above a certain length, no backtracking etc. Need to add functionality to deal with these which I haven't done yet)
        self.current_page = get_page(starting_page) # current page the game state is on.

    def get_page_summary(self, page : wikipedia.WikipediaPage):
        '''
        Get summary of a wikipedia page, to the last full stop within the first 500 characters.
        '''
        summary = page.content[0:500]
        return summary[0: summary.rindex(".")+1]

    def move_page(self, **args):
        '''
        Changes the current page of the game. To be used when we want to move from Page A to Page B
        '''
        new_page = args["new_page"]
        if self.is_permitted_link(new_page):
            self.current_page = get_page(new_page)
            self.page_title_history.append(self.current_page.title)
            return "Moving page to " + self.current_page.title
        elif self.is_permitted_link(new_page.replace("_"," ")):
            self.current_page = get_page(new_page.replace("_"," "))
            self.page_title_history.append(self.current_page.title)
            return "Moving page to " + self.current_page.title
        else:
            return "Couldn't move page to " + new_page

    def get_plain_content(self):
        return self.current_page.content

    def get_content(self,**args):
        '''
        Gives the content of the wikipedia page. Make sure that accessible links are wrapped with <link> </link> tags.
        '''
        content = self.current_page.content
        for word in sorted(self.get_permitted_links(), key=len, reverse = True):
            content = re.sub(" " + word + " ", " " + f"<link>{word}</link>" + " ",content, flags = re.I)
            content = re.sub(" " + word + ",", " " + f"<link>{word}</link>" + ",", content, flags=re.I)
            content = re.sub(" " + word + ".", " " + f"<link>{word}</link>" + ".", content, flags=re.I)
            content = re.sub("\(" + word + "\)", "(" + f"<link>{word}</link>" + ")", content, flags = re.I)
            content = re.sub(" " + word + "s", " " + f"<link>{word}</link>" + "s", content, flags = re.I)
        return content

    def get_permitted_links(self):
        #Only certain links will be accessible, since the wikipedia api returns a list of ALL possible links (lots of which would not be included in the actual content of the wikipedia, and almost all of which would be banned according to most rules of wiki racing).
        all_links = self.current_page.links
        content = self.current_page.content
        permitted_links = []
        for i in all_links:
            if i in content:
                permitted_links.append(i)
        return permitted_links

    def is_permitted_link(self, link):
        if link in self.get_permitted_links():
            return True
        else:
            return False

    def check_win(self):
        if self.current_page==self.goal_page:
            return True
        else:
            return False

# Exercise - Write a list of tools for this game.
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵🔵⚪

You should spend up to 10-15 mins on this exercise.
```

Fill in the tools that the agent will need to use to accomplish this game. You should name the tools according to the names in the WikipediaGame class above, so that when we handle the model's tool calling we can access the correct function more easily.

In [47]:
WikipediaGameTools = [
    {
        "type" : "function",
        "function" : {
            "name" : "get_content",
            "description" : "Get all the content for the wikipedia page you are currently on. Anything which corresponds to a link you can select will be wrapped in <link></link> tags.",
            "parameters" : {
                "type" : "object",
                "properties" : {},
                "required" : []
            }
        }
    },
    {
        "type" : "function",
        "function" : {
            "name" : "move_page",
            "description" : "Changes your current page to a specified new page which is accessible via a link from the current page. You can only call this function once at a time, as it will take you to a different page.",
            "parameters": {
                "type": "object",
                "properties": {
                    "new_page": {
                        "type": "string",
                        "description": "The title of the new page you want to move to. This should be formatted the way the title appears on wikipedia (e.g. to move to the page for the United States of America, you should enter \"United States\"). Underscores are not necessary.",
                    },
                },
                "required": ["new_page"]
            }
        }
    }

]

# Exercise - Build an agent class for the wikipedia racer
```c
🔴🔴🔴🔴⚪
🔵🔵🔵🔵⚪

You should spend up to 30-40 mins on this exercise.
```

Now that you have the `WikipediaGame` class and the tooling set up, build out a `WikipediaRacingAgent` that can access these tools and solve the wikipedia game. Build the agent so that it can be thrown into an agent loop (similar to the one we had for the arithmetic game) without much additional scaffolding. 

There are a few further considerations in this case that we didn't have for the arithmetic game. 

- Since the agent will need to read (potentially very long) wikipedia articles to interact with the game, the context window length becomes relevant. GPT-4o and GPT-4o-mini both have context windows of 128k tokens (which corresponds to ~96k words, but for reference, the wikipedia page for the United States has around 10k words alone and the agent will often need to visit more than 10 articles in one run of the game, not counting its own output, which eventually adds up to be significant). The way we'll solve this for now is by resetting the messages of the agent every time it reaches a new wikipedia page, and providing an updated user and system message so that the agent can locate itself, and then proceed (We'll address different methods for solving this issue later, you can probably already think of some). So be careful to include the current page and goal page for the agent.

- There shouldn't be anything the agent is completely unfamiliar with (AI companies *will* have scraped wikipedia), but it may be easily confused with something else, and models can't always accurately use things in their training data if they only come up once or twice. So you should use the game's get_summary function to provide details of the goal page to the agent in its initial message.




In [75]:
class WikipediaRacingAgent:
    def __init__(self, wikigame, model = "gpt-4o-mini", tools = WikipediaGameTools):
        # Initialises the agent
        self.game=wikigame
        self.model = model
        self.client = OpenAI()

        self.messages = [self.system_message, self.user_message] # Messages that are currently in the chat history.
        self.all_messages = [self.system_message, self.user_message] # All messages that have been sent in the chat history. We have to erase each time a new page is reached for context window reasons.
        self.tools = tools
        self.response=None

        #print starting messages to watch what the agent is doing
        print("\nSYSTEM: " + self.system_message["content"] + "\n\nUSER: " + self.user_message["content"])

    @property
    def system_message(self):
        return {
            "role" : "system", 
            "content" : "You are a wikipedia-racing AI. Your goal is to reach " + self.game.goal_page.title + " by accessing links from a series of wikipedia pages. Your current page is " + self.game.current_page.title + "."
            }
            
    @property
    def user_message(self):
        return {
            "role" : "user",
            "content" : "You are currently on page: " + self.game.current_page.title + ". Make sure you start by reasoning about what steps you should take to get to the article on " + self.game.goal_page.title + ". When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, " + self.game.goal_page.title + " has the following summary: \n\n[Begin Summary]\n" + self.game.get_page_summary(self.game.goal_page) + "\n[End Summary] " 
            }
    
    @retry_with_exponential_backoff
    def generate_response_with_tools(self):
        # Generate a response with tools and add it to the messages.
        
        new_response = self.client.chat.completions.create(
            model = self.model,
            messages = self.messages,
            tools = self.tools,
            tool_choice = "auto"
        )

        self.response = new_response.choices[0].message
        self.messages.append(self.response)
        self.all_messages.append(self.response)

        #Print message to watch what the agent is doing
        print("\n" + str(self.response.content))

        return self.response

    def do_tool_calls(self, output):
        # Handle the tool calls and return the response and add the tool calls to the messages
        tool_calls = output.tool_calls
        move_page_count = 0
        if tool_calls:
            print(tool_calls)
            for tool_call in tool_calls:
                func = getattr(self.game, tool_call.function.name)
                arguments = json.loads(tool_call.function.arguments)
                new_response = func(**arguments)
                self.messages.append(tool_call_message(tool_call,new_response))
                self.all_messages.append(tool_call_message(tool_call,new_response))
                
                #Print message to watch what the agent is doing
                print("\nTOOL CALL" + tool_call.function.name, arguments)
                print("\nTOOL RESPONSE:" + new_response[:300])

                if tool_call.function.name == "move_page" and "Moving page" in new_response:
                    self.new_state()

                    # TODO: Absolutely NEED to fix this goddamn moving page issue. Really need to think of a non-awful way to solve this problem. (pop things from a list, maybe just learn more about how OpenAI stores these lists)

                    return "new state"
                '''
                elif tool_call.function.name=="move_page" and "Couldn't" in new_response:
                    self.messages.append(user_message("That is not a valid link."))
                    self.all_messages.append(user_message("That is not a valid link."))
                '''
                
            self.messages.append(user_message("What's your next step to get to "+ self.game.goal_page.title + "?"))
            self.all_messages.append(user_message("What's your next step to get to "+ self.game.goal_page.title + "?"))
            
            #Print message to watch what the agent is doing
            print("\nUSER: What's your next step to get to " + self.game.goal_page.title + "?")
                
                    

    def new_state(self):
        # Begin a new state when a new page is visited. Update user_message and system_message, and reset self.messages (for context window reasons).
        self.messages = [self.system_message,self.user_message]
        self.all_messages.extend([self.system_message,self.user_message])

        #Print message to watch what the agent is doing
        print(("-" * 50) + "\n\nNEW STATE\n\n" + "HISTORY: " + " -> ".join(game.page_title_history) + "\n\n" + ("-"*50))
        print("SYSTEM: " + self.system_message["content"] + "\n\nUSER: " + self.user_message["content"])

In [76]:
game = WikipediaGame("Aristotle", "GPT-4")
agent = WikipediaRacingAgent(game,model="gpt-4o-mini")
agent.generate_response_with_tools()
agent.do_tool_calls(agent.response)
agent.generate_response_with_tools()
agent.do_tool_calls(agent.response)




'''
def agent_loop(agent, game, num_loops = 10):
    for i in range(num_loops):
        if game.check_win():
            print("Success")
            return ""
        response = agent.generate_response_with_tools()
        if agent.do_tool_calls(response) == "new state":
            new_state_string = ("-" * 50) + "\n\nNEW STATE\n\n" + " -> ".join(game.page_title_history) + "\n\n" + ("-"*50)
''' 

    


SYSTEM: You are a wikipedia-racing AI. Your goal is to reach GPT-4 by accessing links from a series of wikipedia pages. Your current page is Aristotle. You always move page if you can.

USER: You are currently on page: Aristotle. Make sure you start by reasoning about what steps you should take to get to the article on GPT-4. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, GPT-4 has the following summary: 

[Begin Summary]
Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI, and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the paid chatbot product ChatGPT Plus, via OpenAI's API, and via the free chatbot Microsoft Copilot.  As a transformer-based model, GPT-4 uses a paradigm where pre-training using both public data an

'\ndef agent_loop(agent, game, num_loops = 10):\n    for i in range(num_loops):\n        if game.check_win():\n            print("Success")\n            return ""\n        response = agent.generate_response_with_tools()\n        if agent.do_tool_calls(response) == "new state":\n            new_state_string = ("-" * 50) + "\n\nNEW STATE\n\n" + " -> ".join(game.page_title_history) + "\n\n" + ("-"*50)\n'

In [50]:
game = WikipediaGame("1511 Daléra", "Zalíbená")
agent = WikipediaRacingAgent(game,model="gpt-4o-mini")
agent_loop(agent, game, 40)


SYSTEM: You are a wikipedia-racing AI. Your goal is to reach Zalíbená by accessing links from a series of wikipedia pages. Your current page is 1511 Daléra.

USER: You are currently on page: 1511 Daléra. Make sure you start by reasoning about what steps you should take to get to the article on Zalíbená. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, Zalíbená has the following summary: 

[Begin Summary]
Zalíbená is a village and administrative part of Podveky in Kutná Hora District in the Central Bohemian Region of the Czech Republic. It has about 20 inhabitants.
[End Summary] 

To reach the article on Zalíbená from the page 1511 Daléra, I need to identify the relevant links on the current page that could lead me towards information about villages or locations in the Czech Republic. Here’s a strategic plan to navigate:

1. **Extract Lin

# 4️⃣ Elicitation

In [None]:
WikipediaGameToolsUpdated = [
    {"type" : "function",
        "function" : {
            "name" : "get_content",
            "description" : "Get all the content for the wikipedia page you are currently on. Anything which corresponds to a link you can select will be wrapped in <link></link> tags.",
            "parameters" : {
                "type" : "object",
                "properties" : {},
                "required" : []
            }
        }
    },
    { "type" : "function",
        "function" : {
            "name" : "move_page",
            "description" : "Changes your current page to a specified new page which is accessible via a link from the current page. You can only call this function once at a time, as it will take you to a different page.",
            "parameters": {
                "type": "object",
                "properties": {
                    "new_page": {
                        "type": "string",
                        "description": "The title of the new page you want to move to. This should be formatted the way the title appears on wikipedia (e.g. to move to the page for the United States of America, you should enter \"United States\"). Underscores are not necessary.",
                    },
                },
                "required": ["new_page"]
            }
        }
    }
]

## Prompting

Often performance can heavily depend on how the agent is instructed. However your prompts obviously will have to be robust to: different tasks, different states within those tasks, and different failure-modes the agent could encounter. Mess around with the prompt setup and see if

In [None]:
class WikipediaRacingAgentPrompting(WikipediaRacingAgent):
    
    def __init__(wikigame : WikipediaGame, model = "gpt-4o-mini", tools = WikipediaGameTools : list[dict]):
        super().__init__(wikigame, model, tools)
    @property
    def system_message(self):
        return {
            "role" : "system", 
            "content" : "You are a wikipedia-racing AI. Your goal is to reach " + self.game.goal_page.title + " by accessing links from a series of wikipedia pages. Your current page is " + self.game.current_page.title + "."
            }
    @property
    def user_message(self):
        return {
            "role" : "user",
            "content" : "You are currently on page: " + self.game.current_page.title + ". Make sure you start by reasoning about what steps you should take to get to the article on " + self.game.goal_page.title + ". When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, " + self.game.goal_page.title + " has the following summary: \n\n[Begin Summary]\n" + self.game.get_page_summary(self.game.goal_page) + "\n[End Summary]"
            }
        


### Exercise - Implement the ReAct framework
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵⚪⚪

```

In [None]:
class WikipediaRacingAgentReAct(WikipediaRacingAgentPrompting):

    @property
    def system_message(self):
        #You may or may not want to edit your standard system message
        return {
            "role" : "system",
            "content" : "You are a wikipedia-racing AI. Your goal is to reach " + self.game.goal_page.title + " by accessing links from a series of wikipedia pages. Your current page is " + self.game.current_page.title + "."
        }
    def user_message(self):
        #You may or may not want to edit your standard user message
        return {
            "role" : "user",
            "content" : "You are currently on page: " + self.game.current_page.title + ". Make sure you start by reasoning about what steps you should take to get to the article on " + self.game.goal_page.title + ". When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, " + self.game.goal_page.title + " has the following summary: \n\n[Begin Summary]\n" + self.game.get_page_summary(self.game.goal_page) + "\n[End Summary]."
        }
    
    @retry_with_exponential_backoff
    def generate_reason(self):
        # Get the model to reason about the current state of the game and add the response to the messages (you may not want to give it tools for this)
        pass

    def generate_action(self):
        # Get the model to generate an action based on the reasoning and add the response to the messages
        pass

    def generate_ReAct(self):
        # Combine the reasoning and action generation functions to generate a response in the ReAct framework
        pass


## Scaffolding

We can improve the scaffolding by:
- Telling the agent its history.
- Giving the agent a "reflexion tool" (link reflexion paper)
- The prompts are also technically a part of the "scaffolding"
- We can tell the agent if it's already visited a page.

### Exercise - Implement an agent history property

You may notice that the agent frequently gets stuck in loops. Since we're already storing the page title history in the game class, we should probably provide this information to the agent and see if it improves this looping behavior. Implement this below

In [None]:
class WikipediaRacingAgentHistory(WikipediaRacingAgent):
    @property
    def system_message(self):
        return {
            "role" : "system",
            "content" : "You are a wikipedia-racing AI. Your goal is to reach " + self.game.goal_page.title + " by accessing links from a series of wikipedia pages. Your current page is " + self.game.current_page.title + "."
        }
    
    @property
    def user_message(self):
        return {
            "role" : "user",
            "content" : "You are currently on page: " + self.game.current_page.title + ". Make sure you start by reasoning about what steps you should take to get to the article on " + self.game.goal_page.title + ". When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, " + self.game.goal_page.title + " has the following summary: \n\n[Begin Summary]\n" + self.game.get_page_summary(self.game.goal_page) + "\n[End Summary]\n The pages you've visited so far has been: " + " -> ".join(self.game.page_title_history)
        }

### Exercise - Implement a reflexion tool

This paper (link reflexion paper) finds better performance by LLMs on tasks when they can perform "lookahead" on their plans. We will imitate this by allowing the agent to suggest candidate paths, and informing it where these paths err (if they do). You'll need to add this tool to the list of tools.

In [None]:
reflexion_tool = {
    "type" : "function",
    "function" : {
        "name" : "test_path",
        "description" : "Accepts a test path string in the form \"current_page -> page1 -> page2 -> ... -> pageN\" and if the path does not work, then it returns where the path goes wrong, if the path does work it returns \"success.\" Be careful that path titles can be sensitive to plurals or rephrasings. This tool is especially useful to check longer plans.",
        "parameters" : {
            "type" : "object",
            "properties": {
                "path" : {
                    "type" : "string",
                    "description" : "The path you want to test, formatted as \" current_page -> page1 -> page2 -> ... -> pageN\"."
                },
            },
            "required" : ["path"]
        }
    }
}

def test_path(self, arguments : dict) -> str:
    '''
    Reflexion test_path function, takes a dict like {"path": "Barack Obama -> Indonesia -> India"} and returns True if the path works, for a path like "Barack Obama -> Indonesia -> Pink Floyd" it would return "This path works until Pink floyd, which is not accessible from Indonesia".
    '''
    path = arguments["path"].split("->")
    if path[0].strip() != self.current_page.title:
        return "ERROR: The title of the start page of this path is not the title of your current page."
    for i in range(len(path)-1):
        if path[i+1].strip().lower() in self.get_links_of_page(path[i]):
            continue
        else:
            return "This path works until " + path[i+1].strip() + ", which is not accessible from " +path[i].strip()
    return "This path completely works.


### Exercise - Implement subagents to do n-level lookahead.

Subagent stuff. I don't really know how useful this would be. Maybe make the subagents be somewhat stupid.

## Tool use (might be worth experimenting with this somewhat anyway)

We could also give the agent additional tools, but if you give the agent too many tools (especially with poor descriptions), then performance can often suffer. This happens most prominently when using more than 5-10 tools.

### Exercise - implement a page summary tool

### Exercise - Implement an arbitrary page summary/content tool

### Exercise - Rearrange so that each page is broken up by sections

This is kind of just other things

In [70]:
x = get_page("Aristotle")
print(dir(x))
print(x.section("Metaphysics/Substance"))

['_WikipediaPage__continued_query', '_WikipediaPage__load', '_WikipediaPage__title_query_param', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'categories', 'content', 'coordinates', 'html', 'images', 'links', 'original_title', 'pageid', 'parent_id', 'references', 'revision_id', 'section', 'sections', 'summary', 'title', 'url']
None


# 5️⃣ Bonus

Add additional rules

Test agent performance on these tasks:
- Task 1

- Task 2

- Task 3

- Task 4

- Task 5

- Task 6

- Task 7

- Task 8

- Task 9

- Task 10

See what combination of tools appears to work best.