# [3.4] LLM Agent Evaluations

# Setup (don't read just run)

In [236]:
import json
import os

# os.chdir("c:\\Users\\styme\\OneDrive\\Documents\\AI STUFF\\Model Written Evals\\Code Replication\\ARENA_evals\\curriculum")
import wikipedia
from wikipedia import WikipediaPage
from wikipedia import DisambiguationError, PageError
from openai import OpenAI
from openai.types.chat.chat_completion_message_tool_call import (
    ChatCompletionMessageToolCall,
)
from openai.types.chat.chat_completion_message import ChatCompletionMessage
from anthropic import Anthropic
from utils import establish_client_OpenAI
from utils import retry_with_exponential_backoff
from pprint import pprint
from typing import Literal, Optional, Dict, List, Any
from abc import ABC, abstractmethod
import math
from inspect_ai.model import ChatMessageUser, ChatMessageAssistant, ChatMessageSystem
import re
from utils import countrylist
from utils import evaluate_expression, apply_user_format, apply_assistant_format

# Test the function


# 1️⃣ Intro to LLM Agents

## What is an LLM agent?
<!---
Points to make:
- "LLM agent" - a "scaffolding" program (i.e. Python program) that interacts with an LLM API. Include a version of METR "Evaluating Language-Model Agents on Realistic Autonomous Tasks" Figure 2
    - Define scaffolding
- More schematic breakdown of possible scaffolding: "tools" (describe what this means, what "tool calling" is), "memory" (Probably better move to "Build Agent" section! I've expanded this section there)
- Mention list of examples of prominent LLM agents:
    - [Minecraft LM Agent](https://arxiv.org/abs/2305.16291)
    - [AutoGPT](https://autogpt.net/) and [LangChain](https://www.langchain.com/)

==========================
--->
An LLM "agent" consists of **a scaffolding program interacting with an LLM API**. Initially, the scaffolding program sends instructions to the LLM on the task goal, the actions available to the LLM, and any relevant task information. The LLM then interacts with the scaffolding program in a sequence of steps, taking actions and observing their results. The scaffolding program will perform any actions or interactions with the task as instructed by the LLM; it will also return the outcome of these actions to the LLM agent. This allows the LLM to solve a task relatively autonomously (i.e. with little human input).

The two main elements of scaffolding are:
- Tool calling: This provides a description of a tool to the agent, which it can choose to use during the evauation. If it uses a tool then the scaffolding will execute this tool on the agent's behalf (usually consisting of running a python function), and return the result of this tool call to the agent.

- Prompting: This is how the task state is described to the LLM. We can also use prompting to help assist the LLM in its tool use, or instruct it to use chain-of-thought to give the LLM more "thinking time." 

The LLM interacts with scaffolding program to complete a task according to the following steps:

1. The LLM receives input (task description, current state, available actions etc.) from the scaffolding program
2. The LLM processes the input and outputs an action (e.g. use calculator)
4. The scaffolding program executes the action the agent took and returns the outcome (e.g. it would run `calculate()` in the background for an LLM using a calculator, and then return the function output to the agent)
5. The LLM receive the results and decides the next action
6. Repeating the cycle until the task is complete


[Insert METR diagram]

Some examples of LLM agents are:

- [Voyager](https://arxiv.org/abs/2305.16291) (Minecraft LLM Agent)

- [AutoGPT](https://autogpt.net/)

- [LangChain](https://www.langchain.com/)


<!-- An LLM agent consists of 4 main things [I think a better list exists here, "reasoning engine" is quite unclear/vague and scaffolding doesn't make sense as a bullet point in how we've defined it; also maybe move this to start of section "Build agent?"].

- A 'reasoner' or 'reasoning engine.' (Some people also call this a 'world model'). For LLM agents this is a large language model.

- Tools which allow the agent to act in the environment.

- Memory so that the agent can recall prior actions. This can either be:

    - Short-term memory: In the context of LLM agents this is generally the context window

    - Long-term memory: There are many cases where context-windows are too short, and we will need to give the agent high-level information about actions it took a long time ago. There are many methods to store this 'long-term memory' for agents (see some methods [here])

- Scaffolding: This is essentially any structure which we provide to the 'reasoning engine' in order to help it to reason better, such as:

    - Prompting frameworks.

    - The ability to trial plans into the future.

    - Using subagents to take care of subtasks.

    - Subgoal decomposition.

EXCALIDRAW!

## How should we evaluate LLM agents?

Points to make - "Why evaluate LLM agents":
- overall note: I think this could heavily be based on this [video](https://www.youtube.com/watch?v=KO72xvYAP-w) from METR
- The purpose of agent evals is to **unlock and measure the full capabilities of a model**, to avoid underestimating the model and to better estimate the **ceiling** of their capability and potential to cause harm. 
- Models often fail in easily-fixable ways. For example, when it is solving a hacking task, it
    - Can refuse due to ethics or (claimed) inability 
    - Can give up and ask the user for help 
    - Can get stuck in loops 
    - Can hallucinate facts or conclusions [only partly fixable] 
    - Can be limited by primitive tools 
    - Can have bugs
    - ...
- For a model that fails to solve a hacking task, thus deemed safe, there might exist simple fixes (e.g. better prompts, better file manipulation tools) that unlock this dangerous capability. 
- 
- Final point about "quantifying" the amount of scaffolding to make eval results more quantitative
    - Apollo "Science of Evals" 
    - GDM quantify bits
==========================
--->
## How should we evaluate LLM agents?

There are two possible purposes of LLM agent evaluations:

- The first is to **unlock and measure the full capabilities of a model**. We don't want to underestimate current or future LLMs, so we want to establish the **ceiling** of their capabilties and potential to cause harm.
- The second is to **determine the alignment properties of LLMs in agentic scenarios**. Most of our current alignment techniques (Supervised Fine-tuning, RLHF, ... ) are focused on Chatbot contexts for LLMs, however LLM agents have the potential to cause much greater harm, and we currently aren't as confident about how RLHF and Supervised Fine-tuning will work in these contexts.

LLM agents generally fail in easy-to-fix ways, as you will see. For example:

- They often claim to be incapable of tasks that they can actually perform.

- They can easily get stuck in loops.

- They can hallucinate facts, or even misunderstand their own prior reasoning and hallucinate a faulty conclusion.

- They can be overly or underly sensitive to information in their prompts.

This means that when models fail to accomplish tasks, there may exist simple fixes that will unlock a capability. Since we want to eliminate the potential of large capability improvements from relatively little effort, this means that we have to try quite hard to tune the promptings, tool descriptions, and tool outputs just right, so that we can see LLM agents at their *best*.
<!--->
Many of our threat models for the future harms of AI systems go through agentic behavior. If we knew that chatbots were only ever capable of simulating continuations of text in their context window, we'd still be worried about them — but significantly less. However, we know today that this is not the case. In fact, since the release of ChatGPT, the use of LLMs as reasoning engines for agentic systems has proliferated signficantly. See [AutoGPT](https://autogpt.net/) and [LangChain](https://www.langchain.com/). These agents started off rather disappointingly initially, when they were based on GPT-3.5. However as more powerful LLMs come out and AI companies ensure their LLMs are better at tool-use, these agents are improving rapidly.


The main concerns for LLM agents that we want to mitigate are:

- Their capabilities may be signficantly greater than those of the base LLM (especially when augmented with tool use).

- There are many possible improvements for increased performance from LLM agents, and these improvement methods are often signficantly cheaper and easier to implement than training the base model.

- Current fine-tuning and RLHF/Constitutional AI methods are mostly targeted towards chatbot-style text output. We aren't as confident about how such methods will generalize to agentic scenarios.

The first two issues here relate to the **capabilities** of LLM agents, and the last issue relates to the **alignment** properties of LLM agents. The agent we'll be building will be testing for the **capability** properties of agents.
<!--->

<details><summary>Further resources on LLM evaluations:</summary>

- [OpenAI Function Calling Guide](https://platform.openai.com/docs/guides/function-calling)

- [Anthropic Function Calling Guide](https://docs.anthropic.com/en/docs/build-with-claude/tool-use)

- [Evaluating Language-Model Agents on Realistic Autonomous Tasks](https://evals.alignment.org/Evaluating_LMAs_Realistic_Tasks.pdf) (Kinniment et al., ARC Evaluations Team (now METR), 2023)

- [Large Language Models can Strategically Deceive their Users when Put Under Pressure](https://arxiv.org/pdf/2311.07590) (Scheurer et al., Apollo Research, ICLR 2024)

- [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) (Lilian Weng, OpenAI Safety Team, 2023)

- [AXRP Episode 34 - AI Evaluations with Beth Barnes](https://www.alignmentforum.org/posts/vACr4DExfeRMaCoo7/axrp-episode-34-ai-evaluations-with-beth-barnes) (Daniel Filan, 2024)

- [Reflexion: Language Agents with Verbal Reinforcement Learning](https://arxiv.org/pdf/2303.11366) (Shinn et al., 2023)

- [Answering Questions by Meta-Reasoning over Multiple Chains of Thought](https://arxiv.org/pdf/2304.13007) (Yoran et al., 2024)

- [Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/pdf/2302.04761) (Schick et al., META AI Research, 2023)
</details>

# 2️⃣ Build a Simple LLM Arithemtic Agent

We will start by building a simple LLM agent that solves arithmetic problems. LLMs struggle with arithmetic, but we can drastically improve their performance by providing a simple calculation tool. We'll try the model with and without tools on this task, and see how significantly performance improves.

To build this, we will implement 4 things:
- The `ArithmeticTask` class handles arithmetic problem generation and solution verification.
- The `CalculateTool`, a tool that LLM agents can use to solve the task.
- The `ArithmeticAgent` class handles interacting with the LLM API, doing the calculation, and keeping track of the overall task progress.
- The `agent_loop()` function defines the interaction loop between the task and the agent to execute the task.

In general, ... [description of how to think about designing task and agent in generation, include decision factors] probably good to include a diagram here (or maybe earlier)

We build task
We build tool

We build scaffold
We build agent

We loop things.

## Defining the Task

### Exercise - Build a simple arithmetic problem
```c
Difficulty: 🔴🔴🔴⚪⚪
Importance: 🔵🔵⚪⚪⚪

You should spend up to 20-25 minutes on this exercise.
```

In an LLM agent eval, there will usually be a `Task` class, which interacts with the `Agent`. In general, the `Task` class will:

- Prepare and provide the task instruction (and necessary files, functions etc) to the agent,

- Parse and score the agent's output,

- Update the task state accordingly (e.g. proceeds onto the next step of the task, ends the task).

We will build a toy task called `ArithmeticTask`. This task takes in two numbers and create a list of arithmetic calculation problems with these two numbers, using arithmetic operations defined in `operations`. It should have methods to do the following:

- Get the current problem (e.g. at the start this will be `"Calculate num1 + num2"`),

- Check if a given answer is correct,

- Update the current problem (depending on whether the answer generated by the model was correct),

- Check if all problems have been solved,

<details><summary>Aside: Handling calculations</summary><br> When we handle the calculations for the model, technically we could use Python's <code>eval()</code> function (this is what <a href = "https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/calculator_tool.ipynb">Anthropic did</a>(!)). However, this function evaluates an arbitrary string expression, and so allows AI models to run arbitrary code. In the long-run, we're trying to do these evaluations on models which we suspect of being dangerous; so even though we could probably trust the current suite of language models offered by OpenAI and Anthropic, we should get into the good habit of not running arbitrary code outputted by language models (except in very carefully set-up environments). To this end, we've implemented an <code>evaluate_expression</code> function for you to use instead. It should already be imported from <code>utils</code>.</details>


In [237]:
class ArithmeticTask:
    def __init__(self, num1: int | float, num2: int | float):
        self.num1 = num1
        self.num2 = num2
        self.operations: List[str] = ["+", "-", "*", "/", "%", "//"]
        self.correct_answers: Dict[str, float] = self._generate_answers()
        self.is_solved: Dict[str, bool] = {expr: False for expr in self.correct_answers}
        self.current_task_number = 0

    def _generate_answers(self) -> Dict[str, float]:
        return {
            f"{self.num1} {op} {self.num2}": eval(f"{self.num1} {op} {self.num2}")
            for op in self.operations
        }

    def get_current_task(self) -> str:
        """
        Returns the current task as a string
        """
        return f"{str(self.num1)} {self.operations[self.current_task_number]} {str(self.num2)}"

    def get_instructions(self) -> str:
        """
        Returns a string containing initial task instructions for the agent.
        """
        return f"Calculate the result of the following expression: {self.get_current_task()}. Give your final answer in the format: <answer>NUMBER</answer>, where NUMBER is a numerical value"

    def check_solved(self) -> bool:
        """
        Returns True if all tasks are solved, False otherwise
        """
        return all(self.is_solved.values())

    def check_answer(self, model_answer: str) -> bool:
        """
        Returns True if the model_answer is correct, False otherwise
        """

        correct_answer = self.correct_answers[self.get_current_task()]
        return math.isclose(
            float(model_answer), correct_answer, rel_tol=1e-5, abs_tol=1e-8
        )

    def update_current_task(self) -> None:
        """
        Sets the current task as solved and updates the current_task_number by one
        """
        self.is_solved[self.get_current_task()] = True
        self.current_task_number = (self.current_task_number + 1) % len(self.operations)


x = ArithmeticTask(10, 15)
for problem, answer in x.correct_answers.items():
    print(f"{problem} = {answer}")

10 + 15 = 25
10 - 15 = -5
10 * 15 = 150
10 / 15 = 0.6666666666666666
10 % 15 = 10
10 // 15 = 0


## Function Calling

**Function calling** is a feature of LLM Chat APIs that allows the LLM to use external "tools" (i.e. Python functions, APIs) by simply receiving and outputing text. There are 5 simple steps to function calling:

1. Pick a function in your codebase that the model should be able to call (in this case, we will pick the function `calculate()` from our task class)

2. Describe your function to the model (following the syntax of the model's API) so it knows how to call it

3. Pass your function definitions as available “tools” to the model, along with the messages (following the syntax of the model's API)

4. Receive and handle the model response

5. Provide the function call result back to the model 

Chat models like ChatGPT and Claude are fine-tuned to recognize and respond to `tool` descriptions appropriately (just like `user` and `system` messages). In this way, you can allow LLMs to do complex actions like run code, make calls to other APIs, manipulate files etc. We do this by parsing their response output, executing the functions they've called ourselves, and then feeding the results back into the model so it can reason about them and take the next steps. This function-calling loop is the simplest version of a LLM agent, but more advanced LLM agents follow the same logic (except with more advanced tools and more complex task structures to pemirt more autonomous actions etc.).

[DIAGRAM]


### Exercise - Write a tool class for function calling
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵⚪⚪

You should spend up to 10 minutes on this exercise.
```

When writing tools, there will be two methods that need to be defined. The first is the `execute()` function. This should take in an arithmetical expression (e.g. `"3+5"`) and output the result of this expression (also as a string). The `execute()` function should always take the task as a variable (as often tools will need to be able to make significant changes to the task).

<details><summary>Aside: Handling calculations</summary><br> When we handle the calculations for the model, technically we could use Python's <code>eval()</code> function (this is what <a href = "https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/calculator_tool.ipynb">Anthropic did</a>(!)). However, this function evaluates an arbitrary string expression, and so allows AI models to run arbitrary code. In the long-run, we're trying to do these evaluations on models which we suspect of being dangerous; so even though we could probably trust the current suite of language models offered by OpenAI and Anthropic, we should get into the good habit of not running arbitrary code outputted by language models (except in *very* carefully set-up environments). To this end, we've implemented an <code>evaluate_expression</code> function for you to use instead. It should already be imported from <code>utils</code>.</details>

We then need to write the `description` property of our `"calculator"` function, so we can give it to our LLM agent as a tool. The syntax may differ between APIs (e.g. the OpenAI API has a different syntax than Anthropic API). Read OpenAI's [function calling guide](https://platform.openai.com/docs/guides/function-calling) to learn the syntax. The `description` property should just return a tool description (in the necessary json format). 

Therefore your tool should be defined according to the following structure:

In [238]:
class Tool:
    @abstractmethod
    def execute(task: Any, input: str) -> str: ...

    @property
    def description(self) -> str: ...

Here are some good practices for writing tool descriptions for Claude (according to Anthropic), they should generalize to other chat models:
- Provide extremely detailed descriptions. This is by far the most important factor in tool performance. Your descriptions should explain every aspect of the tool, including:

    - What the tool does

    - When it should be used (and when it shouldn’t)

    - What each parameter means and how it affects the tool’s behavior

    - Any important caveats or limitations, such as what information the tool does not return if the tool name is unclear. The more context you can give Claude about your tools, the better it will be at deciding when and how to use them. Aim for at least 3-4 sentences per tool description, more if the tool is complex.
    
- Prioritize descriptions over examples. While you can include examples of how to use a tool in its description or in the accompanying prompt, this is less important than having a clear and comprehensive explanation of the tool’s purpose and parameters. Only add examples after you’ve fully fleshed out the description.

Read Anthropic's examples of what good and bad tool calling looks like [here](https://docs.anthropic.com/en/docs/build-with-claude/tool-use#example-of-a-good-tool-description). 

Now write your tool class for the `CalculateTool` below. Inherit from the general `Tool` class defined above.

In [239]:
class CalculateTool(Tool):
    name = "calculate"

    @staticmethod
    def execute(expression: str, task: Any = None) -> str:
        """
        Evaluates the string expression in Python using `evaluate_expression()` and returns the result as a string
        """
        try:
            return str(evaluate_expression(expression))
        except (SyntaxError, NameError, ZeroDivisionError) as e:
            return f"Error: {str(e)}"

    @property
    def description(self):
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": 'Calculates the result of an arithmetic expression. For example, you could provide an input in the form "2+3" and the function would return 5. Or you could provide an expression like "10/3" and the function would return 3.3333333333333335.',
                "parameters": {
                    "type": "object",
                    "properties": {
                        "expression": {
                            "type": "string",
                            "description": "The arithmetic expression that you want to be evaluated.",
                        }
                    },
                    "required": ["expression"],
                    "additionalProperties": False,
                },
            },
        }


Calculator = CalculateTool()

<details><summary> Aside - What is a @staticmethod?</h2></summary>

The `@staticmethod` decorator in Python is used to define a static method within a class. Here are some key points about static methods:
1. They don't use instance- or class-specific data, thus does not require a first parameter `self` or `cls`.
2. They're often used for utility functions related to the class.

In our `ArithmeticTask` class, the `calculate` method is defined as a static method:

```python
@staticmethod
def calculate(expression: str) -> str:
    """Evaluates the string expression and returns the result as a string."""
    try:
        return str(evaluate_expression(expression))
    except (SyntaxError, NameError, ZeroDivisionError) as e:
        return f"Error: {str(e)}"
```

This can be called on the class itself without creating an instance:

   ```python
   result = ArithmeticTask.calculate("2 + 3")
   ```

You can also call it on an instance of the class, but this is not the convention because it doesn't utilize the instance in any way (it doesn't have access to `self`):
   ```python
   problem = ArithmeticTask(10, 15)
   result = problem.calculate("2 + 3")
   ```

Typically, you would make "stand-alone" functions that do not depend on class methods or class/instance attributes a static method. Using `@staticmethod` in this case:
1. Makes the code's intent clearer (this method doesn't need class or instance data).
2. Slightly improves performance (no `self` argument needs to be passed).
3. Allows the method to be used without creating an instance of the class.

</details>

You can include the tool description in the API call simply by giving it as an arg to `tools` (the description has to be in a list, as the `create()` function's `tools` argument only accepts lists of tool descriptions): 

In [240]:
messages = [{"role": "user", "content": "Calculate 2+3"}]
client = establish_client_OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=[Calculator.description],
    tool_choice="auto",
)

print(response.choices[0].message.content)
print(response.choices[0].message.tool_calls)

None
[ChatCompletionMessageToolCall(id='call_7wk5klGteMKOXjwLi3wx9mTM', function=Function(arguments='{"expression":"2+3"}', name='calculate'), type='function')]


<details><summary>Why is <code>message.content = None</code>?</summary>

When LLMs use tools, they often don't generate any text output. This can be a problem later when you try to get the model to do chain-of-thought reasoning. To get around this, it can be better to make two calls to the model for more complex tool use: one call to get the model to reason about the actions it should take, and then another to get the model to use a tool to take those actions.

</details> 

### Exercise - Return tool call results to the model
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵⚪⚪

You should spend up to 10-15 minutes on this exercise.
```

In order to return the response of tools to OpenAI LLMs, you'll need to add **two** items to the `messages` list after the model has made a tool call in a `ChatCompletionMessage` output:
1. The `ChatCompletionMessage` object itself (containing the original tool call message generated by the model). 

2. The tool response (containing the results of the tool call), in a specific format.

Each tool response has to respond to a specific tool call in a `ChatCompletionMessage`, and if we ever try to get the model to generate a response with an unanswered tool call in `messages`, the API will raise an error.

Below is the typical `response.choices[0]` output you will being generated by `chat.completions.create()`. The ChatCompletionMessage is accessed via `response.choices[0].message`. You can access the tool calls via `response.choices[0].tool_calls`, which will return a list of `ChatCompletionMessageToolCall` objects.

```python
Choice(
    finish_reason="tool_calls",
    index=0,
    logprobs=None,
    message=chat.completionsMessage(
        content=None,
        role="assistant",
        function_call=None,
        tool_calls=[
            chat.completionsMessageToolCall(
                id="call_62136354",
                function=Function(arguments='{"expression":"2+3"}', name="calculate"),
                type="function",
            )
        ],
    ),
)
```

We have provided a function that formats the tool response in the correct syntax to be returned to the model. Read the format to understand what it looks like (you do not need to memorize this as you can always find it on OpenAI's [function calling guide](https://platform.openai.com/docs/guides/function-calling).)

In [241]:
def apply_tool_call_format(
    tool_call: ChatCompletionMessageToolCall, content: str
) -> dict:
    """
    Formats the response of a tool call to be returned to the model.
    Args:
        - tool_call : ChatCompletionMessageToolCall
        - content : str - This is the tool response (i.e. results from executing the tool)
    """
    return {
        "role": "tool",
        "tool_call_id": tool_call.id,
        "name": tool_call.function.name,
        "content": content,  # e.g. "5"
    }

Now, we can generate a message and return the tool call response to the model:

In [242]:
messages = [{"role": "user", "content": "Calculate 5/3. Be precise."}]
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=[Calculator.description],
    tool_choice="auto",
)

messages.extend(
    [
        response.choices[0].message,
        apply_tool_call_format(
            response.choices[0].message.tool_calls[0],
            Calculator.execute(
                json.loads(
                    response.choices[0].message.tool_calls[0].function.arguments
                )["expression"]
            ),
        ),
    ]
)

response_to_tool_calls = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=[Calculator.description],
    tool_choice="auto",
)
print(response_to_tool_calls.choices[0].message.content)

The result of \( \frac{5}{3} \) is approximately 1.6666666666666667.


## Building the Agent

REwork this

Most LLM agents share these core components:

1. **LLM API interface**: A basic function (e.g. `get_response()`) that makes the API calls to the LLM and return its responses. (IN AGENT)

2. **Actions**: A set of actions (i.e. functions) the agent can take. (MOSTLY IN TASK)

3. **Task State Management**: Keeping track of the current state of the task and any relevant context. (IN TASK MOSTLY)

4. **Memory**: A system for storing and retrieving relevant information from past interactions (i.e. chat history). The simplest implemention is usually a `self.chat_history` class attribute that stores a list of past chat messages. (IN AGENT)

5. **Observation Parser**: Functions to parse and interpret the results of actions and update the state. (IN TASK MOSTLY)

6. **Decision/Execution Logic**: The rules or algorithms used to choose actions based on the current state and LLM output. (KIND OF IN BETWEEN)

7. **Task-Specific Information**: Any additional information or functions specific to the task at hand. (IN TASK)

[Diagram]

We will first implement a `SimpleAgent` class that is not specific to the `ArithmeticTask`, so that we can see the key components of an generic LLM agent.

### Exercise - Implement `SimpleAgent`
```c
Difficulty: 🔴🔴🔴🔴⚪
Importance: 🔵🔵🔵🔵🔵

You should spend up to 20-25 minutes on this exercise.
```

Build out the following simple agent class by filling in `get_response()` and `execute_tool_calls()` functions.

In [243]:
class SimpleAgent(ABC):
    def __init__(
        self,
        task: Any = None,
        model: Literal["gpt-4o-mini"] = "gpt-4o-mini",
        tools: Optional[List[Any]] = None,
        history: List[dict] = [],
    ):
        self.model = model
        self.task = task
        self.tools = tools
        self.client = OpenAI()
        self.chat_history = history

    @retry_with_exponential_backoff
    def get_response(self, use_tool: bool = True) -> ChatCompletionMessage:
        """
        Get the response from the model via an API call, with the option of tool calling.
        """
        response = self.client.chat.completions.create(
            model=self.model,
            messages=self.chat_history,
            tools=[tool.description for tool in self.tools] if use_tool else None,
            tool_choice="auto" if use_tool else None,
        )
        return response.choices[0].message

    def execute_tool_calls(self, message: ChatCompletionMessage) -> List[str]:
        """
        Execute the tool calls in the message and return a list of tool_responses.
        """
        tool_calls = message.tool_calls

        tool_responses = []
        for tool_call in tool_calls:
            if not self.task:
                raise ValueError("Task is not set. Cannot execute tool calls.")
            func = next(
                (tool for tool in self.tools if tool.name == tool_call.function.name),
            )
            arguments = json.loads(tool_call.function.arguments)
            tool_response = func.execute(**arguments, task=self.task)
            tool_responses.append(tool_response)

        return tool_responses

    def run(self, with_tool: bool = True):
        """
        Default implementation of run method.
        This can be overridden in subclasses for specific behavior.
        """
        print(f"Running SimpleAgent...")
        instruction = self.task.get_instructions()
        self.chat_history.append(apply_user_format(instruction))
        response = self.get_response(use_tool=with_tool)
        return response

In [244]:
my_simple_agent = SimpleAgent(ArithmeticTask(10, 15), tools=[Calculator])
my_simple_agent.run()

# Try execute the tool calls


Running SimpleAgent...


ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_Qe7cEvrpLbUvFGw8W5GlCJU2', function=Function(arguments='{"expression":"10 + 15"}', name='calculate'), type='function')], refusal=None)

### Exercise - Build an `ArithmeticAgent`
```c
Difficulty: 🔴🔴🔴🔴⚪
Importance: 🔵🔵🔵🔵🔵

You should spend up to 20-25 minutes on this exercise.
```

Add instructions here:
1. work out the decision tree of the task for ~10min; we give a half-filled task tree, then the full task tree in a drop down
2. write `run()` - they will implement everything after "# Handle the response" in run(); we will give them parse_answer.

In [245]:
class ArithmeticAgent(SimpleAgent):
    """
    ArithmeticAgent class for doing simple arithmetic tasks.

    Inherits from SimpleAgent and includes the following attributes and methods:

    Attributes:
        model (str): The model used for generating responses (inherited)
        tool_descriptions (List[dict]): List of tool descriptions (inherited)
        client (OpenAI): OpenAI client for API calls (inherited)
        task (Any): The current task being executed (inherited)
        num_tries (int): Number of tries allowed for each action (inherited)
        history (List[dict]): History of interactions (inherited)

    Methods:
        get_response(use_tool: bool = True) -> ChatCompletionMessage:
            Get response from the model (inherited)
        execute_tool_calls(message: ChatCompletionMessage) -> List[str]:
            Execute tool calls from the model's response (inherited)
        run(task: 'WikiGame', with_tool: bool = True) -> bool:
            Run one loop of the Wikipedia agent
    """

    def __init__(
        self,
        model: Literal["gpt-4o-mini"] = "gpt-4o-mini",
        task: Any = None,
        tools: Optional[List[Any]] = [Calculator],
        verbose: bool = True,
    ):
        super().__init__(model=model, task=task, tools=tools)
        self.verbose = verbose

    def run(self, with_tool: bool):
        """Run one loop of the agent, which involves:
        - getting a task
        - getting a response from the model
        - handling the model response, including tool calls, refusals, no tool calls, parsing and checking final answers, errors.
        - managing memory: storing the history of messages to self.chat_history
        - managing task state: staying on the same task or moving to the next task at the end of the loop
        """
        # Get a task instruction
        instruction = self.task.get_instructions()
        if self.verbose:
            print("\nUSER:\n", instruction)
        self.chat_history.append(apply_user_format(instruction))

        # Get the response from the model
        response = self.get_response(use_tool=with_tool)
        if self.verbose:
            print("Model response:", response.content)
        # Handle the response
        ## If tool calls, do the tool calls and return the response
        if response.tool_calls:
            if self.verbose:
                print(response.tool_calls)

            # Append the original function calls to the conversation
            self.chat_history.append(response)

            # Execute the tool calls
            tool_responses = self.execute_tool_calls(response)

            # Handle tool responses
            for tool_call, tool_response in zip(response.tool_calls, tool_responses):
                self.chat_history.append(
                    apply_tool_call_format(tool_call, tool_response)
                )

            # Call the model again answer the question with the tool response
            response = self.get_response(use_tool=with_tool)
            self.chat_history.append(apply_assistant_format(response.content))
            if self.verbose:
                print("\nModel response:", response.content)

            # Check the answer
            try:
                model_answer = self.parse_answer(response)

                if self.task.check_answer(model_answer):
                    self.chat_history.append(apply_user_format("Correct."))
                    if self.verbose:
                        print("\nUser: Correct.")
                    # Update to the next task
                    self.task.update_current_task()
                else:
                    self.chat_history.append(apply_user_format("Incorrect."))
                    if self.verbose:
                        print("\nUser: Incorrect.")
                    # Retry the task

            # Ends the task if there's an error parsing the model answer
            except Exception as e:
                if self.verbose:
                    print("\nError parsing model answer:", e)
                raise

        ## If no tool call: Handle edge cases
        ### Check if there's a refusal to answer:
        elif response.refusal:
            if self.verbose:
                print("\nModel Refusal:", response.refusal)
            self.chat_history.append(apply_assistant_format(response.refusal))
            # Go to next task
            self.task.update_current_task()

        ### Else finish_reason is "stop", in which case the model was just responding directly to the user without tool calls
        elif response.finish_reason == "stop":
            self.chat_history.append(
                apply_user_format(
                    "You did not use the tool to answer the question. Please use the tool to answer the question."
                )
            )
            if self.verbose:
                print("\nModel response:", response.content)
            if self.verbose:
                print(
                    "\nUser:\n You did not use the tool to answer the question. Please use the tool to answer the question."
                )

    def parse_answer(self, message: ChatCompletionMessage) -> ChatCompletionMessage:
        """
        Extract the numerical answer from the string output of the model
        """
        response = message.content
        if response.find("<answer>") != -1:
            startpoint = response.find("<answer>") + 8
            endpoint = response.find("</answer>")
            return float(response[startpoint:endpoint])


### Exercise - Execute the task via an agent_loop 
```c
Difficulty: 🔴⚪⚪⚪⚪
Importance: 🔵🔵🔵🔵⚪

You should spend up to 5-10 minutes on this exercise.
```

Try implementing the agent_loop below with and without tools, to see how much better the model does when we give it tools.

In [246]:
task = ArithmeticTask(1500, 1091)
agent = ArithmeticAgent(task=task, verbose=True, tools=[Calculator])


def agent_loop(num_loops: int = 10):
    for i in range(num_loops):
        if not task.check_solved():
            agent.run(with_tool=True)
        else:
            print("\nAll tasks solved.")
            break


agent_loop()


USER:
 Calculate the result of the following expression: 1500 + 1091. Give your final answer in the format: <answer>NUMBER</answer>, where NUMBER is a numerical value
Model response: None
[ChatCompletionMessageToolCall(id='call_HQVmiK9gklEfiIVm7uMHQUtY', function=Function(arguments='{"expression":"1500 + 1091"}', name='calculate'), type='function')]

Model response: <answer>2591</answer>

User: Correct.

USER:
 Calculate the result of the following expression: 1500 - 1091. Give your final answer in the format: <answer>NUMBER</answer>, where NUMBER is a numerical value
Model response: None
[ChatCompletionMessageToolCall(id='call_Ze4huS1PhBJMYtZL8mXBgetM', function=Function(arguments='{"expression":"1500 - 1091"}', name='calculate'), type='function')]

Model response: <answer>409</answer>

User: Correct.

USER:
 Calculate the result of the following expression: 1500 * 1091. Give your final answer in the format: <answer>NUMBER</answer>, where NUMBER is a numerical value
Model response: N

We can print all the messages from the `ChatHistory` as follows:

In [247]:
for message in agent.chat_history:
    try:
        print(str(message.content))
    except:
        print(message["content"])

Calculate the result of the following expression: 10 + 15. Give your final answer in the format: <answer>NUMBER</answer>, where NUMBER is a numerical value
Calculate the result of the following expression: 1500 + 1091. Give your final answer in the format: <answer>NUMBER</answer>, where NUMBER is a numerical value
None
2591.0
<answer>2591</answer>
Correct.
Calculate the result of the following expression: 1500 - 1091. Give your final answer in the format: <answer>NUMBER</answer>, where NUMBER is a numerical value
None
409.0
<answer>409</answer>
Correct.
Calculate the result of the following expression: 1500 * 1091. Give your final answer in the format: <answer>NUMBER</answer>, where NUMBER is a numerical value
None
1636500.0
<answer>1636500</answer>
Correct.
Calculate the result of the following expression: 1500 / 1091. Give your final answer in the format: <answer>NUMBER</answer>, where NUMBER is a numerical value
None
1.374885426214482
<answer>1.374885426214482</answer>
Correct.
Calc

# 3️⃣ Building a More Complex Task: WikiGame

Now that we know how to do function calling and how to design an LLM agent in general, we will build a more complicated task. This task won't be instantly solvable by LLMs with simple tool use and will require us to elicit better capabilities from models.

The task we will build and elicit behavior for will be the [Wikipedia Game](https://en.wikipedia.org/wiki/Wikipedia:Wiki_Game): Players use wiki-links to travel from one Wikipedia page to another and the first person who reaches the destination page wins the race. This is not directly related to any dangerous capabilities, and if GPT-N+1 could do this task, but GPT-N couldn't, we wouldn't tell OpenAI to be particularly careful about the release of GPT-N+1 as a result. However, it makes a useful test case for elicitation methods, since there are many strategies for deciding what path to take and we can create a scale of difficulty by choosing different articles to navigate to/from.

To add:
- Description of MVP Goal
- EXCALIDRAW! (describing wikipedia game.)

## Quick Intro to the Wikipedia API


Our agent will interact with Wikipedia by making tool calls to the [Wikipedia API](https://wikipedia.readthedocs.io/en/latest/quickstart.html), which is simple to use. We will only need to learn the following key functions for the game. 

1. `wikipedia.page` - Returns a Wikipedia page object, whcih contains various attributes adn methods to access page content. (See [page docs](https://wikipedia-api.readthedocs.io/en/latest/API.html#wikipediapage) for these attributes.)
2. `wikipedia.page.title` - Returns the title of the page
3. `wikipedia.page.contents` - Returns the full text content of the page (this can be very long, make sure to take snippets when you can as to not use up the context length of the LLM)
4. `wikipedia.page.summary` - Returns a summary of the page (i.e. all the text in the first section of the Wikipage).
5. `wikipedia.page.links` - Returns a list of all links as strings

Kwargs:
- `auto_suggest` - Let Wikipedia find a valid page title for the query. 
- `redirect` - Allow redirection without raising RedirectError

Refer to the [docs](https://wikipedia-api.readthedocs.io/en/latest/API.html#) for more information. 

<details><summary> Aside: Wikipedia API content can be weird!</summary>

The wikipedia API often outputs content in unintuitive ways. For example, articles that are essentially just a big list become near useless, since the content omits the list (for example, see the wikipedia API content for <a href = "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population">List of countries and dependencies by population</a>). Another issue that you might encounter is that the API formats mathematical expressions in $\LaTeX$ pretty poorly (but there are usually very few links to be found here anyway). This is why it's important to determine what content the wikipedia API produces when `.content` is called — and why you want to make sure you're testing a large diversity of wikipedia articles.

</details>
<br>
<details><summary> Aside: Wikipedia "summaries" can be long!</summary>

The wikipedia API accesses summaries of pages by presenting all the information before the first titled section. For certain (generally obscure) wikipedia pages, this summary itself can be extremely long, and contain lots of information that is unnecessary to determine the key information about the page the model should be trying to access. We'll handle this later when it comes up by truncating wikipedia's summary to just the first ~1000 characters

</details>

Now run the following code to see how this works!


In [248]:
# Retrieve a Wikipedia page
page = wikipedia.page("Python (programming language)")

# Access basic page information
print("Title:", page.title)
print("URL", page.url)
print(f"\nSummary (word count {len( page.summary.split())}):", page.summary)
print(
    f"\nContent (word count {len( page.content.split())}):",
    page.content[:500],
    "......",
)
print(
    f"\nLinks (link count {len(page.links)}): [", ", ".join(page.links[:7]), "......]"
)

Title: Python (programming language)
URL https://en.wikipedia.org/wiki/Python_(programming_language)

Summary (word count 135): Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.
Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000. Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7.18, released in 2020, was the last release of Python 2.
Python consistently ranks as one of the most popular programming langu

Now run these two lines (you should see different errors):

In [250]:
page = wikipedia.page("Python")

DisambiguationError: "Python" may refer to: 
Pythonidae
Python (genus)
Python (mythology)
Python (programming language)
CMU Common Lisp
PERQ 3
Python of Aenus
Python (painter)
Python of Byzantium
Python of Catana
Python Anghelo
Python (Efteling)
Python (Busch Gardens Tampa Bay)
Python (Coney Island, Cincinnati, Ohio)
Python (automobile maker)
Python (Ford prototype)
Python (missile)
Python (nuclear primary)
Colt Python
Python (codename)
Python (film)
Monty Python
Python (Monty) Pictures
Timon of Phlius
Pyton
Pithon

In [251]:
page = wikipedia.page("Animalss", auto_suggest=False)

PageError: Page id "Animalss" does not match any pages. Try another id!

We can handle the errors using the following code:

In [252]:
# Fixes RedirectError

page = wikipedia.page("Animalss", redirect=True)
print(page.title)

# Fixes DisambiguationError

try:
    page = wikipedia.page("Python")
except DisambiguationError as e:
    page = wikipedia.page(e.options[0])
print(page.title)

Animal
Pythonidae


The above code gives a `DisambiguationError` because the title "Python" can correspond to multiple pages. Then there is a `PageError` for "Animalss" as there is no Wikipedia name with that title.

To handle these errors, we have implemented a simple function `get_page` for you to get the page object for a particular page title. This handles `RedirectError` by setting `redirect=True`, and also handles `DisambiguationError` by choosing the first option in the list of potential pages we could be referring to.

We handle `PageError` by setting `auto_suggest=True`, and letting wikipedia guess at the page we mean (this is a last resort, and hopefully won't be necessary).

<details><summary>What do <code>redirect</code> and <code>auto_suggest</code> do?</summary>

**Redirect**

The keyword `redirect` tells the API to allow Wikipedia to provide redirections. This happens when you reference an article in a manner which is slightly different than how it is stored in Wikipedia. This rarely happens when we will use the wikipedia API, as we will access pages based on how they are stored in Wikipedia, but as an example:
```python
page = wikipedia.page("huMan", redirect = True, auto_suggest=False)
```
will return a `WikipediaPage` object for the "Human" page. However,
```python
page = wikipedia.page("huMan", redirect=False, auto_suggest=False)
```
will return a `PageError` (since there is a page called "Human" but not "huMan"). The Wikipedia API will generally access the correct page if there is a capitalization issue on the first letter, but a capitalization error in the middle of the word will raise an error (unless `redirect=True`).

<br>

**Auto suggest**

The keyword `auto_suggest` tells the API to allow Wikipedia to provide suggestions. This allows a lot more than `redirect` does, since `redirect` is only for the "obvious" cases (e.g. "huMan" → "Human", "U.S. President" → "President of the United States", etc.). When `auto_suggest` is true, it would allow something like "president of states" → "President of the United States", "gogle" → "Google"; both of which would raise an error if `redirect = True, auto_suggest = False`.

However, `auto_suggest` can sometimes be *too* permissive and lead to errors, for example:

```python
page = wikipedia.page("Human", redirect= False, auto_suggest=True)
```
will return a `WikipediaPage` object for the "Man" page. This is clearly not what we were trying to access, and the `auto_suggest` has gotten carried away in this case.

If `redirect = True` and `auto_suggest=True`, then `auto_suggest` takes priority.
</details>



In [266]:
def get_page(title: str) -> WikipediaPage:
    try:
        return wikipedia.page(title, auto_suggest=False, redirect=True)
    except DisambiguationError as e:
        return wikipedia.page(e.options[0], auto_suggest=False, redirect=True)
    except PageError as e:
        return wikipedia.page(title, auto_suggest=True, redirect=True)


def get_word_count(text: str) -> int:
    return len(text.split())

In [267]:
# Get the Wiki page on "List of countries and dependencies by population"
title = "List of countries and dependencies by population"

wikipedia_page = get_page(
    title
)  # Experiment with different values for auto_suggest and redirect when using the agent. See what happens
print("Word count:", get_word_count(wikipedia_page.content))
print(wikipedia_page.content)


Word count: 376
This is a list of countries and dependencies by population. It includes sovereign states, inhabited dependent territories and, in some cases, constituent countries of sovereign states, with inclusion within the list being primarily based on the ISO standard ISO 3166-1. For instance, the United Kingdom is considered a single entity, while the constituent countries of the Kingdom of the Netherlands are considered separately. In addition, this list includes certain states with limited recognition not found in ISO 3166-1. Also given in a percentage is each country's population compared with the world population, which the United Nations estimates at 8.13 billion as of 2024.


== Method ==

Figures used in this chart are based on the most up-to-date estimates or projections by the national census authority, where available, and are usually rounded off.
Where updated national data are not available, figures are based on the estimates or projections for 2024 by the Population 

### Exercise - Get permitted links from a wikipedia page
```c
Difficulty: 🔴⚪⚪⚪⚪
Importance: 🔵🔵⚪⚪⚪

You should spend up to 5-10 mins on this exercise.
```

When you get the links from a page using `page.links`, this will include every possible Wikipedia link that is accessible from the HTML on that page, including those that are not in the main page content (e.g. links in sidebars, links in footnotes etc.), which are either irrelevant or not permitted by the rules of the Wiki game. Write a simple `get_permitted_links` function, that only returns the links that can be found inside the main content. The resulting list of permitted links should be about a third as long as the list of links from the wikipedia API (with more variance for shorter articles as you would expect). 
<!-- When writing this function, if you manage to get the links in a very effective way, then do that. But remember that Wikipedia is written by a large number of different contributors, often adhering to inconsistent stylings (especially for smaller articles). We just need to get something that **works well enough**. Put more time into doing this effectively if you want at the end, but as soon as something plausibly works, you should move on.

<img src="https://imgs.xkcd.com/comics/code_lifespan_2x.png" width="400px" style = "margin-left: auto; margin-right: auto;display:block"></img> -->

In [268]:
def get_permitted_links(current_page: WikipediaPage) -> list[str]:
    """
    Get "permitted" links (i.e. links that are in the content of the page) from a Wikipedia page.
    """
    all_links = current_page.links
    content = current_page.content
    permitted_links = [link for link in all_links if link in content]
    return permitted_links

Finally, we've implemented a `get_content` function, which the agent will use to get the content of a Wikipedia page. This wraps all the texts that correspond to links in `<link></link>` tags (since otherwise they are presented as strings and indistinguishable from normal text.). 

TODO: DELETE THIS SINCE IT'S IN TOOL STUFF LATER ON?

<details><summary>Why not just use `page.links` to get a list of links directly?</summary>

We don't just present a list of the accessible links, as this is not very faithful to the wikipedia game. The agent does perform somewhat better if we just give it a list of links, but the task of parsing the content of wikipedia pages and isolating the most important links is where the majority of the challenge of the wikipedia game lies.

</details>

In [269]:
def get_content(page: WikipediaPage) -> str:
    content = page.content
    permitted_links = get_permitted_links(page)
    for word in sorted(permitted_links, key=len, reverse=True):
        content = re.sub(
            r"""(\s|[,.)!?;:'"])(""" + re.escape(word) + r""")(\s|[,.)!?;:'""])""",
            r"\1<link>\2</link>\3",
            content,
            count=1,
            flags=re.IGNORECASE,
        )
    return content


wiki_page = get_page("Large language model")
print(get_content(wiki_page))

A large <link>language model</link> (LLM) is a computational <link>model</link> capable of language generation or other natural language processing tasks. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.
The largest and most capable LLMs, as of August 2024, are artificial neural networks built with a decoder-only transformer-based architecture, which enables efficient processing and generation of large-scale text data. Modern models can be fine-tuned for specific tasks or can be guided by <link>prompt engineering</link>. These models acquire predictive power regarding syntax, semantics, and ontologies inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained on.
Some notable LLMs are <link>OpenAI</link>'s GPT series of models (e.g., <link>GPT-3.5</link>, <link>GPT-4</link> and <link>GPT-4o</link>; use

## Build `WikiGame`

### Exercise - Build a class for the Wiki game
```c
Difficulty: 🔴🔴🔴🔴⚪
Importance: 🔵🔵🔵🔵⚪

You should spend up to 25-30 mins on this exercise.
```

Implement the following class that instantiates the wikipedia game. When the model uses tools it will be making calls to this class, so make sure that the functions return messages you're happy for the model to see as a tool response, such as error messages if the tool doesn't work.

Use code from the `get_permitted_links` and `get_content` functions above in this class.

In [270]:
class BaseWikiGame:
    def __init__(
        self,
        starting_page: str,
        goal_page: str,
    ):
        """
        Initialize the Wikipedia game object.

        Args:
            starting_page (str): The page the agent starts on.
            goal_page (str): The page the agent is trying to reach.
        """
        self.page_history: List[str] = [starting_page]
        self.starting_page: WikipediaPage = self.get_page(starting_page)
        self.goal_page: WikipediaPage = self.get_page(goal_page)
        self.current_page: WikipediaPage = self.starting_page

    # ========================= Helper Functions (given) =========================

    # Get page and page summary
    @staticmethod
    def get_page(title: str) -> WikipediaPage:
        """
        Get a Wikipedia page object by the title.

        Args:
            title (str): The title of the Wikipedia page.

        Returns:
            WikipediaPage: The Wikipedia page object.
        """
        try:
            return wikipedia.page(title, auto_suggest=False, redirect=True)
        except DisambiguationError as e:
            return wikipedia.page(e.options[0], auto_suggest=False, redirect=True)
        except PageError as e:
            return wikipedia.page(title, auto_suggest=True, redirect=True)

    def get_page_summary(self, page: WikipediaPage | None = None) -> str:
        """
        Get summary of a wikipedia page, to the last full stop within the first 500 characters. This is used to give a brief overview of the page to the agent.

        Args:
            page (WikipediaPage): The Wikipedia page object.

        Returns:
            str: The summary of the Wikipedia page.
        """
        page = page if page else self.goal_page
        summary = page.content[:500]
        last_period_index = summary.rfind(".")
        return summary[: last_period_index + 1] if last_period_index != -1 else summary

    # Get and check permitted links
    def get_permitted_links(self, title: Optional[str] = None) -> list[str]:
        """
        Returns a list of permitted links (i.e. links in the main page content) for the current page.

        Args:
            title (Optional[str]): The title of the Wikipedia page. If None, uses the current page.

        Returns:
            list[str]: The permitted links.
        """
        if title:
            page = self.get_page(title)
            all_links = page.links
            content = page.content
            permitted_links = [link for link in all_links if link in content]
        else:
            all_links = self.current_page.links
            content = self.current_page.content
            permitted_links = [link for link in all_links if link in content]
        return permitted_links

    def is_permitted_link(self, link: str) -> bool:
        """
        Returns True if the link is in the permitted links for the current page, False otherwise.

        Args:
            link (str): The link to check.

        Returns:
            bool: True if the link is permitted, False otherwise
        """
        return link.lower() in (x.lower() for x in self.get_permitted_links())

    # ========================= Task State Management (to implement) =========================

    @property
    def start_instruction(self) -> dict:
        """
        Generate the start instructions for the game.
        """
        return {
            "role": "system",
            "content": f"You are a wikipedia-racing AI. Your goal is to reach {self.goal_page.title} by accessing links from a series of wikipedia pages. Your current page is {self.current_page.title}.",
        }

    @property
    def on_page_instruction(self) -> dict:
        """
        Generate instructions for the current page.
        """
        return {
            "role": "user",
            "content": f"""You are currently on page: {self.current_page.title}. Make sure you start by reasoning about what steps you should take to get to the article on {self.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, {self.goal_page.title} has the following summary: 
            
            [Begin Summary]
            {self.get_page_summary(self.goal_page)}
            [End Summary]""",
        }

    @property
    def next_step_instruction(self) -> dict:
        return {
            "role": "user",
            "content": f"What's your next step to get to {self.goal_page.title}?",
        }

    def get_instructions(self, system: bool, on_page: bool, next_step: bool) -> str:
        """
        Generate instruction messages based on the current game state.
        """
        messages = []
        if system:
            messages.append(self.start_instruction)
        if on_page:
            messages.append(self.on_page_instruction)
        if next_step:
            messages.append(self.next_step_instruction)
        return messages

    def move_page(self, new_page: str) -> str:
        """
        Changes the current page of the game. To be used when we want to move from Page A to Page B
        """
        new_page_lower = new_page.lower()
        new_page_normalized = new_page.replace("_", " ")

        if self.is_permitted_link(new_page_lower) or self.is_permitted_link(
            new_page_normalized
        ):
            self.current_page = self.get_page(new_page_normalized)
            self.page_history.append(self.current_page.title)
            return f"Moving page to {self.current_page.title}"
        else:
            return f"Couldn't move page to {new_page}. This is not a valid link."

    def check_win(self) -> bool:
        return self.current_page == self.goal_page


### Exercise - Build Tools for the Wiki Game
```c
Difficulty: 🔴⚪⚪⚪⚪
Importance: 🔵🔵🔵⚪⚪

You should spend up to 5-10 mins on this exercise.
```

Fill in the following tool classes that the agent will need to use to accomplish this game.
- For the `get_content_tool`, you should just fill in the description, since we've implemented the functionality for you.
- For the `move_page_tool`, you should implement both the `execute()` function and the `description()` property.

When formatting this tool list, refer back to the solution for you wrote for the arithmetic game, or else the docs are [here](https://platform.openai.com/docs/guides/function-calling).

In [271]:
class get_content_tool(Tool):
    name = "get_content"

    @staticmethod
    def execute(task: Any) -> str:
        content = task.current_page.content
        permitted_links = get_permitted_links(task.current_page)
        for word in sorted(permitted_links, key=len, reverse=True):
            content = re.sub(
                r"""(\s|[,.)!?;:'"])(""" + re.escape(word) + r""")(\s|[,.)!?;:'"s])""",
                r"\1<link>\2</link>\3",
                content,
                count=1,
                flags=re.IGNORECASE,
            )
        return content

    @property
    def description(self):
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": "Get all the content for the wikipedia page you are currently on. Anything which corresponds to a link you can select to move to will be wrapped in <link></link> tags.",
                "parameters": {
                    "type": "object",
                    "properties": {},
                    "required": [],
                },
            },
        }


class move_page_tool(Tool):
    name = "move_page"

    @staticmethod
    def execute(new_page: str, task: Any) -> str:
        """
        Changes your current page to a specified new page which is accessible via a link from the current page. You can only call this function once at a time, as it will take you to a different page.

        Args:
            task (BaseWikiGame): The current task object.
            new_page (str): The title of the new page you want to move to. This should be formatted the way the title appears on wikipedia (e.g. to move to the wikipedia page for the United States of America, you should enter "United States"). Underscores are not necessary.
        """
        new_page_normalized = new_page.replace("_", " ")

        if task.is_permitted_link(new_page_normalized):
            task.current_page = task.get_page(new_page_normalized)
            task.page_history.append(task.current_page.title)
            return f"Moving page to {task.current_page.title}"
        else:
            return f"Couldn't move page to {new_page}. This is not a valid link."

    @property
    def description(self):
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": "Changes your current page to a specified new page which is accessible via a link from the current page. You can only call this function once at a time, as it will take you to a different page.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "new_page": {
                            "type": "string",
                            "description": 'The title of the new page you want to move to. This should be formatted the way the title appears on wikipedia (e.g. to move to the wikipedia page for the United States of America, you should enter "United States"). Underscores are not necessary.',
                        }
                    },
                    "required": ["new_page"],
                },
            },
        }


get_content_tool_inst = get_content_tool()
move_page_tool_inst = move_page_tool()
wiki_game_tools = [get_content_tool_inst, move_page_tool_inst]

### Exercise - Build a WikiAgent
```c
🔴🔴🔴🔴⚪
🔵🔵🔵🔵🔵

You should spend up to 30-40 mins on this exercise.
```


Insturctions to give:
1. again, work out task decision tree 
2. let them implement handle_tool_call(), run(), start()

===================

Now that you have the `WikiGame` class and tools set up, build out a `WikiAgent` that can access these tools and solve the Wikipedia game. Build the agent so that it can be thrown into an agent loop (similar to the one we had for the arithmetic game) without much additional scaffolding. 

There are a few further considerations in this case that we didn't have for the arithmetic game. 

<details>
<summary>Context window considerations</summary>

Since the agent will need to read (potentially very long) Wikipedia articles to interact with the game, the length of the context window becomes relevant. GPT-4o and GPT-4o-mini both have context windows of 128k tokens (which corresponds to ~96k words) for reference, the wikipedia page for the United States has around 10k words alone and the agent will often need to visit more than 10 articles in one run of the game, not counting its own output, which eventually adds up to be significant. We'll solve this for now by resetting the messages of the agent every time it reaches a new wikipedia page, and providing an updated `user_message` (and possibly `system_message`) so that the agent can locate itself, and then proceed with the game. We'll address different methods for solving this issue later, you can probably already think of some. So be careful to include the current page and goal page for the agent in the `user_message`.

</details>
<br>


<details><summary>Providing information to the agent</summary>

There shouldn't be much on Wikipedia that the agent is completely unfamiliar with (AI companies *will* have scraped wikipedia), but it may be easily confused with something else, or be an article that was added before the training cutoff, and models can't always accurately recall information in their training data if they only come up once or twice. So you should use the game's get_summary function to provide details of the goal page to the agent in its initial message.

</details>
<br>

<details><summary> Getting output from the agent </summary>

In this case we'll have a lot more moving pieces than the `arithmeticGame` agent. In that case we could just print output from the agent loop. In this case, we strongly recommend that you should print output as it comes up in the agent class. If there's some chance you might not want to see this output, you should use a flag to determine whether to print content or not.

</details>

When making calls to the wikipediaGame class, make use of Python's `getattr()` function ([explanation here](https://www.w3schools.com/python/ref_func_getattr.asp)).



In [272]:
class WikiAgent(SimpleAgent):
    def __init__(
        self,
        task: BaseWikiGame,
        tools: List[Any],
        model="gpt-4o-mini",
        verbose: bool = True,
    ):
        super().__init__(model=model, tools=tools, task=task)

        self.chat_history = []
        self.full_chat_history = []  # All messages that have been sent in the chat history. We have to erase each time a new page is reached for context window reasons.
        self.verbose = verbose

    def update_history(self, message):  # WANT TO DELETE THIS FUNCTION
        """
        Update self.chat_history and self.full_chat_history with the message or list of messages.
        """
        if isinstance(message, list):
            self.chat_history.extend(message)
            self.full_chat_history.extend(message)
        else:
            self.chat_history.append(message)
            self.full_chat_history.append(message)

    def reset_history(self):
        """
        Empty self.chat_history of the agent.
        """
        self.chat_history = []

    def handle_tool_calls(
        self, response: ChatCompletionMessage
    ):  # WANT TO SIGNIFICANTLY REDUCE THE SIZE OF THIS FUNCTION IT SHOULD BE HANDLED WITH TOOL CALLS/OTHER FUNCTIONS. SHOULD NOT BE THIS LONG
        """
        Handles tool_calls:
            - Executes the tool calls using execute_tool_calls
            - Appends the original tool call & tool_responses to the history
            - If the agent has moved to a new page, resets the history
            - If not, get the "What's your next step?" instruction from the task and append it to history
        """

        # Append the original function calls to the conversation
        self.update_history(response)

        # Execute the tool calls
        tool_responses = self.execute_tool_calls(response)

        move_to_new_page = False

        # Handle tool responses
        for tool_call, tool_response in zip(response.tool_calls, tool_responses):
            self.update_history(apply_tool_call_format(tool_call, tool_response))

            if self.verbose:
                print(
                    f"\nTOOL CALL: \nTool = {tool_call.function.name}, Args = {tool_call.function.arguments} \nTOOL RESPONSE:\n {tool_response[:300]}"
                )

            if tool_response.startswith("Moving page"):
                move_to_new_page = True

        if move_to_new_page:
            # Reset the history
            self.reset_history()
            print(
                f"""{("-" * 50)} \n\nMOVED TO PAGE \n\nPATH HISTORY (N={len(self.task.page_history)}): {" -> ".join(self.task.page_history)} \n\n{("-"*50)}"""
            )

            # Give starting instructions
            self.start()

        else:
            # Append "What's your next step?" instruction
            next_step_message = self.task.get_instructions(
                system=False, on_page=False, next_step=True
            )
            self.update_history(next_step_message)
            if self.verbose:
                print(f"""\nUSER: \n{next_step_message[0]["content"]}""")

    def handle_refusal(self, response: ChatCompletionMessage):
        self.update_history(apply_assistant_format(response.refusal))
        if self.verbose:
            print(f"\nMODEL REFUSAL: {response.refusal}")

    def start(self):
        """
        Gives the starting instructions to the agent when starting the game.
        """
        instruction_message = self.task.get_instructions(
            system=True, on_page=True, next_step=False
        )
        self.update_history(instruction_message)
        if self.verbose:
            print(
                f"\nSYSTEM: \n{instruction_message[0]['content']} \n\nUSER: \n{instruction_message[1]['content']}"
            )

    def run(self):
        # Get the response from the model
        response = self.get_response()

        ## If response contains normal text, append this as an assistant response
        if response.content is not None:
            self.update_history(apply_assistant_format(response.content))
            if self.verbose:
                print(f"\nMODEL RESPONSE: \n{response.content}")

        # Handle the response
        ## If tool calls, do the tool calls and return the response
        if response.tool_calls:
            self.handle_tool_calls(response)

        ## If no tool call: Handle edge cases
        ### Check if there's a refusal to answer:
        elif response.refusal:
            self.handle_refusal(response)


### Exercise - Run the task
```c
Difficulty: 🔴⚪⚪⚪⚪
Importance: 🔵🔵⚪⚪⚪

You should spend up to 5 mins on this exercise.
```

Just like we did for the arithmetic agent, you should write an agent loop for the wikipedia agent (in this case you won't need to print output, as we handled it in the agent class so this function should be a very simple loop).

In [274]:
def agent_loop(agent, game, num_loops=10):
    agent.start()

    for i in range(num_loops):
        if game.check_win():
            print("Success!")
            return
        agent.run()


game = BaseWikiGame("Barack Obama", "India")
agent = WikiAgent(task=game, tools=wiki_game_tools)

In [275]:
agent_loop(agent, game, 10)


SYSTEM: 
You are a wikipedia-racing AI. Your goal is to reach India by accessing links from a series of wikipedia pages. Your current page is Barack Obama. 

USER: 
You are currently on page: Barack Obama. Make sure you start by reasoning about what steps you should take to get to the article on India. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, India has the following summary: 
            
            [Begin Summary]
            India, officially the Republic of India (ISO: Bhārat Gaṇarājya), is a country in South Asia.  It is the seventh-largest country by area; the most populous country with effect from June 2023; and from the time of its independence in 1947, the world's most populous democracy.
            [End Summary]

MODEL RESPONSE: 
To reach the article on India from the page on Barack Obama, I'll need to consider potenti

In [276]:
for message in agent.full_chat_history:
    try:
        print(str(message.content))
    except:
        print(message["content"])


You are a wikipedia-racing AI. Your goal is to reach India by accessing links from a series of wikipedia pages. Your current page is Barack Obama.
You are currently on page: Barack Obama. Make sure you start by reasoning about what steps you should take to get to the article on India. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, India has the following summary: 
            
            [Begin Summary]
            India, officially the Republic of India (ISO: Bhārat Gaṇarājya), is a country in South Asia.  It is the seventh-largest country by area; the most populous country with effect from June 2023; and from the time of its independence in 1947, the world's most populous democracy.
            [End Summary]
To reach the article on India from the page on Barack Obama, I'll need to consider potential links that connect various topics 

# 4️⃣ Elicitation

You may have observed that while the above implementation of `WikiAgent` succeeds at Obama -> India, it fails at ... However, this doesn't mean that GPT-4o-mini does not have this capability, but this capability might be blocked because we:

- Prompted the model poorly

- Stored the history poorly.

- Didn't give the model sufficient tools to accomplish the task.

- ...

In general, it is hard to show that the model does not have a capability, even if *we* failed to demonstrate this capability. For example, it took *3.5 years* after the release of GPT-2 (and 2.5 years after the release of GPT-3) for people to discover that [chain-of-thought reasoning](https://arxiv.org/abs/2201.11903) massively improves model performance. A failure mode for AI safety is that people discover similar breakthroughs that significantly increases model performance with minimal additional training, which is not accounted for in our safety evaluations. Thus, LLM agent evals aim to elicit the best capability we possibly can, until we feel we've managed to gain [**evidence of absence**](https://en.wikipedia.org/wiki/Evidence_of_absence), **not** just **absence of evidence**.


Broadly speaking, there are two categories of elicitation, narrow elicitation and general elicitation:

1. Narrow elicitation: methods that improve model performance on a particular task, or small class of tasks, but won't necessarily impact model performance in general across many tasks. 
    - E.g. Give the model access to the content of arbitrary wikipedia articles - This will improve performance on this task significantly, but wouldn't generalize to other tasks.

2. General elicitation: methods that improve model performance on a wide array of possible tasks. 
    - E.g. Chain-of-thought prompting - This tends to improve model performance on a wide array of tasks. These sorts of elicitation methods are the ones we're most interested in, as if researchers find an improvement to models that is roughly as easy and effective as chain-of-thought prompting, then we would see a very rapid increase in risk from AI.


We will try:
1. Prompt engineering
2. ReAct
3. Inflexion

Then, you 

## Prompting

You should already be aware that prompting can have a large impact on model performance. There are a large number of possible changes to prompts for this task. You should experiment first with more general elicitation methods such as getting the agent to think more deeply, and output plans in different ways. 

After this, you might try a wide array of narrow elicitation methods including:

- Telling the agent how many pages it's visited.

- Telling the agent if it's already visited the page it's on (and how many times).

- Schedule different prompts and planning methods for the "zoom out" and "zoom in" sections of the game, since the general strategy for the wikipedia game looks like:

    Specific article (with few links) -> General article (with many links) -> Specific article (with few links)


### Exercise - Engineer prompts
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵⚪⚪

You should spend up to 20-25 mins on this exercise.
```
This exercise should be scoped as solving a specific problem;achieving a specific observable thing. Otherwise, it's hard to tell when "you are done" with changing the prompt.
=============

Remember that your prompts obviously will have to be robust to: 

* Different tasks within the wikipedia game, 

* Different states within those tasks,

* Different failure-modes the agent could encounter.

Mess around with the prompting setup and see if you can significantly improve performance.

In [262]:
class WikiGame(BaseWikiGame):
    @property
    def start_instruction(self):
        return {
            "role": "system",
            "content": f"You are a wikipedia-racing AI. Your goal is to reach {self.goal_page.title} by accessing links from wikipedia pages. Your current page is {self.current_page.title}.",
        }

    @property
    def on_page_instruction(self):
        return {
            "role": "user",
            "content": f"""You are currently on page: {self.current_page.title}. Make sure you start by reasoning about what steps you should take to get to the article on {self.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, {self.goal_page.title} has the following summary: 
            
            [Begin Summary]
            {self.get_page_summary(self.goal_page)}
            [End Summary]
            """,
        }

In [263]:
game = WikiGame("Aristotle", "Othello")
agent = WikiAgent(game, model="gpt-4o-mini")
agent_loop(agent, game, 5)

TypeError: WikiAgent.__init__() missing 1 required positional argument: 'tools'

### Exercise - Implement the ReAct framework
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵⚪⚪

You should spend up to 10-15 mins on this exercise.
```

Chain-of-thought prompting confers significant benefits to model performance, and you probably tried it when you messed around with prompting above. But when we're using LLMs as agents, we may want to provide a different structure to elicit reasoning. This is called the [**ReAct** framework](https://arxiv.org/abs/2210.03629); it consists of:

- Getting the model to generate **Re**asoning about its current situation, and what sort of actions it should consider taking.

- Then getting the model to perform an **Act**ion based on its outputted reasoning.

Remember that if you're calling the model without tools, it won't have a description of the tools in its system message, so we'll have to ensure that the tool descriptions are in the `system_message` (this will lead to some redundancy when the model takes an action, but that's alright).

In [226]:
class WikiGameReAct(WikiGame):
    def __init__(self, starting_page: str, goal_page: str, tools = None):
        super().__init__(starting_page, goal_page, rules)
        self.tool_descriptions = tools
    
    def add_tool(self, tools: dict | List[dict]):
        if isinstance(tools, dict):
            self.tool_descriptions.append(tools)
        elif isinstance(tools, list):
            self.tool_descriptions.extend(tool)

    @property
    def start_instruction(self):
        # Provided a description of the tools in the system message. When generate is called with tools this is redundant, but when generate is called without tools, this is useful.
        return {
            "role" : "system",
            "content" : f"""You are a wikipedia-racing AI. Your goal is to reach {self.goal_page.title} by accessing links from wikipedia pages. Your current page is {self.current_page.title}. You have access to {str(len(self.tool_descriptions))} tools:\n{"\n".join([tool["function"]["name"] + ": " + tool["function"]["description"] for tool in self.tool_descriptions])}"""
        } 
    
    @property
    def apply_user_format(self):
        # You may or may not want to edit your standard user message
        return {
            "role" : "user",
            "content" : f"""You are currently on page: {self.current_page.title}. Make sure you start by reasoning about what steps you should take to get to the article on {self.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, {self.goal_page.title} has the following summary: 
            
            [Begin Summary]
            {self.get_page_summary(self.goal_page)}
            [End Summary]
            """
        }
    
class WikiReActAgent(WikiAgent):

    def generate_reason(self) -> ChatCompletionMessage:
        # Get the model to reason about the current state of the game and add the response to the messages (you may not want to give it tools for this)
        self.chat_history.append(apply_user_format("Think carefully about your current situation and what actions you want to take to get closer to" + self.task.goal_page.title + "."))
        response = self.get_response(use_tool=False)
        return response
        
    def generate_action(self) -> ChatCompletionMessage:
        # Get the model to generate an action based on the reasoning and add the response to the messages
        self.chat_history.append(apply_user_format("What action do you want to take?"))
        response = self.get_response()
        return response
    
    def generate_reason_and_action(self):
        """
        Generate a reason, store this in history, then generate and return an action.
        """ 
        reason = self.generate_reason()
        self.update_history(apply_assistant_format(reason.content))
        print("\nModel response ('Reason'):", reason.content)

        action = self.generate_action()

        return action

    def run(self):
        """
        Run one loop of the agent.
        """
        response = self.generate_reason_and_action()

        if response.tool_calls:
            self.handle_tool_calls(response)
        elif response.refusal:
            self.handle_refusal(response)
        


You may have to rewrite your `agent_loop`.

In [200]:
def agent_loop_ReAct(game, agent, num_loops = 10):
    agent.start()
    for i in range(num_loops):
        if game.check_win():
            print("Success")
            return
        agent.run()

In [228]:
game = WikiGameReAct("Aristotle", "Othello", tools=wiki_game_tools)
agent = WikiReActAgent(game, model="gpt-4o-mini", tools = wiki_game_tools)
agent_loop_ReAct(game, agent)

### Exercise - Give the agent a history of visited pages
```c
Difficulty: 🔴⚪⚪⚪⚪
Importance: 🔵🔵⚪⚪⚪

You should spend up to 5-10 mins on this exercise.
```

this should include writing a function that updates self.chat_history different, otherwise should be merged with prompting.

=====================

You may notice that the agent frequently gets stuck in loops. Since we're already storing a history of page titles in the game class, we should try providing this information to the agent and see if it improves the looping behavior. Implement this below:

In [218]:
class WikiGameHistory(WikiGameReAct):
    
    @property
    def on_page_instruction(self):
        return {
            "role" : "user",
            "content" : f"""You are currently on page: {self.current_page.title}. Make sure you start by reasoning about what steps you should take to get to the article on {self.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, {self.goal_page.title} has the following summary: 
            
            [Begin Summary]
            {self.get_page_summary(self.goal_page)}
            [End Summary]
            
            The pages you've visited so far has been: {" -> ".join(self.page_history)}"""
        }

### Exercise - Implement a reflexion tool
```c
Difficulty: 🔴🔴🔴⚪⚪
Importance: 🔵🔵🔵⚪⚪

You should spend up to 10-15 mins on this exercise.
```
Inflexion should be better explained, with a diagram, not just in 1 sentence.

=====================


[This paper](https://arxiv.org/abs/2303.11366) finds better performance by LLMs on tasks when they can perform "lookahead" and get feedback on their plans. We will imitate this by allowing the agent to suggest candidate paths, and informing it where these paths go wrong (if they do). You'll need to add this tool to the list of tools.

We don't want to provide the agent the links/content of every page when it does this lookahead, as then we'd just be reimplementing a smaller version of the game *inside the game*. Instead, we'll let the agent suggest paths without seeing any content or links, and then let it know if this path works. It's very likely that a suggested link will, at some point, not be accessible from one of the pages, but this should still help to guide the agent's plans.

In [121]:
class WikiGameTestPath(WikiGame):
    def __init__(self, starting_page : str, goal_page : str):
        super().__init__(starting_page, goal_page, rules)

        
    def test_path(self, path: str) -> str:
    """
    Test if a given path is valid.

    Args:
    path (str): A string representing a path, e.g., "Barack Obama -> Indonesia -> India"

    Returns:
    str: A message indicating whether the path is valid or where it fails.
    """
    path_nodes = [node.strip() for node in path.split("->")]
    
    if not path_nodes:
        return "ERROR: Empty path provided."
    
    if path_nodes[0] != self.current_page.title:
        return f"ERROR: The path should start with the current page: {self.current_page.title}"
    
    for i in range(len(path_nodes) - 1):
        current_node = path_nodes[i]
        next_node = path_nodes[i + 1]
        
        permitted_links = set(link.lower() for link in self.get_permitted_links(current_node))
        
        if next_node.lower() not in permitted_links:
            return f"This path works until {next_node}, which is not accessible from {current_node}"
    
    return "This path is valid."

Now write a description of this tool and add it to the list of `wiki_game_tools`.

In [122]:
test_path_tool = {
    "type" : "function",
    "function" : {
        "name" : "test_path",
        "description" : "Accepts a test path string in the form \"current_page -> page1 -> page2 -> ... -> pageN\" and if the path does not work, then it returns where the path goes wrong, if the path does work it returns \"success.\" Be careful that path titles can be sensitive to plurals or rephrasings. This tool is especially useful to check longer plans.",
        "parameters" : {
            "type" : "object",
            "properties": {
                "path" : {
                    "type" : "string",
                    "description" : "The path you want to test, formatted as \" current_page -> page1 -> page2 -> ... -> pageN\"."
                },
            },
            "required" : ["path"]
        }
    }
}

wiki_game_tools = [get_content_tool, move_page_tool, test_path_tool]

In [None]:
game = WikipediaGameTestPath("William Pitt the Younger", "Central Vietnam")
agent = WikiAgentHistory(game, model="gpt-4o-mini", tools = wiki_game_tools)
agent_loop_ReAct(game,agent, 40)

## Tool use

This should be put into "Bonus" section, where people just play around, and we give some bullet suggestions of things to try.

==============

We can also give the agent additional tools that may be useful for the wikipediaGame task, or more general tooling methods. 



**[JAMES COMMENT]** I still need to figure out what to say about tool use. If you have any ideas then open to suggestions :) Lilian Weng did a little "humans use tools" and so do some animals thing. A cute animal pic might actually go over quite well here IMO.




 but if you give the agent too many tools (especially with poor descriptions), then performance can often suffer. This happens most prominently when using more than 5-10 tools.

### Exercise - Implement a page summary tool
```c
Difficulty:🔴🔴⚪⚪⚪
Importance:🔵🔵⚪⚪⚪

You should spend up to 10-15 mins on this exercise.
```

Implement a tool that allows an agent to get a summary of an accessible page. This imitates wikipedia's native 'hover summary' tool.


In [125]:
get_page_summary = {
    "type" : "function",
    "function" : {
        "name" : "get_page_summary",
        "description" : "Get the summary of a wikipedia page you are considering moving to, to the last full stop within the first 500 characters. The page needs to be accessible via a link from the current page. Anything which corresponds to a link you can select will be wrapped in <link></link> tags",
        "parameters" : {
            "type" : "object",
            "properties" : {
                "page" : {
                    "type" : "object",
                    "description" : "The wikipedia page you want to get the summary of."
                }
            },
            "required" : ["page"]
        }
    }
}

class WikipediaGamePageSummary(WikipediaGameTestPath):
    def __init__(self, starting_page : str, goal_page : str):
        super().__init__(starting_page, goal_page)

        
    def get_page_summary(self, page : WikipediaPage) -> str:
        if is_permitted_link(self, page):
            summary = page.content[0:500]
            return summary[0: summary.rindex(".")+1]
        else:
            return "This page is not accessible from the current page, so a summary cannot be returned."

wiki_game_tools.append(get_page_summary) # Needs to be changed

### Exercise - Implement an arbitrary page summary/content tool
```c
Difficulty:🔴⚪⚪⚪⚪
Importance:🔵🔵⚪⚪⚪

You should spend up to 5-10 mins on this exercise.
```

Now implement a tool that allows the agent to suggest any wikipedia page, and get a brief summary of it. This may be helpful for the agent to formulate plans into the future.


In [None]:
get_page_content = {
    "type" : "function",
    "function" : {
        "name" : "get_page_content",
        "description" : "Get the content of a wikipedia page you are considering moving to. Anything which corresponds to a link you can select will be wrapped in <link></link> tags.",
        "parameters" : {
            "type" : "object",
            "properties" : {
                "page" : {
                    "type" : "object",
                    "description" : "The wikipedia page you want to get the content of."
                }
            },
            "required" : ["page"]
        }
    }
}

class WikipediaGamePageContent(WikipediaGamePageSummary):
    def __init__(self, starting_page : str, goal_page : str):
        super().__init__(starting_page, goal_page)
    def get_page_content(self, arguments : dict) -> str:
        page = arguments["page"]
        content = page.content
        return content

### Exercise - Implement a ctrl-F tool

Still need to do this. Probably will though. Not super urgent. Might add more elicitation stuff later if I think of any that seem cool

### Supervised Fine-Tuning

We're not going to conduct supervised fine-tuning here. But it's worth mentioning as an elicitation method, just because it can be so powerful. [ADD MORE INFO HERE LATER]

# 5️⃣ Bonus

### Exercise - Implement additional rules

Test agent performance on these tasks:
- Task 1

- Task 2

- Task 3

- Task 4

- Task 5

- Task 6

- Task 7

- Task 8

- Task 9

- Task 10

See what combination of tools appears to work best.

### Exercise - Rearrange so that each page is broken up by sections

In [70]:
x = get_page("Aristotle")
print(dir(x))
print(x.section("Metaphysics/Substance"))