# [3.4] LLM Agent Evaluations

# Setup (don't read just run)

In [222]:
try:
    import google.colab  # type: ignore

    IN_COLAB = True
except:
    IN_COLAB = False

import os, sys

chapter = "chapter3_llm_evals"
repo = "ARENA_3.0"

if IN_COLAB:
    # Install packages
    %pip install openai
    %pip install anthropic
    %pip install jaxtyping
    %pip install time
    %pip install wikipedia

    # Code to download the necessary files (e.g. solutions, test funcs) => fill in later

else:
    pass  # fill in later


In [223]:
import json
import os

import wikipedia
from wikipedia import WikipediaPage
from wikipedia import DisambiguationError, PageError
from openai import OpenAI
from openai.types.chat.chat_completion_message_tool_call import (
    ChatCompletionMessageToolCall,
)
from openai.types.chat.chat_completion_message import ChatCompletionMessage
from anthropic import Anthropic
from utils import establish_client_OpenAI
from utils import retry_with_exponential_backoff
from typing import Literal, Optional, Dict, List, Any
from abc import abstractmethod
import math
import re
from utils import countrylist
from utils import evaluate_expression, apply_user_format, apply_assistant_format

# Test the function

# 1️⃣ Intro to LLM Agents

## What is an LLM agent?
<!---
Points to make:
- "LLM agent" - a "scaffolding" program (i.e. Python program) that interacts with an LLM API. Include a version of METR "Evaluating Language-Model Agents on Realistic Autonomous Tasks" Figure 2
    - Define scaffolding
- More schematic breakdown of possible scaffolding: "tools" (describe what this means, what "tool calling" is), "memory" (Probably better move to "Build Agent" section! I've expanded this section there)
- Mention list of examples of prominent LLM agents:
    - [Minecraft LM Agent](https://arxiv.org/abs/2305.16291)
    - [AutoGPT](https://autogpt.net/) and [LangChain](https://www.langchain.com/)

==========================
--->
An LLM "agent" consists of **a scaffolding program interacting with an LLM API**. Initially, the scaffolding program sends instructions to the LLM on the task goal, the actions available to the LLM, and any relevant task information. The LLM then interacts with the scaffolding program in a sequence of steps, taking actions and observing their results. The scaffolding program will perform any actions or interactions with the task as instructed by the LLM; it will also return the outcome of these actions to the LLM agent. This allows the LLM to solve a task relatively autonomously (i.e. with little human input).

The two main elements of scaffolding are:
- Tool calling: This provides a description of a tool to the agent, which it can choose to use during the evauation. If it uses a tool then the scaffolding will execute this tool on the agent's behalf (usually consisting of running a python function), and return the result of this tool call to the agent.

- Prompting: This is how the task state is described to the LLM. We can also use prompting to help assist the LLM in its tool use, or instruct it to use chain-of-thought to give the LLM more "thinking time." 

The LLM interacts with scaffolding program to complete a task according to the following steps:

1. The LLM receives input (task description, current state, available actions etc.) from the scaffolding program
2. The LLM processes the input and outputs an action (e.g. "Use calculator tool")
4. The scaffolding program executes the action the agent took and returns the outcome (e.g. it would run `calculate()` in the background for an LLM using a calculator, and then return the function output to the agent)
5. The LLM receive the results and decides the next action
6. Repeating the cycle until the task is complete


[Insert METR diagram]

Some examples of LLM agents are:

- [Voyager](https://arxiv.org/abs/2305.16291) (Minecraft LLM Agent)

- [AutoGPT](https://autogpt.net/)

- [LangChain](https://www.langchain.com/)


<!-- An LLM agent consists of 4 main things [I think a better list exists here, "reasoning engine" is quite unclear/vague and scaffolding doesn't make sense as a bullet point in how we've defined it; also maybe move this to start of section "Build agent?"].

- A 'reasoner' or 'reasoning engine.' (Some people also call this a 'world model'). For LLM agents this is a large language model.

- Tools which allow the agent to act in the environment.

- Memory so that the agent can recall prior actions. This can either be:

    - Short-term memory: In the context of LLM agents this is generally the context window

    - Long-term memory: There are many cases where context-windows are too short, and we will need to give the agent high-level information about actions it took a long time ago. There are many methods to store this 'long-term memory' for agents (see some methods [here])

- Scaffolding: This is essentially any structure which we provide to the 'reasoning engine' in order to help it to reason better, such as:

    - Prompting frameworks.

    - The ability to trial plans into the future.

    - Using subagents to take care of subtasks.

    - Subgoal decomposition.

EXCALIDRAW!

## How should we evaluate LLM agents?

Points to make - "Why evaluate LLM agents":
- overall note: I think this could heavily be based on this [video](https://www.youtube.com/watch?v=KO72xvYAP-w) from METR
- The purpose of agent evals is to **unlock and measure the full capabilities of a model**, to avoid underestimating the model and to better estimate the **ceiling** of their capability and potential to cause harm. 
- Models often fail in easily-fixable ways. For example, when it is solving a hacking task, it
    - Can refuse due to ethics or (claimed) inability 
    - Can give up and ask the user for help 
    - Can get stuck in loops 
    - Can hallucinate facts or conclusions [only partly fixable] 
    - Can be limited by primitive tools 
    - Can have bugs
    - ...
- For a model that fails to solve a hacking task, thus deemed safe, there might exist simple fixes (e.g. better prompts, better file manipulation tools) that unlock this dangerous capability. 
- 
- Final point about "quantifying" the amount of scaffolding to make eval results more quantitative
    - Apollo "Science of Evals" 
    - GDM quantify bits
==========================
--->
## How should we evaluate LLM agents?

There are two possible purposes of LLM agent evaluations:

- The first is to **unlock and measure the full capabilities of a model**. We don't want to underestimate current or future LLMs, so we want to establish the **ceiling** of their capabilties and potential to cause harm.
- The second is to **determine the alignment properties of LLMs in agentic scenarios**. Most of our current alignment techniques (Supervised Fine-tuning, RLHF, ... ) are focused on Chatbot contexts for LLMs, however LLM agents have the potential to cause much greater harm, and we currently aren't as confident about how RLHF and Supervised Fine-tuning will work in these contexts.

LLM agents generally fail in easy-to-fix ways, as you will see. For example:

- They often claim to be incapable of tasks that they can actually perform.

- They can easily get stuck in loops.

- They can give up and ask the user for help

- They can hallucinate facts, or even misunderstand their own prior reasoning and hallucinate a faulty conclusion.

- They can be overly or underly sensitive to information in their prompts.

- They can just have bugs

This means that when models fail to accomplish tasks, there may exist simple fixes that will unlock a capability. Since we want to eliminate the potential of large capability improvements from relatively little effort, this means that we have to try quite hard to tune the promptings, tool descriptions, and tool outputs just right, so that we can see LLM agents at their *best*.
<!--->
We know today that LLMs are more than just chat-bots. In fact, since the release of ChatGPT, the use of LLMs as  for agentic systems has proliferated signficantly. These agents started off rather disappointingly initially, when they were based on GPT-3.5. However as more powerful LLMs come out and AI companies ensure their LLMs are better at tool-use, these agents are improving rapidly.

The main concerns for LLM agents that we want to mitigate are:

- There are many possible improvements for increased performance from LLM agents, and these improvement methods are often signficantly cheaper and easier to implement than training the base model, for example by writing better prompts, or providing easier-to-use tools.

- Current fine-tuning and RLHF/Constitutional AI methods are mostly targeted towards chatbot-style text output. We aren't as confident about how such methods will generalize to agentic scenarios.

The first two issues here relate to the **capabilities** of LLM agents, and the last issue relates to the **alignment** properties of LLM agents. The agent we'll be building will be testing for the **capability** properties of agents.
<!--->

<details><summary>Further resources on LLM evaluations:</summary>

- [OpenAI Function Calling Guide](https://platform.openai.com/docs/guides/function-calling)

- [Anthropic Function Calling Guide](https://docs.anthropic.com/en/docs/build-with-claude/tool-use)

- [Evaluating Language-Model Agents on Realistic Autonomous Tasks](https://evals.alignment.org/Evaluating_LMAs_Realistic_Tasks.pdf) (Kinniment et al., ARC Evaluations Team (now METR), 2023)

- [Large Language Models can Strategically Deceive their Users when Put Under Pressure](https://arxiv.org/pdf/2311.07590) (Scheurer et al., Apollo Research, ICLR 2024)

- [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) (Lilian Weng, OpenAI Safety Team, 2023)

- [AXRP Episode 34 - AI Evaluations with Beth Barnes](https://www.alignmentforum.org/posts/vACr4DExfeRMaCoo7/axrp-episode-34-ai-evaluations-with-beth-barnes) (Daniel Filan, 2024)

- [Reflexion: Language Agents with Verbal Reinforcement Learning](https://arxiv.org/pdf/2303.11366) (Shinn et al., 2023)

- [Answering Questions by Meta-Reasoning over Multiple Chains of Thought](https://arxiv.org/pdf/2304.13007) (Yoran et al., 2024)

- [Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/pdf/2302.04761) (Schick et al., META AI Research, 2023)
</details>

# 2️⃣ Build a Simple LLM Arithemtic Agent

We will start by building a simple LLM agent that solves arithmetic problems. LLMs struggle with arithmetic, but we can drastically improve their performance by providing a simple calculation tool. We'll try the model with and without tools on this task, and see how significantly performance improves.

To build this, we will implement 4 things:
- The `ArithmeticTask` class handles arithmetic problem generation and solution verification.
- The `CalculateTool`, a tool that LLM agents can use to solve the task.
- The `ArithmeticAgent` class handles interacting with the LLM API, doing the calculation, and keeping track of the overall task progress.
- The `agent_loop()` function defines the interaction loop between the task and the agent to execute the task.

In general, ... [description of how to think about designing task and agent in generation, include decision factors] probably good to include a diagram here 

We build task
We build tool

We build scaffold
We build agent

We loop things.

## Defining the Task

### Exercise - Build a simple arithmetic problem
```c
Difficulty: 🔴🔴🔴⚪⚪
Importance: 🔵🔵⚪⚪⚪

You should spend up to 20-25 minutes on this exercise.
```

In an LLM agent eval, there will usually be a `Task` class, which interacts with the `Agent`. In general, the `Task` class will:

- Prepare and provide the task instruction (and necessary files, functions etc) to the agent,

- Parse and score the agent's output,

- Update the task state accordingly (e.g. proceeds onto the next step of the task, ends the task).

We will build a toy task called `ArithmeticTask`. This task takes in two numbers and create a list of arithmetic calculation problems with these two numbers, using arithmetic operations defined in `operations`. It should have methods to do the following:

- Get the current problem (e.g. at the start this will be `"Calculate num1 + num2"`),

- Check if a given answer is correct,

- Update the current problem (depending on whether the answer generated by the model was correct),

- Check if all problems have been solved,

<details><summary>Aside: Handling calculations</summary>

When we handle the calculations for the model, technically we could use Python's `eval()` function (this is what [Anthropic did](https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/calculator_tool.ipynb)(!)). However, this function evaluates an arbitrary string expression, and so allows AI models to run arbitrary code. In the long-run, we're trying to do these evaluations on models which we suspect of being dangerous; so even though we could probably trust the current suite of language models offered by OpenAI and Anthropic, we should get into the good habit of not running arbitrary code outputted by language models (except in very carefully set-up environments). It's also just plain bad coding practice.

To this end, we've implemented an `evaluate_expression()` function for you to use instead which should accomplish. It should already be imported from `utils`.
 
 </details>


In [None]:
class ArithmeticTask:
    def __init__(self, num1: int | float, num2: int | float):
        self.num1 = num1
        self.num2 = num2
        self.operations: List[str] = ["+", "-", "*", "/", "%", "//"]
        self.correct_answers: Dict[str, float] = self._generate_answers()
        self.is_solved: Dict[str, bool] = {expr: False for expr in self.correct_answers}
        self.current_task_number = 0

    def _generate_answers(self) -> Dict[str, float]:
        """
        Generates a dictionary the correct answers for all possible tasks

        Returns:
            Dict[str, float]: A dictionary with the expression as key and the correct answer as value
        """
        return {}

    @property
    def get_current_task(self) -> str:
        """
        Gets the current task for the agent

        Returns:
            str: A string containing the current task
        """
        return ""

    @property
    def current_task_instruction(self) -> str:
        """
        Gets a string containing instructions for the current task for the agent. (This will be fed to the agent as a user prompt)

        Returns:
            str: A string containing the instructions for the current task
        """
        return ""
    def check_solved(self) -> bool:
        """
        Checks if all tasks have been solved

        Returns:
            bool: True if all tasks have been solved, False otherwise
        """
        return True

    def check_answer(self, model_answer: str) -> bool:
        """
        Checks if the model's answer is correct

        Args:
            model_answer (str): The model's answer

        Returns:
            bool: True if the model's answer is correct, False otherwise
        """
        return True

    def update_current_task(self):
        """
        Sets is_solved for the current task to True and increments self.current_task_number by one
        """
        pass


x = ArithmeticTask(10, 15)
for problem, answer in x.correct_answers.items():
    print(f"{problem} = {answer}")

<details><summary>Aside - What is @property?</summary>

The `@property` decorator in python is used to define methods that behave like they were attributes.

1. It allows you to access a method as though it were an attribute, without parentheses.
2. It allows you to perform functions when calling attributes, e.g. adding validation or performing any necessary calculations (in our case incorporating class attributes which frequently change).

For example, if we defined a `Square` class as follows:

```python
class Square:
    def __init__(self, side_length):
        self.side_length = side_length

    @property
    def perimeter(self):
        return self.side_length*4
```

Then we could access `perimeter` as if it were an attribute:

```python 
s = Square(4)
print(s.perimeter) # Output: 16
```

Using `@property` in this case helps with:
1. Making the intent of the code clearer
2. Making it slightly easier to access these "properties" of the class

</details>

## Function Calling

**Function calling** is a feature of LLM Chat APIs that allows the LLM to use external "tools" (i.e. Python functions, APIs) by simply receiving and outputing text. There are 5 simple steps to function calling:

1. Pick a function in your codebase that the model should be able to call (in this case, we will pick the function `calculate()` from our task class)

2. Describe your function to the model (following the syntax of the model's API) so it knows how to call it

3. Pass your function definitions as available “tools” to the model, along with the messages (following the syntax of the model's API)

4. Receive and handle the model response

5. Provide the function call result back to the model 

Chat models like ChatGPT and Claude are fine-tuned to recognize and respond to `tool` descriptions appropriately (just like `user` and `system` messages). In this way, you can allow LLMs to do complex actions like run code, make calls to other APIs, manipulate files etc. We do this by parsing their response output, executing the functions they've called ourselves, and then feeding the results back into the model so it can reason about them and take the next steps. This function-calling loop is the simplest version of a LLM agent, but more advanced LLM agents follow the same logic (except with more advanced tools and more complex task structures to pemirt more autonomous actions etc.).

[DIAGRAM]


### Exercise - Write a tool class for function calling
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵🔵⚪

You should spend up to 10-15 minutes on this exercise.
```

When writing tools, there will be two methods that need to be defined. The first is the `execute()` function. This should take in an arithmetical expression (e.g. `"3+5"`) and output the result of this expression (also as a string). The `execute()` function should always take the task as a variable (as often tools will need to be able to make significant changes to the task).

<details><summary>Aside: Handling calculations</summary><br> When we handle the calculations for the model, technically we could use Python's <code>eval()</code> function (this is what <a href = "https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/calculator_tool.ipynb">Anthropic did</a>(!)). However, this function evaluates an arbitrary string expression, and so allows AI models to run arbitrary code. In the long-run, we're trying to do these evaluations on models which we suspect of being dangerous; so even though we could probably trust the current suite of language models offered by OpenAI and Anthropic, we should get into the good habit of not running arbitrary code outputted by language models (except in *very* carefully set-up environments). To this end, we've implemented an <code>evaluate_expression</code> function for you to use instead. It should already be imported from <code>utils</code>.</details>

We then need to write the `description` property of our `"calculator"` function, so we can give it to our LLM agent as a tool. The syntax may differ between APIs (e.g. the OpenAI API has a different syntax than Anthropic API). Read OpenAI's [function calling guide](https://platform.openai.com/docs/guides/function-calling) to learn the syntax. The `description` property should just return a tool description (in the necessary json format). 

So ultimately your tool should be defined according to the following structure:

In [254]:
class Tool:
    @staticmethod
    def execute(task: Any, input: str) -> str:
        raise NotImplementedError

    @property
    def description(self) -> str:
        raise NotImplementedError

Here are some good practices for writing tool descriptions for Claude (according to Anthropic), they should generalize to other chat models:
- Provide extremely detailed descriptions. This is by far the most important factor in tool performance. Your descriptions should explain every aspect of the tool, including:

    - What the tool does

    - When it should be used (and when it shouldn’t)

    - What each parameter means and how it affects the tool’s behavior

    - Any important caveats or limitations, such as what information the tool does not return if the tool name is unclear. The more context you can give Claude about your tools, the better it will be at deciding when and how to use them. Aim for at least 3-4 sentences per tool description, more if the tool is complex.
    
- Prioritize descriptions over examples. While you can include examples of how to use a tool in its description or in the accompanying prompt, this is less important than having a clear and comprehensive explanation of the tool’s purpose and parameters. Only add examples after you’ve fully fleshed out the description.

Read Anthropic's examples of what good and bad tool calling looks like [here](https://docs.anthropic.com/en/docs/build-with-claude/tool-use#example-of-a-good-tool-description). 

Now write your tool class for the `CalculateTool` below. Inherit from the general `Tool` class defined above.

In [255]:
class CalculateTool(Tool):
    name = "calculate"

    @staticmethod
    def execute(expression: str, task: Any = None) -> str:
        """
        Evaluates the string expression in Python using `evaluate_expression()` and returns the result as a string

        Args:
            expression (str): The arithmetic expression to evaluate
            task (Any): Not used in this function

        Returns:
            str: The result of the arithmetical expression as a string
        """
        return ""

    @property
    def description(self):
        """
        Provides the description of the tool

        Returns:
            str: The description of the tool
        """

        return {}


Calculator = CalculateTool()

<details><summary> Aside - What is a @staticmethod?</summary>

The `@staticmethod` decorator in Python is used to define a static method within a class. Here are some key points about static methods:
1. They don't use instance- or class-specific data, thus does not require a first parameter `self` or `cls`.
2. They're often used for utility functions related to the class.

For example, if we defined a class of `MathOperations` as follows:

```python
class MathOperations:
    @staticmethod
    def add(x : int | float, y : int | float) -> int | float:
        """Evaluates the string expression and returns the result as a string."""
        return x + y
```

The `add()` method could be called on the class itself without creating an instance:

   ```python
   result = MathOperations.add(2, 3)
   ```

You can also call it on an instance of the class, but it doesn't utilize the instance in any way (it doesn't have access to `self`):
   ```python
   operation = MathOperations()
   result = operation.add(2, 3)
   ```

Typically, you would make "stand-alone" functions that do not depend on class methods or class/instance attributes a static method. Using `@staticmethod` in this case helps with the following:
1. Makes the code's intent clearer (this method doesn't need class or instance data).
2. Slightly improves performance (no `self` argument needs to be passed).
3. Allows the method to be used without creating an instance of the class.

</details>

You can include the tool description in the API call simply by giving it as an arg to `tools` (the description has to be in a list, as the `create()` function's `tools` argument only accepts lists of tool descriptions). The following code provides a very manual example of this.

In [None]:
messages = [{"role": "user", "content": "Calculate 2+3"}]
client = establish_client_OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=[Calculator.description],
    tool_choice="auto",
)

print(response.choices[0].message.content)
print(response.choices[0].message.tool_calls)

<details><summary>Why is <code>message.content = None</code>?</summary>

When LLMs use tools, they often don't generate any text output. This can be a problem later when you try to get the model to do chain-of-thought reasoning. To get around this, it can be better to make two calls to the model for more complex tool use: one call to get the model to reason about the actions it should take, and then another to get the model to use a tool to take those actions.

</details> 

### Exercise - Return tool call results to the model
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵🔵⚪

You should spend up to 15-20 minutes on this exercise.
```

In order to return the response of tools to OpenAI LLMs, you'll need to add **two** items to the `messages` list after the model has made a tool call in a `ChatCompletionMessage` output:
1. The `ChatCompletionMessage` object itself (containing the original tool call message generated by the model). 

2. The tool response (containing the results of the tool call), in a specific format.

Each tool response has to respond to a specific tool call in a `ChatCompletionMessage`, and if we ever try to get the model to generate a response with an unanswered tool call in `messages`, the API will raise an error.

Below is the typical `response.choices[0]` output you will being generated by `chat.completions.create()`. The ChatCompletionMessage is accessed via `response.choices[0].message`. You can access the tool calls via `response.choices[0].tool_calls`, which will return a list of `ChatCompletionMessageToolCall` objects.

```python
Choice(
    finish_reason="tool_calls",
    index=0,
    logprobs=None,
    message=chat.completionsMessage(
        content=None,
        role="assistant",
        function_call=None,
        tool_calls=[
            chat.completionsMessageToolCall(
                id="call_62136354",
                function=Function(arguments='{"expression":"2+3"}', name="calculate"),
                type="function",
            )
        ],
    ),
)
```

We have provided a function that formats the tool response in the correct syntax to be returned to the model. Read the format to understand what it looks like (you don't need to memorize it though, as you can always find it on OpenAI's [function calling guide](https://platform.openai.com/docs/guides/function-calling).)

In [257]:
def apply_tool_call_format(
    tool_call: ChatCompletionMessageToolCall, content: str
) -> dict:
    """
    Formats the response of a tool call to be returned to the model.
    Args:
        - tool_call (ChatCompletionMessageToolCall) : The tool call object
        - content (str) : This is the tool response (i.e. results from executing the tool)

    Returns:
        - dict : The formatted tool response to be returned to the model
    """
    return {}

Now, return a tool call response to the model after the following message:

In [None]:
messages = [{"role": "user", "content": "Calculate 5/3. Be precise."}]
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=[Calculator.description],
    tool_choice="auto",
)

## Building the Agent

REwork this

Most LLM agents share these core components:

1. **LLM API interface**: A basic function (e.g. `get_response()`) that makes the API calls to the LLM and return its responses. (IN AGENT)

2. **Actions**: A set of actions (i.e. functions) the agent can take. (MOSTLY IN TASK)

3. **Task State Management**: Keeping track of the current state of the task and any relevant context. (IN TASK MOSTLY)

4. **Memory**: A system for storing and retrieving relevant information from past interactions (i.e. chat history). The simplest implemention is usually a `self.chat_history` class attribute that stores a list of past chat messages. (IN AGENT)

5. **Observation Parser**: Functions to parse and interpret the results of actions and update the state. (IN TASK MOSTLY)

6. **Decision/Execution Logic**: The rules or algorithms used to choose actions based on the current state and LLM output. (KIND OF IN BETWEEN)

7. **Task-Specific Information**: Any additional information or functions specific to the task at hand. (IN TASK)

[Diagram]

We will first implement a `SimpleAgent` class that is not specific to the `ArithmeticTask`, so that we can see the key components of an generic LLM agent.

### Exercise - Implement `SimpleAgent`
```c
Difficulty: 🔴🔴🔴🔴⚪
Importance: 🔵🔵🔵🔵🔵

You should spend up to 20-25 minutes on this exercise.
```

Build out the following simple agent class by filling in `get_response()` and `execute_tool_calls()` functions.

In [259]:
class SimpleAgent:
    def __init__(
        self,
        task: Any = None,
        model: Literal["gpt-4o-mini"] = "gpt-4o-mini",
        tools: Optional[List[Any]] = None,
        chat_history: Optional[List[dict]] = None,
    ):
        self.model = model
        self.task = task
        self.tools = tools
        self.client = OpenAI()
        self.chat_history = chat_history if chat_history else []

    @retry_with_exponential_backoff
    def get_response(self, use_tool: bool = True) -> ChatCompletionMessage:
        """
        Get the response from the model via an API call, with the option of tool calling.

        Args:
            use_tool (bool): Whether to use tool calling or not

        Returns:
            ChatCompletionMessage: The response from the model
        """

        return response.choices[0].message

    def execute_tool_calls(self, message: ChatCompletionMessage) -> List[str]:
        """
        Execute the tool calls in the message and return a list of tool_responses.

        Args:
            message (ChatCompletionMessage): The message containing the tool calls

        Returns:
            List[str]: A list of tool responses (as strings)
        """

        return tool_responses

    def run(self, with_tool: bool = True) -> ChatCompletionMessage:
        """
        Default implementation of run method.
        This can be overridden in subclasses for specific behavior.

        Args:
            with_tool (bool): Whether to use tool calling or not

        Returns:
            str: The response from the model
        """
        return response

In [None]:
# Try to execute the tool calls

my_simple_agent = SimpleAgent(ArithmeticTask(10, 15), tools=[Calculator])
my_simple_agent.run()


### Exercise - Build an `ArithmeticAgent`
```c
Difficulty: 🔴🔴🔴🔴⚪
Importance: 🔵🔵🔵🔵🔵

You should spend up to 20-25 minutes on this exercise.
```

Add instructions here:
1. work out the decision tree of the task for ~10min; we give a half-filled task tree, then the full task tree in a drop down
2. write `run()` - they will implement everything after "# Handle the response" in run(); we will give them parse_answer.

In [261]:
class ArithmeticAgent(SimpleAgent):
    """
    ArithmeticAgent class for doing simple arithmetic tasks.

    Inherits from SimpleAgent which includes the following attributes and methods:

    Attributes:
        model (str): The model used for generating responses (inherited)
        tool_descriptions (List[dict]): List of tool descriptions (inherited)
        client (OpenAI): OpenAI client for API calls (inherited)
        task (Any): The current task being executed (inherited)
        chat_history (List[dict]): History of interactions (inherited)

    Methods:
        get_response(use_tool: bool = True) -> ChatCompletionMessage:
            Get response from the model (inherited)

        execute_tool_calls(message: ChatCompletionMessage) -> List[str]:
            Execute tool calls from the model's response (inherited)

        run(with_tool: bool = True) -> bool:
            Run one loop of the Wikipedia agent
    """

    def __init__(
        self,
        model: Literal["gpt-4o-mini"] = "gpt-4o-mini",
        task: Any = None,
        tools: Optional[List[Any]] = [Calculator],
        chat_history: List[dict] = None,
        verbose: bool = True,
    ):
        super().__init__(model=model, task=task, tools=tools, chat_history=chat_history)
        self.verbose = verbose

    def handle_tool_calls(self, response: ChatCompletionMessage):
        """
        Handle the tool calls from the model response. This function should:
        - Execute the tool calls
        - Append the tool calls and responses to the chat history

        Args:
            response (ChatCompletionMessage): The response from the model
        """
        pass

    def handle_refusal(self, response: ChatCompletionMessage):
        """
        Handle the refusal from the model response. This function should only be called if the model refuses to answer and should:
        - Append the refusal to the chat history
        - Update the task state

        Args:
            response (ChatCompletionMessage): The response from the model
        """
        pass

    def generate_and_check_final_answer(self) -> Literal["Correct", "Incorrect"]:
        """
        This function should:
        - Get the model to generate a final answer to the question (after it has seen the tool response)
        - Then check this final answer against the correct answer.
        - If the answer is correct, update the task state.
        - Then append to chat history (and return) "Correct" if the answer is correct and "Incorrect" if the answer is incorrect.

        Args:
            None

        Returns:
            str: "Correct" or "Incorrect"
        """

        pass

    def run(self, with_tool: bool):
        """
        Run one loop of the agent, which involves:
        - getting a task
        - getting a response from the model
        - handling the model response, including tool calls, refusals, no tool calls, parsing and checking final answers, errors.
        - managing memory: storing the history of messages to self.chat_history
        - managing task state: staying on the same task or moving to the next task at the end of the loop
        """
        pass

    def parse_answer(self, message: ChatCompletionMessage) -> float:
        """
        Extract the numerical answer from the string output of the model

        Args:
            message (ChatCompletionMessage): The response from the model

        Returns:
            float: The numerical answer extracted from the model
        """
        return float(response[startpoint:endpoint])


### Exercise - Execute the task via an agent_loop 
```c
Difficulty: 🔴⚪⚪⚪⚪
Importance: 🔵🔵🔵🔵⚪

You should spend up to 5-10 minutes on this exercise.
```

Try implementing the agent_loop below with and without tools, to see how much better the model does when we give it tools.

> WARNING! 
>
>When you're making API calls to LLMs to tasks, it can be tempting to use a while loop, and run the model until it finishes the task. But since every time we run a model we make an API call, this would allow us to spend arbitrarily large amounts of money on API calls. For this reason, ***always use a for loop when making API calls!!!*** It would be really unfortunate if you blew all your API budget on one mistake!


In [None]:
Arithmetic_Task_1 = ArithmeticTask(31.1, 8)
Arithmetic_Agent_1 = ArithmeticAgent(
    task=Arithmetic_Task_1, verbose=True, tools=[Calculator]
)


def agent_loop(agent, task, num_loops: int = 10):
    """
    Run the agent loop for a given number of loops

    Args:
        agent (ArithmeticAgent): The agent to run
        task (ArithmeticTask): The task to solve
        num_loops (int): The number of loops to run
    """
    pass


agent_loop(Arithmetic_Agent_1, Arithmetic_Task_1)

We can print all the messages from the `ChatHistory` as follows:

In [None]:
for message in Arithmetic_Agent_1.chat_history:
    try:
        print(f"""{str(message.role)}:\n {str(message.content)}\n""")
    except:
        print(f""" {message["role"]}:\n {message["content"]}\n""")


# 3️⃣ Building a More Complex Task: WikiGame

Now that we know how to do function calling and how to design an LLM agent in general, we will build a more complicated task. This task won't be instantly solvable by LLMs with simple tool use and will require us to elicit better capabilities from models.

The task we will build and elicit behavior for will be the [Wikipedia Game](https://en.wikipedia.org/wiki/Wikipedia:Wiki_Game): Players use wiki-links to travel from one Wikipedia page to another and the first person who reaches the destination page wins the race. This is not directly related to any dangerous capabilities, and if GPT-N+1 could do this task, but GPT-N couldn't, we wouldn't tell OpenAI to be particularly careful about the release of GPT-N+1 as a result. However, it makes a useful test case for elicitation methods, since there are many strategies for deciding what path to take and we can create a scale of difficulty by choosing different articles to navigate to/from.

To add:
- Description of MVP Goal
- EXCALIDRAW! (describing wikipedia game.)

## Quick Intro to the Wikipedia API


Our agent will interact with Wikipedia by making tool calls to the [Wikipedia API](https://wikipedia.readthedocs.io/en/latest/quickstart.html), which is simple to use. We will only need to learn the following key functions for the game. 

1. `wikipedia.page` - Returns a Wikipedia page object, whcih contains various attributes adn methods to access page content. (See [page docs](https://wikipedia-api.readthedocs.io/en/latest/API.html#wikipediapage) for these attributes.)
2. `wikipedia.page.title` - Returns the title of the page
3. `wikipedia.page.contents` - Returns the full text content of the page (this can be very long, make sure to take snippets when you can as to not use up the context length of the LLM)
4. `wikipedia.page.summary` - Returns a summary of the page (i.e. all the text in the first section of the Wikipage).
5. `wikipedia.page.links` - Returns a list of all links as strings

Kwargs:
- `auto_suggest` - Let Wikipedia find a valid page title for the query. 
- `redirect` - Allow redirection without raising RedirectError

Refer to the [docs](https://wikipedia-api.readthedocs.io/en/latest/API.html#) for more information. 

<details><summary> Aside: Wikipedia API content can be weird!</summary>

The wikipedia API often outputs content in unintuitive ways. For example, articles that are essentially just a big list become near useless, since the content omits the list (for example, see the wikipedia API content for <a href = "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population">List of countries and dependencies by population</a>). Another issue that you might encounter is that the API formats mathematical expressions in $\LaTeX$ pretty poorly (for example, see the wikipedia API content for <a href = "https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">Kullback-Leibler divergence</a>). This is why it's important to determine what content the wikipedia API produces when `.content` is called — and why you want to make sure you're testing a large diversity of wikipedia articles.

</details>
<br>
<details><summary> Aside: Wikipedia "summaries" can be long!</summary>

The wikipedia API accesses summaries of pages by presenting all the information before the first titled section. For certain (generally obscure) wikipedia pages, this summary itself can be extremely long, and contain lots of information that is unnecessary to determine the key information about the page the model should be trying to access. We'll handle this later when it comes up by truncating wikipedia's summary to just the first ~1000 characters

</details>

Now run the following code to see how these wikipedia API functions work!


In [None]:
# Retrieve a Wikipedia page from its title
page = wikipedia.page("Large language model")

# Access basic page information
print("Title:", page.title)
print("\nURL", page.url)
print(f"\nSummary (word count {len( page.summary.split())}):", page.summary)
print(
    f"\nContent (word count {len( page.content.split())}):",
    page.content[:1000],
    "......",
)
print(
    f"""\nLinks (link count {len(page.links)}): [{", ".join(page.links[:7])}, ......]"""
)

Now run these two lines (you should see a `DisambiguationError` for the first, and a `PageError` for the second):

In [None]:
page = wikipedia.page("Python")

In [None]:
page = wikipedia.page("Animalss", auto_suggest=False)

We can handle these errors using the following code:

In [None]:
# Fixes PageError by allowing redirects

page = wikipedia.page("Animalss", redirect=True)
print(page.title)

# Fixes DisambiguationError by selecting the first option

try:
    page = wikipedia.page("Python")
except DisambiguationError as e:
    page = wikipedia.page(e.options[0])
print(page.title)

The above code gives a `DisambiguationError` because the title "Python" can correspond to multiple pages. Then there is a `PageError` for "Animalss" as there is no Wikipedia name with that title.

To handle these errors, we have implemented a simple function `get_page` for you to get the page object for a particular page title. This handles `RedirectError` (and in some cases `PageError`) by setting `redirect=True`. The function also handles `DisambiguationError` by choosing the first option in the list of potential pages we could be referring to.

We handle any `PageError` not fixed by the above by setting `auto_suggest=True`, and letting wikipedia guess at the page we mean (this is a last resort, and hopefully won't be necessary).

<details><summary>What do <code>redirect</code> and <code>auto_suggest</code> do?</summary>

**Redirect**

The keyword `redirect` tells the API to allow Wikipedia to provide redirections. This happens when you reference an article in a manner which is slightly different than how it is stored in Wikipedia. This rarely happens when we will use the wikipedia API, as we will access pages based on how they are stored in Wikipedia, but as an example:
```python
page = wikipedia.page("huMan", redirect = True, auto_suggest=False)
```
will return a `WikipediaPage` object for the "Human" page. However,
```python
page = wikipedia.page("huMan", redirect=False, auto_suggest=False)
```
will return a `PageError` (since there is a page called "Human" but not "huMan"). The Wikipedia API will generally access the correct page if there is a capitalization issue on the first letter, but a capitalization error in the middle of the word will raise an error (unless `redirect=True`).

<br>

**Auto suggest**

The keyword `auto_suggest` tells the API to allow Wikipedia to provide suggestions. This allows a lot more than `redirect` does, since `redirect` is only for the "obvious" cases (e.g. "huMan" → "Human", "U.S. President" → "President of the United States", etc.). When `auto_suggest` is true, it would allow something like "president of states" → "President of the United States", "gogle" → "Google"; both of which would raise an error if `redirect = True, auto_suggest = False`.

However, `auto_suggest` can sometimes be *too* permissive and lead to errors, for example:

```python
page = wikipedia.page("Human", redirect= False, auto_suggest=True)
```
will return a `WikipediaPage` object for the "Man" page. This is clearly not what we were trying to access, and the `auto_suggest` has gotten carried away in this case.

If `redirect = True` and `auto_suggest=True`, then `auto_suggest` takes priority.
</details>



In [268]:
def get_page(title: str) -> WikipediaPage:
    """
    Get a Wikipedia page object given a title. If the title is ambiguous, choose the first option. If the title is not found, try to find a similar title.

    Args:
        title (str): The title of the Wikipedia page

    Returns:
        WikipediaPage: The Wikipedia page
    """
    try:
        return wikipedia.page(title, auto_suggest=False, redirect=True)
    except DisambiguationError as e:
        return wikipedia.page(e.options[0], auto_suggest=False, redirect=True)
    except PageError as e:
        return wikipedia.page(title, auto_suggest=True, redirect=True)


def get_word_count(text: str) -> int:
    return len(text.split())

### Exercise - Get permitted links from a wikipedia page
```c
Difficulty: 🔴⚪⚪⚪⚪
Importance: 🔵🔵⚪⚪⚪

You should spend up to 5-10 mins on this exercise.
```

When you get the links from a page using `page.links`, this will include every possible Wikipedia link that is accessible from the HTML on that page, including those that are not in the main page content (e.g. links in sidebars, links in footnotes etc.), which are either irrelevant or not permitted by the rules of the Wiki game. Write a simple `get_permitted_links` function, that only returns the links that can be found inside the main content. The resulting list of permitted links should be about a third as long as the list of links from the wikipedia API (with more variance for shorter articles as you would expect). 
<!-- When writing this function, if you manage to get the links in a very effective way, then do that. But remember that Wikipedia is written by a large number of different contributors, often adhering to inconsistent stylings (especially for smaller articles). We just need to get something that **works well enough**. Put more time into doing this effectively if you want at the end, but as soon as something plausibly works, you should move on.

<img src="https://imgs.xkcd.com/comics/code_lifespan_2x.png" width="400px" style = "margin-left: auto; margin-right: auto;display:block"></img> -->

In [269]:
def get_permitted_links(current_page: WikipediaPage) -> list[str]:
    """
    Get "permitted" links (i.e. links that are in the content of the page) from a Wikipedia page.

    Args:
        current_page (WikipediaPage): The current Wikipedia page

    Returns:
        list[str]: A list of permitted links from current_page

    """
    return []

## Build `WikiGame`

### Exercise - Build a class for the Wiki game
```c
Difficulty: 🔴🔴🔴🔴⚪
Importance: 🔵🔵🔵🔵⚪

You should spend up to 25-30 mins on this exercise.
```

Implement the following class that instantiates the wikipedia game. When the model uses tools it will be making calls to this class, so make sure that the functions return messages you're happy for the model to see as a tool response, such as error messages if the tool doesn't work.

Use code from the `get_permitted_links` and `get_content` functions above in this class.

In [271]:
class BaseWikiGame:
    def __init__(
        self,
        starting_page: str,
        goal_page: str,
    ):
        """
        Initialize the Wikipedia game object.

        Args:
            starting_page (str): The page the agent starts on.
            goal_page (str): The page the agent is trying to reach.
        """
        self.page_history: List[str] = [starting_page]
        self.starting_page: WikipediaPage = self.get_page(starting_page)
        self.goal_page: WikipediaPage = self.get_page(goal_page)
        self.current_page: WikipediaPage = self.starting_page

    # ========================= Helper Functions (given) =========================

    # Get page and page summary
    @staticmethod
    def get_page(title: str) -> WikipediaPage:
        """
        Get a Wikipedia page object given a title. If the title is ambiguous, choose the first option. If the title is not found, try to find a similar title.

        Args:
            title (str): The title of the Wikipedia page

        Returns:
            WikipediaPage: The Wikipedia page
        """
        try:
            return wikipedia.page(title, auto_suggest=False, redirect=True)
        except DisambiguationError as e:
            return wikipedia.page(e.options[0], auto_suggest=False, redirect=True)
        except PageError as e:
            return wikipedia.page(title, auto_suggest=True, redirect=True)

    def get_page_summary(self, page: WikipediaPage | None = None) -> str:
        """
        Get summary of a wikipedia page, to the last full stop within the first 500 characters. This is used to give a brief overview of the page to the agent.

        Args:
            page (WikipediaPage): The Wikipedia page object.

        Returns:
            str: The summary of the Wikipedia page.
        """
        page = page if page else self.goal_page
        summary = page.content[:500]
        last_period_index = summary.rfind(".")
        return summary[: last_period_index + 1] if last_period_index != -1 else summary

    # Get and check permitted links
    def get_permitted_links(self, title: Optional[str] = None) -> list[str]:
        """
        Returns a list of permitted links (i.e. links in the main page content) for the current page.

        Args:
            title (Optional[str]): The title of the Wikipedia page. If None, uses the current page.

        Returns:
            list[str]: The permitted links.
        """
        if title:
            page = self.get_page(title)
            all_links = page.links
            content = page.content
            permitted_links = [link for link in all_links if link in content]
            if title in permitted_links:
                permitted_links.remove(title)
        else:
            all_links = self.current_page.links
            content = self.current_page.content
            permitted_links = [link for link in all_links if link in content]
            if self.current_page.title in permitted_links:
                permitted_links.remove(self.current_page.title)
        return permitted_links

    def is_permitted_link(self, link: str) -> bool:
        """
        Returns True if the link is in the permitted links for the current page, False otherwise.

        Args:
            link (str): The link to check.

        Returns:
            bool: True if the link is permitted, False otherwise
        """
        return link.lower() in (x.lower() for x in self.get_permitted_links())

    # ========================= Task State Management (to implement) =========================

    @property
    def system_instruction(self) -> dict:
        """
        Generate the starting instructions for the game.

        Returns:
            dict: The starting instructions. "role" is "system" for system messages.
        """
        return {}

    @property
    def on_page_instruction(self) -> dict:
        """
        Generate instructions for the current page.

        Returns:
            dict: The instructions for the current page. "role" is "user" for user messages.
        """
        return {}

    @property
    def next_step_instruction(self) -> dict:
        """
        Generate instructions for the next step.

        Returns:
            dict: The instructions for the next step. "role" is "user" for user messages.
        """
        return {}

    def get_instructions(self, system: bool, on_page: bool, next_step: bool) -> str:
        """
        Generate instruction messages based on the current game state.

        Args:
            system (bool): Whether to include system instructions.
            on_page (bool): Whether to include on-page instructions.
            next_step (bool): Whether to include next-step instructions.

        Returns:
            list[str]: A list of instruction messages.
        """
        return []

    def check_win(self) -> bool:
        """
        Check if the agent has won the game.

        Returns:
            bool: True if the agent has won, False otherwise.
        """
        return False


### Exercise - Build Tools for the Wiki Game
```c
Difficulty: 🔴⚪⚪⚪⚪
Importance: 🔵🔵🔵⚪⚪

You should spend up to 5-10 mins on this exercise.
```

Fill in the following tool classes that the agent will need to use to accomplish this game.
- For the `get_content_tool`, you should just fill in the description, since we've implemented the functionality for you.
- For the `move_page_tool`, you should implement both the `execute()` function and the `description()` property.

When formatting this tool list, refer back to the solution for you wrote for the arithmetic game, or else the docs are [here](https://platform.openai.com/docs/guides/function-calling).

<details><summary>Why not just use `page.links` to get a list of links directly?</summary>

We don't just present a list of the accessible links, as this is not very faithful to the wikipedia game. The agent does perform somewhat better if we just give it a list of links, but the task of parsing the content of wikipedia pages and isolating the most important links is where the majority of the challenge of the wikipedia game lies.

</details>
<br>
<details><summary>Notes on the <code>get_content()</code> tool</summary>

The `get_content` function wraps all the texts that correspond to links in `<link></link>` tags (since otherwise they are presented as strings and indistinguishable from normal text, so the agent doesn't know what links to choose). However, since we identify links in the text via their names on wikipedia pages, there are certain articles that will never (or only very rarely) get flagged as links. For example, the page "Python (programming language)" is almost never referenced by its title, instead its almost always referenced by just "Python"; the same is true for towns, which are usually referenced on Wikipedia as e.g. "Juneau, Alaska", but these are almost always referred to as just "Juneau". For this reason, you should avoid having goal pages which are not referenced by their title (or else implement a better version of the function, but beware of simply extracting the HTML source from pages, `wikipediaPage.html` can take a very long time to run, and HTML formatting varies significantly on Wikipedia).

</details>

In [272]:
class get_content_tool(Tool):
    name = "get_content"

    @staticmethod
    def execute(task: BaseWikiGame | Any) -> str:
        """
        Get all the content for the wikipedia page you are currently on. Anything which corresponds to a link is wrapped in <link></link> tags.

        Args:
            task (BaseWikiGame | Any): The current task object.

        Returns:
            str: The content of the page with links wrapped
        """
        content = task.current_page.content
        permitted_links = get_permitted_links(task.current_page)
        for word in sorted(permitted_links, key=len, reverse=True):
            content = re.sub(
                r"""(\s|[,.)!?;:'"])(""" + re.escape(word) + r""")(\s|[,.)!?;:'"s])""",
                r"\1<link>\2</link>\3",
                content,
                count=1,
                flags=re.IGNORECASE,
            )
        return content

    @property
    def description(self):
        """
        Provides the description of the get_content tool

        Returns:
            dict: The description of the tool for the API
        """
        return {}


class move_page_tool(Tool):
    name = "move_page"

    @staticmethod
    def execute(new_page: str, task: Any) -> str:
        """
        Changes your current page to a specified new page which is accessible via a link from the current page. You can only call this function once at a time, as it will take you to a different page.

        Args:
            task (BaseWikiGame): The current task object.
            new_page (str): The title of the new page to move to.

        Returns:
            str: A message indicating the result of the move
        """
        return ""
    @property
    def description(self):
        """
        Provides the description of the move_page tool

        Returns:
            dict: The description of the move_page tool for the API
        """
        return {}


get_content_tool_inst = get_content_tool()
move_page_tool_inst = move_page_tool()
wiki_game_tools = [get_content_tool_inst, move_page_tool_inst]

### Exercise - Build a WikiAgent
```c
🔴🔴🔴🔴⚪
🔵🔵🔵🔵🔵

You should spend up to 30-40 mins on this exercise.
```


Insturctions to give:
1. again, work out task decision tree 
2. let them implement handle_tool_call(), run(), start()

===================

Now that you have the `WikiGame` class and tools set up, build out a `WikiAgent` that can access these tools and solve the Wikipedia game. Build the agent so that it can be thrown into an agent loop (similar to the one we had for the arithmetic game) without much additional scaffolding. 

There are a few further considerations in this case that we didn't have for the arithmetic game. 

<details>
<summary>Context window considerations</summary>

Since the agent will need to read (potentially very long) Wikipedia articles to interact with the game, the length of the context window becomes relevant. GPT-4o and GPT-4o-mini both have context windows of 128k tokens (which corresponds to ~96k words). For reference, the wikipedia page for the United States has around 10k words alone and the agent will often need to visit more than 10 articles in one run of the game, not counting its own output, which eventually adds up to be significant. We'll solve this for now by resetting the messages of the agent every time it reaches a new wikipedia page, and providing an updated `user_message` (and possibly `system_message`) so that the agent can locate itself, and then proceed with the game. We'll address different methods for solving this issue later, you can probably already think of some. So be careful to include the current page and goal page for the agent in the `user_message`.

Since we'll reset the `chat_history` attribute of the agent class each time it reaches a new page, we'll also store a `full_chat_history` property that won't get reset, so we can access the entire run of the game.

</details>
<br>


<details><summary>Providing information to the agent</summary>

There shouldn't be much on Wikipedia that the agent is completely unfamiliar with (AI companies *will* have scraped wikipedia), but it may be easily confused with something else, or be an article that was added before the training cutoff, and models can't always accurately recall information in their training data if they only come up once or twice. So you should use the game's get_summary function to provide details of the goal page to the agent in its initial message.

</details>
<br>

<details><summary> Getting output from the agent </summary>

In this case we'll have a lot more moving pieces than the `arithmeticGame` agent. In that case it was possible to just print output directly from the agent loop. In this case, you should print output as it comes up in the agent class. If there's some chance you might not want to see this output, you should use the `verbose` flag to determine whether to print content or not.

</details>

When making calls to the wikipediaGame class, make use of Python's `getattr()` function ([explanation here](https://www.w3schools.com/python/ref_func_getattr.asp)).



In [273]:
class WikiAgent(SimpleAgent):
    """
    Inherits from SimpleAgent and adds the ability to handle tool calls and refusals in the Wikipedia game context.

    Attributes:
        model (str): The model used for generating responses (inherited)
        tools (List[Any]): List of tools (inherited)
        client (OpenAI): OpenAI client for API calls (inherited)
        task (Any): The current task being executed
        chat_history (List[dict]): History of interactions (inherited)

    Methods:
        get_response(use_tool: bool = True) -> ChatCompletionMessage:
            Get response from the model (inherited)

        execute_tool_calls(message: ChatCompletionMessage) -> List[str]:
            Execute tool calls from the model's response (inherited)

        run(with_tool: bool = True) -> bool:
            Run one loop of the Wikipedia agent (modified below)

    """

    def __init__(
        self,
        task: Any,
        tools: List[Any],
        model="gpt-4o-mini",
        chat_history: List[dict] = None,
        verbose: bool = True,
    ):
        super().__init__(model=model, tools=tools, task=task)

        self.chat_history = chat_history if chat_history else []
        self.full_chat_history = (
            chat_history if chat_history else []
        )  # All messages that have been sent in the chat history.
        self.verbose = verbose
        self.start()

    def update_history(
        self, message: str | ChatCompletionMessage | List[str | ChatCompletionMessage]
    ):
        """
        Update self.chat_history and self.full_chat_history with a message or list of messages.

        Args:
            message (str | List[str]): The message to add to the chat history
        """
        pass

    def reset_history(self):
        """
        Empty self.chat_history of the agent.
        """
        pass

    def handle_tool_calls(self, response: ChatCompletionMessage):
        """
        Handles tool_calls in the wikipedia game context:
            - Executes the tool calls using execute_tool_calls
            - Appends the original tool call & tool_responses to the chat_history
            - If the agent has moved to a new page, resets the chat_history
            - If not, get the next_step_message instruction from the task and append it to chat_history

        Args:
            response (ChatCompletionMessage): The response from the model
        """
        pass

    def handle_refusal(self, response: ChatCompletionMessage):
        """
        Handles refusals in the wikipedia game context:

        Args:
            response (ChatCompletionMessage): The response from the model
        """
        pass

    def start(self):
        """
        A function to put the starting instructions in agent.chat_history when the agent starts a new page or starts the game.
        """
        pass
    def run(self):
        """
        This function runs the agent in the wikipedia game context. It:
            - Gets the current task instruction
            - Gets the response from the model
            - Handles the response in the cases:
                - tool calls (using handle_tool_calls)
                - refusals (using handle_refusal)
                - no tool calls (using update_history)
        """
        pass

### Exercise - Run the task
```c
Difficulty: 🔴⚪⚪⚪⚪
Importance: 🔵🔵⚪⚪⚪

You should spend up to 5 mins on this exercise.
```

Just like we did for the arithmetic agent, you should write an agent loop for the wikipedia agent (in this case you won't need to print output, as we handled it in the agent class so this function should be a very simple loop).

In [274]:
def agent_loop(agent, game, num_loops=10):
    """
    Run the agent loop for a given number of loops

    Args:
        agent (WikiAgent): The agent to run
        game (BaseWikiGame): The game to play
        num_loops (int): The number of loops to run
    """
    #agent.start()

    pass

Your agent should be able to accomplish the following task:

In [None]:
game = BaseWikiGame("Albert Einstein", "Aristotle")
agent = WikiAgent(task=game, tools=wiki_game_tools)
agent_loop(agent, game, 10)

Make sure to check the messages in the chat history to see the full conversation between the agent and the user, to ensure that the messages that are printed above are faithful to the actual chat history (it can be easy to make minor mistakes that mess up the agent's chat_history).

In [None]:
for message in agent.chat_history:
    try:
        print(f"{str(message.role)}:\n {str(message.content)}")
    except:
        print(f"""{message["role"]}:\n {message["content"]}""")


# 4️⃣ Elicitation

You may have observed that while the above implementation of `WikiAgent` succeeds at Albert Einstein → Aristotle, it fails at more complex tasks. However, this doesn't mean that GPT-4o-mini does not have the capability to perform better on this task, but this capability might be blocked because we:

- Prompted the model poorly

- Stored the history poorly.

- Didn't give the model sufficient tools to accomplish the task.

- ...

In general, it is hard to show that a model does not have a capability, even if *we* failed to demonstrate this capability. For example, it took *3.5 years* after the release of GPT-2 (and 2.5 years after the release of GPT-3) for people to discover that [chain-of-thought reasoning](https://arxiv.org/abs/2201.11903) massively improves model performance (which allows the completion of significantly more complex tasks). A potential failure case for AI safety is that people discover similar breakthroughs that significantly increase model performance with minimal additional training, and this is not accounted for in our safety evaluations. Thus, LLM agent evals aim to elicit the best capability we possibly can, until we feel we've managed to gain [**evidence of absence**](https://en.wikipedia.org/wiki/Evidence_of_absence), **not** just **absence of evidence**.


Broadly speaking, there are two categories of elicitation, narrow elicitation and general elicitation:

1. Narrow elicitation: methods that improve model performance on a particular task, or small class of tasks, but won't necessarily impact model performance in general across many tasks. 
    - E.g. Give the model access to the content of arbitrary wikipedia articles - This will improve performance on this task significantly, but wouldn't generalize to other tasks.

2. General elicitation: methods that improve model performance on a wide array of possible tasks. 
    - E.g. Chain-of-thought prompting - This tends to improve model performance on a wide array of tasks. These sorts of elicitation methods are the ones we're most interested in, as if researchers find an improvement to models that is roughly as easy and effective as chain-of-thought prompting, then we would see a very rapid increase in risk from AI.


We will try:
1. Prompt engineering, including:
    - Chain-of-thought prompting
    
    - The ReAct framework
     
2. Reflexion (allowing the model to cheaply explore future paths)

3. Improved message histories

Then you will be able to try further elicitation methods, including any of your own, as a bonus.

<details><summary>Some notes on Wikipedia and Implementation</summary>

You might start having a hard time coming up with wikipedia pages to test on. Luckily, Wikipedia offers a random page link, which is accessible via: https://en.wikipedia.org/wiki/special:random. Sometimes pages are *too* random, and won't be accessible by links between each other. One suggestion for coming up with "random but sensible" wikipedia pages is to go to a different language's wikipedia (which is generally much smaller than English wikipedia), and then use the "special:random" link (it works in every language). The majority of pages in other languages will also exist in English, but you'll want to switch languages back to English to check (and get the English title).

If you want to test whether two pages are accessible via links from each other, then use this free online tool to see the possible paths between pages: https://www.sixdegreesofwikipedia.com/ (be somewhat careful with this though, as the paths that this website believes are accessible may not be accessible to our agent). 

</details>


## Prompting

You should already be aware that prompting can have a large impact on model performance. There are a wide variety of possible changes you could make for prompts in this task. You should experiment first with more general elicitation methods such as getting the agent to think more deeply, and output plans in different ways. After this, you might try a wide array of narrow elicitation methods including:

- Telling the agent how many pages it's visited.

- Telling the agent if it's already visited the page it's on (and how many times).

- Schedule different prompts and planning methods for the "zoom out" and "zoom in" sections of the game, since we know that the general strategy for the wikipedia game looks like:

    Specific article (with few links) -> General article (with many links) -> Specific article (with few links)


### Exercise - Engineer prompts
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵⚪⚪

You should spend up to 20-25 mins on this exercise.
```

Remember that your prompts will have to be robust to: 

* Different tasks within the wikipedia game, 

* Different states within those tasks,

* Different failure-modes the agent could encounter.

Mess around with the prompting setup and see if you can significantly improve performance.

In [278]:
class WikiGame(BaseWikiGame):
    """
    Inherits from BaseWikiGame and adds improved prompting.

    Attributes:
        starting_page (str): The title of the starting page (inherited)
        goal_page (str): The title of the goal page (inherited)
        current_page (WikipediaPage): The current Wikipedia page (inherited)
        page_history (List[str]): The history of pages visited (inherited)

    Methods:
        get_page(title: str) -> WikipediaPage: Get a Wikipedia page object given a title (inherited)

        get_page_summary(page: WikipediaPage | None = None) -> str: Get the summary of a Wikipedia page (inherited)

        get_permitted_links(title: Optional[str] = None) -> list[str]: Get permitted links for the current page (inherited)

        is_permitted_link(link: str) -> bool: Check if a link is permitted (inherited)

        system_instruction -> dict: Generate the starting instructions for the game (modified below)

        on_page_instruction -> dict: Generate instructions for the current page (modified below)

        next_step_instruction -> dict: Generate instructions for the next step (modified below)

        get_instructions(system: bool, on_page: bool, next_step: bool) -> str: Generate instruction messages based on the current game state (inherited)

        check_win() -> bool: Check if the game has been won (inherited)

    """

    @property
    def system_instruction(self):
        """
        Provide improved starting instructions for the game.

        Returns:
            dict: The starting instructions. "role" is "system" for system messages.
        """
        return {}

    @property
    def on_page_instruction(self):
        """
        Provide improved instructions for the current page.

        Returns:
            dict: The instructions for the current page. "role" is "user" for user messages.
        """
        return {}

    @property
    def next_step_instruction(self):
        """
        Provide improved instructions for the next step.

        Returns:
            dict: The instructions for the next step. "role" is "user" for user messages.
        """

        return {}

Your `BaseWikiGame` and `WikiAgent` may not work on the example path "Linux" -> "Dana Carvey". But with improved prompting, you should be able to get the agent to solve this task!

In [None]:
game = BaseWikiGame("Linux", "Dana Carvey")
agent = WikiAgent(game, model="gpt-4o-mini", tools=wiki_game_tools)
agent_loop(agent, game, 30)

In [None]:
game = WikiGame("Linux", "Dana Carvey")
agent = WikiAgent(game, model="gpt-4o-mini", tools=wiki_game_tools)
agent_loop(agent, game, 30)

### Exercise - Implement the ReAct framework
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵🔵⚪⚪

You should spend up to 15-20 mins on this exercise.
```

Chain-of-thought prompting confers significant benefits to model performance, and you probably tried it when you messed around with prompting above. But when we're using LLMs as agents, we may want to provide a different structure to elicit reasoning. This is called the [**ReAct** framework](https://arxiv.org/abs/2210.03629); it consists of:

- Getting the model to generate **Re**asoning about its current situation, and what sort of actions it should consider taking.

- Then getting the model to perform an **Act**ion based on its outputted reasoning.

Remember that if you're calling the model without tools, OpenAI won't automatically provide the model with a description of the tools in its system message, so we'll have to ensure that the tool descriptions are in the `system_instruction` we provide (this will lead to some redundancy when the model takes an action, but this seems to be okay). This means that from now on we will have to feed both the *task* and the *agent* the list of tools the agent can use.

In [121]:
class WikiGameReAct(WikiGame):
    """
    Inherits from WikiGame and adds the ReAct framework.

    Attributes:
        starting_page (str): The title of the starting page (inherited)
        goal_page (str): The title of the goal page (inherited)
        current_page (WikipediaPage): The current Wikipedia page (inherited)
        page_history (List[str]): The history of pages visited (inherited)

    Methods:

        get_page(title: str) -> WikipediaPage: Get a Wikipedia page object given a title (inherited)

        get_page_summary(page: WikipediaPage | None = None) -> str: Get the summary of a Wikipedia page (inherited)

        get_permitted_links(title: Optional[str] = None) -> list[str]: Get permitted links for the current page (inherited)

        is_permitted_link(link: str) -> bool: Check if a link is permitted (inherited)

        system_instruction -> dict: Generate the starting instructions for the game (inherited)

        on_page_instruction -> dict: Generate instructions for the current page (inherited)

        next_step_instruction -> dict: Generate instructions for the next step (inherited)

        get_instructions(system: bool, on_page: bool, next_step: bool) -> str: Generate instruction messages based on the current game state (inherited)

        check_win() -> bool: Check if the game has been won (inherited)

    """

    def __init__(self, starting_page: str, goal_page: str, tools=None):
        super().__init__(starting_page, goal_page)
        self.tools = tools

    @property
    def system_instruction(self):
        """
        Provided a description of the tools in the system message. When generate is called with tools this is redundant, but when generate is called without tools, this is useful.

        Returns:
            dict: The starting instructions. "role" is "system" for system messages.
        """
        tool_descriptions = "\n".join([tool.description["function"]["name"] + ":" + tool.description["function"]["description"] for tool in self.tools])
        return {}


class WikiAgentReAct(WikiAgent):
    """
    Inherits from WikiAgent and adds the ReAct framework.

    Attributes:
        model (str): The model used for generating responses (inherited)
        tools (List[Any]): List of tools (inherited)
        client (OpenAI): OpenAI client for API calls (inherited)
        task (Any): The current task being executed (inherited)
        chat_history (List[dict]): History of interactions (inherited)

    Methods:
        get_response(use_tool: bool = True) -> ChatCompletionMessage: Get response from the model (inherited)

        execute_tool_calls(message: ChatCompletionMessage) -> List[str]: Execute tool calls from the model's response (inherited)

        run(with_tool: bool = True) -> bool: Run one loop of the Wikipedia agent (inherited)

        update_history(message : str | ChatCompletionMessage | List[str | ChatCompletionMessage]): Update self.chat_history and self.full_chat_history with a message or list of messages. (inherited)

        reset_history(): Empty self.chat_history of the agent. (inherited)

        handle_tool_calls(response: ChatCompletionMessage): Handles tool_calls in the wikipedia game context. (inherited)

        handle_refusal(response: ChatCompletionMessage): Handles refusals in the wikipedia game context. (inherited)

        start(): A function to put the starting instructions in agent.chat_history when the agent starts a new page or starts the game. (inherited)

        run(): This function runs the agent in the wikipedia game context. (inherited)


    """

    def generate_reason(self) -> ChatCompletionMessage:
        """
        
        Generate a reason for the agent to take an action. This function should:
            - Get the model to reason about the current state of the game (without tools)
            - Return the response from the model
        
        Returns:
            ChatCompletionMessage: The response from the model
        """
        pass

    def generate_action(self) -> ChatCompletionMessage:
        """
        
        Generate an action for the agent to take. This function should:
            - Get the model to generate an action for the agent to take (with tools)
            - Return the response from the model
        
        Returns:
            ChatCompletionMessage: The response from the model
        
        """
        pass

    def generate_reason_and_action(self):
        """
        
        Generate a Reason and Action for the agent to take. This function should:
            - Generate a Reason
            - Add the Reason to the chat history
            - Generate an Action
            - Return the Action so that tool calls can be handled

        Returns:
            ChatCompletionMessage: The action from the model

        """
        pass

    def run(self):
        """
        Run one loop of the agent.

        This function should:
            - Generate a Reason and Action
            - Handle the tool calls, refusals, and no tool calls in the model response
        """
        pass


You may have to rewrite your `agent_loop`.

In [122]:
def agent_loop_ReAct(game, agent, num_loops = 10):
    """
    Run the agent loop for a given number of loops with the ReAct framework.

    Args:
        agent (WikiReActAgent): The agent to run
        game (WikiGameReAct): The game to play
        num_loops (int): The number of loops to run
    """
    pass

Your `WikiAgent` and `WikiGame` with only improved prompting might not be able to solve "Drupe" → "17th parallel north" (or might not be able to solve it very effectively or reliably). However, your ReAct agent should be able to solve this path.

In [None]:
game = WikiGame("Drupe", "17th parallel north")
agent = WikiAgent(task=game, tools=wiki_game_tools)
agent_loop(agent, game, 40)

In [None]:
game = WikiGameReAct("Drupe", "17th parallel north", tools=wiki_game_tools)
agent = WikiAgentReAct(game, model="gpt-4o-mini", tools = wiki_game_tools)
agent_loop_ReAct(game, agent,40)

### Exercise - Implement a reflexion tool
```c
Difficulty: 🔴🔴🔴⚪⚪
Importance: 🔵🔵🔵⚪⚪

You should spend up to 10-15 mins on this exercise.
```
[Chloe comment] Reflexion should be better explained, with a diagram, not just in 1 sentence.

=====================


[This paper](https://arxiv.org/abs/2303.11366) finds better performance by LLMs on tasks when they can perform "lookahead" and get feedback on their plans. We will imitate this by allowing the agent to suggest candidate paths, and informing it where these paths go wrong (if they do). You'll need to add this tool to the list of tools.

We don't want to provide the agent the links/content of every page when it does this lookahead, as then we'd just be reimplementing a smaller version of the game *inside the game*. Instead, we'll let the agent suggest paths without seeing any content or links, and then let it know if this path works. It's very likely that a suggested link will — at some point — not be accessible from one of the pages, but this tool will still be useful to help the agent plan.

In [154]:
class test_path_tool(Tool):
    """
    Implements a tool that allows the agent to test paths from the current state of the game.

    Attributes:
        name (str): The name of the tool

    Methods:
        execute(task: Any, path: str) -> str: Test if a given path is valid.

        description -> dict: Provides the description of the test_path tool for the API
    """
    
    name = "test_path"

    def execute(self, task: Any, path: str) -> str:
        """
        Test if a given path is valid.

        Args:
            path (str): A string representing a path, e.g., "Barack Obama -> Indonesia -> India"

        Returns:
            str: A message indicating whether the path is valid or where it fails.
        """
        pass
    
    @property
    def description(self) -> dict:
        return {}
test_path_tool_inst = test_path_tool()
wiki_game_tools = [get_content_tool_inst, move_page_tool_inst, test_path_tool_inst]

In [None]:
game = WikiGameHistory("", "", tools = wiki_game_tools)
agent = WikiAgentReAct(game, model="gpt-4o-mini", tools = wiki_game_tools)
agent_loop_ReAct(game,agent, 40)

<details><summary>Help: My agent isn't using the <code>test_path</code> tool</summary>

If your agent isn't using the test path tool, you may want to go back and modify your prompting. One way you could do this is to schedule a prompt to tell the agent to use the `test_path` tool if it hasn't used it in its last N tool calls. Alternatively, you could just include in every `on_page_instruction` an indication that the agent should use this tool.

</details>

### Exercise - Let the LLM see its entire chat history
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵🔵⚪⚪⚪
```

You may have noticed that the agent performs significantly worse as a result of the fact that we decided to reset the chat history every time the agent encounters a new page. It often comes up with plans and doesn't follow through on them. We can fix this issue by letting the agent see the entirety of its chat history.

What we have to overcome is the context window considerations, specifically with regards to the length of wikipedia pages. However, we can fix these issues by resetting the outputs of the `get_content` function each time the agent moves to a new page, instead of resetting the entire chat history.

We'll modify the reset function in the `WikiAgentReAct` class to accomplish this.

In [279]:
class WikiAgentChatHistory(WikiAgentReAct):
    """
    Inherits from WikiAgentReAct and adds the ability to store and retrieve chat history.

    Attributes:
        model (str): The model used for generating responses (inherited)
        tools (List[Any]): List of tools (inherited)
        client (OpenAI): OpenAI client for API calls (inherited)
        task (Any): The current task being executed (inherited)
        chat_history (List[dict]): History of interactions (inherited)
        full_chat_history (List[dict]): Full history of interactions

    Methods:
        get_response(use_tool: bool = True) -> ChatCompletionMessage: Get response from the model (inherited)

        execute_tool_calls(message: ChatCompletionMessage) -> List[str]: Execute tool calls from the model's response (inherited)

        run(with_tool: bool = True) -> bool: Run one loop of the Wikipedia agent (inherited)

        update_history(message : str | ChatCompletionMessage | List[str | ChatCompletionMessage]): Update self.chat_history and self.full_chat_history with a message or list of messages. (inherited)

        reset_history(): Empty self.chat_history of the agent. (modified below)

        handle_tool_calls(response: ChatCompletionMessage): Handles tool_calls in the wikipedia game context. (inherited)

        handle_refusal(response: ChatCompletionMessage): Handles refusals in the wikipedia game context. (inherited)

        start(): A function to put the starting instructions in agent.chat_history when the agent starts a new page or starts the game. (inherited)

        run(): This function runs the agent in the wikipedia game context. (inherited)

        store_chat_history(): Store the current chat history in the full chat history.

        retrieve_chat_history(): Retrieve the full chat history.
    """
    def reset_history(self):
        """
        Replace the output of get_content tool with an indication that wikipedia content was output when the agent moves to a new page (to stay under the context window restrictions).
        """
        for message in self.chat_history:
            if isinstance(message, dict):
                if message["role"] == "tool" and message["name"] == "get_content" and message["content"] != "Wikipedia content was output here.":
                    message["content"] = "Wikipedia content was output here."
                else:
                    pass
            else:
                pass
                     




In [None]:
game = WikiGameReAct("Drupe", "17th parallel north", tools=wiki_game_tools)
agent = WikiAgentChatHistory(game, model="gpt-4o-mini", tools = wiki_game_tools)
agent_loop_ReAct(game, agent, 40)

# 5️⃣ Bonus

## Additional Tool use

### Exercise - Implement a page summary tool
```c
Difficulty:🔴🔴⚪⚪⚪
Importance:🔵🔵⚪⚪⚪

You should spend up to 10-15 mins on this exercise.
```

Implement a tool that allows an agent to get a summary of any page that is accessible from its current page. This imitates a feature on wikipedia where you can see a short summary of a page when you hover over the link to it. You could either implement this tool so that the agent can just read the summary, or you can modify the `move_page` tool, so that the agent sees a summary of the page it wants to move to, and can then make a decision whether to ultimately move page.


In [166]:
class get_accessible_page_summary_tool(Tool):
    """
    Implements a tool that allows the agent to get the summary of a Wikipedia page (you should use the get_page_summary function from the agent class)
    """

    name = "get_accessible_page_summary"

    @staticmethod
    def get_page_summary(task: Any, page_title: str) -> str:
        """
        Get summary of a wikipedia page, to the last full stop within the first 500 characters. This is used to give a brief overview of the page to the agent.

        Args:
            page (str): The Wikipedia page title.
            task (Any): The current task object.

        Returns:
            str: The summary of the Wikipedia page.
        """

        pass

    @property
    def description(self) -> dict:
        """
        Provides the description of the get_page_summary tool

        Returns:
            dict: The description of the tool for the API
        """
        return {}


get_accessible_page_summary_tool_inst = get_accessible_page_summary_tool()
wiki_game_tools = [get_content_tool_inst, move_page_tool_inst, test_path_tool_inst, get_accessible_page_summary_tool_inst]

In [None]:
wiki_game_tools = [get_content_tool_inst, move_page_tool_inst, test_path_tool_inst, get_accessible_page_summary_tool_inst]
agent = WikiAgentReAct(game, model="gpt-4o-mini", tools = wiki_game_tools)
game = WikiGameHistory("", "", tools = wiki_game_tools)

### Exercise - Implement an arbitrary page summary/content tool
```c
Difficulty:🔴⚪⚪⚪⚪
Importance:🔵🔵⚪⚪⚪

You should spend up to 5-10 mins on this exercise.
```

Now implement a tool that allows the agent to suggest any wikipedia page, and get a brief summary of it. This may be helpful for the agent to formulate plans into the future.


In [168]:
class get_page_content(Tool):
    """
    Implements a tool that allows the agent to get the content of any Wikipedia page (not wrapped in link tags).
    """

    name = "get_any_page_content"

    @staticmethod
    def execute(task: Any, page_title: str | None = None) -> str:
        """
        Get the content of any wikipedia page

        Also provides current page content if no page_title is provided.

        Args:
            page_title (str): The title of the Wikipedia page

        Returns:
            str: The content of the page (not wrapped in link tags).
        """
        pass

    @property
    def description(self):
        """
        Provides the description of the get_any_page_content tool

        Returns:
            dict: The description of the tool for the API
        """
        return {}

get_page_content_tool_inst = get_page_content()
wiki_game_tools = [get_content_tool_inst, move_page_tool_inst, test_path_tool_inst, get_page_content_tool_inst]


In [None]:
agent = WikiAgentReAct(game, model="gpt-4o-mini", tools = wiki_game_tools)
game = WikiGameHistory("49th Fighter Training Squadron", "Rosenhan experiment", tools = wiki_game_tools)
agent_loop_ReAct(game, agent, 30)

### Exercise - Implement additional rules
```c
Difficulty: 🔴🔴⚪⚪⚪
Importance: 🔵⚪⚪⚪⚪
```

At this point, our agent is too good, and can solve almost any wikipedia race (that is possible for it to solve, given that it can't see every accessible page). So if we want to try further elicitation methods, and see return from gains, we need to implement some difficult rules for the wikipedia game. One way we can do this is to add additional rules.

Implement the additional rules in the WikiGame class. Some suggestions for rules are:
- A "No country" rule,
- A "No articles above a given length" rule

But feel free to add more (or different) rules if you think of any others.

In [None]:
class WikiGameRules(WikiGameHistory):
    """
    Inherits from WikiGameHistory and adds the ability to store and display the rules of the game.

    Attributes:
        starting_page (str): The title of the starting page (inherited)
        goal_page (str): The title of the goal page (inherited)
        current_page (WikipediaPage): The current Wikipedia page (inherited)
        page_history (List[str]): The history of pages visited (inherited)
        full_chat_history (List[dict]): The full history of messages sent (inherited)

    Methods:
        get_page(title: str) -> WikipediaPage: Get a Wikipedia page object given a title (inherited)

        get_page_summary(page: WikipediaPage | None = None) -> str: Get the summary of a Wikipedia page (inherited)

        get_permitted_links(title: Optional[str] = None) -> list[str]: Get permitted links for the current page (inherited)

        is_permitted_link(link: str) -> bool: Check if a link is permitted (inherited)

        system_instruction -> dict: Generate the starting instructions for the game (inherited)

        on_page_instruction -> dict: Generate instructions for the current page (inherited)

        next_step_instruction -> dict: Generate instructions for the next step (inherited)

        get_instructions(system: bool, on_page: bool, next_step: bool) -> str: Generate instruction messages based on the current game state (inherited)

        check_win() -> bool: Check if the game has been won (inherited)
    """

    def __init__(self, starting_page: str, goal_page: str, rules: List[Literal["no countries", "no pages above length 30000"]], tools=None):
        super().__init__(starting_page, goal_page, tools)
        self.rules = rules if rules else None

    def system_instruction(self):
        """
        Provide improved starting instructions for the game.

        Returns:
            dict: The starting instructions. "role" is "system" for system messages.
        """
        if self.rules:
            return {}
        else:
            return { }

    def on_page_instruction(self):
        """
        Provide improved instructions for the current page.

        Returns:
            dict: The instructions for the current page. "role" is "user" for user messages.
        """
        return {}

Now implement the `move_page_tool_rules`. This should be based on the `move_page_tool` but disallow the agent from accessing any pages that the rules disallow, and return a message that this page violates one of the rules of the game.

In [None]:
class move_page_tool_rules(move_page_tool):
    """
    Inherits from move_page_tool and adds the ability to check the rules of the game.
    """

    @staticmethod
    def execute(new_page: str, task: Any) -> str:
        """
        Changes your current page to a specified new page which is accessible via a link from the current page. You can only call this function once at a time, as it will take you to a different page.

        Args:
            task (BaseWikiGame): The current task object.
            new_page (str): The title of the new page to move to.

        Returns:
            str: A message indicating the result of the move
        """
        return ""

    @property
    def description(self):
        """
        Provides the description of the move_page tool

        Returns:
            dict: The description of the move_page tool for the API
        """
        return {}

### Exercise (optional) - Try further elicitation methods.

See [Lilian Weng's post](https://lilianweng.github.io/posts/2023-06-23-agent/) about elicitation methods if you're stuck for ideas. You can either iterate on prior iteration methods (e.g. prompting), or find and come up with your own.

ANY OTHER SOURCES FOR METHODS HERE?

In [None]:
#Implement other Elicitation methods
#
#
#
#