# Writing a Simple Agent in Python

> **Learning Objectives:**
> - Understand the basic structure of an agent.
> - Implement a simple agent in Python.
> - Test the performance of the agent.

## Introduction
Agentic AI is most often developed using libraries such as [LangGraph](https://www.langchain.com/langgraph) or [AutoGen](https://microsoft.github.io/autogen/0.2/). However, it is also possible to create a simple agent using Python. In this notebook, we will create a simple agent that can interact with a basic environment.

### The Wumpus World
Our agent will be placed in a simple environment called the Wumpus World. This is a classic problem in the field of AI and provides a good starting point for observing how an agent can interact with an environment to achieve a goal but without specific instructions on how to do so.

The Wumpus World is a cave consisting of rooms connected by passageways. The rooms are arranged in a 4x4 grid. In the cave is a **Wumpus** that eats any agents which enter the room. The agent can feel a **stench** in the room if the Wumpus is in a neighboring room.

The cave also contains **pits**. If an agent enters a room with a pit, it falls in and dies. The agent can feel a **breeze** in the room if there is a pit in a neighboring room. The agent has a **gold** detector that beeps when the agent is in a room containing gold. The agent can pick up the gold and leave the cave. The agent can also climb out of the cave without the gold.

### Performance
The agent receives a reward of +1000 for climbing out of the cave with the gold, -1000 for falling into a pit or being eaten by the Wumpus, and -1 for each action taken. The game ends when the agent climbs out of the cave, dies, or takes more than 20 actions.

### Environment
The environment is represented by a 4x4 grid. The agent always starts in the bottom-left corner of the grid. The Wumpus, pits, and gold are placed randomly in the other rooms.

### Actions
The agent can take the following actions:

- `Up`: Move up one room.
- `Down`: Move down one room.
- `Left`: Move left one room.
- `Right`: Move right one room.
- `Grab`: Grab the gold, if it is in the room.
- `Climb`: Climb out of the cave, from the starting room.

### Sensors
The agent has the following sensors:

* `Stench`: Detects a stench in adjacent rooms (not diagonally).
* `Breeze`: Detects a breeze in adjacent rooms (not diagonally).
* `Glitter`: Detects gold in the current room.
* `Ladder`: Detects the ladder in the starting room.
* `Bump`: Detects when the agent tries to move forward into a wall.

### Example Board

Here is an example of a Wumpus World board. Wind is represented by ‚òÅÔ∏è, gold by ü™ô, and stench by üëÉüèª.

|       | Col 1  | Col 2   | Col 3   | Col 4       |
|-------|--------|---------|---------|-------------|
| Row 1 | ‚òÅÔ∏è     | ‚òÅÔ∏è      | Pit     | ‚òÅÔ∏è        |
| Row 2 | Pit    | ‚òÅÔ∏è      | ‚òÅÔ∏è      | ü™ô, üëÉüèª    |
| Row 3 | ‚òÅÔ∏è     |         | üëÉüèª     | Wumpus      |
| Row 4 | Agent  |         |         | üëÉüèª         |



## Getting Started

Let's start by installing the required libraries and setting the OpenAI API key. The OpenAI API key is required to access the OpenAI models.

Run the following cell to install the required libraries and set the OpenAI API key.


In [None]:
%pip install -q openai==1.61.1

import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass("Enter your OpenAI API key: ")

## Core

### Step 1 - Define the Environment
For this lab, we have already written the code for the environment. The code is provided below. 

Run the cell below to define the environment and test the game.

<div class="alert alert-block alert-info">

**Note:** Use the action enum codes to perform the actions. For example, to move up, enter `Up`.
</div>

In [None]:
from enum import Enum
import random
import numpy as np

class Markers(Enum):
    AGENT = 'A'
    WUMPUS = 'W'
    PIT = 'P'
    GOLD = 'G'
    LADDER = 'L'
    EMPTY = '‚Ä¢'

    def __str__(self):
        return self.value

class Actions(Enum):
    MOVE_UP = 'Up'
    MOVE_DOWN = 'Down'
    MOVE_LEFT = 'Left'
    MOVE_RIGHT = 'Right'
    GRAB = 'Grab'
    CLIMB = 'Climb'


class WumpusWorld:
    def __init__(self):
        self.reset()

    def random_empty(self):
        while True:
            x = random.randint(0, 3)
            y = random.randint(0, 3)
            if self.board[y, x] == Markers.EMPTY and (x, y) != (0, 3):
                return (x, y)
    
    def set(self, x, y, value):
        self.board[y, x] = value

    def reset(self):
        self.board = np.full((4, 4), Markers.EMPTY)

        # Add the ladder
        self.set(0, 3, Markers.LADDER)

        # Randomize Wumpus position
        self.set(*self.random_empty(), Markers.WUMPUS)

        # Add two pits
        self.set(*self.random_empty(), Markers.PIT)
        self.set(*self.random_empty(), Markers.PIT)

        # Add the gold
        self.set(*self.random_empty(), Markers.GOLD)

        # Agent properties
        self.has_gold = False
        self.game_over = False
        self.agent_direction = 0 # 0 = up, 1 = right, 2 = down, 3 = left
        self.agent_position = (0, 3)
        self.action_counter = 0

        self.sensors = { }
        self.update_sensors()
    
    def update_sensors(self):
        x, y = self.agent_position
        self.sensors['position'] = (x, y)
        self.sensors['stench'] = any(self.board[ny, nx] == Markers.WUMPUS for nx, ny in [(x, y+1), (x, y-1), (x+1, y), (x-1, y)] if 0 <= nx < 4 and 0 <= ny < 4)
        self.sensors['breeze'] = any(self.board[ny, nx] == Markers.PIT for nx, ny in [(x, y+1), (x, y-1), (x+1, y), (x-1, y)] if 0 <= nx < 4 and 0 <= ny < 4)
        self.sensors['glitter'] = self.board[y, x] == Markers.GOLD
        self.sensors['ladder'] = self.board[y, x] == Markers.LADDER

        self.game_over = self.game_over \
            or self.board[y, x] == Markers.WUMPUS \
            or self.board[y, x] == Markers.PIT \
            or self.action_counter > 20
    
    def __str__(self):
        r = ''
        for y in range(4):
            for x in range(4):
                if (x, y) == self.agent_position:
                    r += f'{Markers.AGENT} '
                else:
                    r += f'{self.board[y, x]} '
            r += '\n'
        r += f'{self.sensors}\n'
        r += f'moves={self.action_counter}, game_over={self.game_over}'
        return r
    
    def move(self, dx, dy):
        x, y = self.agent_position
        new_x = x + dx
        new_y = y + dy

        if new_x < 0 or new_x >= 4 or new_y < 0 or new_y >= 4:
            self.update_sensors()
            self.sensors['bump'] = True
            return

        self.agent_position = (new_x, new_y)
        self.sensors['bump'] = False
        self.update_sensors()
    
    def grab(self):
        if self.sensors['glitter']:
            self.has_gold = True
            self.board[self.agent_position[1], self.agent_position[0]] = Markers.EMPTY
        self.update_sensors()
    
    def climb(self):
        if self.agent_position == (0, 3):
            self.game_over = True
        self.update_sensors()

    def score(self):
        score = -self.action_counter
        if self.agent_position == (0, 3) and self.has_gold:
            score += 1000
        if self.board[self.agent_position[1], self.agent_position[0]] == Markers.WUMPUS or \
            self.board[self.agent_position[1], self.agent_position[0]] == Markers.PIT:
            score -= 1000
        return score
    
    def is_win(self):
        return self.agent_position == (0, 3) and self.has_gold
    
    def act(self, action):
        self.action_counter += 1

        match action:
            case Actions.MOVE_UP: self.move(0, -1)
            case Actions.MOVE_DOWN: self.move(0, 1)
            case Actions.MOVE_LEFT: self.move(-1, 0)
            case Actions.MOVE_RIGHT: self.move(1, 0)
            case Actions.GRAB: self.grab()
            case Actions.CLIMB: self.climb()
            


world = WumpusWorld()

while not world.game_over:
    print(world)
    action = input("Enter action: ")
    world.act(Actions(action))

print('Game Over!')
print(world)

While this isn't the most advanced game, it provides a good starting point for understanding agentic AI. Let's review our definition of Agentic AI.

> **Agentic AI** is...
>   * AI agents
>   * Operating in an environment
>   * From a context
>   * Using tools
>   * To achieve goals.

Fill out the table below with the components of Agentic AI for the Wumpus World game.

## Step 2 - Define a Sample Agent
For our agent, we will create a simple Python class that can interact with the environment. The agent will have a `perceive` method to receive sensor information and a `decide` method to choose an action based on the sensor information. The decision-making process for this agent will be very simple: if the agent perceives glitter, it will grab the gold; if the agent perceives a breeze or stench, it will move in a different direction; otherwise, it will move randomly.

Run the cell below to define the agent.

In [None]:
class Agent:
    """Wumpus World agent base class"""
    def __init__(self):
        self.reset()
    
    def reset(self):
        self.rng = np.random.default_rng()
        self.has_gold = False
        self.actions = []
        self.sensors = []

    def _select_action(self, sensors):
        raise NotImplementedError()
    
    def act(self, sensors):
        action = self._select_action(sensors)
        self.actions.append(action)
        self.sensors.append(sensors)

        if action == Actions.GRAB and sensors['glitter']:
            self.has_gold = True

        return action

class RandomAgent(Agent):
    """Wumpus World agent that acts randomly"""
    def _select_action(self, sensors):   
        if sensors['glitter']:
            return Actions.GRAB
        
        if sensors['ladder'] and self.has_gold:
            return Actions.CLIMB
        
        return self.rng.choice([
            Actions.MOVE_UP, Actions.MOVE_DOWN, Actions.MOVE_LEFT, Actions.MOVE_RIGHT
        ])

With this simple agent, we can test how it performs in the Wumpus World environment.

Run the cell below to test the agent.

In [None]:
world = WumpusWorld()
agent = RandomAgent()

while not world.game_over:
    print(world)
    action = agent.act(world.sensors)
    world.act(action)
    print(f'Agent action: {action}')

print(world.score())
print(world)
print(f'Agent {"won" if world.is_win() else "lost"}!')


More likely than not, the random agent lost the game. This is because the agent is not making decisions based on the information it receives from the environment. However, this random agent provides a good metric for comparison when we create more advanced agents. If our new agent can't outperform the random agent, then we know we have a problem.

In the next cell, write code to run **10,000 games** with the random agent and calculate the average score. This should run in just a few seconds. You can also calculate the win rate of the random agent by using `world.is_win()`.

In [None]:
# YOUR CODE HERE

MAX_EPISODES = 10000  # Don't change the value here



##### The random agent should have an average score of about -764 and a win rate of about 6-7%. Because the world is randomly generated, the exact numbers may vary.

## Step 3 - Create a Generative AI Agent
Now that we have a baseline, let's create an Agent with a generative AI model. We will use the OpenAI GPT-4o-mini model to generate the agent's actions based on the information it receives from the environment. 

The following cell tests that the OpenAI API is working correctly. Run the cell to generate a response from the GPT-4o-mini model.

In [None]:
from openai import OpenAI
from IPython.display import display, Markdown

client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a Wumpus World agent."},
        {"role": "user", "content": "What is the best action to take?"},
    ],
    max_completion_tokens = 2
)

display(Markdown(completion.choices[0].message.content))

Without information about the environment, the model will not be able to generate an appropriate response. We need to provide the model with the information it needs to make a decision. We can do this by providing the model the following information:

* The definition of the wumpus world environment. (System prompt)
    ** You can use the definition from the introduction for this!
* The previous actions the agent has taken. (Prompt)
* The previous sensor information the agent has received. (Prompt)
* The sensor information the agent receives. (Prompt)
* The possible actions the agent can take. (Prompt)
* Whether the agent has the gold. (Prompt)

Given this information, the agent should know everything it needs about the environment, context, tools, and goals to make a decision. Here is a sample prompt that you can use to generate a response from the model:

```
I have already taken these actions with these sensor readings:
    - {sensor} -> {action}

I sense: {sensors}
I have the gold: {self.has_gold}

Explore cells until you find the gold.
Do not climb out without the gold, unless you have explored all cells.
Do not grab unless you sense glitter.
Don't repeatedly move in the same direction if you bump into a wall.
Once you sense a breeze or stench, avoid moving to unknown cells.

What is the best action to take?

Respond ONLY with one of these actions: Up, Down, Left, Right, Grab, Climb
```

In the next cell, define the `_select_action` method for the `GenerativeAgent` class. This method should generate a response from the OpenAI API using the prompt defined above and return the action generated by the model.

<div class="alert alert-block alert-info">

**Note:** Our Wumpus World is a little different from the standard definition. Even if it seems like the model already know the game, it's important to provide our model with the correct information.
</div>

In [None]:
# YOUR CODE HERE
class AIAgent(Agent):
    """Wumpus World agent that uses Generative AI to act"""
    def _select_action(self, sensors):
        raise NotImplementedError()

Now that we have a generative AI agent, let's test it in the Wumpus World environment. Run the cell below to test the agent.

In [None]:
world = WumpusWorld()
agent = AIAgent()

while not world.game_over:
    print(world)
    action = agent.act(world.sensors)
    print(action)
    world.act(action)

print(world.score())
print(f'Agent {"won" if world.is_win() else "lost"}!')


If your Agent isn't behaving as expected, you may need to adjust the prompts you are sending to the model. Make sure to explicitly define what you want the model to do in the prompt and how the output should be structured.

The AI agent may not win. Sometimes a pit or the Wumpus is placed in a way that the agent cannot avoid it. However, the AI agent should have a higher average score and win rate than the random agent.


In the next cell, write code to run **25** games with the generative AI agent and calculate the average score. You can also calculate the win rate of the generative AI agent by using `world.is_win()`.

<div class="alert alert-block alert-warning">

**Warning!** It's important that you only run 25 games with the generative AI agent. Running more games may take a long time and consume a lot of resources.
</div>


In [None]:
# YOUR CODE HERE

MAX_EPISODES = 25  # Don't change the value here



The average score of the AI Agent should be higher than the random agent. The win rate should also be higher. Since we have only run 25 games, it's possible that the agent does score lower than the random agent.

## Bonus Challenge 1
You may have noticed that the AI Agent doesn't always behave optimally. Sometimes it will take actions that are repetitive or don't make sense. In our previous prompt, we provided the model with a list of actions and sensor readings that the agent has already taken.

However, this may not be the best way to provide the model with the information it needs to make a decision. Because we know the structure of the Wumpus World environment, we can synthesize this information into a map of the environment. The map might look something like this:

```
|---------------|
| A | B | ? | ? |
| E | B | ? | ? |
| E | E | ? | ? |
| L | S | ? | ? |
|---------------|
```

When providing the model with the map, we are compressing the information about the environment into a more information-dense representation. This should help the model make better decisions because it does not need to infer the state of the environment from the previous actions and sensor readings.

### Step 1 - Define the Map
In the next cell, create a class called `Map` that will represent the map of the Wumpus World environment. The map should have the following methods:

* `__init__(self, size=4)`: Initializes the map with the given size.
* `update(self, x, y, sensors)`: Updates the map with the sensor information at the given position.
* `__str__(self)`: Returns a string representation of the map.

In [None]:
# YOUR CODE HERE

### Step 2 - Create a New Agent
In the next cell, create a new agent called `MapAgent` that uses the map to make decisions. You can begin with the `AIAgent` and modify your prompt to use the map.

Take away any information which is now redundant because it is represented in the map. For example, you no longer need to provide the model with the previous actions and sensor readings.

If you use symbols to represent the environment, make sure to include a legend in your prompt.

In [None]:
# YOUR CODE HERE

### Step 3 - Test the Agent
Now that you have the agent, test it in the Wumpus World environment. Run the agent 25 times and calculate the average score and win rate.

In [None]:
# YOUR CODE HERE

## Conclusion

In this notebook, we created a simple agent for the Wumpus World environment. We started with a random agent and then created a generative AI agent using the OpenAI API. Finally, we created a new agent that uses a map of the environment to make decisions.

While not the most advanced AI agent, nor the most useful, the Wumpus World provides a good starting point for understanding how agents can interact with environments to achieve goals. The concepts learned here can be applied to more complex environments and agents.