# `PokeAgent` LLM reasoning agents


**Note**: this notebooks requires a locally running Pokémon Showdown server. Please see the [getting started section](../getting_started.rst) for help on how to set one up.


In [1]:
from poke_env import RandomPlayer
from poke_env.player import Player
from poke_env.environment.pokemon import Pokemon
from poke_env.environment.battle import Battle, AbstractBattle
import openai
client = openai.OpenAI()
from poke_env.environment.move import Move
from typing import List, Union
import json
import agentops

# The RandomPlayer is a basic agent that makes decisions randomly,
# serving as a starting point for more complex agent development.
random_player = RandomPlayer()

### Creating a Battle

To create a battle, let's create a second agent and use the `battle_against` method. It's an asynchronous method, so we need to `await` it.

In [2]:
second_player = RandomPlayer()

# The battle_against method initiates a battle between two players.
# Here we are using asynchronous programming (await) to start the battle.
await random_player.battle_against(second_player, n_battles=1)

If you want to look at this battle, you can open a browser at [http://localhost:8000](http://localhost:8000) - you should see the battle in the lobby.

### Inspecting the Result

Here are a couple of ways to inspect the result of this battle.

In [3]:
# n_won_battles and n_finished_battles

print(
    f"Player {random_player.username} won {random_player.n_won_battles} out of {random_player.n_finished_battles} played"
)
print(
    f"Player {second_player.username} won {second_player.n_won_battles} out of {second_player.n_finished_battles} played"
)

# Looping over battles

for battle_tag, battle in random_player.battles.items():
    print(battle_tag, battle.won)

Player RandomPlayer 1 won 0 out of 1 played
Player RandomPlayer 2 won 1 out of 1 played
battle-gen9randombattle-1702 False


You can look at more properties of the [Player](../modules/player.rst) and [Battle](../modules/battle.rst) classes in the documentation.

### Running a Cross-Evaluation

`poke-env` provides a `cross_evaluate` function, that allows you to run a cross evaluation between multiple agents. It will run a number of battles between the two agents, and return the results of the evaluation in a structured way.

In [4]:
from poke_env import cross_evaluate

third_player = RandomPlayer()

players = [random_player, second_player, third_player]

cross_evaluation = await cross_evaluate(players, n_challenges=5)
cross_evaluation

{'RandomPlayer 1': {'RandomPlayer 1': None,
  'RandomPlayer 2': 0.3333333333333333,
  'RandomPlayer 3': 0.2},
 'RandomPlayer 2': {'RandomPlayer 1': 0.6666666666666666,
  'RandomPlayer 2': None,
  'RandomPlayer 3': 0.4},
 'RandomPlayer 3': {'RandomPlayer 1': 0.8,
  'RandomPlayer 2': 0.6,
  'RandomPlayer 3': None}}

Here's one way to pretty print the results of the cross evaluation using `tabulate`:

In [5]:
from tabulate import tabulate

table = [["-"] + [p.username for p in players]]
for p_1, results in cross_evaluation.items():
    table.append([p_1] + [cross_evaluation[p_1][p_2] for p_2 in results])

print(tabulate(table))

--------------  ------------------  ------------------  --------------
-               RandomPlayer 1      RandomPlayer 2      RandomPlayer 3
RandomPlayer 1                      0.3333333333333333  0.2
RandomPlayer 2  0.6666666666666666                      0.4
RandomPlayer 3  0.8                 0.6
--------------  ------------------  ------------------  --------------


## Building GPT Player

In [6]:
def log_pokemon(pokemon: Pokemon, is_opponent: bool = False):
    lines = [
        f"[{pokemon.species} ({pokemon.name}) {'[FAINTED]' if pokemon.fainted else ''}]",
        f"Types: {[t.name for t in pokemon.types]}"
    ]
    
    if is_opponent:
        lines.append(f'Possible Tera types {pokemon.tera_type}')
    
    lines.extend([
        f"HP: {pokemon.current_hp}/{pokemon.max_hp} ({pokemon.current_hp_fraction*100:.1f}%)",
        f"Base stats: {pokemon.base_stats}",
        f"Stats: {pokemon.stats}",
        f"{'Possible abililities' if is_opponent else 'Ability'}: {pokemon.ability}",
        f"{'Possible items' if is_opponent else 'Item'}: {pokemon.item}",
        f"Status: {pokemon.status}"
    ])
    
    if pokemon.status:
        lines.append(f"Status turn count: {pokemon.status_counter}")
    
    lines.append("Moves:")
    lines.extend([
        f"Move ID: `{move.id}` Base Power: {move.base_power} Accuracy: {move.accuracy * 100}% PP: ({move.current_pp}/{move.max_pp}) Priority: {move.priority}  "
        for move in pokemon.moves.values()
    ])
    
    lines.extend([
        f"Stats: {pokemon.stats}",
        f"Boosts: {pokemon.boosts}"
    ])
    
    return "\n".join(lines)


def log_player_info(battle: AbstractBattle):
    lines = [
        "== Player Info ==",
        "Active pokemon:",
        log_pokemon(battle.active_pokemon),
        f"Tera Type: {battle.can_tera}",
        '-'*10,
        f"Team: {battle.team}"
    ]
    
    for _, mon in battle.team.items():
        if not mon.active:
            lines.append(log_pokemon(mon))
            lines.append("")
    
    return "\n".join(lines)


def log_opponent_info(battle: AbstractBattle):
    return "\n".join([
        "== Opponent Info ==",
        "Opponent active pokemon:",
        log_pokemon(battle.opponent_active_pokemon, is_opponent=True),
        f"Opponent team: {battle.opponent_team}"
    ])


def log_battle_info(battle: AbstractBattle):
    lines = [
        "== Battle Info ==",
        f"Turn: {battle.turn}"
    ]
    
    # Field info
    if battle.weather:
        lines.append(f"Weather: {battle.weather}")
    if battle.fields:
        lines.append(f"Fields: {battle.fields}")
    if battle.side_conditions:
        lines.append(f"Player side conditions: {battle.side_conditions}")
    if battle.opponent_side_conditions:
        lines.append(f"Opponent side conditions: {battle.opponent_side_conditions}")
    if battle.trapped:
        lines.append(f"Trapped: {battle.trapped}")
    
    return "\n".join(lines)


In [7]:
def create_prompt(battle_info, player_info, opponent_info, available_moves) -> str:
    prompt = f"""
Here is the current state of the battle:

{battle_info}

Here is the current state of your team:

{player_info}

Here is the current state of the opponent's team:

{opponent_info} 

Your goal is to win the battle. You can only choose one move to make.

Here is the list of available moves:

{available_moves}

Reason carefully about the best move to make. Consider things like the opponent's team, the weather, the side conditions (i.e. stealth rock, spikes, sticky web, etc.). Consider the effectiveness of the move against the opponent's team, but also consider the power of the move, and the accuracy. You may also switch to a different pokemon if you think it is a better option. Given the complexity of the game, you may also sometimes choose to "sacrifice" your pokemon to put your team in a better position.

Finally, write a conclusion that includes the move you will make, and the reason you made that move.

"""
    return prompt

class GPTPlayer(Player):
    
    def choose_max_damage_move(self, battle: Battle):
        return max(battle.available_moves, key=lambda move: move.base_power)

    def choose_move(self, battle: AbstractBattle):

        def choose_order_from_id(move_id: str, battle: AbstractBattle) -> Union[Move, Pokemon]:
            try:
                return list(filter(lambda move: move.id == move_id, battle.available_moves))[0]
            except Exception as e:
                print('Error picking move: ', e)
                return battle.available_moves[0]

        # Chooses a move with the highest base power when possible
        if battle.available_moves:
            # Define tool call dsl
            tools = [{
                "type": "function",
                "name": "choose_order_from_id",
                "description": "Choose a move from the list of available moves.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "move_id": {
                            "type": "string",
                            "description": "The id (name of move) of the move to choose"
                        }
                    },
                    "required": [
                        "move_id"
                    ],
                    "additionalProperties": False
                }
            }]

            # Pass state of game to the Agent
            system_prompt = create_prompt(
                log_battle_info(battle),
                log_player_info(battle),
                log_opponent_info(battle),
                battle.available_moves
            )
            
            print('Calling GPT...')
            reasoning_response = client.responses.create(
                model="gpt-4.1",
                input=[{"role": "system", "content": system_prompt}, 
                    {"role": "user", "content": "Select a move based on the move id (the name of the move)"}],
                # tools=tools,
                # tool_choice="required"
            )

            tool_selection_response = client.responses.create(
                model="gpt-4.1",
                input="tell me another",
                tools=tools,
                tool_choice="required",
                previous_response_id=reasoning_response.id
            )

            
            print('GPT called')
            tool_call = tool_selection_response.output[0]
            print(tool_call)
            args = json.loads(tool_call.arguments)
            print('Args: ', args)
            print('Available moves: ', battle.available_moves)
            chosen_order = choose_order_from_id(args["move_id"], battle)
            print('Chosen order: ', chosen_order)

            # Iterating over available moves to find the one with the highest base power
            best_move = max(battle.available_moves, key=lambda move: move.base_power)
            # Creating an order for the selected move
            return self.create_order(chosen_order)


        else:
            print('No moves available calling random')
            # If no attacking move is available, perform a random switch
            # This involves choosing a random move, which could be a switch or another available action
            return self.choose_random_move(battle)


## Run the GPT Player

Next, we'll test our `GPT Player` against a `RandomPlayer` in a series of battles:


In [None]:
# Max damage player
class MaxDamagePlayer(Player):
    def choose_move(self, battle):
        if battle.available_moves:
            best_move = max(battle.available_moves, key=lambda move: move.base_power)

            if battle.can_tera:
                return self.create_order(best_move, terastallize=True)

            return self.create_order(best_move)
        else:
            return self.choose_random_move(battle)

# Creating players
random_player = RandomPlayer()
gpt_player = GPTPlayer()

# Running battles

agentops.init()
await gpt_player.battle_against(random_player, n_battles=1)


# Displaying results
print(f"GPT player won {gpt_player.n_won_battles} / {gpt_player.n_finished_battles} battles")

No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
No moves available calling random
Calling GPT...
GPT called
ResponseFunctionToolCall(arguments='{"move_id":"focusbla



GPT called
ResponseFunctionToolCall(arguments='{"move_id":"psychic"}', call_id='call_6LRfbizACYWj6fEai8OIvxJ8', name='choose_order_from_id', type='function_call', id='fc_681fd65d7d0c8191b99cddc06983aac006e7a35d4a8b5a40', status='completed')
Args:  {'move_id': 'psychic'}
Available moves:  [focusblast (Move object), psychic (Move object), calmmind (Move object), recover (Move object)]
Chosen order:  psychic (Move object)
No moves available calling random
Calling GPT...
GPT called
ResponseFunctionToolCall(arguments='{"move_id":"quiverdance"}', call_id='call_wTYy76JCLUBpK2jyLqHYzwze', name='choose_order_from_id', type='function_call', id='fc_681fd66b9b3481919bb5dfb5158d96860f0278777777aaf5', status='completed')
Args:  {'move_id': 'quiverdance'}
Available moves:  [hurricane (Move object), quiverdance (Move object), roost (Move object), revelationdance (Move object)]
Chosen order:  quiverdance (Move object)
Calling GPT...
GPT called
ResponseFunctionToolCall(arguments='{"move_id":"roost"}', c