## Experiment Goal

The goal of this experiment is to setup a agent that learns to play on a pokemon showdown powered environment. It will learn this by following a training regiment that is to be determined. The regiment will be based on the idea of completing a nuzlocke challenge.

See [poke-env.readthedocs.io](https://poke-env.readthedocs.io/en/stable/examples/rl_with_open_ai_gym_wrapper.html) for more information.

In [1]:
import random
from typing import Tuple, List, Literal, Dict, Any, Optional, Union, Awaitable
import os
import json

In [2]:
data_dir = os.path.abspath('../data/')
gen4_dex = os.path.join(data_dir, 'gen4_dex.json')
trn_file = os.path.join(data_dir, 'parsed_platinum_trainers.json')

In [3]:
with open(gen4_dex, 'r') as f:
    dex = json.load(f)

with open(trn_file, 'r') as f:
    trn = json.load(f)

In [4]:
# Ensure relative imports work correctly
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from services.helper_methods import get_random_ivs, get_random_nature

In [5]:
from poke_env.teambuilder import TeambuilderPokemon
from poke_env.teambuilder import ConstantTeambuilder
from poke_env.player import RandomPlayer, Player
from poke_env.player.battle_order import (
    BattleOrder,
    DefaultBattleOrder,
    DoubleBattleOrder,
)

In [6]:
def get_dex_entry(species: str) -> dict:
    # TODO return dex entry as class
    return dex[species.lower()]

def get_random_gender(dex_entry):
    genders = list(dex_entry['genderRatio'].keys())
    weights = list(dex_entry['genderRatio'].values())
    return random.choices(genders, weights)[0]

def get_moves_from_dex_entry(dex_entry, learnedby: Literal['M', 'E', 'L'], v: int = 0):
    moves = []
    for move in dex_entry['learnset']:
        if learnedby == 'M' and any(learnedby in m for m in dex_entry['learnset'][move]):
            moves.append(move)
        elif learnedby == 'E' and any(learnedby in m for m in dex_entry['learnset'][move]):
            moves.append(move)
        elif learnedby == 'L' and any(learnedby in m for m in dex_entry['learnset'][move]):
            move_by_lvl = [eval(m[2:]) for m in dex_entry['learnset'][move] if learnedby in m and eval(m[2:]) < v]
            if move_by_lvl:
                moves.append(move)
    return moves

## But what is a Nuzlocke Challenge

TODO describe that the training regiments will be created with the thought in mind of completing a nuzlocke challenge
TODO explain the following rules
- Fainted pokemons are considered dead and may not be used in future battles
- Only the first pokemon encountered in each area can be caught
  - Dupes clause: If the first pokemons encountered is of a species already in the agents possession, the agent can try again until it encounters a new species
  - In extention, if a pokemon is encountered that is of a species that has already been caught, but the pokemon has fainted, the agent can catch the pokemon
- No legendaries 
- Whipe = reset
- No items in battle except for held items
- No overleveling the next boss trainers highest level pokemon
  - Boss trainers are the trainers that are required to be defeated in order to progress the story

## Reward shaping

TODO describe the following reward shaping:
- An agents reward for winning should be proportional to the strength of the opponents party relative to the agents party.
  - The idea being that the agent should be rewarded more for winning against a relativly stronger opponent.

## Gauntles
TODO describe that a gauntlet is a series of battles that the agent must complete, in order, without ever having its team whipe out
- The agent should be rewarded more and more as it gets further into the current gauntlet
- If at any point the agents team whipes out, the agent will be penalized fairly heavily and the gauntlet should be reset

TODO: describe that the agent first gets to explore each enemy in the gauntlet 100 times. This is to prevent the agent from learning a lot from the first enemy and then not learning anything from the rest of the enemies.

## Trainers setup

TODO: describe that the trainer class will be used to house data for the agents opponent at a given point in the gauntlet. The agent itself will also be an instance of the trainer class to house its own data.

In [7]:
class Trainer():
    def __init__(
        self,
        uid: int
    ):
        self.uid = uid
        self.name: str = ""
        self.party: ConstantTeambuilder = None
        # TODO add items to bag of trainer
        self.__parse_trainer_data__()

    def yield_team(self) -> str:
        return self.party.yield_team()

    def __parse_trainer_data__(self):
        data = trn[self.uid]
        team = [
            TeambuilderPokemon(
                nickname = p['species'].title(),
                species = p['species'],
                item = p['held_item'],
                ability = p['ability'],
                nature = p['nature'],
                moves = p['moves'],
                level= p['level'],
                evs = [ 1, 0, 0, 0, 0, 0 ],
            )
            for p in data['pokemon']
        ]
        self.party = ConstantTeambuilder(
            ConstantTeambuilder.join_team(team)
        )
        self.party._mons = team
        self.name = data['trainer_name']

## Encounter setup

TODO describe that this class will be used to detirmine what pokemon the agent encounters in the wild.

In [8]:
class Encounter():
    def __init__(
        self, 
        species: str, 
        method: str, 
        levels: Tuple[int, int], 
        morning: float, 
        day: float, 
        night: float
    ):
        """
        :param species: The name of the Pokémon species.
        :param method: The method of encountering (e.g., 'Grass', 'Fishing Old Rod').
        :param levels: A tuple representing the minimum and maximum level range.
        :param morning: Encounter rate during the morning as a percentage.
        :param day: Encounter rate during the day as a percentage.
        :param night: Encounter rate during the night as a percentage.
        """
        self.species = species
        self.method = method
        self.levels = levels
        self.morning = morning
        self.day = day
        self.night = night
        
    def __str__(self):
        return self.species
    
    #region Dict methods
    def __getitem__(self, key):
        return getattr(self, key)

    def __setitem__(self, key, value):
        setattr(self, key, value)

    def __delitem__(self, key):
        delattr(self, key)

    def __contains__(self, key):
        return hasattr(self, key)

    def keys(self):
        return self.__dict__.keys()

    def values(self):
        return self.__dict__.values()

    def items(self):
        return self.__dict__.items()
    #endregion

## Location setup

TODO descrbie that the location nodes will be used as a graph like structure to house the encounters and trainers that the agent will face. The graph wil be traversed in a predetermined order as to make life a bit easier, as agent traversal would be worthy of its own experiment.

In [9]:
class LocationNode:
    def __init__(self, name: str, trainers: List[int] = [], encounters: List[Encounter] = []):
        self.name = name
        self.trainers = [
            Trainer(trainer_id)
            for trainer_id in trainers
        ]
        self.encounters = encounters
        self.leads_to: List[LocationNode] = []
        self.visited: int = 0

    def add_connection(self, other_location: 'LocationNode'):
        self.leads_to.append(other_location)
        other_location.leads_to.append(self)

    def get_random_encounter(self, time: Literal['morning', 'day', 'night']) -> TeambuilderPokemon:
        weights = [ e[time] for e in self.encounters ]
        encounter = random.choices(self.encounters, weights)[0]
        dex_entry = get_dex_entry(encounter.species)
        level = random.randint(*encounter.levels)
        
        return TeambuilderPokemon(
            nickname=encounter.species,
            species=encounter.species,
            level=level,
            ability=random.choice(dex_entry['abilities']),
            ivs=get_random_ivs(),
            nature=get_random_nature().title(),
            moves=get_moves_from_dex_entry(dex_entry, 'L', level),
            gender=get_random_gender(dex_entry),
            happiness=0,
        )

In [10]:
twinleaf_town = LocationNode(
    name="Twinleaf Town",
    encounters=[
        # TODO implement checking of availability of items such as Old Rod or moves such as surf
        # Encounter("Psyduck", "surfing", (20, 30), .9, .9, .9),
        # Encounter("Golduck", "surfing", (20, 40), .1, .1, .1),
        # Encounter("Magikarp", "oldrod", (3, 15), 1., 1., 1.),
    ],
)

route_201 = LocationNode(
    name="Route 201",
    trainers=[852],
    encounters=[
        Encounter("Starly", "grass", (2, 3), .5, .5, .4),
        Encounter("Bidoof", "grass", (2, 4), .4, .5, .5),
        Encounter("Kricketot", "grass", (3, 3), .1, .0, .1),
    ],
)
route_201.add_connection(twinleaf_town)

lake_verity = LocationNode(
    name="Lake Verity",
    encounters=[
        Encounter("Starly", "grass", (2, 3), .5, .5, .4),
        Encounter("Bidoof", "grass", (2, 4), .5, .5, .6),
        # Encounter("Psyduck", "surfing", (20, 30), .9, .9, .9),
        # Encounter("Golduck", "surfing", (20, 40), .1, .1, .1),
        # Encounter("Magikarp", "oldrod", (3, 15), .1, .1, .1),
        # Encounter("Goldeen", "goodrod", (15, 20), .4, .4, .4),
        # Encounter("Seaking", "goodrod", (25, 30), .6, .6, .6),
    ],
)
lake_verity.add_connection(route_201)

sandgem_town = LocationNode(
    name="Sandgem Town",
)
sandgem_town.add_connection(route_201)

route_202 = LocationNode(
    name="Route 202",
    trainers=[1, 2, 3],
    encounters=[
        Encounter("Starly", "grass", (2, 4), .2, .2, .1),
        Encounter("Bidoof", "grass", (2, 4), .4, .5, .5),
        Encounter("Kricketot", "grass", (3, 3), .1, .0, .0),
        Encounter("Kricketot", "grass", (4, 4), .0, .0, .1),
        Encounter("Shinx", "grass", (3, 4), .3, .3, .3),
    ],
)
route_202.add_connection(sandgem_town)

jubilife_city = LocationNode(
    name="Jubilife City",
)
jubilife_city.add_connection(route_202)

route_204 = LocationNode(
    name="Route 204",
    trainers=[10, 11, 12],
    encounters=[
        Encounter("Zubat", "grass", (3, 3), .0, .0, .1),
        Encounter("Wurmple", "grass", (4, 4), .1, .1, .0),
        Encounter("Starly", "grass", (4, 6), .25, .25, .25),
        Encounter("Bidoof", "grass", (4, 6), .25, .25, .25),
        Encounter("Kricketot", "grass", (3, 3), .1, .0, .0),
        Encounter("Kricketot", "grass", (4, 4), .0, .0, .1),
        Encounter("Shinx", "grass", (4, 5), .15, .15, .15),
        Encounter("Budew", "grass", (3, 5), .0, .25, .0),
        Encounter("Budew", "grass", (4, 5), .15, .0, .15),
    ],
)
route_204.add_connection(jubilife_city)

route_203 = LocationNode(
    name="Route 203",
    trainers=[249, 4, 322, 323, 355, 356],
    encounters=[
        Encounter("Zubat", "grass", (4, 4), .0, .0, .1),
        Encounter("Abra", "grass", (4, 5), .15, .15, .15),
        Encounter("Starly", "grass", (6, 7), .25, .25, .25),
        Encounter("Starly", "grass", (4, 4), .25, .25, .25),
        Encounter("Starly", "grass", (5, 5), .1, .1, .0),
        Encounter("Bidoof", "grass", (5, 7), .15, .15, .15),
        Encounter("Bidoof", "grass", (4, 4), .0, .1, .0),
        Encounter("Kricketot", "grass", (4, 4), .1, .0, .0),
        Encounter("Kricketot", "grass", (5, 5), .0, .0, .1),
        Encounter("Shinx", "grass", (4, 5), .25, .25, .25),
    ],
)
route_203.add_connection(jubilife_city)

oreburgh_gate = LocationNode(
    name="Oreburgh Gate",
    trainers=[265, 329],
    encounters=[
        Encounter("Zubat", "cave", (5, 8), .5, .5, .5),
        Encounter("Psyduck", "surfing", (5, 7), .35, .35, .35),
        Encounter("Geodude", "cave", (5, 7), .15, .15, .15),
    ],
)
oreburgh_gate.add_connection(route_203)

oreburgh_city = LocationNode(
    name="Oreburgh City",
    encounters=[
        # TODO add the trade encounter (abra for machop)
    ]
)
oreburgh_city.add_connection(oreburgh_gate)

oreburgh_mine = LocationNode(
    name="Oreburgh Mine",
    encounters=[
        Encounter("Zubat", "cave", (5, 7), .25, .25, .25),
        Encounter("Geodude", "cave", (4, 8), .65, .65, .65),
        Encounter("Onix", "cave", (7, 9), .1, .1, .1),
    ],
)
oreburgh_mine.add_connection(oreburgh_city)

route_207 = LocationNode(
    name="Route 207",
    encounters=[
        Encounter("Zubat", "grass", (5, 5), .0, .0, .1),
        Encounter("Machop", "grass", (5, 8), .0, .45, .0),
        Encounter("Machop", "grass", (6, 8), .35, .0, .35),
        Encounter("Geodude", "grass", (5, 7), .3, .3, .3),
        Encounter("Ponyta", "grass", (5, 7), .25, .25, .15),
        Encounter("Kricketot", "grass", (5, 5), .1, .0, .0),
        Encounter("Kricketot", "grass", (6, 6), .0, .0, .1),
    ],
)
route_207.add_connection(oreburgh_city)

oreburgh_gym = LocationNode(
    name="Oreburgh Gym",
    trainers=[244, 245, 246],
)
oreburgh_gym.add_connection(oreburgh_city)

In [11]:
all_locations = {
    "Twinleaf Town": twinleaf_town,
    "Route 201": route_201,
    "Lake Verity": lake_verity,
    "Sandgem Town": sandgem_town,
    "Route 202": route_202,
    "Jubilife City": jubilife_city,
    "Route 204": route_204,
    "Route 203": route_203,
    "Oreburgh Gate": oreburgh_gate,
    "Oreburgh City": oreburgh_city,
    "Oreburgh Mine": oreburgh_mine,
    "Route 207": route_207,
    "Oreburgh Gym": oreburgh_gym,
}

In [12]:
def get_location_by_name(name: str) -> LocationNode:
    global all_locations
    return all_locations[name]

## Gauntlet setup

In [13]:
class Gauntlet():
    def __init__(self, locations: List[str]):
        self.locations = {
            i: get_location_by_name(name)
            for i, name in enumerate(locations)
        }
        self._location_idx = 0
        self._trainer_idx = 0

    def get_encounter_at(
        self, 
        time: Literal['morning', 'day', 'night'] = 'day', 
        location: int | None = None
    ) -> TeambuilderPokemon | None:
        if location is None:
            location = self._location_idx
        
        if location >= len(self.locations):
            return None

        return self.locations[location].get_random_encounter(time)
    
    def get_trainer_at(
        self, 
        location: int | None = None,
        trainer: int | None = None
    ) -> Trainer | None:
        if location is None:
            location = self._location_idx
        if trainer is None:
            trainer = self._trainer_idx

        if location >= len(self.locations):
            return None
        if trainer >= len(self.locations[location].trainers):
            return None

        return self.locations[location].trainers[self._trainer_idx]

    def move_to_next_location(self) -> LocationNode | None:
        self._location_idx += 1
        self._trainer_idx = 0

        if self._location_idx >= len(self.locations):
            self._location_idx = len(self.locations) - 1
            return None
        
        return self.locations[self._location_idx]

## Box setup

In [14]:
class BoxEntry():
    def __init__(
        self, 
        pokemon: TeambuilderPokemon, 
        has_fainted: bool = False
    ):
        self.pokemon = pokemon
        self.has_fainted = has_fainted

In [15]:
class PokemonBox():
    def __init__(self):
        self.pokemons: List[BoxEntry] = []

    def add_to_box(self, entry: BoxEntry):
        self.pokemons.append(entry)

    def exists_in_box(self, species: str) -> bool:
        # TODO implement checking of evolution line
        return any([
            p.pokemon.species == species
            for p in self.pokemons
        ])

## Advesary setup

TODO find a way for the agents opponent to behave in a similair fasion as the AI would in game.

## Environment Setup

Please see the [defining a proper battle observation space](./defining_a_proper_battle_observation_space.ipynb) notebook for a detailed explanation about the observation space.

In [16]:
import numpy as np
from gymnasium.spaces import Space, Box

from poke_env.data import GenData
from poke_env.player import Gen4EnvSinglePlayer
from poke_env.environment.battle import AbstractBattle
from poke_env.environment.move import Move, EmptyMove
from poke_env.environment.pokemon_type import PokemonType
from poke_env.environment.move_category import MoveCategory
from poke_env.ps_client.account_configuration import AccountConfiguration
import uuid

```py
    # ...
    def __init__(
        self,
        opponent: Optional[Union[Player, str]],
        account_configuration: Optional[AccountConfiguration] = None,
        *,
        avatar: Optional[int] = None,
        battle_format: Optional[str] = None,
        log_level: Optional[int] = None,
        save_replays: Union[bool, str] = False,
        server_configuration: Optional[ServerConfiguration] = None,
        start_listening: bool = True,
        accept_open_team_sheet: Optional[bool] = False,
        start_timer_on_battle_start: bool = False,
        ping_interval: Optional[float] = 20.0,
        ping_timeout: Optional[float] = 20.0,
        team: Optional[Union[str, Teambuilder]] = None,
        start_challenging: bool = True,
    ):
```

In [17]:
def get_unique_account():
    username = str(uuid.uuid4()).replace('-', '')[0:18]
    print(f'Generated username: {username} ({len(username)})')

    return AccountConfiguration(
        username=username,
        password='some-very-secure-password'
    )

In [18]:
# Accounts created onces and used for all battles
# agent_account = get_unique_account()
# openent_account = get_unique_account()

agent_account = AccountConfiguration(
    username='9d0496ec535d481e9d',
    password='some-very-secure-password'
)
openent_account = AccountConfiguration(
    username='52d55c0841574db9bf',
    password='some-very-secure-password'
)

In [19]:
from poke_env.ps_client.server_configuration import ServerConfiguration

server_config = ServerConfiguration(
    "ws://localhost:8000/showdown/websocket",
    "https://play.pokemonshowdown.com/action.php?",
)

In [20]:
class Gen4GauntletPlayer(Gen4EnvSinglePlayer):
    def __init__(
        self,
        locations: List[str],
        starter_dist: Tuple[float, float, float] = (.0, 1., .0)
    ):
        self._locations = locations
        self._starter_dist = starter_dist
        self._starter_species = ['turtwig', 'chimchar', 'piplup']

        # First setup all the custom attributes
        self.box = PokemonBox()
        self.gauntlet = Gauntlet(self._locations)
        self.agent_trainer = self._get_fresh_agent_trainer()

        # From the custom attributes, we can setup the opponent
        self._format = 'gen4anythinggoes'
        opponent = RandomPlayer(
            battle_format=self._format,
            account_configuration=openent_account,
            team=self.gauntlet.get_trainer_at().party
        )

        # Only now we can pass the data to the parent class
        super().__init__(
            battle_format=self._format,
            account_configuration=agent_account,
            team=self.agent_trainer.party,
            opponent=opponent,
            start_challenging=True,
            server_configuration=server_config
        )
        

    #region Helper methods for updating the gauntlet
    def _next_encounter_to_box(self):
        encounter = self.gauntlet.get_encounter_at()
        if encounter != None:
            self.box.add_to_box(BoxEntry(encounter))

    def _next_gauntlet_challange(self):
        self.gauntlet.move_to_next_location()
        opponent = self.get_opponent()
        opponent.update_team(self.gauntlet.get_trainer_at().party)
        self.set_opponent(opponent)
    #endregion


    #region Helper methods for resetting the environment
    def _get_random_starter(self) -> str:
        return random.choices(self._starter_species, self._starter_dist)[0]

    def _get_fresh_agent_trainer(self) -> Trainer:
        starter_species = self._get_random_starter()
        starter_dex_entry = get_dex_entry(starter_species)
        starter = TeambuilderPokemon(
            nickname=starter_species.title(),
            species=starter_species,
            level=5,
            gender=get_random_gender(starter_dex_entry),
            ability=starter_dex_entry['abilities'][0],
            moves=[ 'scratch', 'leer' ],
            ivs=get_random_ivs(),
            evs=[1, 0, 0, 0, 0, 0],
            nature=get_random_nature().title(),
            happiness=0,
        )
        starter_team = ConstantTeambuilder(str(starter))
        starter_team._mons = [starter]

        if not self.box.exists_in_box(starter_species):
            self.box.add_to_box(BoxEntry(starter))

        agent_trainer = Trainer(0)
        agent_trainer.name = 'Lucas'
        agent_trainer.party = starter_team
        return agent_trainer
    #endregion


    def choose_move(self, battle: AbstractBattle):
        return 0

    def calc_reward(self, last_battle: AbstractBattle, current_battle: AbstractBattle) -> float:
        return 0
    
    def embed_battle(self, battle: AbstractBattle):
        # Party observations
        observations = []

        t1_pokemons = list(battle.team.values())
        t2_pokemons = list(battle.opponent_team.values())
        
        observations.append(t1_pokemons[0].current_hp_fraction)
        observations.append(t2_pokemons[0].current_hp_fraction)

        return np.float32(observations)

    def describe_embedding(self) -> Space:
        low = np.array([0, 0], dtype=np.float32)
        high = np.array([100, 100], dtype=np.float32)
        return Box(low=low, high=high, dtype=np.float32)

In [21]:
gauntlet = [
    # "Twinleaf Town",
    "Route 201",
    # "Sandgem Town",
    "Lake Verity",
    "Route 202",
    # "Jubilife City",
    "Route 204",
    "Route 203",
    "Oreburgh Gate",
    # "Oreburgh City",
    "Oreburgh Mine",
    "Route 207",
    "Oreburgh Gym",
]

In [22]:
env = Gen4GauntletPlayer(
    locations=gauntlet,
)

In [23]:
# Reset the environment to start a fresh battle
state = env.reset()

# Run a single battle loop
terminated = False
rewards = []
states = [state]
while not terminated:
    # Get a random action from the action space
    action = env.action_space.sample()

    # Step through the environment using the chosen action
    next_state, reward, terminated, truncated, info = env.step(action)
    
    rewards.append(reward)
    states.append(next_state)

print(f"Rewards: {rewards}")
print(f"States: {states}")

Rewards: [0, 0, 0, 0, 0, 0]
States: [(array([1., 1.], dtype=float32), {}), array([0.8, 1. ], dtype=float32), array([0.6, 1. ], dtype=float32), array([0.4 , 0.67], dtype=float32), array([0.4 , 0.67], dtype=float32), array([0.2 , 0.43], dtype=float32), array([0.  , 0.43], dtype=float32)]


## Model setup

In [24]:
tensorboard_dir = os.path.abspath('./pokemon_showdown_agent')
if not os.path.exists(tensorboard_dir):
    os.makedirs(tensorboard_dir)

In [25]:
from stable_baselines3 import DQN

In [26]:
# model = DQN(
#     'MlpPolicy',
#     env,
#     verbose=1,
#     tensorboard_log=tensorboard_dir,
#     exploration_fraction=0.5
# )

In [27]:
# model.learn(
#     total_timesteps=10000,
#     tb_log_name='dqn_starter_battle',
# )

In [36]:
pokemon.describe()