# Modeling Gameplay Decisions in Pokémon Platinum using Markovion Descision Processes

TODO: add title page elements

## Abstract

TODO: write an abstract

## Inspiration for this Study  

This study was undertaken as part of my decision theory course. Upon learning that the course would require a combination of applied and theoretical research within the context of decision theory, I was immediately drawn to the idea of creating a Markov decision process capable of playing *Pokémon Platinum*.  

The inspiration for this idea came from a video by Keeyan Ghoreshi ([link](https://www.youtube.com/watch?v=jNMWkD5VsZ8)), in which the creator analyzes how the game generates random numbers to predict every possible outcome among the 4,294,967,296 potential ways the game could unfold. By doing so, Ghoreshi identifies a deterministic sequence of inputs that will always beat the game. This approach, however, relies heavily on an in-depth understanding of the game's internal mechanics.  

In contrast, my approach will differ significantly by focusing on building a Markov decision process that does not require extensive reverse engineering of the game’s random number generation. Instead, it will aim to make decisions based on probabilistic modeling of state transitions and outcomes.

## A More Probabilistic Approach  

For each segment of the game, a probabilistic method will be employed to attempt to progress through it successfully. In the inspiration video, a combination of statistics and basic probability is used to identify outliers, such as the longest possible duration of a battle or the maximum number of encounters in a specific grass patch.  

My approach will lean more toward reinforcement learning principles, primarily utilizing Markov decision processes (MDPs). I find MDPs particularly compelling and plan to use them extensively to model and solve decision-making problems within the game. However, my methodology will remain flexible—if I encounter alternative approaches during the course of my research that appear promising, I will consider and potentially incorporate them into my analysis.

### On Model Generalization  

Each model designed troughout this study will be made to be as general-purpose as possible. For instance, a single model will handle overworld exploration rather than creating separate models for specific areas or tasks. This approach is intended to maximize efficiency and general applicability within the game's framework. 

However, if creating a broadly generalized model for broader task completion proves impractical within the given timeframe, I will adopt a "divide and conquer" strategy. In this case, larger problems will be divided into smaller, more manageable subproblems, each addressed by its own specialized model. This fallback approach ensures progress remains achievable while maintaining focus on the overall objective.

### Experimental approach

TODO: describe the use of experiments, of which a summery of what is done + their results will be added to this document. Experiments will be left intact for people to read into if desireable.

## The Goal of This Study  

The primary goal of this study is to **create a probabilistic model that learns to play Pokémon Platinum**. This involves leveraging Markov decision processes or similar probabilistic frameworks to model and solve various challenges encountered in the game.  

### Minimal Viable Product (MVP)  
The minimum objective is to develop a model capable of learning how to engage in battles and achieve victory. The precise definition of "winning" in this context will be refined as the study progresses, incorporating factors such as efficiency, success rate, and adaptability.  

### Extended Goals  
If time permits, the study will also explore the development of an exploratory model designed to navigate the overworld. This model would aim to traverse the game world efficiently and interact with various game elements, providing a more comprehensive framework for automated gameplay.  

By balancing these objectives, the study seeks to demonstrate the feasibility and effectiveness of probabilistic models in addressing the complex decision-making tasks inherent to Pokémon Platinum.

### Gradually Increasing the Model's Goal Difficulty  

The difficulty of the model's objectives will be increased incrementally as the agent demonstrates successful training and proficiency at simpler tasks. This step-by-step approach ensures that the model builds a solid foundation of skills before tackling more complex challenges.  

For example, initial tasks may focus on basic decision-making, such as selecting effective moves during battles or navigating straightforward overworld paths. Once the model achieves consistent success in these areas, the complexity of the tasks will gradually increase to include:  
- Optimizing strategies for tougher battles, such as those involving type advantages or multi-stage opponents.  
- Navigating more intricate overworld segments with branching paths or hazards.  
- Managing resources like healing items or determining when to retreat to a Pokémon Center.  

This progressive approach allows the model to adapt and improve over time, ensuring it is equipped to handle increasingly difficult scenarios. Additionally, this structure provides clear milestones for measuring the model’s success throughout its development.  

## Research Questions  

To complement the goal of creating a probabilistic model that plays Pokémon Platinum, I devised the following research questions to guide me toward a viable solution:  

1. **What are popular (both old and new) reinforcement learning approaches?**  
   - How do these approaches compare in terms of their strengths and weaknesses?  
   - Are there supporting methods, such as statistical techniques or others, that could enhance the effectiveness of reinforcement learning?  

2. **How can Markovian decision processes be implemented in Python?**  
   - What tools, libraries, or frameworks are available to assist in programming Markovian decision processes?  

3. **How can we efficiently simulate Pokémon Platinum?**  
   - What does "efficiency" entail in this context, particularly regarding time and computational complexity (and potentially space complexity)?  
   - What existing software products or tools are available for this purpose, and how can they be utilized or adapted?  

These questions aim to provide a structured framework for addressing the theoretical and practical challenges involved in creating a probabilistic model for gameplay. They span foundational research, technical implementation, and performance optimization, ensuring a comprehensive approach to the study.

## On source collection

TODO: Describe that this study relly's heavilly on community driven documentation when it comes to research about the pokémon game. These sources will be cross refrenced as to verify their truthfullness

FEEDBACK: hierbij is het vooral belangrijk dat je academische bronnen voor DETH wiskunde gberuikt, pokemon bronnen neem ik voor lief

## Domain understanding

TODO: write a short paragraph explaining what pokémon is by giving a brief summery of its history and then explaining where this research links to pokémon. 

### Manga, Anime and (card) Game(s)

TODO: write how pokémon developed over time from a manga to whatever

### About the games

TODO: write how the traditional games developed, not going into detail about spin off games, starting at pokémon red and green up to black and white 2 (it continues but I have no experience with these games, and going into their details is not of added value to this domain understanding)

### A quick intro to the Gen. 4 games

TODO: write that from here on the domain understanding is tailered specificly towards generation 4 as thats what this study is tailered towards.

TODO: write a short intro explaining that starting the game, go trough some dialouge, choose a pokémon after which you have your first battle. Also describe that details about battleing will be done in the definitions chapter.

### A perfect information game

TODO: write that everything about the game is extensivly documented online. paired with that fact, as long as a person has read access to the games state (stored in memory), they can retrieve information not being displayed on screen (like hidden pokémon stats), making pokémon a perfect information game. This knowledge will at first not be utilized during training, as we want the agents to behave preferably like a normal human, which would not have this perfect infromation before hand. But we will not be excluding this knowledge entierly, as it might proof usefull.

### What is a Pokémon?

When stating the word pokémon, people will often times have a intuitive definition of what that means. Usually people will think of their favorite or the most commonly known pokémen, which is often regarded as being Pickachu.

TODO insert image of pickachu

*Figure 1: an image of pickachu, often regarded as the most comonly known pokémon as its the mascot of the company*

A more formal, mathimatical definition of what a pokémon is will be required to perform studies on them. This definition will be provided later in this study.

## Some definitions before continuing

Troughout this research, I will be refering to specific words of which their definitions can be found in this chapter.

### Markov Decision Processes

> A **Markov decision process** is a 4-tuple $(S, A, P_a, R_a)$, where:
> 
> - $( S )$ is a set of states called the **state space**. The state space may be discrete or continuous, like the set of real numbers.
> - $( A )$ is a set of actions called the **action space** (alternatively, $A_s$ is the set of actions available from state $s$). As for state, this set may be discrete or > continuous.
> - $P_a(s, s')$ is, on an intuitive level, the **probability** that action $a$ in state $s$ at time $t$ will lead to state $s'$ at time $t+1$. 
> - A policy function $\pi$ is a (potentially probabilistic) mapping from state space $( S )$ to action space $( A )$. 
<!-- In general, this probability > transition is defined to satisfy:
> 
>   $$
>   \Pr(s_{t+1} \in S' \mid s_t = s, a_t = a) = \int_{S'} P_a(s, s') \, ds',
>   $$
> 
>   for every $S' \subseteq S$ measurable. In case the state space is discrete, the integral is intended with respect to the counting measure, simplifying as
> 
>   $$
>   P_a(s, s') = \Pr(s_{t+1} = s' \mid s_t = s, a_t = a).
>   $$
> 
>   If $S \subseteq \mathbb{R}^d$, the integral is usually taken with respect to the Lebesgue measure.
> 
> - $R_a(s, s')$ is the **immediate reward** (or expected immediate reward) received after transitioning from state $s$ to state $s'$, due to action $a$.
> -->

Reference: [Wikipedia's definition of a Markov Decision Process](https://web.archive.org/web/20241114035948/https://en.wikipedia.org/wiki/Markov_decision_process). Definition matches AP-DETH course slides.

FEEDBACK: waarom gebruik je het boek niet?

### Strings

> A string is a sequence of characters.
> - TODO: find a formal mathematical definition of a string (list comprehension?)
>   - TODO: ensure the definition includes logical equivelance ($\equiv$) for strings
> - $labelencode$ is a function on any set of strings $s$, such that:
>   - $labelencode(s) \implies \{ x_1, x_2, ..., x_n \}$
>   - $n = |s|$
>   - $x \in \mathbb{N}$

### Function on tuple with string elements to ordered set

> Their exists a function $f(x)$ where $x$ is a tuple, which results in an ordered set of strings $y$ where:
> - $|y| = |\{ i: x_i \equiv string \}|$

### Pokémon game definition's

The game itself, paired with online community driven sources, provide all the necesarry information to define pokémon's in a mathimatical fashion.

#### Types

<!-- Reasoning for chosing a tuple and not a set is that later on, being able to refrence types by numbers becomes important during programming. To refrence a type by a number means we order is important, thus the choice for a tuple over a set -->
> **Types** $T$ is a constant 18-tuple containing the following strings
> - $T_{0}$  = none
> - $T_{1}$  = normal
> - $T_{2}$  = fire
> - $T_{3}$  = water
> - $T_{4}$  = electric
> - $T_{5}$  = grass
> - $T_{6}$  = ice
> - $T_{7}$  = fighting
> - $T_{8}$  = poison
> - $T_{9}$  = ground
> - $T_{10}$ = flying
> - $T_{11}$ = psychich
> - $T_{12}$ = bug
> - $T_{13}$ = rock
> - $T_{14}$ = ghost
> - $T_{15}$ = dragon
> - $T_{16}$ = dark
> - $T_{17}$ = steel

#### Moves

> A singular **move** $m$ is a 7-tuple $(name, type, category, power, accuracy, pp, effect)$, where:
> - $name$ is the name of the move as a string.
> - $type \in T$ is the type of the move.
> - $category \in \{ physical, special, status \}$ is the category of the move.
> - $power \in \mathbb{N}$ is the power of the move.
> - $accuracy \in \mathbb{N}$ is the accuracy of the move.
> - $pp \in \mathbb{N}$ is the power points of the move.
> - $effect$ is the effect of the move as a string.
>
> The set of moves $\mathbb{M}$ in pokemon platinum is constant and known beforehand. A list of all moves can be found at [Psypokes](http://www.psypokes.com/dpphgss/attacks.php).
>
> TODO: write their is an empthy move in M

#### Abilities

> **Abilities** $\mathbb{A}$ is a set of all abilities. A list of all abilities can be found at [Psypokes](http://www.psypokes.com/dpphgss/abilities.php).
>
> TODO: research if abilities are constant or if they can change during the game  
> TODO: research different types of abilities and define functions that can be used to determine if an ability is of a certain type

#### Items

> **Items** $\mathbb{I}$ is a set of all items available in pokemon platinum. A list of all items can be found at [Serebii.net](https://www.serebii.net/platinum/items.shtml).
> 
> - item $x \in \mathbb{I}$ is a 3-tuple $(name, function, type)$, where:
>   - $name$ is the name of the item as a string.
>   - $function$ is the function $f$ of item $x$ such that when used, $f$ either does or does not results in a state change.
>       - The item $type$ dictates the resulting change in state.
>   - $type$ is the type of the item.
>     - $type \in \{ healing \}$ dictates the change in state of item $x$
>     - 

FEEDBACK:
funtion ( name ) -> function that changes state <- map in python, this is enough, you do not need the type

#### Status Effects

> Status effects $\mathbb{E}$ is a 5-tuple $( poisend, burned, paralyzed, asleep, frozen )$.
> 

#### Pokémon's

> A **pokémon** $p$ is a 14-tuple $(hp, atk, def, spatk, spdef, spd, ability, t_1, t_2, m_1, m_2, m_3, m_4, status)$, where:
> 
> - $hp, atk, def, spatk, spdef, spd \in \mathbb{N}$ are the pokémon's stats, those being: hit points, physical attack power, physical defensive power, special attack power, special defensive power and speed respectivly.
> - ability $x \in \mathbb{A}$
> - $t_1, t_2 \in T$ are the pokémon's type's. 
>   - $monotype$ is a function on $p$ that results in a boolean value.
>   - A pokemon $p$ is considered to be $monotype(p) \iff p_{t_2} = T_0$ and dual type otherwise.
>   - $partialtype$ is a function on $p$ that results in a boolean value.
>   - A pokemon $p$ is considered to be $partialtype(p, T_i) \iff p_{t_1} = T_i \lor p_{t_2} = T_i$.
> - $m_1, m_2, m_3, m_4 \in \mathbb{M}$ are the pokémon's moves.
> - $status \in \mathbb{E}$

#### Effectiveness
> Effectiveness of a move $m$ against a pokémon $p$ is a function $E(m, p) \in \{0, 0.5, 1, 2, 4\}$, where:
> - The value is dictated by the type chart $C$, which is a 18x18 matrix where $C_{i, j}$ is the effectiveness of type $T_i$ against type $T_j$.
>   - Here, $i$ is used for row indexing and $j$ for column indexing
> - The value of effectiveness is compared to both of $p$'s type's and added together.
> - For example:
>   - If $p_{t1} = grass$, $p_{t2} = ice$ and $m_t = fire$, then $E(m, p) = C_{m_T, p_{t1}} + C_{m_T, p_{t2}} = C_{2, 5} + C_{2, 6} = 2 + 2 = 4$ 
>
> ![generation 4 type chart](https://img.pokemondb.net/images/typechart-gen2345.png)
>
> *Figure N: a visual representation of the generation 4 type chart as provided by [pokemondb.net](https://pokemondb.net/type/old)*

## ...

TODO: find clever title for this chapter in which I define my state space, action space and reward function

### State space

...
TODO: describe that the state space will be setup based on the information available to a player whilst battleing

### Action space

...
TODO: describe that the action space for battleing agent will be derived from the fundemental action space (see notes)

## The battleing agent

TODO: work out timo's tip bellow in a proper paragraph 
for an agent to learn a strategy, it needs to be rewarded for a set of sequential moves (temperal difference learning), otherwise it wont learn a strategy but just the current best thing to do in the state. To win you need to perform a sequence of actions.

### Battleing Environment

TODO: describe the fact that the environment essentially has 2 states it can be in, one where the model can make choices and one where the battle is taking place and the choice of the model + the opponent are being played out

### ...

THIS BLOCK CONTAINS A REDUNTANT CHAPTER
<!-- ### Overworld Locations for Agent Traversal -->
<!-- TODO: write a code block that parcess this file, reads this block, parces the table bellow and for each row: create a markdown cell with the title based on the id and name of the environment + a python cell bellow with some default code -->
<!-- Environments cell GUID: 7f1582d2-2e3e-4526-b3a1-ff8cace214bb -->

<!-- TODO: descsribe the use of images from bulbapedia pared with the interactive map by William Sullivan to create matrixes representing the 'maze' that the agent needs to traverse in a given point in time -->

<!-- The table bellow contains ... TODO: describe what this table contains. -->
 
<!-- | ID   | Name        | Description |
|---   |---          |---          |
| 0001 | Player Room | Second floor of the players house. Simultainiously, its room the player starts in on a new game. | -->

## Sources

https://www.youtube.com/watch?v=EqzghgtJ1Is

https://www.youtube.com/watch?v=jNMWkD5VsZ8

https://web.archive.org/web/20241114035948/https://en.wikipedia.org/wiki/Markov_decision_process

https://web.archive.org/web/20241128050046/https://en.wikipedia.org/wiki/Pok%C3%A9mon_(video_game_series)

https://pkmnmap4.web.app/

http://web.archive.org/web/20241118075715/https://bulbapedia.bulbagarden.net/wiki/Category:Platinum_locations

http://web.archive.org/web/20241126192248/https://bulbapedia.bulbagarden.net/wiki/Damage

https://github.com/smogon/damage-calc (web version https://calc.pokemonshowdown.com/?gen=4)

https://stable-baselines.readthedocs.io/en/master/