# Modeling Gameplay Decisions in Pokémon Platinum using Markovion Descision Processes

TODO: add title page elements

## Abstract

TODO: write an abstract

## Inspiration for this Study  

This study was undertaken as part of my decision theory course. Upon learning that the course would require a combination of applied and theoretical research within the context of decision theory, I was immediately drawn to the idea of creating a Markov decision process capable of playing *Pokémon Platinum*.  

The inspiration for this idea came from a video by Keeyan Ghoreshi ([link](https://www.youtube.com/watch?v=jNMWkD5VsZ8)), in which the creator analyzes how the game generates random numbers to predict every possible outcome among the 4,294,967,296 potential ways the game could unfold. By doing so, Ghoreshi identifies a deterministic sequence of inputs that will always beat the game. This approach, however, relies heavily on an in-depth understanding of the game's internal mechanics.  

In contrast, my approach will differ significantly by focusing on building a Markov decision process that does not require extensive reverse engineering of the game’s random number generation. Instead, it will aim to make decisions based on probabilistic modeling of state transitions and outcomes.

## A More Probabilistic Approach  

For each segment of the game, a probabilistic method will be employed to attempt to progress through it successfully. In the inspiration video, a combination of statistics and basic probability is used to identify outliers, such as the longest possible duration of a battle or the maximum number of encounters in a specific grass patch.  

My approach will lean more toward reinforcement learning principles, primarily utilizing Markov decision processes (MDPs). I find MDPs particularly compelling and plan to use them extensively to model and solve decision-making problems within the game. However, my methodology will remain flexible—if I encounter alternative approaches during the course of my research that appear promising, I will consider and potentially incorporate them into my analysis.

## On Model Generalization  

Each model will be designed to be as general-purpose as possible. For instance, a single model will handle overworld exploration rather than creating separate models for specific areas or tasks. This approach is intended to maximize efficiency and general applicability within the game's framework.  

However, if creating a broadly generalized model for broader task completion proves impractical within the given timeframe, I will adopt a "divide and conquer" strategy. In this case, larger problems will be divided into smaller, more manageable subproblems, each addressed by its own specialized model. This fallback approach ensures progress remains achievable while maintaining focus on the overall objective.

## The Goal of This Study  

The primary goal of this study is to **create a probabilistic model that learns to play Pokémon Platinum**. This involves leveraging Markov decision processes or similar probabilistic frameworks to model and solve various challenges encountered in the game.  

### Minimal Viable Product (MVP)  
The minimum objective is to develop a model capable of learning how to engage in battles and achieve victory. The precise definition of "winning" in this context will be refined as the study progresses, incorporating factors such as efficiency, success rate, and adaptability.  

### Extended Goals  
If time permits, the study will also explore the development of an exploratory model designed to navigate the overworld. This model would aim to traverse the game world efficiently and interact with various game elements, providing a more comprehensive framework for automated gameplay.  

By balancing these objectives, the study seeks to demonstrate the feasibility and effectiveness of probabilistic models in addressing the complex decision-making tasks inherent to Pokémon Platinum.

## Gradually Increasing the Model's Goal Difficulty  

The difficulty of the model's objectives will be increased incrementally as the agent demonstrates successful training and proficiency at simpler tasks. This step-by-step approach ensures that the model builds a solid foundation of skills before tackling more complex challenges.  

For example, initial tasks may focus on basic decision-making, such as selecting effective moves during battles or navigating straightforward overworld paths. Once the model achieves consistent success in these areas, the complexity of the tasks will gradually increase to include:  
- Optimizing strategies for tougher battles, such as those involving type advantages or multi-stage opponents.  
- Navigating more intricate overworld segments with branching paths or hazards.  
- Managing resources like healing items or determining when to retreat to a Pokémon Center.  

This progressive approach allows the model to adapt and improve over time, ensuring it is equipped to handle increasingly difficult scenarios. Additionally, this structure provides clear milestones for measuring the model’s success throughout its development.  

## Research Questions  

To complement the goal of creating a probabilistic model that plays Pokémon Platinum, I devised the following research questions to guide me toward a viable solution:  

1. **What are popular (both old and new) reinforcement learning approaches?**  
   - How do these approaches compare in terms of their strengths and weaknesses?  
   - Are there supporting methods, such as statistical techniques or others, that could enhance the effectiveness of reinforcement learning?  

2. **How can Markovian decision processes be implemented in Python?**  
   - What tools, libraries, or frameworks are available to assist in programming Markovian decision processes?  

3. **How can we efficiently simulate Pokémon Platinum?**  
   - What does "efficiency" entail in this context, particularly regarding time and computational complexity (and potentially space complexity)?  
   - What existing software products or tools are available for this purpose, and how can they be utilized or adapted?  

These questions aim to provide a structured framework for addressing the theoretical and practical challenges involved in creating a probabilistic model for gameplay. They span foundational research, technical implementation, and performance optimization, ensuring a comprehensive approach to the study.

## Standardizing environments

TODO: descsribe the use of images from bulbapedia pared with the interactive map by William Sullivan to create matrixes representing the 'maze' that the agent needs to traverse in a given point in time

### Environments Setup Automization
<!-- Environments cell GUID: 7f1582d2-2e3e-4526-b3a1-ff8cace214bb -->

### Battle-ing Environment 

TODO: describe the standardization of battleing navigation

<!-- TODO: write a code block that parcess this file, reads this block, parces the table bellow and for each row: create a markdown cell with the title based on the id and name of the environment + a python cell bellow with some default code -->
### Overworld Locations for Agent Traversal

The table bellow demonstrates 

| ID   | Name        | Description |
|---   |---          |---          |
| 0001 | Player Room | The very first room the player spawns in. Also the room that the player will be relocated to once they used mom as their last source for healing their pokemon |

## Defining the battle agent

TODO: work out timo's tip bellow in a proper paragraph
for an agent to learn a strategy, it needs to be rewarded for a set of sequential moves (temperal difference learning)

### Definition of a Markov Decision Process

> A **Markov decision process** is a 4-tuple $(S, A, P_a, R_a)$, where:
> 
> - $( S )$ is a set of states called the **state space**. The state space may be discrete or continuous, like the set of real numbers.
> - $( A )$ is a set of actions called the **action space** (alternatively, $A_s$ is the set of actions available from state $s$). As for state, this set may be discrete or > continuous.
> - $P_a(s, s')$ is, on an intuitive level, the **probability** that action $a$ in state $s$ at time $t$ will lead to state $s'$ at time $t+1$. In general, this probability > transition is defined to satisfy:
> 
>   $$
>   \Pr(s_{t+1} \in S' \mid s_t = s, a_t = a) = \int_{S'} P_a(s, s') \, ds',
>   $$
> 
>   for every $S' \subseteq S$ measurable. In case the state space is discrete, the integral is intended with respect to the counting measure, simplifying as
> 
>   $$
>   P_a(s, s') = \Pr(s_{t+1} = s' \mid s_t = s, a_t = a).
>   $$
> 
>   If $S \subseteq \mathbb{R}^d$, the integral is usually taken with respect to the Lebesgue measure.
> 
> - $R_a(s, s')$ is the **immediate reward** (or expected immediate reward) received after transitioning from state $s$ to state $s'$, due to action $a$.
> 
> A policy function $\pi$ is a (potentially probabilistic) mapping from state space $( S )$ to action space $( A )$. 

Reference: [Wikipedia's definition of a Markov Decision Process](https://web.archive.org/web/20241114035948/https://en.wikipedia.org/wiki/Markov_decision_process).


## Sources

https://www.youtube.com/watch?v=jNMWkD5VsZ8

https://web.archive.org/web/20241114035948/https://en.wikipedia.org/wiki/Markov_decision_process

https://pkmnmap4.web.app/

https://bulbapedia.bulbagarden.net/wiki/Category:Platinum_locations