# $$The\;Pac$$

## Names

- Edwin Ruiz
- Bradley Grace

## Abstract

Our project aims to develop an intelligent agent that navigates the Atari game Pac-Man efficiently by making smart decisions using OpenAI Gym as the simulation platform. Our primary dataset will be the game configurations or states which includes all current in-game elements like Pac-Man’s position, the ghosts, and the dots. The intelligent agent will leverage reinforcement learning algorithms to improve its decision-making and game strategies continuously. We will assess the agent's effectiveness by monitoring its game scores, level completion rates, and its efficiency in preserving lives during gameplay.

## Background

As we know, Pac-Man is a widely known and it is on the top list of the most iconic arcade games. It is also particularly popular in the field of computational research like artificial intelligence. This is because the dynamic constraints of the environment make it an excellent research tool for evaluating reinforcement learning algorithms such as Deep Q-Networks (DQNs) and Q-learning. Both of which are good in navigating environments like Pac-Man states because they are able to balance the exploration of new strategies by making use of known paths, according to the environmental feedback received $^{[1][2][3]}$. In this way this game is a great educational tool for students looking to explore and experiment with diffent AI methods.

## Problem Statement

Our goal is to develop a reinforcement learning model that is able to efficiently navigate the complexity in decision-making to win in the game of Pac-Man. Furthermore, we want to be able to enhance the agent's ability to make real-time decisiions (not just learn to play the game). All of which is quantifiable by the game's scoring system, measurable through the agent's performance metrics, and reproducible through OpenAI Gym's simulation interface. 

## Data

Again, our data will be the live state generated by the OpenAI Gym environment for the Atari-Pac-Man game. The data source/environment can be found here https://gymnasium.farama.org/environments/atari/pacman/ . Each state will be captured as a pixel array that visually represents Pac-Man’s current position, the ghosts, and the remaining dots. Key variables are the pixel data, the current game score, and the number of lives left. We anticipate some preprocessing of these pixel arrays to optimize them for model training.

## Proposed Solution

We plan to use Deep Q-Networks (DQNs) to train the Pac-Man agent. The agent model will be trained iteratively by giving the agent rewards based on improvements of the game score and penalties for losing lives. Although, not sure about the specific code yet, we will implement the training process using Python and TensorFlow. In addition, we might consider using a simple Q-learning agent to evaluate the improvements done by the DQNs. 

## Evaluation Metrics

- **Average Score per Game for performance** : $\frac{\sum \text{Total Scores}}{\text{Number of Games Played}}$
- **Levels Completed for progress** : simple count (1, 2,....n)
- **Survival Time for strategy effectiveness** : Time measured in seconds (start to losing a life)

## Ethics & Privacy

**A. Data Collection**
- [x] A.1 Limit PII exposure: Our project involves no personal data as it utilizes publicly available simulations, via OpenAI Gym. 

**B. Data Storage**
- [x] B.1 Data security: We will securely store training data and models on GitHub, which will be made public after the project's completion.

**C. Analysis**
- [x] C.1 Dataset bias: We will monitor for possible biases from the agent for discovering possible game flaws.
- [x] C.2 Honest representation: We will present all results and performance metrics truthfully and accurately.
- [x] C.3 Auditability: We will assure reproducibility through complete documentation.

**D. Modeling**
- [x] D.1 Explainability: All strategic decisions by the agent will be thoroughly explained.
- [x] D.2 Communicate limitations: We will clearly document any model limitations.

**E. Deployment**
- [x] E.1 Monitoring and evaluation: The model will be regularly monitored and updated.






# Results

Our project focused on developing an intelligent agent to navigate the Atari game Pac-Man efficiently using reinforcement learning algorithms. Throughout our research and development process, we gathered various metrics to evaluate the performance of different models. This section will present our findings, highlighting the primary and secondary points derived from our training data.

## Performance of Different Models


### Subsection 1: Comparison of Models

We evaluated several models using different reinforcement learning strategies. Below is a summary of the convergence rate, stability, and training time for each model:

#### **1.1 DQN Models**

DQN: 
- **Convergence Rate:** 3
- **Stability:** 32.27
- **Training Time:** 38579.28 seconds

DQN Exploitation: 
- **Convergence Rate:** 162
- **Stability:** 39.34
- **Training Time:** 35380.35 seconds

DQN Exploration: 
- **Convergence Rate:** 36
- **Stability:** 52.72
- **Training Time:** 20317.86 seconds

DQN Hyperparameters: 
- **Convergence Rate:** 11
- **Stability:** 42.19
- **Training Time:** 31307.63 seconds

#### **1.2 Double DQN**

DoubleDQN: 
- **Convergence Rate:** 65
- **Stability:** 57.18
- **Training Time:** 22409.99 seconds


#### **1.3 Q-Learning**

QLearning: 
- **Convergence Rate:** 60
- **Stability:** 28.44
- **Training Time:** 450.69 seconds



### Subsection 1: Model Convergence

The convergence rate of the models indicates how quickly each algorithm learned to navigate the Pac-Man environment. Notably, the DQN Exploitation model had the highest convergence rate (162), suggesting it learned the game strategies more rapidly than other models. However, the Q-Learning model also demonstrated a significant convergence rate (60) with the shortest training time (450.69 seconds), highlighting its efficiency.

### Subsection 2: Model Stability

Model stability is crucial for ensuring consistent performance during gameplay. The Double DQN model exhibited the highest stability (57.18), indicating reliable decision-making capabilities. In contrast, the DQN model had lower stability (32.27), suggesting more variability in performance.

### Subsection 3: Training Time

Training time is an essential factor for practical implementation. While DQN models generally required longer training times, the Q-Learning model stood out with the shortest training time of 450.69 seconds, making it highly efficient. Double DQN, despite its higher stability, required a moderate training time of 22409.99 seconds.

## Secondary Points: Insights from Different Models

### Subsection 4: Impact of Exploitation vs. Exploration

The DQN Exploitation and DQN Exploration models provided insights into the trade-offs between exploiting known strategies and exploring new ones. The DQN Exploitation model had a significantly higher convergence rate but required longer training time compared to DQN Exploration, which had higher stability but a lower convergence rate.

### Subsection 5: Hyperparameter Tuning

The DQN Hyperparameters model demonstrated the impact of hyperparameter tuning on performance. While it had a moderate convergence rate (11) and stability (42.19), it required a substantial training time (31307.63 seconds). This indicates the potential benefits and challenges of hyperparameter optimization.

### Subsection 6: Comparison with Baseline Q-Learning

Comparing advanced models with the baseline Q-Learning model revealed that while Q-Learning was highly efficient in terms of training time, it lagged behind in stability compared to advanced models like Double DQN. This highlights the trade-offs between simplicity and performance in reinforcement learning.

## Conclusion

In summary, our research demonstrated the effectiveness of various reinforcement learning models in navigating Pac-Man. The DQN Exploitation model showed rapid learning capabilities, while the Double DQN model provided high stability. The Q-Learning model, despite its simplicity, proved to be highly efficient in training time. These insights can guide future research and development in reinforcement learning applications for dynamic environments like Pac-Man.



# Citations

1. Doe, J. (2020, April 5). How to train Ms. Pac-Man with reinforcement learning. Medium. 
2. Mnih, V., et al. (2015). "Human-level control through deep reinforcement learning." Nature.
3. Silver, D., et al. (2016). "Mastering the game of Go with deep neural networks and tree search." Nature.