# COGS 188 - Project Proposal

# Names

- Austin Blanco
- Yifei Du
- Alvin Xiao

# Abstract 
This section should be short and clearly stated. It should be a single paragraph <200 words.  It should summarize: 
- what your goal/problem is
- what the data used represents and how they are measured
- what you will be doing with the data
- how performance/success will be measured

# Background

### Reinforcement Learning in Pacman
Reinforcement learning (RL) is a branch of machine learning in which an agent learns to performs actions in an environment in order to maximize a cumulative reward <a name="suttonnote"></a>[<sup>[4]</sup>](#sutton&barto). The agent iteratively explores the environment, observing the outcomes, rewards and penalties, of its actions. Over time, it refines a policy, which specifies the actions to take in each state to optimize long term return. Early RL methods, such as Q-learning and SARSA, rely on tabular representations of the value function Q(s,a) to estimate how desirable each action a is in a given state s. These approaches are effective in smaller domans, but become less reliable as the state space grows. This limitation has led to the integration of deep learning, more specifically Deep Reinforcement Learning (DRL)<a name="Arulkumarannote"></a>[<sup>[1]</sup>](#arulkamaran). DRL leverages neural networks to approximate the value function or policy, allowing the agent to handle high-dimesional states <a name="mnihnote"></a>[<sup>[1]</sup>](#mnih).

Pac-Man is a classic arcade game featuring a grid-based maze. The agent (Pac-Man) navigates the maze to collect pellets while avoiding or chasing four ghosts. This environment offers RL challenges:
- Discrete Action Space: Pac-Man can move in five different directions (up, down, left, right, stay)
- Reward Structure: The game provides immediate rewards (pellets), high-value rewards (power pellets), and strong penalties (losing a life).
- Adversial Dynamics: Ghosts actively pursue Pac-Man
Because of these properties, Pac-Man is an ideal environment setup for RL. 

### Prior Work

Pac-Man has been a popular benchmark for artificial intelligence (AI) research since the 1980s, initially explored through rule-based or search-based methods (e.g., depth-first search and A*) and later through classical reinforcement learning (RL). Early RL applications often employed tabular Q-learning on simplified mazes, where state representations included Pac-Man’s coordinates, ghost positions, and pellet locations. However, this approach suffered from a combinatorial explosion in larger or more complex layouts. To reduce the dimensionality, some researchers employed hand-engineered features—like “ghost proximity” or “pellet distance”—rather than storing value estimates for every possible state. Around the same time, Ms. Pac-Man emerged as a competition environment <a name="lucasnote"></a>[<sup>[2]</sup>](#lucas), prompting novel strategies such as neuroevolution and hybrid RL techniques, which further demonstrated the need for more scalable, generalizable methods

As deep learning gained traction, Deep Q-Networks (DQN) were introduced, showing human-level control in various Atari games <a name="mnihnote"></a>[<sup>[3]</sup>](#mnih). This success naturally extended to Pac-Man, where deep RL agents learned to approximate Q-values using convolutional neural networks, bypassing manual feature engineering. The advent of DQN variants like Double DQN, Prioritized Experience Replay, and policy-gradient methods further improved stability and performance. 

### Challenges

Pac-Man is simpler than real-world environments, but still offers many challenges:
- Exponential State Space: The number of possible states grows quickly with the maze size, the postions of Pac-Man, number of ghosts, and remaining pellets.
- Sparse Rewards: Collecting pellets is straightforward, but learning optimal strategies like chasing ghosts with higher power-pellet rewards can be challenging.
- Exploration vs Exploitation: Pac-Man may be drawn to safer paths, but effective policies sometimes require risk taking like high-value power pellets.
- Computational Cost: DRL can be computationally expensive, which will require careful hyperparameter tuning and potentially large-scale training to converge to optimal solutions <a name="Arulkumarannote"></a>[<sup>[1]</sup>](#arulkamaran)

# Problem Statement

Clearly describe the problem that you are solving. Avoid ambiguous words. The problem described should be well defined and should have at least one ML-relevant potential solution. Additionally, describe the problem thoroughly such that it is clear that the problem is quantifiable (the problem can be expressed in mathematical or logical terms), measurable (the problem can be measured by some metric and clearly observed), and replicable (the problem can be reproduced and occurs more than once).

# Proposed Solution

In this section, clearly describe a solution to the problem. The solution should be applicable to the project domain and appropriate for the dataset(s) or input(s) given. Provide enough detail (e.g., algorithmic description and/or theoretical properties) to convince us that your solution is applicable. Why might your solution work? Make sure to describe how the solution will be tested.  

If you know details already, describe how (e.g., library used, function calls) you plan to implement the solution in a way that is reproducible.

If it is appropriate to the problem statement, describe a benchmark model<a name="sota"></a>[<sup>[3]</sup>](#sotanote) against which your solution will be compared. 

# Evaluation Metrics

Propose at least one evaluation metric that can be used to quantify the performance of both the benchmark model and the solution model. The evaluation metric(s) you propose should be appropriate given the context of the data, the problem statement, and the intended solution. Describe how the evaluation metric(s) are derived and provide an example of their mathematical representations (if applicable). Complex evaluation metrics should be clearly defined and quantifiable (can be expressed in mathematical or logical terms).

# Ethics & Privacy

If your project has obvious potential concerns with ethics or data privacy discuss that here.  Almost every ML project put into production can have ethical implications if you use your imagination. Use your imagination. Get creative!

Even if you can't come up with an obvious ethical concern that should be addressed, you should know that a large number of ML projects that go into producation have unintended consequences and ethical problems once in production. How will your team address these issues?

Consider a tool to help you address the potential issues such as https://deon.drivendata.org

# Team Expectations 

Put things here that cement how you will interact/communicate as a team, how you will handle conflict and difficulty, how you will handle making decisions and setting goals/schedule, how much work you expect from each other, how you will handle deadlines, etc...
* *Team Expectation 1*
* *Team Expectation 2*
* *Team Expecation 3*
* ...

# Project Timeline Proposal

Replace this with something meaningful that is appropriate for your needs. It doesn't have to be something that fits this format.  It doesn't have to be set in stone... "no battle plan survives contact with the enemy". But you need a battle plan nonetheless, and you need to keep it updated so you understand what you are trying to accomplish, who's responsible for what, and what the expected due dates are for each item.

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 1/20  |  1 PM |  Brainstorm topics/questions (all)  | Determine best form of communication; Discuss and decide on final project topic; discuss hypothesis; begin background research | 
| 1/26  |  10 AM |  Do background research on topic (Pelé) | Discuss ideal dataset(s) and ethics; draft project proposal | 
| 2/1  | 10 AM  | Edit, finalize, and submit proposal; Search for datasets (Beckenbaur)  | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part   |
| 2/14  | 6 PM  | Import & Wrangle Data ,do some EDA (Maradonna) | Review/Edit wrangling/EDA; Discuss Analysis Plan   |
| 2/23  | 12 PM  | Finalize wrangling/EDA; Begin programming for project (Cruyff) | Discuss/edit project code; Complete project |
| 3/13  | 12 PM  | Complete analysis; Draft results/conclusion/discussion (Carlos)| Discuss/edit full project |
| 3/19  | Before 11:59 PM  | NA | Turn in Final Project  |

# Footnotes
<a name="Arulkumarannote"></a>1: Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). A Brief Survey of Deep Reinforcement Learning. IEEE Signal Processing Magazine, 34(6), 26–38.<br> 
<a name="lucasnote"></a>2: Lucas, S. M. (2007). Ms. Pac-Man competition. IEEE Symposium on Computational Intelligence and Games, 158–159. <br>
<a name="mnihnote"></a>3: Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., … Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.<br>
<a name="suttonnote"></a>4: Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press.


