# COGS 188 - Project Proposal

# Names
- Aydin Tabatabai
- Sia Khorsand
- Caleb Galdston
- Aden Harris

# Abstract 

Our goal is to train multiple AI agents to complete a volleyball game using various reinforcement learning techniques. We are still deciding between using a 3D volleyball environment using Unity's ML-agents toolkit or a 2D using OpenAI's gymnasium. If we use a 2 dimensional enviornment we will likely attempt multiple learning techniques such as Markov Decision Processes, Dynamic Programming, Monte Carlo methods, and potentially use neural networks such as Deep Q networks. Our main measurement for success will be how long it takes the agents to learn to play a game of volleyball. We are also considering attempting to have agents trained on different reinforcement learning algorithms compete against eachother in a game of volleyball to see which one performs better. We are not using a specific dataset, however we will keep logs of states, actions, and rewards that the agents use to better understand how the methods compare to one another. 

# Background

Reinforcement learning has shown great promises in training agents to perform complex tasts, including playing games. Deep Q-learning, which uses neural networks to approximate Q-values, has been particularly successful in game environments. The appeal of using these methods for sports games lies in their ability to learn optimal strategies in a continuous environment. Additionally, with famous projects like the Google Research Football demonstrating multi-agent reinforcement learning in team sports <a name="Football"></a><sup>[1]</sup>, application of RL to sports simulations has gained a lot of popularity. 

In the realm of volleyball specifically, there have been projects to create AI agents capable of playing the game. For instance, the "vollAIball" project uses Unity's ML-Agents toolkit to train autonomous agents in a 3D volleyball enrivornment<a name="VollAIball"></a><sup>[2]</sup>. This builds upon the broader trend of applying RL to sports, where continuos state spaces and multiple agents have to  be considered<a name="MDP"></a><sup>[3]</sup>. This implementation builds upon RL concepts like Markov Decision Processes that requires the agents to make decisions based on incomplete information and anticipate oponent actions. This method provides a framework for modeling these complex decision scenarios in sports<a name="MDPs"></a><sup>[4]</sup>.

Our project extends this body of work by comparing different RL approaches in a volleyball environment, using both RL methods and potentially other types of algorithms. We'll employ classic approaches like DP and Monte Carlo methods and Q-learning, along with potentially experimenting with nature-inspired algorithms such as Genetic algorithms and Particle Swarm Optimization. This approach could help us further understand the strengths and weaknesses of strategic decision-making algorithms vs. movement optimizations in each aspect of the game of Volleyball. 


# Problem Statement

How well do different reinforcement learning algorithms compare when playing a zero-sum game like volleyball? Is it possible to optimize learning for a game with only a few rules to follow?

# Data

The data that we will use in our assignment will be generated by the agents that we implement. Effectively our project will generate its own dataset rather than using a pre-exisiting dataset. The agents will generate this dataset using but not limited to:


State observations including:

Agent rotation (Y-axis)
3D vector from agent to ball
Distance to ball
Agent velocity (X, Y, Z axes)
Ball velocity (X, Y, Z axes)


Actions taken by agents:

Movement commands (forward/backward, left/right)
Rotation commands (left/right)
Jump commands (jump/no jump)

The data collection will be ongoing throughout the training, with each training running generating vectors that will help the agents improve their policies and action value estimates. That will be our dataset.

# Proposed Solution

Our goal is to train AI agents to play a full volleyball game using reinforcement learning. We will test multiple RL approaches and compare their effectiveness in a simulated volleyball environment. We are exploring both a 3D volleyball environment with Unity’s ML-Agents Toolkit and a 2D option with OpenAI Gymnasium. The choice of environment will determine the learning techniques we use. If we go with a 2D setting, we plan to test methods like Markov Decision Processes, Dynamic Programming, Monte Carlo methods, and Deep Q-Networks to find the best approach.(Potentially, after further exploration and experimentation, we will consider the addition of nature-inspired algorithms as a form of comparison.)

We will evaluate the success of our models by measuring how efficiently they learn to play. This includes tracking their cumulative rewards, improvement over time, and overall win/loss ratios. Another key aspect of our project is comparing different RL algorithms by having agents trained under each approach compete against one another. This will help us determine which method is most effective for teaching AI how to play volleyball. Our agents will undergo extensive training through self-play and adversarial matchups, allowing them to refine their strategies over a multitude of games.

Instead of using a pre-existing dataset, our project will generate its own data through recorded states, actions, and rewards. This will allow us to analyze how different algorithms adapt to the game and how each method influences decision-making. Through these experiments, we aim to gain insights into the best reinforcement learning approaches for training AI agents in multi-agent sports environments.

# Evaluation Metrics

Propose at least one evaluation metric that can be used to quantify the performance of both the benchmark model and the solution model. The evaluation metric(s) you propose should be appropriate given the context of the data, the problem statement, and the intended solution. Describe how the evaluation metric(s) are derived and provide an example of their mathematical representations (if applicable). Complex evaluation metrics should be clearly defined and quantifiable (can be expressed in mathematical or logical terms).


here are some metrics that we may use to evaluate a given agents performance in the volleyball enviroment (2D or 3D)
1. Cumulative Reward & Win Rate

-Primary metric: Average score over 1000 evaluation episodes

-Win/loss ratio when agents compete against each other

-Standard deviation of scores to measure consistency


2. Learning Efficiency

-Number of timesteps required to achieve a positive average score (over 1000 episodes)

-Training stability measured by the variance in episode rewards over time

-Sample efficiency: performance improvement per million timesteps of training


3. Comparative Performance

-Head-to-head performance between different algorithms

-Performance against the baseline agent provided in SlimeVolleyGym

# Ethics & Privacy

Ensuring ethical AI development is critical, and while our project does not involve personal data, we must consider potential biases, unintended consequences, and responsible AI usage.

**1. Bias and Fairness:** Reinforcement learning models can sometimes develop biases based on how they are trained. To avoid this, we’ll test them in different conditions to make sure no single strategy has an unfair advantage. Additionally, we will analyze performance across different training scenarios to identify and mitigate any emerging biases.

**2. Transparency and Interpretability:** AI decision-making can sometimes be difficult to understand. To improve transparency, we will document agent behaviors, provide visualizations of their learning process, and conduct detailed evaluations on how different factors influence agent decisions. This will help ensure that the AI’s reasoning can be explained and justified.

**3. Unintended Advantages:** AI agents might come up with unexpected ways to win, like taking advantage of flaws in the simulation instead of actually playing well. We'll closely monitor training to make sure they develop fair and realistic volleyball skills. Additionally, we will pay close attention to agent decisions and gameplay to ensure that the strategies developed align with human expectations.

By addressing these ethical considerations, we aim to develop a fair, responsible, and transparent AI system that contributes positively to the reinforcement learning research community.

# Team Expectations 


* *Clear Communications and Weekly Meetings*
* *Accountability in terms of responsibilities and meeting deadlines*
* *Fair Work Distribution*
* *Teamwork Etiquette and Transparency*
* *Collaborative and Constructive Group Environment*

# Project Timeline Proposal



| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 1/14  |  4 PM |  Discuss Project scope | Research different methods  | 
| 1/21  |  11 AM | Environment Setup and Decide on Agent Assignment  | Decide between 2D/3D | 
| 2/28  | 11 AM  | Basic Agent Implementations | Basic Movement Controls |
| 3/7  | 11 PM  | RL Algorithm Implementations (rough)| Discuss Algorithms |
| 3/15-20 | 11 AM  | Edit Project Code/Debug | Discuss/edit project code and render videos | Complete project |
| 3/15-20  | 12 PM  | Turn-in Project | Package Project for Github/Resumes|

# Footnotes
<a name="Football"></a>1.[^](#Football): Alechina, N, et al. Boosting Studies of Multi-Agent Reinforcement Learning on Google Research Football Environment: The Past, Present, and Future. 2024. https://www.ifaamas.org/Proceedings/aamas2024/pdfs/p1772.pdf<br> 
<a name="VollAIball"></a>2.[^](#VollAIball): gianfrancodemarco. “GitHub - Gianfrancodemarco/VollAIball: Agents Learning Volleyball in a Unity3D Environment, Using Reinforcement Learning and with a Prolog Narrator.” GitHub, 2023, github.com/gianfrancodemarco/vollAIball. Accessed 15 Feb. 2025.<br>
<a name="MDP"></a>3.[^](#MDP): Ye, Andre. “Markov Decision Process in Reinforcement Learning: Everything You Need to Know.” Neptune.ai, 1 Dec. 2020, neptune.ai/blog/markov-decision-process-in-reinforcement-learning.<br>
<a name="MDPs"></a>4.[^](#MDPs): Banerjee, Somnath. “Real World Applications of Markov Decision Process (MDP) | towards Data Science.” Towards Data Science, 9 Jan. 2021, towardsdatascience.com/real-world-applications-of-markov-decision-process-mdp-a39685546026/. Accessed 15 Feb. 2025.
