Predator-Prey-Grass gridworld deploying multi-objective and multi-agent environments with dynamic deletion and spawning of partially observant agents. Utilizing Farama's PettingZoo and Momaland.
so_predpregrass_v0.py: A (single-objective) multi-agent reinforcement learning (MARL) environment, trained and evaluated using Proximal Policy Optimization (PPO). Learning agents Predators (red) and Prey (blue) both expend energy moving around, and replenish it by eating. Prey eat Grass (green), and Predators eat Prey if they end up on the same grid cell. In the base case for simplicity, the agents obtain all the energy from the eaten Prey or Grass. Predators die of starvation when their energy is zero, Prey die either of starvation or when being eaten by a Predator. The agents asexually reproduce when energy levels of learning agents rise above a certain treshold by eating. Learning agents, learn to execute movement actions based on their partial observations (transparent red and blue squares respectively) of the environment to maximize cumulative reward. The single objective rewards (stepping, eating, dying and reproducing) are naively summed and can be adjusted in the environment configuration file.
mo_predpregrass_v0.py: A (multi-objective) multi-agent reinforcement learning (MOMARL) environment. The environment has two objectives:
- maximize cumulative rewards for reproduction of Predator agents
- maximize cumulative rewards for reproduction of Prey agents.
The rewards returned by the environment are stored in a two-dimensional vector conform Farama's Momaland framework, which follows the standard PettingZoo API. This environment is a generalization of the single objective version described above and offers the opportunity to go beyond naively summing rewards and permits the possibility of implementing predefined (possibly non-linear) utility functions for every seperate learning agent.
Training the single onbjective environment mo_predpregrass_v0.py with the PPO algorithm is an example of how elaborate behaviors can emerge from simple rules in agent-based models. In the above displayed MARL example, rewards for learning agents are solely obtained by reproduction. So all other reward options are set to zero in the environment configuration. Despite these relative sparse reward structure, maximizing these rewards results in elaborate emerging behaviors such as:
- Predators hunting Prey
- Prey finding and eating grass
- Predators hovering around grass to catch Prey
- Prey trying to escape Predators
Moreover, these learning behaviors lead to more complex emergent dynamics at the ecosystem level. The trained agents are displaying a classic Lotka–Volterra pattern over time:
More emergent behavior and findings are described on our website.
Editor used: Visual Studio Code 1.93.1 on Linux Mint 21.3 Cinnamon
- Clone the repository:
git clone https://github.com/doesburg11/PredPreyGrass.git
- Open Visual Studio Code and execute:
- Press
ctrl+shift+p
- Type and choose: "Python: Create Environment..."
- Choose environment: Conda
- Choose interpreter: Python 3.11.7
- Open a new terminal
- Install dependencies:
pip install -r requirements.txt
- Press
- If encountering "ERROR: Failed building wheel for box2d-py," run:
and
conda install swig
pip install box2d box2d-kengz
- Alternative 1:
pip install wheel setuptools pip --upgrade pip install swig pip install gymnasium[box2d]
- Alternative 2: a workaround is to copy Box2d files from assets/box2d to the site-packages directory.
- If facing "libGL error: failed to load driver: swrast," execute:
conda install -c conda-forge gcc=12.1.0
In Visual Studio Code run:
predpreygrass/optimizations/so_predpreygrass_v0/evaluation/so_simple_aec_random_policy.py
Adjust parameters accordingly in:
predpreygrass/envs/_so_predpreygrass_v0/config/so_config_predpreygrass.py
In Visual Studio Code run:
predpreygrass/optimizations/so_predpreygrass_v0/training/so_predpreygrass_v0_train_ppo.py
To evaluate and visualize after training follow instructions in:
predpreygrass/optimizations/so_predpreygrass_v0/evaluation/so_evaluate_ppo_from_file.py
Batch training and evaluating in one go:
predpreygrass/optimizations/so_predpreygrass_v0/evaluation/so_parameter_variation_train_ppo_and_evaluate.py
- Terry, J and Black, Benjamin and Grammel, Nathaniel and Jayakumar, Mario and Hari, Ananth and Sullivan, Ryan and Santos, Luis S and Dieffendahl, Clemens and Horsch, Caroline and Perez-Vicente, Rodrigo and others. Pettingzoo: Gym for multi-agent reinforcement learning. 2021-2024
- Paper Collection of Multi-Agent Reinforcement Learning (MARL)
- MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning (MOMARL). Florian Felten and Umut Ucak and Hicham Azmani and Gao Peng and Willem Röpke and Hendrik Baier and Patrick Mannion and Diederik M. Roijers and Jordan K. Terry and El-Ghazali Talbi and Grégoire Danoy and Ann Nowé and Roxana Rădulescu
- Multi-Objective Multi-Agent Decision Making: A Utility-based Analysis and Survey
- A Practical Guide to Multi-Objective Reinforcement Learning and Planning
- Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer