Pulsar is a research framework for training intelligent agents via reinforcement learning to compete in Robomaster's (ICRA) AI challenge. The framework is divided into three major parts: rmleague for the training scheme, Truly + Distributed PPO as the reinforcement learning algorithm, and a physics simulation to act as the simulation of the game.
Below contain a list of multi-agent reinforcement learning algorithms tested.
- Truly-PPO
- Hierarchical Critics Assignment for Multi-agent Reinforcement Learning
- Learning with Opponent-Learning Awareness
- Deep Multi-Agent Reinforcement Learning with Relevance Graphs
- TD3
- SARSA
- Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning
- QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning
- Graph Convolutional Reinforcement Learning
- Duel-qlearning
- Deep deterministic policy gradient
- Counterfactual Multi-Agent Policy Gradients
- QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
After conducting all tests against a benchmark, we have concluded that Truly-PPO has the best asymptotic performance, hence that was the algorithm chosen for further development.
- The network's architecture as well as training method is heavily inspired by Deepmind's Alpha-star:
Vinyals, O., Babuschkin, I., Czarnecki, W.M. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature (2019) doi:10.1038/s41586-019-1724-z
- Training algorithm:
@misc{wang2019truly, title={Truly Proximal Policy Optimization}, author={Yuhui Wang and Hao He and Chao Wen and Xiaoyang Tan}, year={2019}, eprint={1903.07940}, archivePrefix={arXiv}, primaryClass={cs.LG} }