Implemented based on Tianshou and Pettingzoo.
On PistonBall environment
- test/test_mappo.py
- test/test_qmix.py
Comm-free baselines
- IDDPG
- MADDPG
- ISAC
- MASAC
- PPO
- MAPPO
- IQL
- QMIX
- QMIX-Attention
Comm baselines
flowchart TD
subgraph CTDE [CTDE]
subgraph p [Parameter]
IP([Individual Parameter])
PS([Parameter Sharing])
IPGI([Individual Parameter with Global Information])
end
subgraph c [Critic]
IC([Individual Critic])
JC([Joint Critic])
end
end
subgraph FD [Fully Decentralized]
end
We recommend A Survey of Multi-Agent Reinforcement Learning with Communication for a detailed taxonomy.
Types | Sub-types |
---|---|
Fully Decentralized | |
CTDE | Individual Parameter |
Parameter Sharing | |
Individual Parameter with Global Info |
The figure refers to https://colab.research.google.com/drive/1MhzYXtUEfnRrlAVSB3SR83r0HA5wds2i?usp=sharing.
flowchart
subgraph mapolicy [MA-policy]
p1((Agent1)) --action--> m([Manager]) -->|obs or messages| p1
p2((Agent2)) --action--> m([Manager]) -->|obs or messages| p2
p3((Agent3)) --action--> m([Manager]) -->|obs or messages| p3
end
subgraph collector [Collector]
VE(VecEnv) ==Transition==> mapolicy ==Action==> VE;
end
subgraph alg [Algorithm]
collector ==Data==> B[(Buffer)] ==Sample==> T{Trainer} ==>|Processed Sample| mapolicy ==Info==> T
T ==Info==> L{{Logger}}
end
An algorithm corresponds to
- A MA-policy: interaction among agents, such as the communication
- A Buffer: what to store
- A Trainer: Update, but implemented in each agent's policy actually
To facilitate the sub-types of CTDE scheme, the buffer should consider the agent dimension.
- For the case using a single env, implement the
MAReplayBuffer
as aReplayBufferManager
withagent_num
buffers. - For the case using a vectorized env, implement the
VectorMAReplayBuffer
as aReplayBufferManager
withagent_num
$*$ env_num
buffers.
For both cases, the buffer should contain sub-buffers for shared global information, e.g., the state.
Act as a vectorized env with agent_num
envs.
AysncCollector with self._ready_env_ids
initialized as [1 0 ... ]
and self.data
initialized to be length env_num
seems suitable, accompanied with env_id
in returned info.
Maintain the centralized part inside the class.
sudo apt install swig -y
pip install 'pettingzoo[all]'
# Install my version of tianshou
git clone https://github.com/Leo-xh/tianshou.git
cd tianshou
pip install -e .