Algorithm that applies SAC to QMIX for Multi-Agent Reinforcement Learning. Watch the demo here.
SMAC
pytorch (GPU support recommanded while training)
tensorboard
StarCraft II
For the installation of SMAC and StarCraft II, refer to the repository of SMAC.
Train a model with the following command:
python main.py
Configurations and parameters of the training are specified in config.json
. Models will be saved at ./models
Test a trained model with the following command:
python test_model.py
Configurations and parameters of the testing are specified in test_config.json
. Match the run_name
items in config.json
and test_config.json
.
Note that a_i is equivalent to \mu_i and s_i is equivalent to o_i in the architecture schema above.
Train Objective: policies that maximum
Q-values computed by networks:
Individual state-value functions:
Total state-values (alpha is the entropy temperature):
Q-values expressed with Bellman Function:
Critic networks update: minimum
Actor networks update: maximum
Entropy temperatures update: minimum
Note that data of other algorithm are from SMAC paper. Therefore methods of evaluations are kept the same as SMAC paper did (StarCraftII version: SC2.4.6.2.69232).
(Mean of 5 independent runs)
Scenario | IQL | VDN | QMIX | SAC-QMIX |
---|---|---|---|---|
2s_vs_1sc | 100 | 100 | 100 | 100 |
2s3z | 75 | 97 | 99 | 100 |
3s5z | 10 | 84 | 97 | 97 |
1c3s5z | 21 | 91 | 97 | 100 |
10m_vs_11m | 34 | 97 | 97 | 100 |
2c_vs_64zg | 7 | 21 | 58 | 56 |
bane_vs_bane | 99 | 94 | 85 | 100 |
5m_vs_6m | 49 | 70 | 70 | 90 |
3s_vs_5z | 45 | 91 | 87 | 100 |
3s5z_vs_3s6z | 0 | 2 | 2 | 85 |
6h_vs_8z | 0 | 0 | 3 | 82 |
27m_vs_30m | 0 | 0 | 49 | 100 |
MMM2 | 0 | 1 | 69 | 95 |
corridor | 0 | 0 | 1 | 0 |
(Mean of 5 independent runs)