Skip to content

PKU-RL/DPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DPO

Note

The implementation of the following methods can be found in this codebase:

Installation

How to run

 python3 on-policy-main/train_smac.py  --map_name 2s3z --use_eval  --penalty_method True --dtar_kl 0.02   --experiment_name dtar_0.02_V_penalty_2M --num_env_steps 2000000 --group_name dpo --seed 1 --multi_rollout True --n_rollout_threads 1

Results

Here, we provide results in three different SMAC scenarios using default hyperparameters. 2s3z8m3s5z --->

Citation

If you are using the codes, please cite our papers.

Kefan Su and Zongqing Lu. A Fully Decentralized Surrogate for Multi-Agent Policy Optimization. TMLR, 2024

@article{DPO,
title={A Fully Decentralized Surrogate for Multi-Agent Policy Optimization},
author={Su, Kefan and Lu, Zongqing},
journal={Transactions on Machine Learning Research},
year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages