Mirror Descent Inverse Reinforcement Learning (NeurIPS 2022)

This repository contains an implementation of our paper:

Robust Imitation via Mirror Descent Inverse Reinforcement Learning

Dong-Sig Han, Hyunseo Kim, Hyundo Lee, Je-Hwan Ryu, and Byoung-Tak Zhang

Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

arXiv: https://arxiv.org/abs/2211.02291

This implementation makes use of Tensorflow 2.2.0 and Gym 0.17.2.

The RL loop is a variant of the SAC algorithm in stable-baselines to serve our specific purposes of implementing regularized actor-critic and modeling Gaussian policies.

Additionally, for the demonstration code in the demo_julia directory, we used Julia 1.7 for training and visualization.

To run the MuJoCo experiments, one needs to place optimal expert trajectory files in the data directory (specified with f'data/trj{num_demo}.{env_name}.npz'), which then can be stored in memory with data/trj.py file.

Training

Run MD-AIRL (Tsallis entropy, q=2) on MuJoCo

python3 run.py --env_id Hopper-v3 --policy gaussian --alg mdirl --q 2 --k 1e-2 --k2 1. --alphaT 20. --num_demo 100 --gamma 0.99 --gp_coeff 1e-4 --burnin_steps 10000 --save_intvl 100000 --seed 0 --num_steps 1000000

Run MD-AIRL (Tsallis entropy, q=2) on an discete environment (a.k.a. multi-armed bandits)

python3 run_discrete.py --alg mdirl --alpha 1. --alphaT 2. --reg_type tsallis --batch_size 4 --num_action 4 --expert_type set1 --seed 0

Run MD-AIRL on noisy MuJoCo

python3 run_noisy.py --env_id Hopper-v3 --policy gaussian --alg mdirl --q 2 --noise_lvl 1e-2 --k 1e-2 --k2 1. --alphaT 20. -num_demo 100 --gamma 0.99 --gp_coeff 1e-4 --burnin_steps 10000 --save_intvl 100000 --seed 0 --num_steps 1000000

Reference

@inproceedings{han2022robust,
 author = {Han, Dong-Sig and Kim, Hyunseo and Lee, Hyundo and Ryu, JeHwan and Zhang, Byoung-Tak},
 title = {Robust Imitation via Mirror Descent Inverse Reinforcement Learning},
 booktitle = {Advances in Neural Information Processing Systems},
 pages = {30031--30043},
 volume = {35},
 year = {2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
demo_julia		demo_julia
env		env
loss		loss
nn		nn
util		util
.gitignore		.gitignore
README.md		README.md
const.py		const.py
const_discrete.py		const_discrete.py
discrete_models.py		discrete_models.py
run.py		run.py
run_discrete.py		run_discrete.py
run_noisy.py		run_noisy.py
train.py		train.py
train_discrete.py		train_discrete.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mirror Descent Inverse Reinforcement Learning (NeurIPS 2022)

Training

Reference

About

Releases

Packages

Languages

dshan4585/mdirl

Folders and files

Latest commit

History

Repository files navigation

Mirror Descent Inverse Reinforcement Learning (NeurIPS 2022)

Training

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages