Implementation of DCIL-II based on jax-based XPAG library.
- Clone DCIL repo,
git clone https://github.com/AlexandreChenu/DCIL_XPAG.git
- Create virtual environment dcil_env from environment.ylm,
If you want to use Mujoco environments (Fetch + Humanoid locomotion & standup):
cd DCIL_XPAG
conda env create --name dcil_env --file environment.yml
If you want to use Cassie envionments check this Repo for installation.
If you want to use PyBullet environments:
cd DCIL_XPAG
conda env create --name dcil_env_pybullet --file environment_pybullet.yml
- Clone + install XPAG (+ Jax),
git clone https://github.com/perrin-isir/xpag.git
cd xpag
git checkout 9ef7dd74b74fc71cee83c6a476adfebe4b977814
pip install -e .
Check this Repo for instructions.
- Install physical simulators,
- Clone + install maze or humanoid environments
git clone https://github.com/AlexandreChenu/gmaze_dcil.git
OR
git clone https://github.com/AlexandreChenu/ghumanoid_dcil.git
and
cd <env_directory>
pip install -e .
python test_DCIL_variant_XPAG_v4.py --demo_path ./demos/dubins_convert/1.demo --save_path /path/to/save/path
python test_DCIL_variant_XPAG_humanoid_v4.py --demo_path ./demos/humanoid_convert/1.demo --save_path <path_to_results_directory> --eps_state 0.5 --value_clipping 1
(learns sequential goal reaching with less than 1m training steps)
python test_DCIL_variant_XPAG_cassie_v5.py --demo_path ./demos/cassie_convert/1.demo --save_path <path_to_results_directory> --eps_state 0.5 --value_clipping 1
python test_DCIL_variant_XPAG_humanoid_walk_PB_v4.py --demo_path <path_to_this_directory>/demos/humanoid_PB_walk/ --save_path <path_to_results_directory> --eps_state 0.2 --value_clipping 1
(not working at the moment. Code running but no skill learning)
NOTE: PyBullet installation requires python==3.8
- trajs_it_- : training rollouts + skill-chaining evaluation + success goals sets
- value_skill_-it- : value for x-y position of skill starting state for different orientations
- transitions_- : sampled training transitions + segment between true desired goal and relabelled desired goal