Skip to content

Minusadd/Meta-learning-parameterized-skills

Repository files navigation

Meta-learning-parameterized-skills

Source code for Meta-learning Parameterized Skills (ICML 2023).

We propose a novel parameterized skill-learning algorithm that aims to learn transferable parameterized skills and synthesize them into a new action space that supports efficient learning in long-horizon tasks. We propose to leverage off-policy Meta-RL combined with a trajectory-centric smoothness term to learn a set of parameterized skills. Our agent can use these learned skills to construct a three-level hierarchical framework that models a Temporally-extended Parameterized Action Markov Decision Process. We empirically demonstrate that the proposed algorithms enable an agent to solve a set of highly difficult long-horizon (obstacle-course and robot manipulation) tasks.

The repo is built upon PEARL.

We utilize pytorch-softdtw-cuda for DTW implementation.

Requirements: Pytorch (we use version 1.7.1 but later version should also be fine.), Mujoco210, Gym, etc.

  • Install the modified metaworld:
$ cd metaworld
$ pip install -e .

To learn the parameterized skills, run: python launch_experiment_ml1.py ./configs/[ENV_NAME].json for different coffee tasks. By default the code will use the GPU - to use CPU instead, set use_gpu=False in the appropriate config file. Output files will be written to ./output/[ENV]/[EXP NAME] where the experiment name is uniquely generated based on the date. The file progress.csv contains statistics logged over the course of training.

After having all the coffee-related parameterized skills, run: python eval_act_boundcoffee.py ./output/[ENV]/[EXP NAME].json ./output/[ENV] --num_trajs 50 to get the bounds of the learned new parameterized action spaces.

Then, to learn the high-level and mid-level control policies using the learning parameterized skills, run: python launch_experiment_coffeeplanning.py ./configs/coffee-full.json and modify the coffee-full.json file with your own paths for the learning skill-conditioned policies and bounds of action space.

Run python sim_policy_coffeeplanning.py ./output/[ENV]/[EXP NAME].json ./output/[ENV] --num_trajs 100 to evaluate the overall policies. You can use --video to save the video.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages