Skip to content

albertcity/OPA

Repository files navigation

This is a public implementation of the paper "Online Prototype Alignment for Few-shot Policy Transfer" (OPA), which is accepted at ICML 2023.

The main implementation of PPO is based on the tianshou library, which is also included in this repository. However, the logger is borrowed form the stable-baselines3 and is not included. You should install stable-baselines3 or disable the logger first.

To reproduce the results of OPA on Hunter-Z3C3, you should:

  • train a policy $\pi_{task}$ to solve the task in the source domain and save the history trajectories at the same time:
python ppo_main.py env_kwargs.spawn_args=Z3C3 logdir=log/policy/Z3C3 seed=12345 save_hist=1 use_vec_traj_to_save=1
  • train the discriminator (i.e. the inference model $q_\theta$) using the saved trajectories:
python train_disc.py data_dir=log/policy/Z3C3 logdir=log/disc/Z3C3
  • train the exploration policy $\pi_{exp}$ using $q_\theta$
python ppo_expl_main.py env_kwargs.spawn_args=Z3C3 load_task_pol_dir=log/policy/Z3C3/s12345/all_model.pth load_disc_dir=log/disc/Z3C3/disc.pt logdir=log/expl/Z3C3

For simplicity, we also provide some pretrained models in the pretrain_model folder, including:

  • pretrain_models/percept: the novelty detection model $\Psi_{unseen},f_{ND}$ for Hunter.
  • pretrain_models/policy/Z3C3/s4801973: $\pi_{task}$ for Hunter-Z3C3
  • pretrain_models/disc/Z3C3: $q_\theta$ for Hunter-Z3C3

Feel free to raise a issue to communicate with us.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages