Skip to content
Code Released for NeurIPS 2018 paper: Synthesized Policies for Transfer and Adaptation across Tasks and Environments
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Synthesized Policies (SynPo)

This repository implements the SynPo algorithms presented in:

  • Hu, Hexiang, Liyu Chen, Boqing Gong, and Fei Sha. "Synthesized Policies for Transfer and Adaptation across Tasks and Environments." In Advances in Neural Information Processing Systems, pp. 1176-1185. 2018.


  • Python 3+
  • Numpy 1.10.0+
  • Pytorch 0.4.0

Please see requirements.txt for complete details


We release our source code based on the gridworld environment in this repo. The usage is listed as following:

usage: [-h] [--gpu_id GPU_ID] [--batch_size BATCH_SIZE]
                          [--weight WEIGHT] [--scene SCENE] [--task TASK]
                          [--embedding_dim EMBEDDING_DIM]
                          [--scene_embedding_dim SCENE_EMBEDDING_DIM]
                          [--task_embedding_dim TASK_EMBEDDING_DIM]
                          [--num_obj_types NUM_OBJ_TYPES]
                          [--task_length TASK_LENGTH]
                          [--update_interval UPDATE_INTERVAL]
                          [--scene_num SCENE_NUM] [--task_num TASK_NUM]
                          [--reward_prediction REWARD_PREDICTION]
                          [--scene_disentanglement SCENE_DISENTANGLEMENT]
                          [--task_disentanglement TASK_DISENTANGLEMENT]
                          --split_filepath SPLIT_FILEPATH [--lr LR] [--wd]
                          [--mode {cloning}] [--network {mlp,mtl,synpo}]
                          [--postfix POSTFIX] [--repeat REPEAT] [--evaluate]
                          [--visualize] [--random_seed RANDOM_SEED]
                          [--logger_name LOGGER_NAME] [--norm]

optional arguments:
  -h, --help            show this help message and exit
  --gpu_id GPU_ID
  --batch_size BATCH_SIZE
  --weight WEIGHT
  --scene SCENE
  --task TASK
  --embedding_dim EMBEDDING_DIM
  --scene_embedding_dim SCENE_EMBEDDING_DIM
  --task_embedding_dim TASK_EMBEDDING_DIM
  --num_obj_types NUM_OBJ_TYPES
  --task_length TASK_LENGTH
  --update_interval UPDATE_INTERVAL
  --scene_num SCENE_NUM
  --task_num TASK_NUM
  --reward_prediction REWARD_PREDICTION
                        loss weight of reward prediction objective
  --scene_disentanglement SCENE_DISENTANGLEMENT
                        loss weight of scene disentanglement prediction
  --task_disentanglement TASK_DISENTANGLEMENT
                        loss weight of task disentanglement prediction
  --split_filepath SPLIT_FILEPATH
                        train/test split filepath
  --lr LR               base learning rate
  --wd                  enable weight decay
  --mode {cloning}      training mode [only behavior cloing available for now]
  --network {mlp,mtl,synpo}
                        select model architecture
  --postfix POSTFIX     postfix to the log file
  --repeat REPEAT       number of test run
  --evaluate            evaluation mode
  --visualize           visualize policy [only in evaluation mode]
  --random_seed RANDOM_SEED
                        random seed value
  --logger_name LOGGER_NAME
                        logger name format [must have for slots to fill]
  --norm                whether normalize the scene/task embedding


If you are using any resources within this repo for your research, please cite:

  title={Synthesized Policies for Transfer and Adaptation across Tasks and Environments},
  author={Hu, Hexiang and Chen, Liyu and Gong, Boqing and Sha, Fei},
  booktitle={Advances in Neural Information Processing Systems},


Part of the source code is modified based on the pytorch DeepRL repo. We thank the original author for open source their implementation.


SynPo is MIT licensed, as found in the LICENSE file.

You can’t perform that action at this time.