This is the code for experiments in the paper Single Episode Policy Transfer in Reinforcement Learning, published in ICLR 2020. Ablations and baselines are included.
- Python 3.6
- TensorFlow 1.10.1
alg/- Implementation of SEPT, ablations and baselines.alg/configs/- A collection of JSON config files, one for each method on each domain.
results/- Results of training and testing will be stored in subfolders here. Each independent training run will create a subfolder that contains the final Tensorflow model, performance log file, and timing. All test runs for a method on a domain will be stored in a single aggregate subfolder. For example, 5 parallel independent training runs may produceresults/hiv_sept_1,...,results/hiv_sept_5, and test results will be stored inresults/hiv_sept.hip-mdp-public- Contains code for environments used in Killian et al. 2017. We provide new top-level scripts, one for training, and one for testing the saved models with the single test episode constraint.
Choice of experimental domain is selected by config['main']['domain'] within the JSON config files.
2D- source located inhip-mdp-public/grid_simulator/grid.pyacrobot- source located inhip-mdp-public/acrobot_simulator/acrobot_py3.pyhiv- source located inhip-mdp-public/hiv_simulator/hiv.py
- Check general settings in
alg/configs/config_2d_sept.json. E.g."domain" : "2D""N_seeds" : 20""dir_name" : "2D_sept"
cdinto thealgfolder- Execute training script:
python train_multiprocess.py configs/config_2d_sept.json - Periodic logging and final model are stored in
results/2D_sept_<int>, whereintranges fromdir_idx_starttoN_seeds(see the configs).
- Keep the same settings in
alg/configs/config_2d_sept.jsonas those used for training cdinto thealgfolder.- Execute test script
python test_multiprocess.py configs/config_2d_sept.json - Results will be stored in
test_<int>.csvandtest_time_<int>.pklinresults/2D_sept/
@inproceedings{yang2019single,
title={Single Episode Policy Transfer in Reinforcement Learning},
author={Yang, Jiachen and Petersen, Brenden and Zha, Hongyuan and Faissol, Daniel},
booktitle={International Conference on Learning Representations},
year={2019}
}
SEPT is distributed under the terms of the BSD-3 license. All new contributions must be made under this license.
See LICENSE and NOTICE for details.
SPDX-License-Identifier: BSD-3
LLNL-CODE-805017