The source code to replicate our NeurIPS 2019 paper (arXiv preprint). Demo video
- Install anaconda
- Intsall Mujoco (required mjpro-150)
- Create conda environment containing required packages (e.g., gym):
conda env create -f robustoption20190919.yml
- Replace the original "gym" directory installed in the conda environment with our version (the "gym" directory in our package).
- Install "gym-extensions" in our package.
cd gym_extensions
python setup.py intall
- Edit "main" function in "LearningMoreRobustOption/run_mujodo.py" according to an experiment setup (e.g., task environment and learning method).
For example, if you want to learn options for "HaflCheetah-disc", set the default value of python argument "--env" as :
parser.add_argument('--env', help='environment ID', default='HalfCheetah-Random-Params-discrete-v1')
You can select an option learning method by editing the default value of the python argument "--method." For example, if you want to use OC3, edit the source code as :
parser.add_argument('--method', help='Method name:' + str(METHODS), type=str, default="CVaR")
- Run a script to conduct option learning:
sh runexp.sh
- Run a script to select learned options to be tested:
python GeneratebestpolTextMaxAverageReturnwithCVaRThreth.py
or
python GeneratebestpolTextMaxAverageReturn.py
- Run a script to conduct test:
sh run_test_w_best_cvar_pol.sh
- Run a result summarizer.
You can obtain the summary of CVaR scores as
python EvalAverageCVaR.py
and the summary of average return as
python EvalAverageReturn.py
** This repository is based on PPOC, gym, and gym extensions. **
Keep refactoring the source codes.