Learning-Robust-Options-by-Conditional-Value-at-Risk-Optimization

The source code to replicate our NeurIPS 2019 paper (arXiv preprint). Demo video

Getting Started

Install anaconda
Intsall Mujoco (required mjpro-150)
Create conda environment containing required packages (e.g., gym):
conda env create -f robustoption20190919.yml
Replace the original "gym" directory installed in the conda environment with our version (the "gym" directory in our package).
Install "gym-extensions" in our package.
cd gym_extensions
python setup.py intall

Edit "main" function in "LearningMoreRobustOption/run_mujodo.py" according to an experiment setup (e.g., task environment and learning method). For example, if you want to learn options for "HaflCheetah-disc", set the default value of python argument "--env" as :
parser.add_argument('--env', help='environment ID', default='HalfCheetah-Random-Params-discrete-v1')
You can select an option learning method by editing the default value of the python argument "--method." For example, if you want to use OC3, edit the source code as :
parser.add_argument('--method', help='Method name:' + str(METHODS), type=str, default="CVaR")
Run a script to conduct option learning:
sh runexp.sh
Run a script to select learned options to be tested:
python GeneratebestpolTextMaxAverageReturnwithCVaRThreth.py
or
python GeneratebestpolTextMaxAverageReturn.py
Run a script to conduct test:
sh run_test_w_best_cvar_pol.sh
Run a result summarizer. You can obtain the summary of CVaR scores as
python EvalAverageCVaR.py
and the summary of average return as
python EvalAverageReturn.py

** This repository is based on PPOC, gym, and gym extensions. **

Keep refactoring the source codes.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LearningMoreRobustOption		LearningMoreRobustOption
gym-extensions		gym-extensions
gym		gym
.gitignore		.gitignore
EvalAverageCVaR.py		EvalAverageCVaR.py
EvalAverageReturn.py		EvalAverageReturn.py
GeneratebestpolTextMaxAverageReturn.py		GeneratebestpolTextMaxAverageReturn.py
GeneratebestpolTextMaxAverageReturnwithCVaRThreth.py		GeneratebestpolTextMaxAverageReturnwithCVaRThreth.py
LICENSE		LICENSE
README.md		README.md
Scores4EachParameterPerturbation.py		Scores4EachParameterPerturbation.py
robustoption20190919.yml		robustoption20190919.yml
run_test_w_best_cvar_pol.sh		run_test_w_best_cvar_pol.sh
run_test_w_specified_epoch.sh		run_test_w_specified_epoch.sh
runexp.sh		runexp.sh