Accompanying code for the paper "Learning and Planning in Average-Reward Markov Decision Processes" by Yi Wan*, Abhishek Naik*, Rich Sutton.
agents/
folder contains all the algorithms.environments/
folder contains all the environments.config_files/
folder contains sample configuration files for various experiments.experiments.py
contains methods to run different kinds of experiments, e.g., prediction, control.run_exp.py
runs an experiment based on command-line arguments outlined below.
A typical experiment looks like:
python run_exp.py --exp run_exp_learning_control_no_eval --config-file config_files/control_AccessControl_diff-q.json --output-folder results/control/AccessControl
where,
exp
: the experiment to be run. For prediction and control, this will generally berun_exp_learning_prediction
orrun_exp_learning_control_no_eval
. Checkexperiments.py
for full documentation and use-cases.config-file
: the file with all the experiment configurationsoutput-folder
: the location where all the result-logs will be stored
Optional parameters for deploying experiments at scale:
cfg-start
: the start index of the list of configurations for this scriptcfg-end
: the end index of the list of configurations for this script (refer toutils/sweeper.py
for more details)
Check out the jupyter notebook learning_planning_exps.ipynb
for sample experiments and the plots reported in the paper.
Requirements:
python3
(tested with 3.7.6)numpy
(tested with 1.18.1)tqdm
(tested with 4.40.2)