Reinforcement Learning: N-step Bootstrapping in Actor Critic Methods

This repository contains the code to run all our experiments.

Davide Barbieri, Jan Schutte, Hinrik Snær Guðmundsson, Zi Long Zhu

Requirements

Use the supplied environment.yml file for conda or the requirements.txt for pip to install the python depedencies.

Python version should be 3.7 or higher.

Training

The entrypoint of the training code is the run.py file, it's arguments specify the experiments you can run. These arguments can be supplied directly through the commandline but it is strongly advised to use one of the config files found in the config/ folder. To get a description of all parameters run:

python run.py -h

Running experiments causes them to generate results files, by default these can be found in the results/ folder. These files are required for generating plots and calculating metrics.

Examples

Learn CartPole environment with REINFORCE:

python run.py --load_from config/examples/cartpole/test_actor_reinforce.json

Acrobot environment with GAE:

python run.py --load_from config/examples/acrobot/test_GAE.json

Plotting

The entry point for plotting is the plots.py file, you can supply it the results files generated by run.py. We can generate two plots, episode versus return and episode versus episode length see the arguments for more information python plots.py -h.

Examples

If you ran both examples from the training section you can generate an episode return plot:

python plots.py --results_files \
    results/exp_testgae_<timestamp>/output_1.json \
    results/exp_test_actor_REINFORCE_<timestamp>/output_1.json \
    --labels "CartPole REINFORCE" "Acrobot GAE"  \
    --plot e_return  \
    --title "my title" \
    --show

If you ran only a single experiment, only specify that results file and one label.

Analyse

The analyse.py file also contains some functions for measuring standard deviation and mean when running multiple experiments. The 'aggregate' function will combine multiple results files into a single one with mean and std instead of the original values. The 'AUC' function will compute the 'Area under Curve' metric and asymptote of results files.

Examples

Aggregate all files that match glob pattern output_*.json:

python analyse.py --input_name "results/Acrobot_AE/output_*.json" \
    --output_name aggregate_acrobot_ae.json  \
    --function aggregate \
    --targets return

Calculate AUC for results files (can also be a glob pattern):

python analyse.py --input_name "results/Acrobot_AE/output_1.json" \
    --output_name metrics.json  \
    --function auc \
    --targets return

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
config		config
losses		losses
policies		policies
results		results
utils		utils
.gitignore		.gitignore
README.md		README.md
analyse.py		analyse.py
environment.yml		environment.yml
plots.py		plots.py
requirements.txt		requirements.txt
run.py		run.py

SchutteJan/RLProject

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning: N-step Bootstrapping in Actor Critic Methods

Requirements

Training

Examples

Plotting

Examples

Analyse

Examples

About

Resources

Stars

Watchers

Forks

Languages