Baseline Techniques for REINFORCE Algorithm in Policy Based Reinforcement Learning

Four techniques compared on CartPole environment:

REINFORCE with whitened returns
REINFORCE with learned learned value function as baseline
Self-critic with sampled rollout

Usage

The reported experiments in the blog can be reproduced by executing gridsearch.py, where we provide a function for each running a gridsearch for REINFORCE, REINFORCE with learned baseline and the self-critic approach. The used hyperparameters, as described in the blog, are:

REINFORCE: Learning rates 4e-5,1e-4,4e-4,1e-3
REINFORCE with learned baseline: Learning rates 2e-4,5e-4,1e-3, Alpha 0.1,0.25,0.5,0.75,0.9
REINFORCE Self-critic: Learning rates 2e-3,5e-3,1e-2, Number of beams 1,2,4 (for greedy only 1), Log basis 2,3,4

The experiments were performed on a NVIDIA GeForce 1080Ti. Note that using a different GPU can lead to slightly different results due to the different number generators of the device.

The results of the gridsearch is stored in the directory data_gridsearch, where the subdirectory is by default based on the date and algorithm. We export the results of each run (episode length over time, number of interactions and number of update steps) in a compressed npz file, as well as the information about the experiments (i.e. which hyperparameters were used for each experiment). For plotting purposes, we take these two files as input and generate plots in the file plotting.py.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
LunarLanderEnv.py		LunarLanderEnv.py
README.md		README.md
gridsearch.py		gridsearch.py
learned_value.py		learned_value.py
loss_utils.py		loss_utils.py
lunar_training.py		lunar_training.py
maze_training.py		maze_training.py
mutils.py		mutils.py
plotting.py		plotting.py
train_utils.py		train_utils.py
visualize_tree_fork.py		visualize_tree_fork.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LunarLanderEnv.py

LunarLanderEnv.py

README.md

README.md

gridsearch.py

gridsearch.py

learned_value.py

learned_value.py

loss_utils.py

loss_utils.py

lunar_training.py

lunar_training.py

maze_training.py

maze_training.py

mutils.py

mutils.py

plotting.py

plotting.py

train_utils.py

train_utils.py

visualize_tree_fork.py

visualize_tree_fork.py

Repository files navigation

Baseline Techniques for REINFORCE Algorithm in Policy Based Reinforcement Learning

Usage

About

Releases

Packages

Contributors 4

Languages

Nithin-Holla/reinforce_baselines

Folders and files

Latest commit

History

Repository files navigation

Baseline Techniques for REINFORCE Algorithm in Policy Based Reinforcement Learning

Usage

About

Resources

Stars

Watchers

Forks

Languages