Policy Gradient Methods

PyTorch implementation of policy gradient methods.

NOTE This repository is still work in progress! As I continue to try to break things down into modular and reusable parts things might break. However, I will try to ensure the cases in tests keep passing.

Installation

This library only works with Python 3.5+. If you are using Python 2.7 you should upgrade immediately. The requirements for this library can be found in requirements.txt. To install this library you can use pip:

pip install -e .

The -e indicates that the library will be installed in development mode. You can then check if it works by opening up python and typing:

import pg_methods
print(pg_methods.__version__) # should print 0

Tests

There are tests for components in this library under ./tests/. You can run them by executing python -m pytest ./tests --verbose.

Philosophy

There are a few good reinforcement learning reinforcement algorithm algorithm implementations in pytorch. There are many ones in Tensorflow, Theano and Keras. The main thing lacking in the PyTorch implementations is extensibility/modularity. Sure I would love to run this one algorithm on all environments ever. But sometimes it's just the little parts that are useful. For example, a good utility to calculate discounted future returns with masks. Or the REINFORCE objective itself. Maybe you want to try a new kind of baseline? The goal of this library is to allow you to do all of these things. Sort of like LEGO. Arguably, more important than having a long script with the algorithm, is having the components to make new ones. This is one thing I find frustrating with baselines, all the algorithms are in their own folders, with only marginal code sharing. I've already used some stuff from here in some (old version of pg_methods) of my projects (soon to be released).

To see how the code is organized see ./pg_methods/README.md

Algorithms Implemented

Vanilla Policy Gradient (pg_methods.algorithms.VanillaPolicyGradient)

To be implemented (contributions welcome!)

Synchronous Advantage Actor Critic
Asynchronous Advantage Actor Critic
Natural Policy Gradient
Trust Region Policy Optimization
Proximal Policy Optimization

etc.

other opportunities to contribute

See projects. Things like new objectives, baselines optimizers, replay_memorys are all good contributions!

Also what would be cool is a large scale benchmarking script so that we can run all the algorithms to see how they perform on different gym environments.

Some performance graphs (soon to improve)

I'm working to get roboschool installed on the ComputeCanada clusters so i can run for longer. To install roboschool on your local machine you can try this script

Example

Here is an example script of how to get started with the VanillaPolicyGradient algorithm. We expect other algorithms to have similar interfaces.

from pg_methods import interfaces
from pg_methods.algorithms.REINFORCE import VanillaPolicyGradient
from pg_methods.baselines import FunctionApproximatorBaseline
from pg_methods.utils import experiment

env = interfaces.make_parallelized_gym_env('CartPole-v0', seed=4, n_workers=2)
experiment_logger = experiment.Experiment({'algorithm_name': 'VPG'}, './')
experiment_logger.start()
fn_approximator, policy = experiment.setup_policy(env, hidden_non_linearity=nn.ReLU, hidden_sizes=[16, 16])
optimizer = torch.optim.SGD(fn_approximator.parameters(), lr=0.01)

# setting up a baseline function
baseline_approximator = MLP_factory(env.observation_space_info['shape'][0],
                               [16, 16],
                               output_size=1,
                               hidden_non_linearity=nn.ReLU)
baseline_optimizer = torch.optim.SGD(baseline_approximator.parameters(), lr=0.01)
baseline = FunctionApproximatorBaseline(baseline_approximator, baseline_optimizer)

algorithm = VanillaPolicyGradient(env, policy, optimizer, gamma=0.99, baseline=baseline)

rewards, losses = algorithm.run(1000, verbose=True)

experiment_logger.log_data('rewards', rewards.tolist())
experiment_logger.save()

More example scripts can be seen in ./experiments/

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.circleci		.circleci
experiments		experiments
pg_methods		pg_methods
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.circleci

.circleci

experiments

experiments

pg_methods

pg_methods

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pytest.ini

pytest.ini

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Policy Gradient Methods

Installation

Tests

Philosophy

Algorithms Implemented

To be implemented (contributions welcome!)

other opportunities to contribute

Some performance graphs (soon to improve)

Example

About

Releases

Packages

Contributors 2

Languages

License

zafarali/policy-gradient-methods

Folders and files

Latest commit

History

Repository files navigation

Policy Gradient Methods

Installation

Tests

Philosophy

Algorithms Implemented

To be implemented (contributions welcome!)

other opportunities to contribute

Some performance graphs (soon to improve)

Example

About

Topics

Resources

License

Stars

Watchers

Forks

Languages