GitHub

PPO

Proximal Policy Optimization implementation with Tensorflow.

https://arxiv.org/pdf/1707.06347.pdf

This repository has been much updated from commit id a4fbd383f0f89ce2d881a8b78d6b8a03294e5c7c . New PPO requires a new dependency, rlsaber which is my utility repository that can be shared across different algorithms.

Some of my design follow OpenAI baselines. But, I used as many default tensorflow packages as possible unlike baselines, that makes my codes easier to be read.

In addition, my PPO automatically switches between continuous action-space and discrete action-space depending on environments. If you want to change hyper parameters, check atari_constants.py or box_constants.py, which will be loaded depending on environments too.

requirements

Python3

dependencies

tensorflow
gym[atari]
opencv-python
git+https://github.com/imai-laboratory/rlsaber

usage

training

$ python train.py [--env env-id] [--render] [--logdir log-name]

example

$ python train.py --env BreakoutNoFrameskip-v4 --logdir breakout

playing

$ python train.py --demo --load results/path-to-model [--env env-id] [--render]

example

$ python train.py --demo --load results/breakout/model.ckpt-xxxx --env BreakoutNoFrameskip-v4 --render

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
__pycache__		__pycache__
logs/baseline_MDP		logs/baseline_MDP
old		old
pacman_env		pacman_env
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
atari_constants.py		atari_constants.py
box_constants.py		box_constants.py
build_graph.py		build_graph.py
network.py		network.py
pacman.py		pacman.py
rollout.py		rollout.py
scheduler.py		scheduler.py
train.py		train.py

License

benpetit/cs379c

Folders and files

Latest commit

History

Repository files navigation

PPO

requirements

dependencies

usage

training

playing

performance examples

Pendulumn-v0

BreakoutNoFrameskip-v4

implementation

License

About

Resources

License

Stars

Watchers

Forks

Languages