trpo

trust region policy optimitztion base on gym and tensorflow

There are three versions of trpo, one for decrete action space like mountaincar, one for decreate action space task with image as input like atari games, and the last for continuous action space for pendulems.

The environment is base on openAI gym.

part of code refer to rllab

constructure for code

baseline:baseline estimation of baseline function , note that baseline_tensorflow.py have some problems and can not be used now
checkpoint:folder to store model file, can not be delete or will cause some error
distribution:distribution base class, it can be used to calculate probability of distributions, for example Gaussian.
logger:have a Logger class for log data to .csv file
log:store log file
main.py: main file, run this file can start trainning or testing
agent.py: agent
environment.py: environment
krylov.py: implement of some math method:conjugate gradient descent , calculating hessian matrix
parameters.py: config file
utils.py: implement of some basic function: getFlat, setFlat, lineaSearch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

trpo

constructure for code

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.idea		.idea
baseline		baseline
distribution		distribution
logger		logger
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
environment.py		environment.py
krylov.py		krylov.py
main.py		main.py
parameters.py		parameters.py
run.py		run.py
storage.py		storage.py
utils.py		utils.py

AirSmithX/trpo

Folders and files

Latest commit

History

Repository files navigation

trpo

constructure for code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages