Skip to content
/ trpo Public
forked from jjkke88/trpo

trust region policy optimization base on gym and tensorflow

Notifications You must be signed in to change notification settings

AirSmithX/trpo

 
 

Repository files navigation

trpo

trust region policy optimitztion base on gym and tensorflow

There are three versions of trpo, one for decrete action space like mountaincar, one for decreate action space task with image as input like atari games, and the last for continuous action space for pendulems.

The environment is base on openAI gym.

part of code refer to rllab

constructure for code

  • baseline:baseline estimation of baseline function , note that baseline_tensorflow.py have some problems and can not be used now
  • checkpoint:folder to store model file, can not be delete or will cause some error
  • distribution:distribution base class, it can be used to calculate probability of distributions, for example Gaussian.
  • logger:have a Logger class for log data to .csv file
  • log:store log file
  • main.py: main file, run this file can start trainning or testing
  • agent.py: agent
  • environment.py: environment
  • krylov.py: implement of some math method:conjugate gradient descent , calculating hessian matrix
  • parameters.py: config file
  • utils.py: implement of some basic function: getFlat, setFlat, lineaSearch

About

trust region policy optimization base on gym and tensorflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%