trust region policy optimitztion base on gym and tensorflow
There are three versions of trpo, one for decrete action space like mountaincar, one for decreate action space task with image as input like atari games, and the last for continuous action space for pendulems.
The environment is base on openAI gym.
part of code refer to rllab
- baseline:baseline estimation of baseline function , note that baseline_tensorflow.py have some problems and can not be used now
- checkpoint:folder to store model file, can not be delete or will cause some error
- distribution:distribution base class, it can be used to calculate probability of distributions, for example Gaussian.
- logger:have a Logger class for log data to .csv file
- log:store log file
- main.py: main file, run this file can start trainning or testing
- agent.py: agent
- environment.py: environment
- krylov.py: implement of some math method:conjugate gradient descent , calculating hessian matrix
- parameters.py: config file
- utils.py: implement of some basic function: getFlat, setFlat, lineaSearch