Skip to content

aranganath/pytorch-trpo

 
 

Repository files navigation

PyTorch implementation of TRPO

Try my implementation of PPO (aka newer better variant of TRPO), unless you need to you TRPO for some specific reasons.

This is a PyTorch implementation of "Trust Region Policy Optimization (TRPO)".

This is code mostly ported from original implementation by John Schulman and forked from here. In contrast to another implementation of TRPO in PyTorch, this implementation uses exact Hessian-vector product instead of finite differences approximation.

Contributions

We are presenting a new method for optimization, called 'Adaptive regularized cubics using L-SR1 hessian approximations'. We present the comparitive results with TRPO, forked from here. All implementations are in pytorch.

Usage

python main.py --env-name "Reacher-v1"

Recommended hyper parameters

InvertedPendulum-v1: 5000

Reacher-v1, InvertedDoublePendulum-v1: 15000

HalfCheetah-v1, Hopper-v1, Swimmer-v1, Walker2d-v1: 25000

Ant-v1, Humanoid-v1: 50000

Results

More or less similar to the original code. Coming soon.

About

PyTorch implementation of Trust Region Policy Optimization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.2%
  • Python 1.8%