PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
Clone or download

PyTorch implementation of reinforcement learning algorithms

This repository contains:

  1. policy gradient methods (TRPO, PPO, A2C)
  2. Generative Adversarial Imitation Learning (GAIL)

Important notes

  • The code now works for PyTorch 0.4. For PyTorch 0.3, please check out the 0.3 branch.
  • To run mujoco environments, first install mujoco-py and gym.
  • If you have a GPU, I recommend setting the OMP_NUM_THREADS to 1 (PyTorch will create additional threads when performing computations which can damage the performance of multiprocessing. This problem is most serious with Linux, where multiprocessing can be even slower than a single thread):


  • Support CUDA. (x10 faster than CPU implementation)
  • Support discrete and continous action space.
  • Support multiprocessing for agent to collect samples in multiple environments simultaneously. (x8 faster than single thread)
  • Fast Fisher vector product calculation. For this part, Ankur kindly wrote a blog explaining the implementation details.

Policy gradient methods


  • python examples/ --env-name Hopper-v2


Generative Adversarial Imitation Learning (GAIL)

To save trajectory

  • python gail/ --model-path assets/expert_traj/Hopper-v2_ppo.p

To do imitation learning

  • python gail/ --env-name Hopper-v2 --expert-traj-path assets/expert_traj/Hopper-v2_expert_traj.p