Skip to content

ShangtongZhang/DeepRL

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
December 14, 2020 15:49
July 18, 2020 15:07
January 14, 2022 12:12
July 17, 2020 09:43
April 2, 2019 15:40
October 18, 2022 11:26
July 18, 2020 15:07
July 16, 2020 21:46
July 17, 2020 09:43
July 18, 2020 15:07
July 18, 2020 15:07

DeepRL

@@ I am looking for self-motivated students interested in RL at different levels! @@
@@ Visit https://shangtongzhang.github.io/people/ for more details. @@

If you have any question or want to report a bug, please open an issue instead of emailing me directly.

Modularized implementation of popular deep RL algorithms in PyTorch.
Easy switch between toy tasks and challenging games.

Implemented algorithms:

The DQN agent, as well as C51 and QR-DQN, has an asynchronous actor for data generation and an asynchronous replay buffer for transferring data to GPU. Using 1 RTX 2080 Ti and 3 threads, the DQN agent runs for 10M steps (40M frames, 2.5M gradient updates) for Breakout within 6 hours.

Dependency

  • PyTorch v1.5.1
  • See Dockerfile and requirements.txt for more details

Usage

examples.py contains examples for all the implemented algorithms.
Dockerfile contains the environment for generating the curves below.
Please use this bibtex if you want to cite this repo

@misc{deeprl,
  author = {Zhang, Shangtong},
  title = {Modularized Implementation of Deep RL Algorithms in PyTorch},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/ShangtongZhang/DeepRL}},
}

Curves (commit 9e811e)

BreakoutNoFrameskip-v4 (1 run)

Loading...

Mujoco

  • DDPG/TD3 evaluation performance. Loading... (5 runs, mean + standard error)

  • PPO online performance. Loading... (5 runs, mean + standard error, smoothed by a window of size 10)

References

Code of My Papers

They are located in other branches of this repo and seem to be good examples for using this codebase.