Skip to content

LihaoR/d4pg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

d4pg

This is a simple implement of d4pg.

distribution value and async.

I use qr dqn's value distribution with MSE loss to replace c51 dqn's distribution. More informations about qr dqn are in my another profile c51-qr-dqn

In my experiment, in continuous control, distribution value is not better than vanilla ddpg.

original code is from morvan.

To do

prioritized experience replay, and n-step return. In Pendulum, I think multi-step return may not be better than one-step return after serveral experiments.

paper

Distributed Distributional Deterministic Policy Gradients

About

async distribution ddpg

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages