d4pg

This is a simple implement of d4pg.

distribution value and async.

I use qr dqn's value distribution with MSE loss to replace c51 dqn's distribution. More informations about qr dqn are in my another profile c51-qr-dqn

In my experiment, in continuous control, distribution value is not better than vanilla ddpg.

original code is from morvan.

To do

prioritized experience replay, and n-step return. In Pendulum, I think multi-step return may not be better than one-step return after serveral experiments.

paper

Distributed Distributional Deterministic Policy Gradients

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
README.md		README.md
async-ddpg.py		async-ddpg.py
atari_wrappers.py		atari_wrappers.py
d4pg.py		d4pg.py
new_d4pg.py		new_d4pg.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

d4pg

This is a simple implement of d4pg.

To do

paper

About

Releases

Packages

Languages

LihaoR/d4pg

Folders and files

Latest commit

History

Repository files navigation

d4pg

This is a simple implement of d4pg.

To do

paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages