distribution value and async.
I use qr dqn's value distribution with MSE loss to replace c51 dqn's distribution. More informations about qr dqn are in my another profile c51-qr-dqn
In my experiment, in continuous control, distribution value is not better than vanilla ddpg.
original code is from morvan.
prioritized experience replay, and n-step return. In Pendulum, I think multi-step return may not be better than one-step return after serveral experiments.