Are policy and value gradients propagated back through the world model? #3

xlnwel · 2020-02-26T23:24:03Z

Hi,

Furthermore, may I ask if the gradients of policy and value are backpropagated to the world model? If I understand your code right, they are not as this line suggests. However, your paper mentions it in several places that value and policy gradients are propagated back through the dynamics. Therefore, I am afraid that I understand the code wrong. Please let me know if I made any mistakes.

Best,

Sherwin

danijar · 2020-02-27T20:45:05Z

The value loss is not propagated through multi-step predictions because we stop the gradients around the value targets as usual, turning it into a per-step loss.

The actor loss is the negative of the lambda returns. This is backpropagated through imagined sequences of multiple states.

Precisely, the gradient flows from the predicted value of a future state through the neural network value function, through the sequence of earlier states, through the sampled action, into the actor.

The stop gradient your pointing to makes sure it doesn't flow further. In other words, we only consider how a current action influences future states and their values. But we don't consider how a current action influences future actions.

Hope this helps. Regarding pcont, please reply to the previous ticket on that topic so we keep the discussion organized and easier for others to follow.

xlnwel · 2020-02-28T06:50:48Z

Hi,

Thanks for your explanation. I see now that the gradient of the actor comes directly from the predicted values, which is different from the traditional policy gradient method. That makes sense now. By the way, I've moved my question about pcon to the previous issue.

danijar closed this as completed Feb 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are policy and value gradients propagated back through the world model? #3

Are policy and value gradients propagated back through the world model? #3

xlnwel commented Feb 26, 2020 •

edited

Loading

danijar commented Feb 27, 2020

xlnwel commented Feb 28, 2020

Are policy and value gradients propagated back through the world model? #3

Are policy and value gradients propagated back through the world model? #3

Comments

xlnwel commented Feb 26, 2020 • edited Loading

danijar commented Feb 27, 2020

xlnwel commented Feb 28, 2020

xlnwel commented Feb 26, 2020 •

edited

Loading