Trust Region Updates #12

random-user-x · 2018-07-04T04:42:48Z

Line 97 in f22b07c

trust_loss += (param * z_star_p).sum()

I think that we should freeze the value of z_star_p by using z_star_p.detach().

In the second stage, we take advantage of back-propagation. Specifically, the updated gradient with
respect to φθ, that is z_∗, is back-propagated through the network to compute the derivatives with
respect to the parameters.

Please let me know what do you think.

The text was updated successfully, but these errors were encountered:

random-user-x · 2018-07-04T09:35:31Z

#13

Kaixhin · 2018-07-07T08:09:43Z

Ah yes the gradients are probably leaking through z_star_p, I think you are right on this one.

Kaixhin closed this as completed Jul 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trust Region Updates #12

Trust Region Updates #12

random-user-x commented Jul 4, 2018 •

edited

Loading

random-user-x commented Jul 4, 2018

Kaixhin commented Jul 7, 2018

Trust Region Updates #12

Trust Region Updates #12

Comments

random-user-x commented Jul 4, 2018 • edited Loading

random-user-x commented Jul 4, 2018

Kaixhin commented Jul 7, 2018

random-user-x commented Jul 4, 2018 •

edited

Loading