Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trust Region Updates #12

Closed
random-user-x opened this issue Jul 4, 2018 · 2 comments
Closed

Trust Region Updates #12

random-user-x opened this issue Jul 4, 2018 · 2 comments

Comments

@random-user-x
Copy link
Contributor

random-user-x commented Jul 4, 2018

Hello @Kaixhin,

ACER/train.py

Line 97 in f22b07c

trust_loss += (param * z_star_p).sum()

I think that we should freeze the value of z_star_p by using z_star_p.detach().

In the second stage, we take advantage of back-propagation. Specifically, the updated gradient with
respect to φθ, that is z_∗, is back-propagated through the network to compute the derivatives with
respect to the parameters. 

Please let me know what do you think.

@random-user-x
Copy link
Contributor Author

#13

@Kaixhin
Copy link
Owner

Kaixhin commented Jul 7, 2018

Ah yes the gradients are probably leaking through z_star_p, I think you are right on this one.

@Kaixhin Kaixhin closed this as completed Jul 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants