Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubts #11

Closed
random-user-x opened this issue Jul 1, 2018 · 7 comments
Closed

Doubts #11

random-user-x opened this issue Jul 1, 2018 · 7 comments

Comments

@random-user-x
Copy link
Contributor

Could you please let me know why there is a negative sign. I think that since we have already defined kl-divergence in the step before, we do not need a negative sign here. Please let me know how do you see this.

ACER/train.py

Line 81 in f22b07c

(-kl).backward(retain_graph=True)

@Kaixhin
Copy link
Owner

Kaixhin commented Jul 1, 2018

The trust region function should be a translation of the one from ChainerRL, apart from the fact that the trust region involves some parameters, so if you think it should be the other way around then you should raise an issue there to let them know too.

@random-user-x
Copy link
Contributor Author

I think since we have already defined the kl divergence before, we do not really need the negative sign. I am not sure why there is a negative sign.

Please let me know your views. Do you think it should be the other way round?

@Kaixhin
Copy link
Owner

Kaixhin commented Jul 1, 2018

I'm not sure either, I just ported the code from Chainer for this part.

@random-user-x
Copy link
Contributor Author

Let me know the difference in your implementation and OpenAI baselines.

@Kaixhin
Copy link
Owner

Kaixhin commented Jul 1, 2018

I've not compared the two at all, and this is very low priority for me at the moment.

@random-user-x
Copy link
Contributor Author

@Kaixhin, thank you for your input. I will look into the chainerrl and OpenAI baselines to get more insight about the implementation. I will just send a PR if needed.

Thanks

@random-user-x
Copy link
Contributor Author

I am closing this issue because the current implementation seems to be correct. However, I think we need to detach the z_star_p for better results. Please refer to #13 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants