Doubts #11

random-user-x · 2018-07-01T06:58:37Z

Could you please let me know why there is a negative sign. I think that since we have already defined kl-divergence in the step before, we do not need a negative sign here. Please let me know how do you see this.

ACER/train.py

Line 81 in f22b07c

(-kl).backward(retain_graph=True)

Kaixhin · 2018-07-01T09:02:01Z

The trust region function should be a translation of the one from ChainerRL, apart from the fact that the trust region involves some parameters, so if you think it should be the other way around then you should raise an issue there to let them know too.

Closes #1

random-user-x · 2018-07-01T09:11:47Z

I think since we have already defined the kl divergence before, we do not really need the negative sign. I am not sure why there is a negative sign.

Please let me know your views. Do you think it should be the other way round?

Kaixhin · 2018-07-01T09:21:08Z

I'm not sure either, I just ported the code from Chainer for this part.

random-user-x · 2018-07-01T09:37:59Z

Let me know the difference in your implementation and OpenAI baselines.

Kaixhin · 2018-07-01T10:00:35Z

I've not compared the two at all, and this is very low priority for me at the moment.

random-user-x · 2018-07-01T10:11:01Z

@Kaixhin, thank you for your input. I will look into the chainerrl and OpenAI baselines to get more insight about the implementation. I will just send a PR if needed.

Thanks

random-user-x · 2018-07-04T09:28:48Z

I am closing this issue because the current implementation seems to be correct. However, I think we need to detach the z_star_p for better results. Please refer to #13 .

Kaixhin referenced this issue Jul 1, 2018

Implement efficient trust region

4326295

Closes #1

random-user-x closed this as completed Jul 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doubts #11

Doubts #11

random-user-x commented Jul 1, 2018

Kaixhin commented Jul 1, 2018

random-user-x commented Jul 1, 2018

Kaixhin commented Jul 1, 2018

random-user-x commented Jul 1, 2018

Kaixhin commented Jul 1, 2018

random-user-x commented Jul 1, 2018

random-user-x commented Jul 4, 2018

Doubts #11

Doubts #11

Comments

random-user-x commented Jul 1, 2018

Kaixhin commented Jul 1, 2018

random-user-x commented Jul 1, 2018

Kaixhin commented Jul 1, 2018

random-user-x commented Jul 1, 2018

Kaixhin commented Jul 1, 2018

random-user-x commented Jul 1, 2018

random-user-x commented Jul 4, 2018