Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition Policy Gradients #10

Open
gyh75520 opened this issue Sep 25, 2018 · 1 comment
Open

Transition Policy Gradients #10

gyh75520 opened this issue Sep 25, 2018 · 1 comment

Comments

@gyh75520
Copy link

From the paper:
in fact the properform for the transition policy gradient arrived at in eqn.10.

manager_loss = -tf.reduce_sum((self.r-cutoff_vf_manager)*dcos) ( from code )
why not implement the eqn 10.

@biggzlar
Copy link

biggzlar commented Jan 30, 2019

Because the simpler, heuristic form in eqn. 7 is in fact the proper form of the more complex and (probably) less robust eqn. 10. Eqn. 10 is the gradient of a policy over states, instead of state-space directions.

Here's an intuition: We tell the agent to find a real world address. Eqn. 10 suggests intermittent addresses to help the agent find the final one - and the agent is rewarded every time they find one of the addresses suggested. Eqn. 7 suggests directions towards intermittent addresses and the agent is rewarded as soon as they follow the direction (so if the agent acts well, they get rewarded all the time, instead of sparsely).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants