Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about action selection #12

Closed
rajcscw opened this issue Feb 25, 2018 · 2 comments
Closed

Question about action selection #12

rajcscw opened this issue Feb 25, 2018 · 2 comments

Comments

@rajcscw
Copy link

rajcscw commented Feb 25, 2018

In case of policy gradients, we try to approximate a softmax policy from which we sample actions based on probabilities stochastically.

How about in ES in case of discrete action space? Does the method follow greedy policy or softmax policy? From the code, I could see it is greedy policy, is it the right behavior?

@atgambardella
Copy link
Owner

It isn't necessary to use a stochastic policy during training for ES, as we don't need to take noisy actions to explore -- the exploration is solely done through noise on the weights.

@rajcscw
Copy link
Author

rajcscw commented Mar 4, 2018

Thanks for clearing this up, I also referred the paper again, it seems they are using deterministic policy as you said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants