-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExplorationPolicies don't work with stepthrough #541
Comments
I'm not sure about the history of the Most of the simulators call From the documentation for the
Based on the current documentation, this behavior is expected. However, there is probably a good argument to redefine how we construct the exploration policies to include the Since I am not familiar with the background here in the development, I am not confident about any secondary issues as it would be a breaking change since we would be redefining the structs of those policies. |
Also, reference #497 |
Yeah, the exploration policy interface was designed for reinforcement learning solvers where the exploration should be decayed, but it is not really a If you just want an epsilon greedy policy for a rollout. I'd recommend:
|
Closing. Please continue the discussion at #497. |
I'm trying to sample beliefs using the implemented exploration policies (SoftmaxPolicy and EspGreedyPolicy), but they don't work with
stepthrough
or the other simulator techniques that I've tried.Steps to recreate:
Error:
The text was updated successfully, but these errors were encountered: