-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use different rollout depth than the tree construction depth. #88
Comments
@BoZenKhaa Thanks a bunch for thinking through this. I actually think we should change it to function like it does in the alg4dm book: https://algorithmsbook.com/files/dm.pdf#%5B%7B%22num%22%3A153%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C59.776%2C238.344%2Cnull%5D That is, the value estimates are by default infinite-horizon. We can then add an option to the RolloutEstimator for the maximum depth. This would be a breaking change, but we can increment the version number appropriately. What do you think of that? |
I will have a look at the relevant section in the book, but the way I learned about MCTS, rollouts to the terminal state are the "default" for me, so the current behavior was a surprise to me. I think your proposal sounds good. If the infinite rollout is to be a default, then there is no way to do it without break. Maybe the default rollout could also have some epsilon defined for MDPs with infinite horizon? |
Closed with #91 |
Currently, when using the rollout estimator, the rollouts end at the same depth as the MCTS tree construction. This can be solved by redefining the rollouts in e.g. this way (for rollouts terminating in terminal states):
Providing custom policy for the
RolloutEstimator
in theestimate_value
parameter does not change the depth, since the depth is handled in the rollout.To me, this behavior seems somewhat unintuitive, and the solution is undocumented as far as I can tell.
I think there are few options I can think of to address this:
MCTSSolver
, such asrollout_depth
, that will be passed to theestimate_value
instead of thedepth
variable. For backward compatibility, this could be a function, defaulting todepth->depth
.domain_knowledge.jl
that handle the different depth.Overall, the option 1. is the easiest, option 2. seems hacky, since not all estimators are rollouts. Option 3. makes most sense to me, maybe with some small rewrite to make the new methods play well with the rest of the
domain_knowledge.jl
.I can try to prepare the PR for either option, but after seeing #38, I thought there might be an even better way of handling this?
The text was updated successfully, but these errors were encountered: