Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use different rollout depth than the tree construction depth. #88

Closed
BoZenKhaa opened this issue Feb 3, 2022 · 3 comments
Closed

Use different rollout depth than the tree construction depth. #88

BoZenKhaa opened this issue Feb 3, 2022 · 3 comments

Comments

@BoZenKhaa
Copy link
Contributor

BoZenKhaa commented Feb 3, 2022

Currently, when using the rollout estimator, the rollouts end at the same depth as the MCTS tree construction. This can be solved by redefining the rollouts in e.g. this way (for rollouts terminating in terminal states):

function MCTS.rollout(estimator::MCTS.SolvedRolloutEstimator, mdp::MDP, s, d::Int)
    sim = RolloutSimulator(;estimator.rng, eps=nothing, max_steps=nothing)
    POMDPs.simulate(sim, mdp, estimator.policy, s)
end

Providing custom policy for the RolloutEstimator in the estimate_value parameter does not change the depth, since the depth is handled in the rollout.

To me, this behavior seems somewhat unintuitive, and the solution is undocumented as far as I can tell.

I think there are few options I can think of to address this:

  1. Add this example to the docs and to the domain knowledge notebook.
  2. Add new parameter to the MCTSSolver, such as rollout_depth, that will be passed to the estimate_value instead of the depth variable. For backward compatibility, this could be a function, defaulting to depth->depth.
  3. Provide new methods for the domain_knowledge.jl that handle the different depth.

Overall, the option 1. is the easiest, option 2. seems hacky, since not all estimators are rollouts. Option 3. makes most sense to me, maybe with some small rewrite to make the new methods play well with the rest of the domain_knowledge.jl.

I can try to prepare the PR for either option, but after seeing #38, I thought there might be an even better way of handling this?

@zsunberg
Copy link
Member

@BoZenKhaa Thanks a bunch for thinking through this.

I actually think we should change it to function like it does in the alg4dm book: https://algorithmsbook.com/files/dm.pdf#%5B%7B%22num%22%3A153%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C59.776%2C238.344%2Cnull%5D

That is, the value estimates are by default infinite-horizon. We can then add an option to the RolloutEstimator for the maximum depth.

This would be a breaking change, but we can increment the version number appropriately. What do you think of that?

@BoZenKhaa
Copy link
Contributor Author

I will have a look at the relevant section in the book, but the way I learned about MCTS, rollouts to the terminal state are the "default" for me, so the current behavior was a surprise to me.

I think your proposal sounds good. If the infinite rollout is to be a default, then there is no way to do it without break. Maybe the default rollout could also have some epsilon defined for MDPs with infinite horizon?

@zsunberg
Copy link
Member

Closed with #91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants