Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExplorationPolicies don't work with stepthrough #541

Closed
FlyingWorkshop opened this issue Mar 6, 2024 · 4 comments
Closed

ExplorationPolicies don't work with stepthrough #541

FlyingWorkshop opened this issue Mar 6, 2024 · 4 comments

Comments

@FlyingWorkshop
Copy link
Member

I'm trying to sample beliefs using the implemented exploration policies (SoftmaxPolicy and EspGreedyPolicy), but they don't work with stepthrough or the other simulator techniques that I've tried.

Steps to recreate:

using POMDPs
using POMDPModels
using POMDPTools

pomdp = TigerPOMDP()
policy = EpsGreedyPolicy(pomdp, 0.05)
beliefs = [b for b in stepthrough(pomdp, policy, DiscreteUpdater(pomdp), "b", max_steps=20)]

Error:

ERROR: MethodError: no method matching action(::EpsGreedyPolicy{POMDPTools.Policies.var"#20#21"{…}, Random.TaskLocalRNG, TigerPOMDP}, ::DiscreteBelief{TigerPOMDP, Bool})

Closest candidates are:
  action(::Starve, ::Any)
   @ POMDPModels ~/.julia/packages/POMDPModels/eZX2K/src/CryingBabies.jl:65
  action(::FeedWhenCrying, ::Any)
   @ POMDPModels ~/.julia/packages/POMDPModels/eZX2K/src/CryingBabies.jl:85
  action(::AlwaysFeed, ::Any)
   @ POMDPModels ~/.julia/packages/POMDPModels/eZX2K/src/CryingBabies.jl:69
  ...

Stacktrace:
 [1] action_info(p::EpsGreedyPolicy{…}, x::DiscreteBelief{…})
   @ POMDPTools.ModelTools ~/.julia/packages/POMDPTools/7Rekv/src/ModelTools/info.jl:12
 [2] iterate
   @ ~/.julia/packages/POMDPTools/7Rekv/src/Simulators/stepthrough.jl:91 [inlined]
 [3] iterate
   @ ~/.julia/packages/POMDPTools/7Rekv/src/Simulators/stepthrough.jl:85 [inlined]
 [4] iterate
   @ ./generator.jl:44 [inlined]
 [5] grow_to!
   @ ./array.jl:907 [inlined]
 [6] collect(itr::Base.Generator{POMDPTools.Simulators.POMDPSimIterator{…}, typeof(identity)})
   @ Base ./array.jl:831
 [7] top-level scope
   @ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.
@dylan-asmar
Copy link
Member

dylan-asmar commented Mar 7, 2024

I'm not sure about the history of the ExplorationPolicy abstract type, but it doesn't look like it is constructed to work with built-in simulators like stepthrough.

Most of the simulators call action_info(policy, state) to get the action (note: action_info calls action(policy, state) and returns nothing for the info by default: link).

From the documentation for the ExplorationPolicy type,

Sampling from an exploration policy is done using action(exploration_policy, on_policy, k, state).
Where k is used to determine the exploration parameter.

Based on the current documentation, this behavior is expected. However, there is probably a good argument to redefine how we construct the exploration policies to include the on_policy and k as part of the struct. Then we could define action(policy::ExplorationPolicy, state) appropriately based on the above comment.

Since I am not familiar with the background here in the development, I am not confident about any secondary issues as it would be a breaking change since we would be redefining the structs of those policies.

@dylan-asmar
Copy link
Member

Also, reference #497

@zsunberg
Copy link
Member

Yeah, the exploration policy interface was designed for reinforcement learning solvers where the exploration should be decayed, but it is not really a Policy. I would not object to a re-design of that interface.

If you just want an epsilon greedy policy for a rollout. I'd recommend:

struct MyEpsGreedy{M, P} <: POMDPs.Policy
    pomdp::M
    original_policy::P
    epsilon::Float64
end

function POMDPs.action(p::MyEpsGreedy, s)
    if rand() < p.epsilon
        return rand(actions(p.pomdp))
    else
        return action(p.original_policy, s)
    end
end

policy = MyEpsGreedy(pomdp, original_policy, 0.05)

@dylan-asmar
Copy link
Member

Closing. Please continue the discussion at #497.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants