ExplorationPolicies don't work with stepthrough #541

FlyingWorkshop · 2024-03-06T22:35:57Z

I'm trying to sample beliefs using the implemented exploration policies (SoftmaxPolicy and EspGreedyPolicy), but they don't work with stepthrough or the other simulator techniques that I've tried.

Steps to recreate:

using POMDPs
using POMDPModels
using POMDPTools

pomdp = TigerPOMDP()
policy = EpsGreedyPolicy(pomdp, 0.05)
beliefs = [b for b in stepthrough(pomdp, policy, DiscreteUpdater(pomdp), "b", max_steps=20)]

Error:

ERROR: MethodError: no method matching action(::EpsGreedyPolicy{POMDPTools.Policies.var"#20#21"{…}, Random.TaskLocalRNG, TigerPOMDP}, ::DiscreteBelief{TigerPOMDP, Bool})

Closest candidates are:
  action(::Starve, ::Any)
   @ POMDPModels ~/.julia/packages/POMDPModels/eZX2K/src/CryingBabies.jl:65
  action(::FeedWhenCrying, ::Any)
   @ POMDPModels ~/.julia/packages/POMDPModels/eZX2K/src/CryingBabies.jl:85
  action(::AlwaysFeed, ::Any)
   @ POMDPModels ~/.julia/packages/POMDPModels/eZX2K/src/CryingBabies.jl:69
  ...

Stacktrace:
 [1] action_info(p::EpsGreedyPolicy{…}, x::DiscreteBelief{…})
   @ POMDPTools.ModelTools ~/.julia/packages/POMDPTools/7Rekv/src/ModelTools/info.jl:12
 [2] iterate
   @ ~/.julia/packages/POMDPTools/7Rekv/src/Simulators/stepthrough.jl:91 [inlined]
 [3] iterate
   @ ~/.julia/packages/POMDPTools/7Rekv/src/Simulators/stepthrough.jl:85 [inlined]
 [4] iterate
   @ ./generator.jl:44 [inlined]
 [5] grow_to!
   @ ./array.jl:907 [inlined]
 [6] collect(itr::Base.Generator{POMDPTools.Simulators.POMDPSimIterator{…}, typeof(identity)})
   @ Base ./array.jl:831
 [7] top-level scope
   @ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.

The text was updated successfully, but these errors were encountered:

dylan-asmar · 2024-03-07T23:33:27Z

I'm not sure about the history of the ExplorationPolicy abstract type, but it doesn't look like it is constructed to work with built-in simulators like stepthrough.

Most of the simulators call action_info(policy, state) to get the action (note: action_info calls action(policy, state) and returns nothing for the info by default: link).

From the documentation for the ExplorationPolicy type,

Sampling from an exploration policy is done using action(exploration_policy, on_policy, k, state).
Where k is used to determine the exploration parameter.

Based on the current documentation, this behavior is expected. However, there is probably a good argument to redefine how we construct the exploration policies to include the on_policy and k as part of the struct. Then we could define action(policy::ExplorationPolicy, state) appropriately based on the above comment.

Since I am not familiar with the background here in the development, I am not confident about any secondary issues as it would be a breaking change since we would be redefining the structs of those policies.

dylan-asmar · 2024-03-07T23:35:27Z

Also, reference #497

zsunberg · 2024-03-11T20:49:17Z

Yeah, the exploration policy interface was designed for reinforcement learning solvers where the exploration should be decayed, but it is not really a Policy. I would not object to a re-design of that interface.

If you just want an epsilon greedy policy for a rollout. I'd recommend:

struct MyEpsGreedy{M, P} <: POMDPs.Policy
    pomdp::M
    original_policy::P
    epsilon::Float64
end

function POMDPs.action(p::MyEpsGreedy, s)
    if rand() < p.epsilon
        return rand(actions(p.pomdp))
    else
        return action(p.original_policy, s)
    end
end

policy = MyEpsGreedy(pomdp, original_policy, 0.05)

dylan-asmar · 2024-03-23T20:59:46Z

Closing. Please continue the discussion at #497.

dylan-asmar closed this as completed Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExplorationPolicies don't work with stepthrough #541

ExplorationPolicies don't work with stepthrough #541

FlyingWorkshop commented Mar 6, 2024

dylan-asmar commented Mar 7, 2024 •

edited

Loading

dylan-asmar commented Mar 7, 2024

zsunberg commented Mar 11, 2024

dylan-asmar commented Mar 23, 2024

ExplorationPolicies don't work with stepthrough #541

ExplorationPolicies don't work with stepthrough #541

Comments

FlyingWorkshop commented Mar 6, 2024

dylan-asmar commented Mar 7, 2024 • edited Loading

dylan-asmar commented Mar 7, 2024

zsunberg commented Mar 11, 2024

dylan-asmar commented Mar 23, 2024

dylan-asmar commented Mar 7, 2024 •

edited

Loading