Skip to content

Commit

Permalink
alphapairs and alphavectors
Browse files Browse the repository at this point in the history
  • Loading branch information
zsunberg committed Jun 5, 2019
1 parent 090f1a2 commit 45e59c8
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 7 deletions.
5 changes: 3 additions & 2 deletions docs/src/alpha_vector.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# Alpha Vector Policy

Represents a policy with a set of alpha vectors (See `AlphaVectorPolicy` constructor docstring). In addition to finding the optimal action with `action`, pairs mapping each alpha vector to an action can be iterated through with [`alpha_actions`](@ref).
Represents a policy with a set of alpha vectors (See `AlphaVectorPolicy` constructor docstring). In addition to finding the optimal action with `action`, the alpha vectors can be accessed with [`alphavectors`](@ref) or [`alphapairs`](@ref).

Determining the estimated value and optimal action depends on calculating the dot product between alpha vectors and a belief vector. [`POMDPPolicies.beliefvec(pomdp, b)`](@ref) is used to create this vector and can be overridden for new belief types for efficiency.


```@docs
AlphaVectorPolicy
alpha_actions
alphavectors
alphapairs
POMDPPolicies.beliefvec
```
3 changes: 2 additions & 1 deletion src/POMDPPolicies.jl
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ export

export
AlphaVectorPolicy,
alpha_actions
alphavectors,
alphapairs

include("alpha_vector.jl")

Expand Down
11 changes: 8 additions & 3 deletions src/alpha_vector.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Construct a policy from alpha vectors.
Represents a policy with a set of alpha vectors.
Use `action` to get the best action for a belief, an `alpha_actions` to iterate through alpha-vector => action pairs.
Use `action` to get the best action for a belief, and `alphavectors` and `alphapairs` to
# Fields
- `pomdp::P` the POMDP problem
Expand Down Expand Up @@ -43,9 +43,14 @@ end
updater(p::AlphaVectorPolicy) = DiscreteUpdater(p.pomdp)

"""
Return an iterator of alpha-vector => action pairs in the policy.
Return an iterator of alpha vector-action pairs in the policy.
"""
alpha_actions(p::AlphaVectorPolicy) = (p.alphas[i]=>p.action_map[i] for i in 1:length(p.alphas))
alphapairs(p::AlphaVectorPolicy) = (p.alphas[i]=>p.action_map[i] for i in 1:length(p.alphas))

"""
Return the alpha vectors.
"""
alphavectors(p::AlphaVectorPolicy) = p.alphas

# The three functions below rely on beliefvec being implemented for the belief type
# Implementations of beliefvec are below
Expand Down
3 changes: 2 additions & 1 deletion test/test_alpha_policy.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ let
alphas = [ -16.0629 -19.4557; -36.5093 -29.4557]
policy = AlphaVectorPolicy(pomdp, alphas)

@test collect(alpha_actions(policy)) == [[-16.0629, -36.5093]=>false, [-19.4557, -29.4557]=>true]
@test Set(alphapairs(policy)) == Set([[-16.0629, -36.5093]=>false, [-19.4557, -29.4557]=>true])
@test Set(alphavectors(policy)) == Set([[-16.0629, -36.5093], [-19.4557, -29.4557]])

# initial belief is 100% confidence in baby not being hungry
@test isapprox(value(policy, b0), -16.0629)
Expand Down

0 comments on commit 45e59c8

Please sign in to comment.