Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POMDPs v0.8 Compatibility #26

Merged
merged 10 commits into from
Sep 20, 2019
6 changes: 4 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "POMDPModelTools"
uuid = "08074719-1b2a-587c-a292-00f91cc44415"
authors = ["JuliaPOMDP Contributors"]
version = "0.1.7"
version = "0.2.0"

[deps]
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Expand All @@ -14,14 +14,16 @@ UnicodePlots = "b8865327-cd53-5732-bb35-84acbb429228"

[compat]
Distributions = ">= 0.17"
POMDPs = "0.7.3, 0.8.0"
julia = "1"

[extras]
BeliefUpdaters = "8bb6e9a1-7d73-552c-a44a-e5dc5634aac4"
POMDPModels = "355abbd5-f08e-5560-ac9e-8b5f2592a0ca"
POMDPPolicies = "182e52fb-cfd0-5e46-8c26-fd0667c990f4"
POMDPSimulators = "e0d0a172-29c6-5d4e-96d0-f262df5d01fd"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Test", "POMDPModels", "POMDPSimulators", "POMDPPolicies", "Pkg"]
test = ["Test", "POMDPModels", "POMDPSimulators", "POMDPPolicies", "BeliefUpdaters", "Pkg"]
10 changes: 3 additions & 7 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,17 +1,13 @@
push!(LOAD_PATH, "../src/")

using Documenter, POMDPModelTools

makedocs(
modules = [POMDPModelTools],
format = :html,
format = Documenter.HTML(),
sitename = "POMDPModelTools.jl"
)

deploydocs(
repo = "github.com/JuliaPOMDP/POMDPModelTools.jl.git",
julia = "1.0",
osname = "linux",
target = "build",
deps = nothing,
make = nothing
)

23 changes: 0 additions & 23 deletions docs/mkdocs.yml

This file was deleted.

2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# About

POMDPModelTools is a collection of interface extensions and tools to make writing models and solvers for [POMDPs.jl](github.com/JuliaPOMDP/POMDPs.jl) easier.
POMDPModelTools is a collection of interface extensions and tools to make writing models and solvers for [POMDPs.jl](https://github.com/JuliaPOMDP/POMDPs.jl) easier.

```@contents
```
10 changes: 8 additions & 2 deletions docs/src/interface_extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,15 @@ ordered_observations

It is often the case that useful information besides the belief, state, action, etc is generated by a function in POMDPs.jl. This information can be useful for debugging or understanding the behavior of a solver, updater, or problem. The info interface provides a standard way for problems, policies, solvers or updaters to output this information. The recording simulators from [POMDPSimulators.jl](https://github.com/JuliaPOMDP/POMDPSimulators.jl) automatically record this information.

To specify info for a problem (in POMDPs v0.8 and above), one should modify the problem's DDN with the `add_infonode` function, then return the info in `gen`. There is an example of this pattern in the docstring below:

```@docs
add_infonode
```

To specify info from policies, solvers, or updaters, implement the following functions:

```@docs
generate_sri
generate_sori
action_info
solve_info
update_info
Expand Down
13 changes: 7 additions & 6 deletions src/POMDPModelTools.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ using LinearAlgebra
using SparseArrays
using UnicodePlots

import POMDPs: actions, n_actions, actionindex
import POMDPs: states, n_states, stateindex
import POMDPs: observations, n_observations, obsindex
import POMDPs: actions, actionindex
import POMDPs: states, stateindex
import POMDPs: observations, obsindex
import POMDPs: sampletype, generate_sr, initialstate, isterminal, discount
import POMDPs: implemented
import Distributions: pdf, mode, mean, support
Expand All @@ -22,11 +22,12 @@ include("visualization.jl")

# info interface
export
generate_sri,
generate_sori,
add_infonode,
action_info,
solve_info,
update_info
update_info,
generate_sri,
generate_sori
include("info.jl")

export
Expand Down
5 changes: 0 additions & 5 deletions src/distributions/bool.jl
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,6 @@ function Base.iterate(d::BoolDistribution, state::Bool)
end

support(d::BoolDistribution) = [true, false]

==(d1::BoolDistribution, d2::BoolDistribution) = d1.p == d2.p

Base.hash(d::BoolDistribution) = hash(d.p)

Base.length(d::BoolDistribution) = 2

Base.show(io::IO, m::MIME"text/plain", d::BoolDistribution) = showdistribution(io, m, d, title="BoolDistribution")
1 change: 1 addition & 0 deletions src/distributions/deterministic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ rand(rng::AbstractRNG, d::Deterministic) = d.val
rand(d::Deterministic) = d.val
support(d::Deterministic) = (d.val,)
sampletype(::Type{Deterministic{T}}) where T = T
Random.gentype(::Type{Deterministic{T}}) where T = T
pdf(d::Deterministic, x) = convert(Float64, x == d.val)
mode(d::Deterministic) = d.val
mean(d::Deterministic{N}) where N<:Number = d.val / 1 # / 1 is to make this return a similar type to Statistics.mean
Expand Down
1 change: 1 addition & 0 deletions src/distributions/sparse_cat.jl
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ end
Base.length(d::SparseCat) = min(length(d.vals), length(d.probs))
Base.eltype(D::Type{SparseCat{V,P}}) where {V, P} = Pair{eltype(V), eltype(P)}
sampletype(D::Type{SparseCat{V,P}}) where {V, P} = eltype(V)
Random.gentype(D::Type{SparseCat{V,P}}) where {V, P} = eltype(V)

function mean(d::SparseCat)
vsum = zero(eltype(d.vals))
Expand Down
2 changes: 2 additions & 0 deletions src/distributions/uniform.jl
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ end

support(d::Uniform) = d.set
sampletype(::Type{Uniform{T}}) where T = eltype(T)
Random.gentype(::Type{Uniform{T}}) where T = eltype(T)

function pdf(d::Uniform, s)
if s in d.set
Expand All @@ -49,6 +50,7 @@ end
pdf(d::UnsafeUniform, s) = 1.0/length(d.collection)
support(d::UnsafeUniform) = d.collection
sampletype(::Type{UnsafeUniform{T}}) where T = eltype(T)
Random.gentype(::Type{UnsafeUniform{T}}) where T = eltype(T)

# Common Implementations

Expand Down
54 changes: 37 additions & 17 deletions src/fully_observable_pomdp.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,33 @@

Turn `MDP` `mdp` into a `POMDP` where the observations are the states of the MDP.
"""
struct FullyObservablePOMDP{S, A} <: POMDP{S,A,S}
mdp::MDP{S, A}
struct FullyObservablePOMDP{M,S,A} <: POMDP{S,A,S}
mdp::M
end

function FullyObservablePOMDP(m::MDP)
return FullyObservablePOMDP{typeof(m), statetype(m), actiontype(m)}(m)
end

mdptype(::Type{FullyObservablePOMDP{M,S,A}}) where {M,S,A} = M

function POMDPs.DDNStructure(::Type{M}) where M <: FullyObservablePOMDP
MM = mdptype(M)
add_obsnode(DDNStructure(MM))
end

add_obsnode(ddn) = add_node(ddn, :o, FunctionDDNNode((m,sp)->sp), (:sp,)) # for ::DDNStructure, but this is not declared yet POMDPs in v0.7.3

POMDPs.observations(pomdp::FullyObservablePOMDP) = states(pomdp.mdp)
POMDPs.n_observations(pomdp::FullyObservablePOMDP) = n_states(pomdp.mdp)
POMDPs.obsindex(pomdp::FullyObservablePOMDP{S, A}, o::S) where {S, A} = stateindex(pomdp.mdp, o)

POMDPs.convert_o(T::Type{V}, o, pomdp::FullyObservablePOMDP) where {V<:AbstractArray} = convert_s(T, s, pomdp.mdp)
POMDPs.convert_o(T::Type{S}, vec::V, pomdp::FullyObservablePOMDP) where {S,V<:AbstractArray} = convert_s(T, vec, pomdp.mdp)

POMDPs.gen(::DDNNode{:o}, m::FullyObservablePOMDP, sp, rng) = sp

function POMDPs.generate_o(pomdp::FullyObservablePOMDP, s, a, rng::AbstractRNG)
return s
end

function POMDPs.observation(pomdp::FullyObservablePOMDP, s, a)
return Deterministic(s)
function POMDPs.observation(pomdp::FullyObservablePOMDP, a, sp)
return Deterministic(sp)
end

function POMDPs.observation(pomdp::FullyObservablePOMDP, s, a, sp)
Expand All @@ -31,19 +40,30 @@ end

POMDPs.states(pomdp::FullyObservablePOMDP) = states(pomdp.mdp)
POMDPs.actions(pomdp::FullyObservablePOMDP) = actions(pomdp.mdp)
POMDPs.transition(pomdp::FullyObservablePOMDP{S,A}, s::S, a::A) where {S,A} = transition(pomdp.mdp, s, a)
POMDPs.transition(pomdp::FullyObservablePOMDP, s, a) = transition(pomdp.mdp, s, a)
POMDPs.initialstate_distribution(pomdp::FullyObservablePOMDP) = initialstate_distribution(pomdp.mdp)
POMDPs.initialstate(pomdp::FullyObservablePOMDP, rng::AbstractRNG) = initialstate(pomdp.mdp, rng)
POMDPs.generate_s(pomdp::FullyObservablePOMDP, s, a, rng::AbstractRNG) = generate_s(pomdp.mdp, s, a, rng)
POMDPs.generate_sr(pomdp::FullyObservablePOMDP, s, a, rng::AbstractRNG) = generate_sr(pomdp.mdp, s, a, rng)
POMDPs.reward(pomdp::FullyObservablePOMDP{S, A}, s::S, a::A) where {S,A} = reward(pomdp.mdp, s, a)
POMDPs.isterminal(pomdp::FullyObservablePOMDP, s) = isterminal(pomdp.mdp, s)
POMDPs.discount(pomdp::FullyObservablePOMDP) = discount(pomdp.mdp)
POMDPs.n_states(pomdp::FullyObservablePOMDP) = n_states(pomdp.mdp)
POMDPs.n_actions(pomdp::FullyObservablePOMDP) = n_actions(pomdp.mdp)
POMDPs.stateindex(pomdp::FullyObservablePOMDP{S,A}, s::S) where {S,A} = stateindex(pomdp.mdp, s)
POMDPs.actionindex(pomdp::FullyObservablePOMDP{S, A}, a::A) where {S,A} = actionindex(pomdp.mdp, a)
POMDPs.stateindex(pomdp::FullyObservablePOMDP, s) = stateindex(pomdp.mdp, s)
POMDPs.actionindex(pomdp::FullyObservablePOMDP, a) = actionindex(pomdp.mdp, a)
POMDPs.convert_s(T::Type{V}, s, pomdp::FullyObservablePOMDP) where V<:AbstractArray = convert_s(T, s, pomdp.mdp)
POMDPs.convert_s(T::Type{S}, vec::V, pomdp::FullyObservablePOMDP) where {S,V<:AbstractArray} = convert_s(T, vec, pomdp.mdp)
POMDPs.convert_a(T::Type{V}, a, pomdp::FullyObservablePOMDP) where V<:AbstractArray = convert_a(T, a, pomdp.mdp)
POMDPs.convert_a(T::Type{A}, vec::V, pomdp::FullyObservablePOMDP) where {A,V<:AbstractArray} = convert_a(T, vec, pomdp.mdp)

POMDPs.gen(d::DDNNode, m::FullyObservablePOMDP, args...) = gen(d, m.mdp, args...)
POMDPs.gen(m::FullyObservablePOMDP, s, a, rng) = gen(m.mdp, s, a, rng)
POMDPs.reward(pomdp::FullyObservablePOMDP, s, a) = reward(pomdp.mdp, s, a)

# deprecated in POMDPs v0.8
add_obsnode(ddn::POMDPs.DDNStructureV7{(:s,:a,:sp,:r)}) = POMDPs.DDNStructureV7{(:s,:a,:sp,:o,:r)}()
add_obsnode(ddn::POMDPs.DDNStructureV7) = error("FullyObservablePOMDP only supports MDPs with the standard DDN Structure (DDNStructureV7{(:s,:a,:sp,:r)}) with POMDPs v0.7.")

POMDPs.generate_s(pomdp::FullyObservablePOMDP, s, a, rng::AbstractRNG) = generate_s(pomdp.mdp, s, a, rng)
POMDPs.generate_sr(pomdp::FullyObservablePOMDP, s, a, rng::AbstractRNG) = generate_sr(pomdp.mdp, s, a, rng)
POMDPs.n_actions(pomdp::FullyObservablePOMDP) = n_actions(pomdp.mdp)
POMDPs.n_states(pomdp::FullyObservablePOMDP) = n_states(pomdp.mdp)
function POMDPs.generate_o(pomdp::FullyObservablePOMDP, s, rng::AbstractRNG)
return s
end
11 changes: 8 additions & 3 deletions src/generative_belief_mdp.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,20 @@ function GenerativeBeliefMDP(pomdp::P, up::U) where {P<:POMDP, U<:Updater}
GenerativeBeliefMDP{P, U, typeof(b0), actiontype(pomdp)}(pomdp, up)
end

function generate_sr(bmdp::GenerativeBeliefMDP, b, a, rng::AbstractRNG)
function POMDPs.gen(bmdp::GenerativeBeliefMDP, b, a, rng::AbstractRNG)
s = rand(rng, b)
if isterminal(bmdp.pomdp, s)
bp = gbmdp_handle_terminal(bmdp.pomdp, bmdp.updater, b, s, a, rng::AbstractRNG)::typeof(b)
return bp, 0.0
end
sp, o, r = generate_sor(bmdp.pomdp, s, a, rng) # maybe this should have been generate_or?
sp, o, r = gen(DDNOut(:sp,:o,:r), bmdp.pomdp, s, a, rng) # maybe this should have been generate_or?
bp = update(bmdp.updater, b, a, o)
return bp, r
return (sp=bp, r=r)
end

function generate_sr(bmdp::GenerativeBeliefMDP, b, a, rng::AbstractRNG)
x = gen(bmdp, b, a, rng)
return x.sp, x.r
end

function initialstate(bmdp::GenerativeBeliefMDP, rng::AbstractRNG)
Expand Down
93 changes: 75 additions & 18 deletions src/info.jl
Original file line number Diff line number Diff line change
@@ -1,24 +1,6 @@
# functions for passing out info from simulations, similar to the info return from openai gym
# maintained by @zsunberg

"""
Return a tuple containing the next state and reward and information (usually a `NamedTuple`, `Dict` or `nothing`) from that step.

By default, returns `nothing` as info.
"""
function generate_sri(p::MDP, s, a, rng::AbstractRNG)
return generate_sr(p, s, a, rng)..., nothing
end

"""
Return a tuple containing the next state, observation, and reward and information (usually a `NamedTuple`, `Dict` or `nothing`) from that step.

By default, returns `nothing` as info.
"""
function generate_sori(p::POMDP, s, a, rng::AbstractRNG)
return generate_sor(p, s, a, rng)..., nothing
end

"""
a, ai = action_info(policy, x)

Expand Down Expand Up @@ -51,3 +33,78 @@ By default, returns `nothing` as info.
function update_info(up::Updater, b, a, o)
return update(up, b, a, o), nothing
end

# once POMDPs v0.8 is released, this should be a jldoctest
"""
add_infonode(ddn::DDNStructure)

Create a new DDNStructure object with a new node labeled :info for returning miscellaneous informationabout a simulation step.

Typically, the object in info is associative (i.e. a `Dict` or `NamedTuple`) with keys corresponding to different pieces of information.

# Example (using POMDPs v0.8)

```julia
using POMDPs, POMDPModelTools, POMDPPolicies, POMDPSimulators, Random

struct MyMDP <: MDP{Int, Int} end

# add the info node to the DDN
POMDPs.DDNStructure(::Type{MyMDP}) = mdp_ddn() |> add_infonode

# the dynamics involve two random numbers - here we record the values for each in info
function POMDPs.gen(m::MyMDP, s, a, rng)
r1 = rand(rng)
r2 = randn(rng)
return (sp=s+a+r1+r2, r=s^2, info=(r1=r1, r2=r2))
end

m = MyMDP()
@show nodenames(DDNStructure(m))
p = FunctionPolicy(s->1)
for (s,info) in stepthrough(m, p, 1, "s,info", max_steps=5, rng=MersenneTwister(2))
@show s
@show info
end
```
"""
function add_infonode(ddn) # for DDNStructure, but it is not declared in v0.7.3, so there is not annotation
add_node(ddn, :info, ConstantDDNNode(nothing), nodenames(ddn))
end

function add_infonode(ddn::POMDPs.DDNStructureV7{nodenames}) where nodenames
return POMDPs.DDNStructureV7{(nodenames..., :info)}()
end

###############################################################
# Note all generate functions will be deprecated in POMDPs v0.8
###############################################################


if DDNStructure(MDP) isa POMDPs.DDNStructureV7
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we check on the POMDPs.jl version instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I initially though, but it didn't seem like there was a good way to do that, so this seems like a reliable enough proxy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if Pkg.installed()["POMDPs"] < v"0.8.0"
something like that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to reply to your comment on this below.

The problem with Pkg.installed() is that it can take a really long time (or at least it could in the past)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it adds Pkg as a dependency (which is probably fine)

"""
Return a tuple containing the next state and reward and information (usually a `NamedTuple`, `Dict` or `nothing`) from that step.

By default, returns `nothing` as info.
"""
function generate_sri(p::MDP, s, a, rng::AbstractRNG)
return generate_sr(p, s, a, rng)..., nothing
end

"""
Return a tuple containing the next state, observation, and reward and information (usually a `NamedTuple`, `Dict` or `nothing`) from that step.

By default, returns `nothing` as info.
"""
function generate_sori(p::POMDP, s, a, rng::AbstractRNG)
return generate_sor(p, s, a, rng)..., nothing
end

POMDPs.gen(::DDNOut{(:sp,:o,:r,:i)}, m, s, a, rng) = generate_sori(m, s, a, rng)
POMDPs.gen(::DDNOut{(:sp,:o,:r,:info)}, m, s, a, rng) = generate_sori(m, s, a, rng)
POMDPs.gen(::DDNOut{(:sp,:r,:i)}, m, s, a, rng) = generate_sri(m, s, a, rng)
POMDPs.gen(::DDNOut{(:sp,:r,:info)}, m, s, a, rng) = generate_sri(m, s, a, rng)
else
@deprecate generate_sri(args...) gen(DDNOut(:sp,:r,:info), args...)
@deprecate generate_sori(args...) gen(DDNOut(:sp,:o,:r,:info), args...)
end
Loading