# Defining a POMDP with the Generative Interface

In this tutorial we will define a version of the Crying Baby Problem [1] with the generative interface of POMDPs.jl. To find out more about the generative interface, please see this section of the POMDPs.jl documentation: [Generative POMDP Interface](http://juliapomdp.github.io/POMDPs.jl/latest/generative.html#generative_doc-1).

[1] Kochenderfer, Mykel J. Decision Making Under Uncertainty: Theory and Application. MIT Press, 2015

In [1]:
using POMDPs
using Random # for AbstractRNG
using POMDPModelTools # for Deterministic

## Model type

We begin by implementing the type that represents the model itself. First, some notes about the Crying Baby POMDP:

- The *state* is a Boolean value representing whether the baby is hungry
- The *action* is a Boolean indicateting whether the agent is feeding the baby
- The *observation* is a Boolean indicating whether the baby is crying
- When the baby is hungry, it cries with a high probability and cries with a low probability when it is full
- When the baby is fed, it immediately becomes full.
- At each step, the baby has a small probability of becoming hungry.

The `BabyPOMDP` model type is a subtype of the [`POMDP`](http://juliapomdp.github.io/POMDPs.jl/latest/api.html#POMDPs.POMDP) abstract type. `POMDP` has three parameters which indicate the types used to represent the state, action, and observation respectively, which are all `Bool` in this case. The type also includes parameters that will be used in the reward and transition definitions.

In [2]:
struct BabyPOMDP <: POMDP{Bool, Bool, Bool}
    r_feed::Float64
    r_hungry::Float64
    p_become_hungry::Float64
    p_cry_when_hungry::Float64
    p_cry_when_not_hungry::Float64
    discount::Float64   
end

BabyPOMDP() = BabyPOMDP(-5., -10., 0.1, 0.8, 0.1, 0.9);

## State transitions

Next we describe the *state transitions*. The [`generate_s`](http://juliapomdp.github.io/POMDPs.jl/latest/api.html#POMDPs.generate_s) function controls how the state transitions by generating a state for the next time step given the current state (`s`) and action (`a`).

The `rng` argument is a random number generator that should be used to generate all random numbers within the function for the sake of reproducibility. To find out more about random number generation in POMDPs.jl, consult the documentation [TODO: add link].

For the crying baby problem, if the baby is fed (`a` = `true`), the baby always becomes full (the next state is `false`). Otherwise, if the baby is already hungry, it remains hungry. If not, a random number between 0 and 1 is generated using `rng`, and if this number is less than the probability of becoming hungry, the next state is hungry.

In [3]:
function POMDPs.generate_s(p::BabyPOMDP, s::Bool, a::Bool, rng::AbstractRNG)
    if a # feed
        return false
    elseif s # hungry
        return true
    else # not hungry
        return rand(rng) < p.p_become_hungry
    end
end

## Observations

The next step is to define how observations are generated. The [`generate_o`](http://juliapomdp.github.io/POMDPs.jl/latest/api.html#POMDPs.generate_o) function returns an observation given an starting state, `s`, action `a`, and ending state `sp`.

Note that the observtion more often depends on `sp` rather than `s` to include the most recent information about the state, but can include information from `s`, `a`, and `sp`.

In the crying baby problem, a random number is generated with `rng` and produces a crying observation (`true`) with high probability if the baby is hungry (`sp` = `true`) and with lower probability if the baby is not hungry.

In [4]:
function POMDPs.generate_o(p::BabyPOMDP, s::Bool, a::Bool, sp::Bool, rng::AbstractRNG)
    if sp # hungry
        return rand(rng) < p.p_cry_when_hungry
    else # not hungry
        return rand(rng) < p.p_cry_when_not_hungry
    end
end

## Rewards

Rewards are specified with the [`reward`](http://juliapomdp.github.io/POMDPs.jl/latest/api.html#POMDPs.reward) function. It returns the immediate expected reward for a step starting in state `s` when action `a` is taken. Note that the reward function may also depend on the resulting state, `sp`.

For this problem there is a negative reward when the baby is hungry (`s` = `true`) and an additional smaller negative reward when the baby is fed (`a` = `true`).

In [5]:
POMDPs.reward(p::BabyPOMDP, s::Bool, a::Bool) = s*p.r_hungry + a*p.r_feed

## Initial state distribution

The final task for describing the POMDP is specifying the initial state distribution, which can be accomplished by implementing a method of [`initialstate_distribution`](http://juliapomdp.github.io/POMDPs.jl/latest/api.html#POMDPs.initialstate_distribution).

For this example, the baby always starts in the the "not hungry" state (`false`). This is implemented using the `Deterministic` distribution [TODO: add link] from POMDPModelTools. The distribution returned by `initialstate_distribution` can be any object that implements all or part of the POMDPs.jl distribution interface [TODO: add link].

Alternatively, one may implement [`initialstate(m::BabyPOMDP, rng::AbstractRNG)`](http://juliapomdp.github.io/POMDPs.jl/latest/api.html#POMDPs.initialstate), to generate initial states, but many solvers expect at least a sampleable model of the initial state distribution, so `initialstate_distribution` is preferred.

In [6]:
POMDPs.initialstate_distribution(m::BabyPOMDP) = Deterministic(false)

## Other methods

Certain solvers may require other methods to be implemented. To see which methods are required for a particular solver, check the solver requirements [TODO: add link].

[`actions`](http://juliapomdp.github.io/POMDPs.jl/latest/api.html#POMDPs.actions) is a particularly common requirement. For this crying baby case, since the action is a Boolean variable, a default implementation of the action space is provided when POMDPModelTools is imported [TODO: add link].

## Seeing the model in action

The generative POMDP model can now be simulated using the `stepthrough` function in the cell below. For more information on running simulations, see the simulation example [TODO: add link].

A Generative model is all that is required for most online POMDP methods and many offline ones. To find out more about solving POMDPs, please see the online solution and offline solution tutorials [TODO: add links].

In [7]:
using POMDPSimulators
using POMDPPolicies

m = BabyPOMDP()

# policy that maps every input to a feed (true) action
policy = FunctionPolicy(o->true)

for (s, a, r) in stepthrough(m, policy, "s,a,r", max_steps=10)
    @show s
    @show a
    @show r
    println()
end

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

