# Defining a POMDP with the Generative Interface

In this tutorial we will define a version of the Crying Baby Problem [1] with the generative interface of POMDPs.jl. To find out more about the generative interface, please see this section of the POMDPs.jl documentation: [Generative POMDP Interface](http://juliapomdp.github.io/POMDPs.jl/latest/generative.html#generative_doc-1).

[1] Kochenderfer, Mykel J. Decision Making Under Uncertainty: Theory and Application. MIT Press, 2015

In [8]:
using POMDPs
using Random # for AbstractRNG
using POMDPModelTools # for Deterministic

## Model type

We begin by implementing the type that represents the model itself. First, some notes about the Crying Baby POMDP:

- The *state* is a Boolean value representing whether the baby is hungry
- The *action* is a Boolean indicateting whether the agent is feeding the baby
- The *observation* is a Boolean indicating whether the baby is crying
- When the baby is hungry, it cries with a high probability and cries with a low probability when it is full
- When the baby is fed, it immediately becomes full.
- At each step, the baby has a small probability of becoming hungry.

A [version of this problem implemented with the explicit interface](https://github.com/JuliaPOMDP/POMDPModels.jl/blob/master/src/CryingBabies.jl) can be found in the [POMDPModels repository](https://github.com/JuliaPOMDP/POMDPModels.jl).

The `BabyPOMDP` model type is a subtype of the [`POMDP`](http://juliapomdp.github.io/POMDPs.jl/latest/api.html#POMDPs.POMDP) abstract type. `POMDP` has three parameters which indicate the types used to represent the state, action, and observation respectively, which are all `Bool` in this case. The type also includes parameters that will be used in the reward and transition definitions.

In [2]:
struct BabyPOMDP <: POMDP{Bool, Bool, Bool}
    r_feed::Float64
    r_hungry::Float64
    p_become_hungry::Float64
    p_cry_when_hungry::Float64
    p_cry_when_not_hungry::Float64
    discount::Float64   
end

BabyPOMDP() = BabyPOMDP(-5., -10., 0.1, 0.8, 0.1, 0.9);

# Defining `gen`

The complete generative model for the `BabyPOMDP` can be defined by implemening the method
```julia
gen(m::BabyPOMDP, s, a, rng)
```
where `s` is the current state, `a` is the action, and `rng` is a [random number generator](https://juliapomdp.github.io/POMDPs.jl/latest/generative/#Random-number-generators).
This function should return a [`NamedTuple`](https://docs.julialang.org/en/v1/manual/types/index.html#Named-Tuple-Types-1) containing the next state (key `sp`, mnemonic "s prime"), observation (key `o`), and reward (key `r`).

---

Note:
It is also possible to implement a separate generative model for the transition, observation, and reward models (or, more generally each node in the [dynamic decision network](https://juliapomdp.github.io/POMDPs.jl/latest/ddns)) with the methods
```julia
gen(::DDNNode{:sp}, m::BabyPOMDP, s, a, rng)
gen(::DDNNode{:o}, m::BabyPOMDP, s, a, sp, rng)
reward(m::BabyPOMDP, s, a, sp, o)
```
but this will not be covered in the present tutorial. See the [generative model section of the POMDPs.jl documentation](https://juliapomdp.github.io/POMDPs.jl/latest/generative/) for more information.

---

First we will qualitatively describe the state and observation models for the Baby POMDP, then provide the complete implementation below.

## State transition model

For the crying baby problem, if the baby is fed (`a` = `true`), the baby always becomes full (the next state is `false`). Otherwise, if the baby is already hungry, it remains hungry. If not, a random number between 0 and 1 is generated using `rng`, and if this number is less than the probability of becoming hungry, the next state is hungry.

## Observation model

The observation model in the crying baby problem is implemented as follows: a random number is generated with `rng` and produces a crying observation (`true`) with high probability if the baby is hungry (`sp` = `true`) and with lower probability if the baby is not hungry.

## Reward model

For this problem there is a negative reward when the baby is hungry (`s` = `true`) and an additional smaller negative reward when the baby is fed (`a` = `true`).

In [6]:
function POMDPs.gen(m::BabyPOMDP, s, a, rng)
    # transition model
    if a # feed
        sp = false
    elseif s # hungry
        sp = true
    else # not hungry
        sp = rand(rng) < m.p_become_hungry
    end
    
    # observation model
    if sp # hungry
        o = rand(rng) < m.p_cry_when_hungry
    else # not hungry
        o = rand(rng) < m.p_cry_when_not_hungry
    end
    
    # reward model
    r = s*m.r_hungry + a*m.r_feed
    
    # create and return a NamedTuple
    return (sp=sp, o=o, r=r)
end

## Initial state distribution

The final task for describing the POMDP is specifying the initial state distribution, which can be accomplished by implementing a method of [`initialstate_distribution`](http://juliapomdp.github.io/POMDPs.jl/latest/api.html#POMDPs.initialstate_distribution).

For this example, the baby always starts in the the "not hungry" state (`false`). This is implemented using the [`Deterministic` distribution from POMDPModelTools](https://juliapomdp.github.io/POMDPModelTools.jl/latest/distributions.html#Deterministic-1). The distribution returned by `initialstate_distribution` can be any object that implements all or part of the [POMDPs.jl distribution interface](http://juliapomdp.github.io/POMDPs.jl/latest/interfaces.html#Distributions-1).

Alternatively, one may implement [`initialstate(m::BabyPOMDP, rng::AbstractRNG)`](http://juliapomdp.github.io/POMDPs.jl/latest/api.html#POMDPs.initialstate), to generate initial states, but many solvers expect at least a sampleable model of the initial state distribution, so `initialstate_distribution` is preferred.

In [4]:
POMDPs.initialstate_distribution(m::BabyPOMDP) = Deterministic(false)

## Other methods

Certain solvers may require other methods to be implemented. To see which methods are required for a particular solver, [check the solver requirements](http://juliapomdp.github.io/POMDPs.jl/latest/requirements.html#requirements-1) with [@requirements_info](http://juliapomdp.github.io/POMDPs.jl/latest/api.html#POMDPs.@requirements_info).

[`actions`](http://juliapomdp.github.io/POMDPs.jl/latest/api.html#POMDPs.actions) is a particularly common requirement. For this crying baby case, since the action is a Boolean variable, a [default implementation of the action space is provided when POMDPModelTools is imported](https://juliapomdp.github.io/POMDPModelTools.jl/latest/convenience.html#Convenience-1).

## Seeing the model in action

The generative POMDP model can now be simulated using the `stepthrough` function in the cell below. For more information on running simulations, see the [simulation example](Running-Simulations.ipynb).

A Generative model is all that is required for most online POMDP methods and many offline ones. To find out more about solving POMDPs, please see the [online solution](Using-an-Online-Solver.ipynb) and [offline solution](Using-an-Offline-Solver.ipynb) tutorials.

In [7]:
using POMDPSimulators
using POMDPPolicies

m = BabyPOMDP()

# policy that maps every input to a feed (true) action
policy = FunctionPolicy(o->true)

for (s, a, r) in stepthrough(m, policy, "s,a,r", max_steps=10)
    @show s
    @show a
    @show r
    println()
end

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

s = false
a = true
r = -5.0

