# Defining a POMDP with the Explicit Interface

In this tutorial we will define a version of the Tiger POMDP example [1] with the explicit interface of POMDPs.jl. To find out more about the explicit interface, please see this section of the POMDPs.jl documentation: [Explicit POMDP Interface](http://juliapomdp.github.io/POMDPs.jl/latest/explicit.html).

[1] L. Pack Kaelbling, M. L. Littman, A. R. Cassandra, "Planning and Action in Partially Observable Domain", *Artificial Intelligence*, 1998.

In [1]:
using POMDPs
using POMDPModelTools

## Model type

In the tiger POMDP, the agent is tasked with escaping from a room. There are two doors leading out of the room. Behind one of the doors is a tiger, and behind the other is sweet, sweet freedom. If the agent opens the door and finds the tiger, it gets eaten (and receives a reward of -100). If the agent opens the other door, it escapes and receives a reward of 10. The agent can also listen. Listening gives a noisy measurement of which door the tiger is hiding behind. Listening gives the agent the correct location of the tiger 85% of the time. The agent receives a reward of -1 for listening.

The POMDP model of the problem is as follows:
- The *state* is a Boolean value representing whether the tiger is on the left door (true) or on the right door (false)
- The *action* is a Symbol indicating listening, opening the left door or opening the right door
- The *observation* is a Boolean indicating whether the agent hears the tiger on the left (true) or on the right (false)
- At each step there is a probability of hearing the tiger at the right location 
- Once the agent opens a door, it receives a reward and restarts at the initial state.

The `TigerPOMDP` model type is a subtype of the `POMDP`. It is parameterized by the types used to represent the state, action, and observation respectively. The type also includes parameters that will be used in the reward and transitions definitions.

In [2]:
struct TigerPOMDP <: POMDP{Bool, Symbol, Bool} # POMDP{State, Action, Observation}
    r_listen::Float64 # reward for listening (default -1)
    r_findtiger::Float64 # reward for finding the tiger (default -100)
    r_escapetiger::Float64 # reward for escaping (default 10)
    p_listen_correctly::Float64 # prob of correctly listening (default 0.85)
    discount_factor::Float64 # discount
end

TigerPOMDP() = TigerPOMDP(-1., -100., 10., 0.85, 0.95)

TigerPOMDP

## States

We define our state with a boolean that indicates weather or not the tiger is hiding behind the left door. If our state is true, the tiger is behind the left door. If its false, the tiger is behind the right door. 

We must implement the `states` function that returns the state space and `n_states` that returns the number of states. We should also define a `stateindex` function that returns the integer index of state `s`. 
For simple state types like `Int` or `Bool`, a default implementation of `stateindex` is provided as a convenience in [POMDPModelTools.jl](https://github.com/JuliaPOMDP/POMDPModelTools.jl)

In [3]:
POMDPs.states(pomdp::TigerPOMDP) = [true, false]
POMDPs.n_states(pomdp::TigerPOMDP) = 2
POMDPs.stateindex(pomdp::TigerPOMDP, s::Bool) = s ? 1 : 2 ;

## Actions

There are three actions in our problem. Once again, we represent the action space as an array of the actions in our problem. The actions function serve a similar purpose to the `states` function above. Since the action space is discrete, we can define the `actionindex` function that associates an integer index to each action.

In [4]:
POMDPs.actions(pomdp::TigerPOMDP) = [:open_left, :open_right, :listen]
POMDPs.n_actions(pomdp::TigerPOMDP) = 3
function POMDPs.action_index(pomdp::TigerPOMDP, a::Symbol)
    if a==:open_left
        return 1
    elseif a==:open_right
        return 2
    elseif a==:listen
        return 3
    end
    error("invalid TigerPOMDP action: $a")
end;

**Note:** if the actions available depends on the state, one must additionally implement the function `actions(pomdp, s)`. 

## State transitions

Now that the states and actions are defined, the transition distribution can be specified. We can do so by implementing the `transition` function. It takes as input the pomdp model, a state and an action and returns a distribution over the next states. 

We first need a data type to represent the transition distribution. We must be able to sample a state from this object using `rand` and also query the probability mass of a certain state using `pdf`. One can create its own type if needed. POMDPModelTools provide useful [distribution types](https://juliapomdp.github.io/POMDPModelTools.jl/latest/distributions.html). 

In the tiger problem, since there are two state we can represent our distribution with one parameter corresponding to the probability of being in state `true` (tiger on the left). This can be done using the `BoolDistribution` type provided by POMDPModelTools. The transition model is described as follows: 
- The tiger always stays on the same side 
- Once we open the door, the problem resets, that is the tiger is spawned with equal probability behind one of the two doors.

In [5]:
function POMDPs.transition(pomdp::TigerPOMDP, s::Bool, a::Symbol)
    if a == :open_left || a == :open_right
        # problem resets
        return BoolDistribution(0.5) 
    elseif s
        # tiger on the left stays on the left 
        return BoolDistribution(1.0)
    else
        return BoolDistribution(0.0)
    end
end

## Observations

In the tiger problem there are two possible observations: hearing the tiger on the left or on the right. We represent them by a boolean. Similarly as for states and actions we must implement `observations`, `n_observations`, and `obsindex`.

**Note:** `obsindex` for boolean observation is provided as a convenience function by POMDPModelTools. 

In [6]:
POMDPs.observations(pomdp::TigerPOMDP) = [true, false]
POMDPs.n_observations(pomdp::TigerPOMDP) = 2

The observation model captures the uncertainty in the agent's listening ability. When we listen, we receive a noisy measurement of the tiger's location. 
To implement the observation model with the explicit interface, one must implement the function `observation` which returns a distribution. Remember that observations are also represented by booleans, we can again use the `BoolDistribution` type. 

In [7]:
function POMDPs.observation(pomdp::TigerPOMDP, a::Symbol, s::Bool)
    pc = pomdp.p_listen_correctly
    if a == :listen 
        if s 
            return BoolDistribution(pc)
        else
            return BoolDistribution(1 - pc)
        end
    else
        return BoolDistribution(0.5)
    end
end

## Rewards

The reward model caputres the immediate objectives of the agent. It recieves a large negative reward for opening the door with the tiger behind it (-100), gets a positive reward for opening the other door (+10), and a small penalty for listening (-1).

In [8]:
# reward model
function POMDPs.reward(pomdp::TigerPOMDP, s::Bool, a::Symbol)
    r = 0.0
    a == :listen ? (r+=pomdp.r_listen) : (nothing)
    if a == :open_left
        s ? (r += pomdp.r_findtiger) : (r += pomdp.r_escapetiger)
    end
    if a == :open_right
        s ? (r += pomdp.r_escapetiger) : (r += pomdp.r_findtiger)
    end
    return r
end
POMDPs.reward(pomdp::TigerPOMDP, s::Bool, a::Symbol, sp::Bool) = reward(pomdp, s, a);

## Belief

In POMDPs, we often represent our estimate of the current state with a belief, a distribution over states. Since we have two possible state, we can use the convenient `BoolDistribution` one more time. 

Implementing beliefs and their updaters can be tricky. Luckily, our solvers abstract away the belief updating. All you need to do is define a function that returns an initial distribution over states. 

In [9]:
POMDPs.initialstate_distribution(pomdp::TigerPOMDP) = BoolDistribution(0.5)

To learn more about beliefs and belief updating you may look at [BeliefUpdaters.jl](https://github.com/JuliaPOMDP/BeliefUpdaters.jl) where a collection of belief representations and belief updaters. 
**Note:** It is also possible to create your own belief representation and update schemes. 

## Miscellaneous Functions

Let's define the `discount` function.

In [10]:
POMDPs.discount(pomdp::TigerPOMDP) = pomdp.discount_factor

## Seeing the model in action

We have now implemented all the functions necessary to solve or simulate the tiger pomdp with the explicit interface!

To learn how to solve POMDPs offline: TODO[link to tutorial]

To learn how to solve POMDPs online: TODO[link_to_tutorial]

We can run a simulation of our model using the `stepthrough` function

For more information on running simulations, see the simulation example [TODO: add link].

In [12]:
using POMDPSimulators
using POMDPPolicies

m = TigerPOMDP()

# policy that takes a random action
policy = RandomPolicy(m)

for (s, a, r) in stepthrough(m, policy, "s,a,r", max_steps=10)
    @show s
    @show a
    @show r
    println()
end

s = true
a = :listen
r = -1.0

s = true
a = :open_right
r = 10.0

s = false
a = :open_left
r = 10.0

s = true
a = :listen
r = -1.0

s = true
a = :listen
r = -1.0

s = true
a = :listen
r = -1.0

s = true
a = :listen
r = -1.0

s = true
a = :open_right
r = 10.0

s = false
a = :listen
r = -1.0

s = false
a = :listen
r = -1.0

