# Minimal Example: Light-Dark 1D

This is designed to be a minimal example to get POMCP running. The problem is a one dimensional light-dark problem. The state is the position, which is unknown initially. The agent can move left (a=-1) or right (a=1) or can choose to terminate the problem (a=0). If the state is in (-1, 1) when the problem is ended, a reward is given, otherwise there is a penalty. Observations are noisy measuerements of position.

```
   -3-2-1 0 1 2 3
...| | | | | | | | ...
          G   S
```

The state is an integer. Measurements are most accurate at x = 5 (see the noise function below), and have uniformly distributed noise.

## Problem Definition

First, we will define the problem with the generative interface - this is all that is needed to use the POMCP Planner.

In [48]:
using POMDPs
using Distributions: Normal
using Random
import POMDPs: initialstate, actions, gen, discount, isterminal
Random.seed!(1);

In [49]:
mutable struct LightDark1D <: POMDPs.POMDP{Float64,Int,Int}
    discount_factor::Float64
    correct_r::Float64
    incorrect_r::Float64
    step_size::Int
    movement_cost::Float64
end
LightDark1D() = LightDark1D(0.9, 10, -10, 1, 0)
discount(p::LightDark1D) = p.discount_factor
isterminal(::LightDark1D, s::Float64) = isnan(s);

In [50]:
noise(x) = ceil(Int, abs(x - 5)/sqrt(2) + 1e-2)

function gen(m::LightDark1D, s::Float64, a::Int, rng::AbstractRNG)
    # generate next state
    sp = iszero(a) ? NaN : s+a
    # generate observation
    if isnan(sp)
        o = 0
    else
        n = noise(sp)
        o = round(Int, sp) + rand(rng, -n:n)
    end
    # generate reward
    r = iszero(a) ? (abs(s) < 1 ? m.correct_r : m.incorrect_r) : 0.0
    
    return (sp=sp, o=o, r=r)
end;

In [51]:
actions(::LightDark1D) = [-1, 0, 1] # Left Stop Right

initialstate(pomdp::LightDark1D) = Normal(2.0, 3.0);

## Using the POMCP Planner

We can now use the POMCP Planner to make a decision about the best action to be taken at a state.

In [52]:
using BasicPOMCP
using POMDPSimulators

In [53]:
solver = POMCPSolver(tree_queries=10000, c=10)
pomdp = LightDark1D()
planner = solve(solver, pomdp);

In [54]:
b = initialstate(pomdp)
a = action(planner, b)
println("""
    POMCP Recommends action $a for belief $b.

    (this may be a bad choice because the POMCP Parameters are set to their defaults.)
""")

    POMCP Recommends action -1 for belief Normal{Float64}(μ=2.0, σ=3.0).

    (this may be a bad choice because the POMCP Parameters are set to their defaults.)



### Simulations

We can also use the planner in a simulation:

In [55]:
for (s,a,r,sp,o) in stepthrough(pomdp, planner, "s,a,r,sp,o")
    @show (s,a,r,sp,o)
end

(s, a, r, sp, o) = (1.1450149205203342, -1, 0.0, 0.14501492052033416, -3)
(s, a, r, sp, o) = (0.14501492052033416, 0, 10.0, NaN, 0)


## Reliable Belief Updates

By default, if the POMDP does not have an explicit observation model implemented (`POMDPs.observation()` or `POMDPModelTools.obs_weight()`), POMCP will attempt to use the unweighted rejection particle filter defined here: https://github.com/JuliaPOMDP/ParticleFilters.jl/blob/master/src/unweighted.jl. Our `LightDark1D` POMDP has a small enough observation space for that to work, but in most cases, we will need to use a weighted particle filter, which will require definition of the observation distribution.

## Enabling Weighted Belief Updates

In order for the particle filter to re-weight the particles, we need to define the observation distribution.

In [56]:
struct LDObsDist
    x::Int
    noise::Int
end

function POMDPs.pdf(d::LDObsDist, x::Int)
    if abs(x-d.x) <= d.noise
        return 1/(2*d.noise+1)
    else
        return 0.0
    end
end

function POMDPs.observation(p::LightDark1D, a::Int, sp::Float64)
    if isnan(sp)
        return LDObsDist(0, 0)
    else
        return LDObsDist(round(Int, sp), noise(sp))
    end
end

Now we can run a simulation with a particle filter.

In [57]:
using ParticleFilters

In [58]:
filter = BootstrapFilter(pomdp, 1000)
for (s,a,r,sp,o) in stepthrough(pomdp, planner, filter, "s,a,r,sp,o")
    @show (s,a,r,sp,o)
end

(s, a, r, sp, o) = (7.6069078240874, -1, 0.0, 6.6069078240874, 5)
(s, a, r, sp, o) = (6.6069078240874, -1, 0.0, 5.6069078240874, 7)
(s, a, r, sp, o) = (5.6069078240874, -1, 0.0, 4.6069078240874, 5)
(s, a, r, sp, o) = (4.6069078240874, -1, 0.0, 3.6069078240873997, 5)
(s, a, r, sp, o) = (3.6069078240873997, -1, 0.0, 2.6069078240873997, 3)
(s, a, r, sp, o) = (2.6069078240873997, -1, 0.0, 1.6069078240873997, 3)
(s, a, r, sp, o) = (1.6069078240873997, -1, 0.0, 0.6069078240873997, 0)
(s, a, r, sp, o) = (0.6069078240873997, -1, 0.0, -0.39309217591260026, 3)
(s, a, r, sp, o) = (-0.39309217591260026, 0, 10.0, NaN, 0)
