# Minimal Example: Light-Dark 1D

This is designed to be a minimal example to get POMCP running. The problem is a one dimensional light-dark problem. The state is the position, which is unknown initially. The agent can move left (a=-1) or right (a=1) or can choose to terminate the problem (a=0). If the state is in (-1, 1) when the problem is ended, a reward is given, otherwise there is a penalty. Observations are noisy measuerements of position.

```
   -3-2-1 0 1 2 3
...| | | | | | | | ...
          G   S
```

The state is an integer. Measurements are most accurate at x = 5 (see the noise function below), and have uniformly distributed noise.

## Problem Definition

First, we will define the problem with the generative interface - this is all that is needed to use the POMCP Planner.

In [None]:
using POMDPs
using Distributions # for Normal
using Random
import POMDPs: initialstate_distribution, actions, n_actions, reward, generate_o, generate_s, discount
Random.seed!(1);

In [None]:
mutable struct LightDark1D <: POMDPs.POMDP{Float64,Int,Int}
    discount_factor::Float64
    correct_r::Float64
    incorrect_r::Float64
    step_size::Int
    movement_cost::Float64
end
LightDark1D() = LightDark1D(0.9, 10, -10, 1, 0)
discount(p::LightDark1D) = p.discount_factor
isterminal(::LightDark1D, s::Float64) = isnan(s);

In [None]:
noise(x) = ceil(Int, abs(x - 5)/sqrt(2) + 1e-2)

function generate_o(p::LightDark1D, s::Float64, a::Int, sp::Float64, rng::AbstractRNG)
    if isnan(sp)
        return 0
    else
        n = noise(sp)
        return round(Int, sp) + rand(rng, -n:n)
    end
end

function generate_s(p::LightDark1D, s::Float64, a::Int, rng::AbstractRNG)
    if a == 0
        return NaN
    else
        return s+a
    end
end

function reward(p::LightDark1D, s::Float64, a::Int, sp::Float64)
    if a == 0
        if abs(s) < 1
            return p.correct_r
        else
            return p.incorrect_r
        end
    else
        return 0.0
    end 
end;

In [None]:
actions(::LightDark1D) = [-1, 0, 1] # Left Stop Right
n_actions(::LightDark1D) = 3

function initialstate_distribution(pomdp::LightDark1D)
    return Normal(2.0, 3.0)
end;

## Using the POMCP Planner

We can now use the POMCP Planner to make a decision about the best action to be taken at a state.

In [None]:
using BasicPOMCP
using POMDPSimulators

In [None]:
solver = POMCPSolver(tree_queries=10000, c=10)
pomdp = LightDark1D()
planner = solve(solver, pomdp);

In [None]:
b = initialstate_distribution(pomdp)
a = action(planner, b)
println("""
    POMCP Recommends action $a for belief $b.

    (this may be a bad choice because the POMCP Parameters are set to their defaults.)
""")

### Simulations

We can also use the planner in a simulation:

In [None]:
for (s,a,r,sp,o) in stepthrough(pomdp, planner, "sarspo")
    @show (s,a,r,sp,o)
end

## Reliable Belief Updates

By default, if the POMDP does not have an explicit observation model implemented (`POMDPs.observation()` or `ParticleFilters.obs_weight()`), POMCP will attempt to use the unweighted rejection particle filter defined here: https://github.com/JuliaPOMDP/ParticleFilters.jl/blob/master/src/unweighted.jl. Our `LightDark1D` POMDP has a small enough observation space for that to work, but in most cases, we will need to use a weighted particle filter, which will require definition of the observation distribution.

## Enabling Weighted Belief Updates

In order for the particle filter to re-weight the particles, we need to define the observation distribution.

In [None]:
immutable LDObsDist
    x::Int
    noise::Int
end

function pdf(d::LDObsDist, x::Int)
    if abs(x-d.x) <= d.noise
        return 1/(2*d.noise+1)
    else
        return 0.0
    end
end

function observation(p::LightDark1D, a::Int, sp::Float64)
    if isnan(sp)
        return LDObsDist(0, 0)
    else
        return LDObsDist(round(Int, sp), noise(sp))
    end
end

Now we can run a simulation with a particle filter.

In [None]:
using ParticleFilters

In [None]:
filter = SIRParticleFilter(pomdp, 1000)
for (s,a,r,sp,o) in stepthrough(pomdp, planner, filter, "sarspo")
    @show (s,a,r,sp,o)
end