# Minimal Example: Light-Dark 1D

This is designed to be a minimal example to get POMCP running. The problaem is a one dimensional light-dark problem. The goal is to be near 0. Observations are noisy measuerements of position.

```
   -3-2-1 0 1 2 3
...| | | | | | | | ...
          G   S
```

The state is an integer. Measurements are most accurate at x = 5 (see the noise function below), and have uniformly distributed noise.

## Problem Definition

First, we will define the problem with the generative interface - this is all that is needed to use the POMCP Planner.

In [1]:
importall POMDPs

In [2]:
immutable LightDark1DState
    status::Int
    x::Int
end
Base.:(==)(s1::LightDark1DState, s2::LightDark1DState) = (s1.status == s2.status) && (s1.x == s2.x)
Base.hash(s::LightDark1DState, h::UInt64=zero(UInt64)) = hash(s.status, hash(s.x, h));

In [3]:
type LightDark1D <: POMDPs.POMDP{LightDark1DState,Int,Int}
    discount_factor::Float64
    correct_r::Float64
    incorrect_r::Float64
    step_size::Int
    movement_cost::Float64
end
LightDark1D() = LightDark1D(0.9, 10, -10, 1, 0)
discount(p::LightDark1D) = p.discount_factor
isterminal(::LightDark1D, s::LightDark1DState) = s.status < 0;

In [4]:
noise(x) = ceil(Int, abs(x - 5)/sqrt(2) + 1e-2)

function generate_o(p::LightDark1D, s::LightDark1DState, a::Int, sp::LightDark1DState, rng::AbstractRNG)
    n = noise(sp.x)
    return sp.x + rand(rng, -n:n)
end

function generate_s(p::LightDark1D, s::LightDark1DState, a::Int, rng::AbstractRNG)
    if s.status < 0                  # Terminal state
        return s
    end
    if a == 0                   # Enter
        sprime = LightDark1DState(-1, s.x)
    else
        sprime = LightDark1DState(s.status, s.x+a)
    end
    return sprime
end

function reward(p::LightDark1D, s::LightDark1DState, a::Int, sp::LightDark1DState)
    if s.status < 0
        return 0.0
    end
    if a == 0
        if abs(s.x) < 1
            return p.correct_r
        else
            return p.incorrect_r
        end
    else
        return 0.0
    end 
end;

In [5]:
actions(::LightDark1D) = [-1, 0, 1] # Left Stop Right
n_actions(::LightDark1D) = 3

immutable LDNormalStateDist
    mean::Float64
    std::Float64
end
function Base.rand(rng::AbstractRNG, d::LDNormalStateDist)
    return LightDark1DState(0, round(Int, d.mean + randn(rng)*d.std))
end
function initial_state_distribution(pomdp::LightDark1D)
    return LDNormalStateDist(2, 3)
end;

## Using the POMCP Planner

We can now use the POMCP Planner to make a decision about the best action to be taken at a state.

In [6]:
using BasicPOMCP
using POMDPToolbox

[1m[36mINFO: [39m[22m[36mRecompiling stale cache file /home/zach/.julia/lib/v0.6/POMDPToolbox.ji for module POMDPToolbox.
[39m[1m[36mINFO: [39m[22m[36mRecompiling stale cache file /home/zach/.julia/lib/v0.6/ParticleFilters.ji for module ParticleFilters.
[39m[1m[36mINFO: [39m[22m[36mRecompiling stale cache file /home/zach/.julia/lib/v0.6/MCTS.ji for module MCTS.
[39m

In [7]:
solver = POMCPSolver()
pomdp = LightDark1D()
planner = solve(solver, pomdp);

In [8]:
b = initial_state_distribution(pomdp)
a = action(planner, b)
println("""
    POMCP Recommends action $a for belief $b.

    (this may be a bad choice because the POMCP Parameters are set to their defaults.)
""")

    POMCP Recommends action 1 for belief LDNormalStateDist(2.0, 3.0).

    (this may be a bad choice because the POMCP Parameters are set to their defaults.)



## Running in Simulation

Unfortunately this model won't yet work in simulation because, since we have not defined the observation probability distribution, the weighted particle filter used for belief updates will not work.

In [9]:
try
    simulate(RolloutSimulator(), pomdp, planner)
catch ex
    print(STDERR, "ERROR: ")
    showerror(STDERR, ex)
end

ERROR:  LightDark1D is not compatible with the default belief updater for POMCP.

    The default belief updater for a POMCPSolver is the `SIRParticleFilter` from ParticleFilters.jl. However this requires `ParticleFilters.obs_weight(::LightDark1D, ::LightDark1DState, ::Int64, ::LightDark1DState, ::Int64)`, and this was not implemented. You can still use the POMCPSolver without this, but simulation with this updater will probably fail. See the documentation for `ParticleFilters.obs_weight` for more details.


## Enabling Belief Updates

In order for the particle filter to re-weight the particles, we need to define the observation distribution.

In [10]:
immutable LDObsDist
    x::Int
    noise::Int
end

function pdf(d::LDObsDist, x::Int)
    if abs(x-d.x) <= d.noise
        return 1/(2*d.noise+1)
    else
        return 0.0
    end
end

function observation(p::LightDark1D, a::Int, sp::LightDark1DState)
    return LDObsDist(sp.x, noise(sp.x))
end

observation (generic function with 4 methods)

Now we can run a simulation with a particle filter.

In [11]:
using ParticleFilters

In [12]:
filter = SIRParticleFilter(pomdp, 1000)
simulate(RolloutSimulator(), pomdp, planner, filter)

2.058911320946491