# Defining a Tabular POMDP

In this tutorial we will define a version of the Tiger POMDP example [1] with a tabular representation. 

To find out more about the explicit interface, please see this section of the POMDPs.jl documentation: [Explicit POMDP Interface](http://juliapomdp.github.io/POMDPs.jl/latest/explicit.html).

For a more detailed problem description and an alternative way of implementing it using a functional form see [Defining-a-POMDP-with-the-Explicit-Interface](https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Defining-a-POMDP-with-the-Explicit-Interface.ipynb).

[1] L. Pack Kaelbling, M. L. Littman, A. R. Cassandra, "Planning and Action in Partially Observable Domain", Artificial Intelligence, 1998.

In [1]:
using POMDPs
using POMDPModels

We will use the `TabularPOMDP` model provided by the package [POMDPModels.jl](https://github.com/JuliaPOMDP/POMDPModels.jl). 
The states, actions and observations are represented by integers.

The Tiger POMDP consists of 2 states, 3 actions, and 2 observations. The correspondence with the problem definition is as follows:

|integer | state
|--------|--------
|1       | tiger on the left 
|2       | tiger on the right 

|integer | observation
|--------|--------
|1       | hear on the left 
|2       | hear on the right 
   
|integer | action
|--------|--------
|1       | open left 
|2       | open right
|3       | listen


## Transition Matrix

The transition matrix is of size $2\times3\times2$. 

In [7]:
T = zeros(2,3,2) # |S|x|A|x|S|, T[s', a, s] = p(s'|a,s)
T[:,:,1] = [1. 0.5 0.5; 
            0. 0.5 0.5]
T[:,:,2] = [0. 0.5 0.5; 
            1. 0.5 0.5]

2×3 Array{Float64,2}:
 0.0  0.5  0.5
 1.0  0.5  0.5

## Observation Matrix

The observation matrix is of size $2\times3\times2$. 

In [8]:
O = zeros(2,3,2) # |O|x|A|x|S|, O[o, a, s] = p(o|a,s)
O[:,:,1] = [0.85 0.5 0.5; 
            0.15 0.5 0.5]
O[:,:,2] = [0.15 0.5 0.5; 
            0.85 0.5 0.5]

2×3 Array{Float64,2}:
 0.15  0.5  0.5
 0.85  0.5  0.5

## Reward Matrix

The reward matrix is of size $2 \times 3

In [9]:
R = [-1. -100. 10.; 
     -1. 10. -100.] # |S|x|A| state-action pair rewards

2×3 Array{Float64,2}:
 -1.0  -100.0    10.0
 -1.0    10.0  -100.0

## Model 

We can now instantiate a `TabularPOMDP` object. 

In addition to the transition, observation, and reward matrices we also need to give a discount factor in the constructor

In [10]:
discount = 0.95

pomdp = TabularPOMDP(T, R, O, discount);

## Seeing the model in action

When using the `TabularPOMDP` representation, all the functions from the explicit interface are automatically implemented and the model is ready to be used for simulation or solving. 

To learn how to solve POMDPs offline: TODO[link to tutorial]

To learn how to solve POMDPs online: TODO[link_to_tutorial]

We can run a simulation of our model using the `stepthrough` function

For more information on running simulations, see the [simulation tutorial](https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Running-Simulations.ipynb).

In [11]:
using POMDPSimulators
using POMDPPolicies

# policy that takes a random action
policy = RandomPolicy(pomdp)

for (s, a, r) in stepthrough(pomdp,policy, "s,a,r", max_steps=10)
    @show s
    @show a
    @show r
    println()
end

s = 2
a = 2
r = 10.0

s = 1
a = 3
r = 10.0

s = 2
a = 2
r = 10.0

s = 1
a = 1
r = -1.0

s = 1
a = 3
r = 10.0

s = 2
a = 3
r = -100.0

s = 1
a = 1
r = -1.0

s = 1
a = 1
r = -1.0

s = 1
a = 1
r = -1.0

s = 1
a = 1
r = -1.0

