## Using an Offline Solver
In this section, we will walk through using an Offline solver to obtain a policy (typically approximately optimal in practice) for a POMDP. For more details on POMDPs and offline solvers, please consult Chapter 6 of the DMU textbook[1]. The solver we will be using is FIB or Fast Informed Bound[2], implemented [here](https://github.com/JuliaPOMDP/FIB.jl). We will also compare it against a policy that chooses actions at random.

[1] Kochenderfer, Mykel J. Decision Making Under Uncertainty: Theory and Application. MIT Press, 2015

[2] Smith, Trey, and Reid Simmons. Point-based POMDP Algorithms: Improved Analysis and Implementation. In Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2005.

### POMDP Model
For this example we will use the Tiger POMDP (defined in [POMDPModels](https://github.com/JuliaPOMDP/POMDPModels.jl)) which is an instance of an explicit POMDP. Please see this [notebook](Defining-a-POMDP-with-the-Explicit-Interface.ipynb) for how to define a POMDP with the explicit interface.

In [1]:
using POMDPs
using POMDPModels # For the problem
using FIB # For the solver
using POMDPPolicies # For creating a random policy

In [2]:
# Define the POMDP problem with default params
pomdp = TigerPOMDP()

TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95)

In [3]:
# Define the FIB Solver with default params
solver = FIBSolver()

FIBSolver(100, 0.001, false)

### Solving the POMDP
This is as simple as calling `solve` on the solver and the pomdp problem. The resulting policy is an `AlphaVectorPolicy` defined [here](https://github.com/JuliaPOMDP/POMDPPolicies.jl/blob/master/src/alpha_vector.jl) in `POMDPPolicies`. We also create the baseline random policy.

In [4]:
# Solve the problem offline and obtain the FIB policy which is an AlphaVectorPolicy
fib_policy = solve(solver, pomdp)

AlphaVectorPolicy{TigerPOMDP,Int64}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Array{Float64,1}[[86.6633, 86.6633], [92.271, -17.729], [-17.729, 92.271]], [0, 1, 2])

In [5]:
# Create a TigerPOMDP policy that chooses actions at random
rand_policy = RandomPolicy(pomdp);

### Benchmarking the Solver Policy
We will compare the performance of the FIB policy against that of the Random Policy for the Tiger POMDP. Since we only care about the discounted reward, we can use the rollout simulator defined in [POMDPSimulators](https://github.com/JuliaPOMDP/POMDPSimulators.jl). Checkout this [notebook](https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Running-Simulations.ipynb) for ways to use the other simulators as well. We will also need a method to update the belief state of the Tiger POMDP after taking an action and seeing an observation. More information about that is in [BeliefUpdaters](https://github.com/JuliaPOMDP/BeliefUpdaters.jl). The default updater for an `AlphaVectorPolicy` is the `DiscreteUpdater`, which is called when we do not provide an updater argument while calling `simulate`. Finally, we can compare the expected discounted rewards and see how the FIB policy does significantly better than random for this problem.

In [6]:
# Create and run the rollout simulator
using POMDPSimulators
rollout_sim = RolloutSimulator(max_steps=10);
history_fib = simulate(rollout_sim, pomdp, fib_policy);
history_rand = simulate(rollout_sim, pomdp, rand_policy);

In [7]:
@show history_fib;
@show history_rand;

history_fib = 10.413829097267579
history_rand = -202.2969902185546
