## Using an Online Solver
In this section, we will walk through using an Online solver, which takes actions during runtime without any offline pre-processing phase. For more details on POMDPs and offline solvers, please consult Chapter 6 of the DMU textbook [1]. The solver we will use is POMCP or Partially Observable Monte Carlo Planning [2], implemented [here](https://github.com/JuliaPOMDP/BasicPOMCP.jl). We will compare it against a policy that chooses actions at random.

[1] Kochenderfer, Mykel J. Decision Making Under Uncertainty: Theory and Application. MIT Press, 2015

[2] Silver, David, and Joel Veness. "Monte-Carlo planning in large POMDPs." In Advances in Neural Information Processing Systems, 2010.

### POMDP Model
For this example we will use the LightDark1D POMDP (defined in [POMDPModels](https://github.com/JuliaPOMDP/POMDPModels.jl)) which is an instance of an explicit POMDP. Please see this [notebook](https://github.com/JuliaPOMDP/POMDPExamples) for how to define a POMDP with the explicit interface.

In [1]:
using POMDPs
using POMDPModels # For the problem
using BasicPOMCP # For the solver
using POMDPPolicies # For creating a random policy

In [2]:
# Define the POMDP problem with default params
pomdp = LightDark1D()

LightDark1D{typeof(POMDPModels.default_sigma)}(0.9, 10.0, -10.0, 1.0, 0.0, POMDPModels.default_sigma)

In [9]:
# Define the POMCP solver; use keyword arguments to adjust parameters
solver = POMCPSolver(c=10.0)

POMCPSolver
  max_depth: Int64 20
  c: Float64 10.0
  tree_queries: Int64 1000
  max_time: Float64 Inf
  tree_in_info: Bool false
  default_action: ExceptionRethrow ExceptionRethrow()
  rng: Random.MersenneTwister
  estimate_value: RolloutEstimator


### Creating the planner

A planner is created by calling `solve` on the solver and the pomdp problem.
Since POMCP is an online solver, it does not actually do any computation in `solve`; instead the "planner" object that is returned by solve has a method of `POMDPs.action` that does planning calculations online during a simulation. 
Semantically, the planner functions identically to a `POMDPs.Policy` object (and is indeed a subtype of `POMDPs.Policy`).

In [12]:
planner = solve(solver, pomdp);

We will compare the planner against a random policy:

In [5]:
# Create a LightDark1D policy that chooses actions at random
rand_policy = RandomPolicy(pomdp);

### Benchmarking the planner
We will compare the performance of the POMCP planner against that of the Random Policy for the LightDark1D POMDP. Since we only care about the discounted reward, we can use the rollout simulator defined in [POMDPSimulators](https://github.com/JuliaPOMDP/POMDPSimulators.jl). Checkout this [notebook](https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Running-Simulations.ipynb) for ways to use the other simulators as well. 

We will also need a method to update the belief state of the POMDP after taking an action and seeing an observation. More information about that is in [BeliefUpdaters](https://github.com/JuliaPOMDP/BeliefUpdaters.jl). The default updater for a `FunctionPolicy` is the `PreviousObservationUpdater`, which is called when we do not provide an updater argument while calling `simulate`. However, we will use the more sophisticated `ParticleFilters` updater. See [ParticleFilters](https://github.com/JuliaPOMDP/ParticleFilters.jl) for more information. Finally, we can compare the expected discounted rewards and see how the POMCP planner usually does significantly better than random.

In [24]:
using POMDPSimulators 
using ParticleFilters

# Define the specific unweighted particle filter to be used
pf = SIRParticleFilter(pomdp, 1000);

In [25]:
# Create and run the rollout simulator
rollout_sim = RolloutSimulator(max_steps=10);
r_pomcp = simulate(rollout_sim, pomdp, planner, pf);
r_rand = simulate(rollout_sim, pomdp, rand_policy);

In [26]:
@show r_pomcp;
@show r_rand;

r_pomcp = 8.100000000000001
r_rand = -9.0
