# Running Simulations

All simulators in POMDPs.jl should be consistent with the [Simulation Standard](https://juliapomdp.github.io/POMDPs.jl/latest/simulation), and the Simulation Standard document should be consulted to answer questions about how a simulation is defined.

There are several ways to perform simulations in the POMDPs.jl ecosystem. The [POMDPSimulators package](https://github.com/JuliaPOMDP/POMDPSimulators.jl) provides implementations of several simulators and [a page to help decide which to use](https://juliapomdp.github.io/POMDPSimulators.jl/latest/which). This tutorial will show a few examples demonstrating how to use the simulators, but the POMDPSimulators documentation serves as a more in-depth reference.

## Ingredients

The [Simulation Standard documentation](https://juliapomdp.github.io/POMDPs.jl/latest/simulation) gives information about ingredients to a simulation. The minimum requirement is for a `POMDP` or `MDP` model and a `Policy`.

Models from the [POMDPModels package](https://github.com/JuliaPOMDP/POMDPModels.jl) and policies from the [POMDPPolicies package](https://github.com/JuliaPOMDP/POMDPPolicies.jl) will be used in these examples. For more information on defining models, see the tutorials about defining POMDPs and MDPs; for information on hand-specifying policies see the Heuristic Policies tutorial [TODO: add link]; and for examples of solving for policies, see the tutorials about solving POMDPs online and offline [TODO: add link]. Also consult the docstrings for the various functions used here.

In [1]:
using POMDPs
using POMDPSimulators
using POMDPModels
using POMDPPolicies
using POMDPModelTools
using BeliefUpdaters
using Printf
using Random

Below we define some problems, policies, and belief updaters that will be used in the simulations. Use [Julia's help system (type `?`)](https://docs.julialang.org/en/v1/stdlib/REPL/#Help-mode-1) to find out more by reading the docstrings.

In [2]:
tiger = TigerPOMDP()
grid = SimpleGridWorld();
listen = FunctionPolicy(b->TIGER_LISTEN)
tiger_filter = DiscreteUpdater(tiger);

## Stepthrough

The recommended first way to try out POMDPs.jl simulations is with the [`stepthrough` function](https://juliapomdp.github.io/POMDPSimulators.jl/latest/stepthrough/#POMDPSimulators.stepthrough). This function provides a window into the simulation with a for-loop syntax.

Within the body of the for loop, we have access to the belief, `b`, the action, `a`, the observation, `o`, and the reward, `r`, in each step. As more information is gathered through listening, confidence about whether the tiger is in the left or right door increases dramatically. The sum of the rewards is also calculated here, but note that this is *not the discounted reward*.

In [3]:
rsum = 0.0
for (b,a,o,r) in stepthrough(tiger, listen, tiger_filter, "b,a,o,r", max_steps=5)
    @printf("belief that tiger is behind left door: %4.1f%%\n",
            100*pdf(b, TIGER_LEFT))
    @printf("action: %s, observation: %s\n", a, o)
    global rsum += r
end
@show rsum;

belief that tiger is behind left door: 50.0%
action: 0, observation: true
belief that tiger is behind left door: 85.0%
action: 0, observation: true
belief that tiger is behind left door: 97.0%
action: 0, observation: false
belief that tiger is behind left door: 85.0%
action: 0, observation: true
belief that tiger is behind left door: 97.0%
action: 0, observation: true
rsum = -5.0


## Rollout Simulations

While `stepthrough` is a flexible and convenient tool for many user-facing demonstration, it is often less error-prone to use the standard [`simulate`](https://juliapomdp.github.io/POMDPs.jl/latest/api#POMDPs.simulate) function with a [`Simulator`](https://juliapomdp.github.io/POMDPs.jl/latest/concepts#Simulators-1) object. The simplest `Simulator` is the [`RolloutSimulator`](https://juliapomdp.github.io/POMDPSimulators.jl/latest/rollout#POMDPSimulators.RolloutSimulator). It simply runs a simulation as fast as possible and returns the discounted reward. Note that discounted reward for the simulation with the larger number of steps approaches the theoretical reward for a policy of always listening, $\Sigma_{t=0}^\infty -5 \gamma^t$.

In [4]:
short = RolloutSimulator(max_steps=5)
short_dr = simulate(short, tiger, listen)
long = RolloutSimulator(max_steps=1_000_000)
long_dr = simulate(long, tiger, listen)
@show short_dr
@show long_dr;

short_dr = -4.524381249999999
long_dr = -19.999999999999975


## Recording Histories

Sometimes it is important to record the entire history of a simulation for further examination. This can be accomplished with a [`HistoryRecorder`](https://juliapomdp.github.io/POMDPSimulators.jl/latest/history_recorder#POMDPSimulators.HistoryRecorder). 

In [5]:
hr = HistoryRecorder(max_steps=5)
history = simulate(hr, tiger, listen, tiger_filter);

### Histories

The history object produced by a `HistoryRecorder` is a [`SimHistory`, documented in the POMDPSimulators package](https://juliapomdp.github.io/POMDPSimulators.jl/latest/histories#Histories-1). The information in this object can be accessed in several ways. For example, there is a function:

In [6]:
discounted_reward(history)

-4.524381249999999

Accessor functions like `state_hist` and `action_hist` can be used to access parts of the history:

In [10]:
@show state_hist(history)
@show collect(observation_hist(history));

state_hist(history) = Bool[0, 0, 0, 0, 0, 0]
collect(observation_hist(history)) = Bool[0, 0, 0, 0, 0]


However, keeping track of which states, actions, and observations belong together can be tricky (for example, since there is a starting state, and ending state, but no action is taken from the ending state, the list of actions has a different length than the list of states; Note the difference in the length of the state and observation histories above). For this reason, it is often better to think of histories in terms of *steps* that include both starting and ending states, i.e. $(s, a, r, s')$/`(s, a, r, sp)` tuples for MDPs.

### `eachstep`

The most powerful function for accessing the information in a `SimHistory` is the [`eachstep` function](https://juliapomdp.github.io/POMDPSimulators.jl/latest/histories#POMDPSimulators.eachstep) which returns an iterator through [`NamedTuple`](https://docs.julialang.org/en/v1/manual/types/index.html#Named-Tuple-Types-1)s representing each step in the history. The `eachstep` function is similar to the `stepthrough` function above except that it iterates through the immutable steps of a previously simulated history instead of conducting the simulation as the for loop is being carried out.

In [11]:
rsum = 0.0
for (b,a,o,r) in eachstep(history, "b,a,o,r")
    @printf("belief that tiger is behind left door: %4.1f%%\n",
            100*pdf(b, TIGER_LEFT))
    @printf("action: %s, observation: %s\n", a, o)
    global rsum += r
end
@show rsum;

belief that tiger is behind left door: 50.0%
action: 0, observation: false
belief that tiger is behind left door: 15.0%
action: 0, observation: false
belief that tiger is behind left door:  3.0%
action: 0, observation: false
belief that tiger is behind left door:  0.5%
action: 0, observation: false
belief that tiger is behind left door:  0.1%
action: 0, observation: false
rsum = -5.0


Since each step returned by `eachstep` is a [`NamedTuple`](https://docs.julialang.org/en/v1/manual/types/index.html#Named-Tuple-Types-1), the elements in it can also be accessed as fields:

In [12]:
rsum = 0.0
for step in eachstep(history)
    @printf("belief that tiger is behind left door: %4.1f%%\n",
            100*pdf(step.b, TIGER_LEFT))
    @printf("action: %s, observation: %s\n", step.a, step.o)
    global rsum += step.r
end
@show rsum;

belief that tiger is behind left door: 50.0%
action: 0, observation: false
belief that tiger is behind left door: 15.0%
action: 0, observation: false
belief that tiger is behind left door:  3.0%
action: 0, observation: false
belief that tiger is behind left door:  0.5%
action: 0, observation: false
belief that tiger is behind left door:  0.1%
action: 0, observation: false
rsum = -5.0


## Parallel Simulations

It is often useful to evaluate a policy by running many simulations.
The parallel simulator is the most effective tool for this.
To use the parallel simulator, one should first create a list of `Sim` objects, each of which contains all of the information needed to run a simulation, and then run the simulations using `run_parallel`, which will return a `DataFrame` with the results.

Details can be found in the [POMDPSimulators package documentation](https://juliapomdp.github.io/POMDPSimulators.jl/stable/parallel/).

In [13]:
pomdp = BabyPOMDP()
fwc = FeedWhenCrying()
rnd = solve(RandomSolver(MersenneTwister(7)), pomdp)

q = [] # vector of the simulations to be run
push!(q, Sim(pomdp, fwc, max_steps=32, rng=MersenneTwister(4), metadata=Dict(:policy=>"feed when crying")))
push!(q, Sim(pomdp, rnd, max_steps=32, rng=MersenneTwister(4), metadata=Dict(:policy=>"random")))

# this creates two simulations, one with the feed-when-crying policy and one with a random policy

data = run_parallel(q)

│ 
│ 
│ To use multiple processes, use addprocs() or the -p option (e.g. julia -p 4) and make sure the correct worker pool is assigned to argument `pool` in the call to run_parallel.
└ @ POMDPSimulators /home/zach/.julia/packages/POMDPSimulators/Ipuzk/src/parallel.jl:123
[32mSimulating...100%|██████████████████████████████████████| Time: 0:00:01[39m


Unnamed: 0_level_0,policy,reward
Unnamed: 0_level_1,String⍰,Float64⍰
1,feed when crying,-4.5874
2,random,-27.4139


By default, the parallel simulator only returns the reward from each simulation, but more information can be gathered by specifying a function to analyze the `Sim`-history pair and record additional statistics.

In [15]:
# to perform additional analysis on each of the simulations one can define a processing function with the `do` syntax:
data2 = run_parallel(q, show_progress=false) do sim, hist
    println("finished a simulation - final state was $(last(state_hist(hist)))")
    return [:steps=>n_steps(hist), :reward=>discounted_reward(hist)]
end

finished a simulation - final state was false


│ 
│ 
│ To use multiple processes, use addprocs() or the -p option (e.g. julia -p 4) and make sure the correct worker pool is assigned to argument `pool` in the call to run_parallel.
└ @ POMDPSimulators /home/zach/.julia/packages/POMDPSimulators/Ipuzk/src/parallel.jl:123


finished a simulation - final state was false


Unnamed: 0_level_0,policy,reward,steps
Unnamed: 0_level_1,String⍰,Float64⍰,Float64⍰
1,feed when crying,-18.2874,32.0
2,random,-17.7054,32.0


## `sim()` [This tutorial is a work in progress]

The [`sim` function](https://juliapomdp.github.io/POMDPSimulators.jl/latest/sim) provides a convenient way to interact with a `POMDP` or `MDP` environment from the perspective of an agent acting in that environment.

TODO: Add Example