In [1]:
using MCTS
using POMDPModels # for GridWorld

# Choosing Actions For Progressive Widening

Often, when we are using double progressive widening, we want to consider certain actions first. We can accomplish this by defining an `ActionGenerator` and implementing a new method of `next_action`.

## Trivial Example

First, let's begin with a trivial example: suppose that we only want to consider "up" actions in the grid world problem. We define the `OnlyUp` action generator to accomplish this.

In [2]:
type OnlyUp <: ActionGenerator end

function MCTS.next_action(gen::OnlyUp, mdp::GridWorld, s, snode)
    return GridWorldAction(:up)
end

Now we can use this in the DPW Solver:

In [3]:
mdp = GridWorld()
solver = DPWSolver(action_generator=OnlyUp(), n_iterations=10)
policy = solve(solver, mdp)
action(policy, GridWorldState(1,1))

POMDPModels.GridWorldAction(:up)

When the tree below is examined, we see that the planner only considers "up" actions.

In [4]:
TreeVisualizer(policy, GridWorldState(1,1))

## Considering previously explored actions

Considering only one action when solving a problem as above is usually a terrible idea. More often, we would like to consider some important action first, and then other actions. The following example considers a special "priority" action first, and then subsequently selects random actions. To accomplish this, `next_action` has to look at the node in the MCTS tree to check if the priority action has already been considered.

In [5]:
type PriorityAct <: ActionGenerator
    priority
    rng::AbstractRNG
end

function MCTS.next_action(gen::PriorityAct, mdp::GridWorld, s, snode::DPWStateNode)
    if haskey(snode.A, gen.priority) # the priority action is already there
        return rand(gen.rng, actions(mdp, s)) # add a random action
    else
        return gen.priority
    end
end

We can test this out, setting the priority action to "move right", as follows:

In [6]:
right_priority = PriorityAct(GridWorldAction(:right), Base.GLOBAL_RNG)
mdp = GridWorld()
solver = DPWSolver(action_generator=right_priority, n_iterations=5)
policy = solve(solver, mdp)
action(policy, GridWorldState(1,1))

POMDPModels.GridWorldAction(:up)

When we examine the tree below, we see that the "move right" priority action is always considered, but in nodes that have been visited several times, other actions are considered.

In [7]:
TreeVisualizer(policy, GridWorldState(1,1))