Skip to content

Solutions

Erwin Walraven edited this page Nov 12, 2018 · 20 revisions

Solution objects represent the solution computed by a planning algorithm, and they can be used by the agents to decide how to behave in an uncertain environment with limited resource availability. For example, a solution can be a policy describing the action to execute depending on the environment state. Other examples are collections of policies and finite-state controllers. The toolbox provide generic data structures which represent such solutions, and below we discuss them in more detail for both Markov Decision Processes and Partially Observable Markov Decision Processes.

Solutions for Markov Decision Processes

The solution corresponding to an agent is defined by an MDPSolutionFinite object, which provides a getPolicy() method that returns a policy that the agent should execute.

Solutions

  • individual policy
  • set of policies

Policies

Policies are represented by an MDPPolicyFinite object. This is an interface that contains the method getAction(t,s), which should return the action to be executed in state s at time t. Currently there are two implementations of the policy interface available, which we discuss below.

TODO: fig with implementing classes

Deterministic policy

Class: solutions.MDPPolicyFiniteDet

The getAction(t,s) method returns the action to be executed in state s at time t.

Stochastic policy

Class: solutions.MDPPolicyFiniteStochastic

The getAction(t,s) method samples an action from the distribution represented by the stochastic policy, and it returns this action. Calling getAction(t,s) multiple times for the same t and s may give different actions due to the stochastic nature of the policy.

Solutions for Partially Observable Markov Decision Processes

  • vector-based policy
  • deterministic policy graph
  • stochastic finite-state controller
  • set of policies

Clone this wiki locally