-
Notifications
You must be signed in to change notification settings - Fork 1
Solutions
Solution objects represent the solution computed by a planning algorithm, and they can be used by the agents to decide how to behave in an uncertain environment with limited resource availability. For example, a solution can be a policy describing the action to execute depending on the environment state. Other examples are collections of policies and finite-state controllers. The toolbox provide generic data structures which represent such solutions, and below we discuss them in more detail for both Markov Decision Processes and Partially Observable Markov Decision Processes.
- Set of MDP policies
- Deterministic MDP policy
- Stochastic MDP policy
- Set of POMDP policies
- Deterministic vector-based POMDP policy
- Deterministic finite-state controller for POMDPs
- Stochastic finite-state controller for POMDPs
The solution corresponding to an agent is defined by an MDPSolutionFinite object, which provides a getPolicy() method that returns a policy that the agent should execute. We can distinguish two types of solutions, for which we provide an overview below.

Class: directly implemented by policies (see next section)
If the agent has one single policy to execute, then the MDPSolutionFinite object can be seen as a wrapper around the policy, and the call to getPolicy() immediately returns the policy.
Class: solutions.MDPPolicyFiniteSet
The MDPPolicyFiniteSet class represents a solution in which an agent has a set of policies with corresponding probabilities. Upon calling the `getPolicy()' method, the solution object samples a policy from the distribution, which it subsequently returns. Each policy in the set should be represented by an MDPPolicyFinite object, which we discuss below.
Policies are represented by an MDPPolicyFinite object. This is an interface that contains the method getAction(t,s), which should return the action to be executed in state s at time t. Currently there are two implementations of the policy interface available, which we discuss below. The structure of the interface and the implementing classes is also visualized in the second figure.
Both implementing classes also implement the MDPSolutionFinite interface, and they contain a function getPolicy() which returns its own object.


Class: solutions.MDPPolicyFiniteDet
The getAction(t,s) method returns the action to be executed in state s at time t.
Class: solutions.MDPPolicyFiniteStochastic
The getAction(t,s) method samples an action from the distribution represented by the stochastic policy, and it returns this action. Calling getAction(t,s) multiple times for the same t and s may give different actions due to the stochastic nature of the policy.
The solution corresponding to an agent is defined by an POMDPSolutionFinite object, which provides a getPolicy() method that returns a policy that the agent should execute. We can distinguish two types of solutions, for which we provide an overview below.
Class: directly implemented by policies (see next section)
If the agent has one single policy to execute, then the POMDPSolutionFinite object can be seen as a wrapper around the policy, and the call to getPolicy() immediately returns the policy.
Class: solutions.POMDPPolicyFiniteSet
The POMDPPolicyFiniteSet class represents a solution in which an agent has a set of policies with corresponding probabilities. Upon calling the `getPolicy()' method, the solution object samples a policy from the distribution, which it subsequently returns. Each policy in the set should be represented by an POMDPPolicyFinite object, which we discuss below.
Policies are represented by an POMDPPolicyFinite object. This is an interface that contains multiple methods:
- getAction(b,t): returns the action to be executed in belief b at time t
- update(a,o): update data structure representing the policy, depending on the action executed by the agent and the observation received (e.g., transition to a new state in a finite-state controller)
- reset(): resets the data structures which represents the policy (e.g., set state of finite-state controller to initial state)
Currently there are three implementations of the policy interface available, which we discuss below. The structure of the interface and the implementing classes is also visualized in the figure.
All implementing classes also implement the POMDPSolutionFinite interface, and they contain a function getPolicy() which returns its own object.

Class: solutions.POMDPPolicyFiniteVector
The getAction(b,t) method returns the action to be executed in belief b at time t. The policy is represented by a set of alpha vectors for each time step.
Class: solutions.POMDPPolicyFiniteGraph
The getAction(b,t) method returns the action to be executed based on the current state of the finite-state controller. The update(a,o) method implements the transition of controller states, and reset() sets the current state to the initial state of the controller.
For more details about this policy representation we refer to: Walraven, E., & Spaan, M. T. J. (2018). Column Generation Algorithms for Constrained POMDPs. Journal of Artificial Intelligence Research, 62, 489–533.
Class: solutions.POMDPPolicyFiniteFSC
The getAction(b,t) method returns the action to be executed based on the current state of the finite-state controller. The update(a,o) method implements the transition of controller states, and reset() sets the current state to the initial state of the controller.
For more details about this policy representation we refer to: Poupart, P., Malhotra, A., Pei, P., Kim, K.E., Goh, B., & Bowling, M. (2015). Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (pp. 3342–3348).
The ConstrainedPlanningToolbox has been developed by the Algorithmics group at Delft University of Technology, The Netherlands. Please visit our website for more information.