MDP and POMDP data structures

The java source code of the toolbox provides several classes to model MDPs and POMDPs. In both cases it is possible to define additional cost functions which can be used in constraints. The figure illustrates the inheritance structure of the classes, which we further discuss below.

Inheritance structure of classes

MDP

The MDP class represents a basic finite-horizon Markov Decision Process.

Attributes

nStates: number of states
nActions: number of actions
initialState: initial state
nDecisions: number of decisions, defining the horizon
feasibleActions: feasible[t][s] is an array containing the actions that are feasible to execute in state s at time t
rewardsDefined: true if rewards have been set, false otherwise
hasTimeDependentReward: true if rewards are time dependent, false otherwise
rewardFunction: rewardFunction[s][a] is the reward collected when executing action a in state s. This function is only defined if hasTimeDependentReward equals false.
timeRewardFunction: timeRewardFunction[t][s][a] is the reward collected when executing action a in state s at time t. This function is only defined if hasTimeDependentReward equals true.
minReward: minimum reward in the model, computed upon setting reward function
maxReward: maximum reward in the model, computed upon setting reward function
transitionsDefined: true if transitions have been set, false otherwise
hasTimeDependentTransitions: true if transitions are time dependent, false otherwise
transitionDestinations: transitionDestinations[s][a] is an array containing states reachable when executing action a in state s. This function is only defined if hasTimeDependentTransitions equals false.
transitionProbabilities: transitionProbabilities[s][a] is an array containing the transition probabilities for the states reachable when executing action a in state s. These probabilities correspond to the destinations in transitionDestinations[s][a]. This function is only defined if hasTimeDependentTransitions equals false.
timeTransitionDestinations: timeTransitionDestinations[t][s][a] is an array containing states reachable when executing action in state s at time t. This function is only defined if hasTimeDependentTransitions equals true.
timeTransitionProbabilities: timeTransitionProbabilities[t][s][a] is an array containing transition probabilities for the states reachable when executing action a in states s. These probabilities correspond to the destinations in timeTransitionDestinations[t][s][a]. This function is only defined if hasTimeDependentTransitions equals true.

Initialization

Create MDP object using the constructor MDP(int nStates, int nActions, int initialState, int nDecisions)
Set a reward function using the method setRewardFunction
Set a transition function using the method setTransitionFunction
Set feasible actions using the method setFeasibleActions (optional)

POMDP

The POMDP class represents a basic finite-horizon Partially Observable Markov Decision Process, and it extends the MDP class.

Attributes

nObservations: number of observations
observationFunction: observationFunction[a][sNext][o] represents the probability to observe o after executing a and transitioning to state sNext.
b0: the initial belief, represented by a BeliefPoint object

Initialization

Create POMDP object using the constructor POMDP(int nStates, int nActions, int nObservations, double[][][] observationFunction, BeliefPoint b0, int nDecisions)
Set a reward function using the method setRewardFunction
Set a transition function using the method setTransitionFunction
Set feasible actions using the method setFeasibleActions (optional)

CMDP

The CMDP class represents a basic finite-horizon Markov Decision Process with additional cost functions, and it extends the MDP class.

Attributes

costFunctions: a list containing cost functions. Each cost function is defined by an array costFunction such that costFunction[s][a] is the cost for executing action a in state s
minCost: array with minimum cost for each cost function, computed upon initialization
maxCost: array with maximum cost for each cost function, computed upon initialization

Initialization

Create CMDP object using the constructor CMDP(int nStates, int nActions, List<double[][]> costFunctions, int initialState, int nDecisions)
Set a reward function using the method setRewardFunction
Set a transition function using the method setTransitionFunction
Set feasible actions using the method setFeasibleActions (optional)

CPOMDP

The CPOMDP class represents a basic finite-horizon Partially Observable Markov Decision Process with additional cost functions, and it extends the POMDP class.

Attributes

costFunctions: a list containing cost functions. Each cost function is defined by an array costFunction such that costFunction[s][a] is the cost for executing action a in state s
minCost: array with minimum cost for each cost function, computed upon initialization
maxCost: array with maximum cost for each cost function, computed upon initialization

Initialization

Create CPOMDP object using the constructor CPOMDP(int nStates, int nActions, int nObservations, List<double[][]> costFunctions, double[][][] observationFunction, BeliefPoint b0, int nDecisions)
Set a reward function using the method setRewardFunction
Set a transition function using the method setTransitionFunction
Set feasible actions using the method setFeasibleActions (optional)

The ConstrainedPlanningToolbox has been developed by the Algorithmics group at Delft University of Technology, The Netherlands. Please visit our website for more information.