MDP and POMDP data structures

The java source code of the toolbox provides several classes to model MDPs and POMDPs. In both cases it is possible to define additional cost functions which can be used in constraints. The figure illustrates the inheritance structure of the classes, which we further discuss below.

Inheritance structure of classes

MDP

The MDP class represents a basic finite-horizon Markov Decision Process.

Attributes

nStates: number of states
nActions: number of actions
initialState: initial state
nDecisions: number of decisions, defining the horizon
feasibleActions: feasible[t][s] is an array containing the actions that are feasible to execute in state s at time t
rewardsDefined: true if rewards have been set, false otherwise
hasTimeDependentReward: true if rewards are time dependent, false otherwise
rewardFunction: rewardFunction[s][a] is the reward collected when executing action a in state s. This function is only defined if hasTimeDependentReward equals false.
timeRewardFunction: timeRewardFunction[t][s][a] is the reward collected when executing action a in state s at time t. This function is only defined if hasTimeDependentReward equals true.
minReward: minimum reward in the model, computed upon setting reward function
maxReward: maximum reward in the model, computed upon setting reward function

Initialization

Create MDP object using the constructor MDP(int nStates, int nActions, int initialState, int nDecisions)
Set a reward function using the method setRewardFunction
Set a transition function using the method setTransitionFunction
Set feasible actions using the method setFeasibleActions (optional)

POMDP

CMDP

CPOMDP

The ConstrainedPlanningToolbox has been developed by the Algorithmics group at Delft University of Technology, The Netherlands. Please visit our website for more information.