-
Notifications
You must be signed in to change notification settings - Fork 1
MDP and POMDP data structures
Erwin Walraven edited this page Nov 20, 2018
·
8 revisions
The java source code of the toolbox provides several classes to model MDPs and POMDPs. In both cases it is possible to define additional cost functions which can be used in constraints. The figure illustrates the inheritance structure of the classes, which we further discuss below.

The MDP class represents a basic finite-horizon Markov Decision Process.
- nStates: number of states
- nActions: number of actions
- initialState: initial state
- nDecisions: number of decisions, defining the horizon
- feasibleActions: feasible[t][s] is an array containing the actions that are feasible to execute in state s at time t
- rewardsDefined: true if rewards have been set, false otherwise
- hasTimeDependentReward: true if rewards are time dependent, false otherwise
- rewardFunction: rewardFunction[s][a] is the reward collected when executing action a in state s. This function is only defined if hasTimeDependentReward equals false.
- timeRewardFunction: timeRewardFunction[t][s][a] is the reward collected when executing action a in state s at time t. This function is only defined if hasTimeDependentReward equals true.
- minReward: minimum reward in the model, computed upon setting reward function
- maxReward: maximum reward in the model, computed upon setting reward function
- Create MDP object using the constructor
MDP(int nStates, int nActions, int initialState, int nDecisions) - Set a reward function using the method setRewardFunction
- Set a transition function using the method setTransitionFunction
- Set feasible actions using the method setFeasibleActions (optional)
The ConstrainedPlanningToolbox has been developed by the Algorithmics group at Delft University of Technology, The Netherlands. Please visit our website for more information.