Skip to content

MDP and POMDP data structures

Erwin Walraven edited this page Nov 20, 2018 · 8 revisions

The java source code of the toolbox provides several classes to model MDPs and POMDPs. In both cases it is possible to define additional cost functions which can be used in constraints. The figure illustrates the inheritance structure of the classes, which we further discuss below.

Inheritance structure of classes

MDP

The MDP class represents a basic finite-horizon Markov Decision Process.

Attributes

  • nStates: number of states
  • nActions: number of actions
  • initialState: initial state
  • nDecisions: number of decisions, defining the horizon
  • feasibleActions: feasible[t][s] is an array containing the actions that are feasible to execute in state s at time t
  • rewardsDefined: true if rewards have been set, false otherwise
  • hasTimeDependentReward: true if rewards are time dependent, false otherwise
  • rewardFunction: rewardFunction[s][a] is the reward collected when executing action a in state s. This function is only defined if hasTimeDependentReward equals false.
  • timeRewardFunction: timeRewardFunction[t][s][a] is the reward collected when executing action a in state s at time t. This function is only defined if hasTimeDependentReward equals true.
  • minReward: minimum reward in the model, computed upon setting reward function
  • maxReward: maximum reward in the model, computed upon setting reward function

Initialization

  1. Create MDP object using the constructor MDP(int nStates, int nActions, int initialState, int nDecisions)
  2. Set a reward function using the method setRewardFunction
  3. Set a transition function using the method setTransitionFunction
  4. Set feasible actions using the method setFeasibleActions (optional)

POMDP

CMDP

CPOMDP

Clone this wiki locally