Skip to content

MDP and POMDP data structures

Erwin Walraven edited this page Nov 20, 2018 · 8 revisions

The java source code of the toolbox provides several classes to model MDPs and POMDPs. In both cases it is possible to define additional cost functions which can be used in constraints. The figure illustrates the inheritance structure of the classes, which we further discuss below.

Inheritance structure of classes

MDP

The MDP class represents a basic finite-horizon Markov Decision Process.

Attributes

  • nStates: number of states
  • nActions: number of actions
  • initialState: initial state
  • nDecisions: number of decisions, defining the horizon
  • feasibleActions: feasible[t][s] is an array containing the actions that are feasible to execute in state s at time t
  • rewardsDefined: true if rewards have been set, false otherwise
  • hasTimeDependentReward: true if rewards are time dependent, false otherwise
  • rewardFunction: rewardFunction[s][a] is the reward collected when executing action a in state s. This function is only defined if hasTimeDependentReward equals false.
  • timeRewardFunction: timeRewardFunction[t][s][a] is the reward collected when executing action a in state s at time t. This function is only defined if hasTimeDependentReward equals true.
  • minReward: minimum reward in the model, computed upon setting reward function
  • maxReward: maximum reward in the model, computed upon setting reward function
  • transitionsDefined: true if transitions have been set, false otherwise
  • hasTimeDependentTransitions: true if transitions are time dependent, false otherwise
  • transitionDestinations: transitionDestinations[s][a] is an array containing states reachable when executing action a in state s. This function is only defined if hasTimeDependentTransitions equals false.
  • transitionProbabilities: transitionProbabilities[s][a] is an array containing the transition probabilities for the states reachable when executing action a in state s. These probabilities correspond to the destinations in transitionDestinations[s][a]. This function is only defined if hasTimeDependentTransitions equals false.
  • timeTransitionDestinations: timeTransitionDestinations[t][s][a] is an array containing states reachable when executing action in state s at time t. This function is only defined if hasTimeDependentTransitions equals true.
  • timeTransitionProbabilities: timeTransitionProbabilities[t][s][a] is an array containing transition probabilities for the states reachable when executing action a in states s. These probabilities correspond to the destinations in timeTransitionDestinations[t][s][a]. This function is only defined if hasTimeDependentTransitions equals true.

Initialization

  1. Create MDP object using the constructor MDP(int nStates, int nActions, int initialState, int nDecisions)
  2. Set a reward function using the method setRewardFunction
  3. Set a transition function using the method setTransitionFunction
  4. Set feasible actions using the method setFeasibleActions (optional)

POMDP

The POMDP class represents a basic finite-horizon Partially Observable Markov Decision Process, and it extends the MDP class.

Attributes

  • nObservations: number of observations
  • observationFunction: observationFunction[a][sNext][o] represents the probability to observe o after executing a and transitioning to state sNext.
  • b0: the initial belief, represented by a BeliefPoint object

Initialization

  1. Create POMDP object using the constructor POMDP(int nStates, int nActions, int nObservations, double[][][] observationFunction, BeliefPoint b0, int nDecisions)
  2. Set a reward function using the method setRewardFunction
  3. Set a transition function using the method setTransitionFunction
  4. Set feasible actions using the method setFeasibleActions (optional)

CMDP

The CMDP class represents a basic finite-horizon Markov Decision Process with additional cost functions, and it extends the MDP class.

Attributes

  • costFunctions: a list containing cost functions. Each cost function is defined by an array costFunction such that costFunction[s][a] is the cost for executing action a in state s
  • minCost: array with minimum cost for each cost function, computed upon initialization
  • maxCost: array with maximum cost for each cost function, computed upon initialization

Initialization

  1. Create CMDP object using the constructor CMDP(int nStates, int nActions, List<double[][]> costFunctions, int initialState, int nDecisions)
  2. Set a reward function using the method setRewardFunction
  3. Set a transition function using the method setTransitionFunction
  4. Set feasible actions using the method setFeasibleActions (optional)

CPOMDP

The CPOMDP class represents a basic finite-horizon Partially Observable Markov Decision Process with additional cost functions, and it extends the POMDP class.

Attributes

  • costFunctions: a list containing cost functions. Each cost function is defined by an array costFunction such that costFunction[s][a] is the cost for executing action a in state s
  • minCost: array with minimum cost for each cost function, computed upon initialization
  • maxCost: array with maximum cost for each cost function, computed upon initialization

Initialization

  1. Create CPOMDP object using the constructor CPOMDP(int nStates, int nActions, int nObservations, List<double[][]> costFunctions, double[][][] observationFunction, BeliefPoint b0, int nDecisions)
  2. Set a reward function using the method setRewardFunction
  3. Set a transition function using the method setTransitionFunction
  4. Set feasible actions using the method setFeasibleActions (optional)

Clone this wiki locally