Evaluation

The toolbox provides two simulators which can be used to simulate agent policies in an environment. A simulator takes an instance and a solution as input arguments, and the run() method can be invoked to run the desired number of simulation runs. During these runs the simulator keeps track of statistics such as the mean reward and the consumption of resources. More details about the simulators can be found below.

Simulator structure

MDP simulator

Class: evaluation.CMDPSimulatorFiniteHorizon

Input: CMDPInstance and an array with MDPSolutionFinite objects

Available statistics:

Mean total reward of the agents, getMeanReward()
Mean total consumption of resource k, measured over all time steps, getMeanTotalCost(k)
Mean instantaneous consumption of resource k at time t, getMeanInstantaneousCost(k,t)
Empirical estimate of probability that limit of resource k is exceeded, getViolationProbabilityEstimateTotal(k). This statistic is only provided for problems with budget constraints.
Empirical estimate of probability that limit of resource k is exceeded at time t, getViolationProbabilityEstimateInstantaneous(k,t). This statistic is only provided for problems with instantaneous constraints.

POMDP simulator

Class: evaluation.CPOMDPSimulatorFiniteHorizon

Input: CPOMDPInstance and an array with POMDPSolutionFinite objects

Available statistics:

Mean total reward of the agents, getMeanReward()
Mean total cost of the agents, getMeanCost()
Standard deviation of the total reward of the agents, getStdReward()
Standard deviation of the total cost of the agents, getStdCost()

The ConstrainedPlanningToolbox has been developed by the Algorithmics group at Delft University of Technology, The Netherlands. Please visit our website for more information.