Evaluation

The toolbox provides two simulators which can be used to simulate agent policies in an environment. A simulator takes an instance and a solution as input arguments, and the run() method can be invoked to run the desired number of simulation runs. During these runs the simulator keeps track of statistics such as the mean reward and the consumption of resources. More details about the simulators can be found below.

Simulator structure

MDP simulator

Class: evaluation.CMDPSimulator

Input: CMDPInstance and MDPSolution

Available statistics:

Mean total reward of the agents, getMeanReward()
Mean total consumption of resource k, measured over all time steps, getMeanTotalCost(k)
Mean instantaneous consumption of resource k at time t, getMeanInstantaneousCost(k,t)
Empirical estimate of probability that limit of resource k is exceeded, getViolationProbabilityEstimateTotal(k). This statistic is only provided for problems with budget constraints.
Empirical estimate of probability that limit of resource k is exceeded at time t, getViolationProbabilityEstimateInstantaneous(k,t). This statistic is only provided for problems with instantaneous constraints.

POMDP simulator

Class: evaluation.CPOMDPSimulator

Input: CPOMDPInstance and POMDPSolution

Available statistics:

Mean total reward of the agents, getMeanReward()
Mean total consumption of resource k, measured over all time steps, getMeanTotalCost(k)
Mean instantaneous consumption of resource k at time t, getMeanInstantaneousCost(k,t)
Empirical estimate of probability that limit of resource k is exceeded, getViolationProbabilityEstimateTotal(k). This statistic is only provided for problems with budget constraints.

The ConstrainedPlanningToolbox has been developed by the Algorithmics group at Delft University of Technology, The Netherlands. Please visit our website for more information.