-
Notifications
You must be signed in to change notification settings - Fork 1
Examples in Java
The toolbox comes with example code which illustrates how problem instances can be solved for both MDPs and POMDPs. This page provides a more elaborate description of these examples. In addition to the examples based on built-in problem domains, we also explain how new problem domains can be defined.
The class executables.TestCMDP provides example code for solving an MDP problem instance with constraints. In the first few lines the LP solver and random generator are initialized. The LP solver LPSolve can be used by initializing LPSolverLPSolve rather than LPSolverGurobi.
LPSolver lpSolver = new LPSolverGurobi();
Random rnd = new Random(222);The next step consists of generating a problem instance using one of the instance generators in the domains package. The code fragment below generates a problem instance with 2 agents and 10 sequential decisions based on the advertising domain.
CMDPInstanceGenerator generator = new AdvertisingInstanceGenerator();
int nAgents = 2;
int nDecisions = 10;
CMDPInstance instance = generator.getInstance(nAgents, nDecisions);After generating the instance it is time to solve the instance using one of the algorithms in the toolbox. The code fragment below initializes an algorithm which uses the linear program for CMDPs. After that, it sets the problem instance generated previously by calling the setInstance method. Finally, it obtains a solution by calling the solve method of the algorithm.
CMDPAlgorithmFiniteHorizon alg = new ConstrainedMDPFiniteHorizon(lpSolver, rnd);
try {
alg.setInstance(instance);
} catch (UnsupportedInstanceException e) {
e.printStackTrace();
System.exit(0);
}
MDPSolutionFinite[] solution = alg.solve();After computing a solution the resulting expected reward can be printed as illustrated below.
double expectedReward = 0.0;
for(int i=0; i<solution.length; i++) {
expectedReward += solution[i].getExpectedReward();
}
System.out.println("Expected reward: "+expectedReward);Finally, the computed solution can be evaluated through simulation. The code fragment below initializes a simulation environment using the CMDPSimulatorFiniteHorizon class. Using this simulator it executes 1000000 simulation runs and it prints the mean reward obtained in these simulation runs.
CMDPSimulatorFiniteHorizon sim = new CMDPSimulatorFiniteHorizon(instance, solution, rnd);
sim.run(1000000);
System.out.println("Mean reward: "+sim.getMeanReward());The example code in executables.TestCMDP shows how other problem domains and algorithms can be used.
The ConstrainedPlanningToolbox has been developed by the Algorithmics group at Delft University of Technology, The Netherlands. Please visit our website for more information.