Examples in Python

For users of Python we provide additional source code which makes it possible to solve constrained planning problems by interacting with our Java toolbox. This requires running a java server, and a Python client which sends requests based on a client-server architecture. The construction of planning domains and problem instances can all be done in the Python client. Only the planning step is performed by the Java server.

Starting the server

The Python client and Java server communicate using a client-server architecture. A full description of this architecture is provided here. Starting the server is straightforward: it only requires running the class executables.Server or the Server.jar file that is included in the archive. After starting the server, it waits until a client connects, after which it can process requests.

Solving MDPs with constraints

We illustrate how an MDP planning problem with constraints can be solved in Python. We follow the same example as the corresponding Java example. The Python code can be found in TestCMDP.py. First it is required to connect to the Java server:

ToolboxServer.connect()

Once the connection got established we can obtain a problem instance for the advertising domain with 2 agents and 10 sequential decisions.

num_agents = 2
num_decisions = 10
instance = InstanceGenerator.get_advertising_instance(num_agents, num_decisions)

After generating the instance it is time to solve the instance using one of the algorithms in the toolbox. The code fragment below initializes an algorithm which uses the linear program for CMDPs. It obtains a solution by calling the solve method of the algorithm. After solving the code prints the expected reward of the solution.

expected_reward = ConstrainedMDPFiniteHorizon.solve(instance)
print("Expected reward:", expected_reward)

Finally, the computed solution can be evaluated through simulation. The code fragment below initializes a simulation environment using the CMDPSimulator class. Using this simulator it executes 1000 simulation runs and it prints the mean reward obtained in these simulation runs. In addition to mean reward the simulator also provides methods to obtain the expected resource consumptions (cost) and estimations of the constraint violation probabilities.

sim = CMDPSimulator(instance)
mean_reward = sim.run(1000)
print("Mean reward:", mean_reward)

Once we are ready we don't need the server connection anymore, and therefore we disconnect.

ToolboxServer.disconnect()

Solving POMDPs with constraints

The example code for POMDPs with constraints can be found in TestCPOMDP.py. It follows exactly the same structure as the example for MDPs, and additional explanations are therefore omitted.

ToolboxServer.connect()

num_agents = 2
num_decisions = 10
instance = InstanceGenerator.get_cbm_instance(num_agents, num_decisions)

expected_reward = CGCP.solve(instance)
print("Expected reward:", expected_reward)

sim = CPOMDPSimulator(instance)
mean_reward = sim.run(1000)
print("Mean reward:", mean_reward)

ToolboxServer.disconnect()

Defining new domains and problem instances

New domains and problem instances can be defined by initializing CMDP and CPOMDP objects. These objects follow roughly the same structure as the objects in Java, except that there is no inheritance structure. The Python code corresponding to the Java example is as follows:

test

The ConstrainedPlanningToolbox has been developed by the Algorithmics group at Delft University of Technology, The Netherlands. Please visit our website for more information.