Domains

The toolbox has several built-in problem domains which can be obtained using a CMDPInstanceGenerator or CPOMDPInstanceGenerator in the domains package. Each domain provides a function getInstance(int numAgents, int numDecisions) which returns a CMDPInstance or CPOMDPInstance with the given number of agents and decisions. This instance also defines the resource limits which impose constraints on the behavior of the agents.

Problem instances

describe notion of problem instance in the code

Domain descriptions

Several domains have been integrated in the toolbox already. A brief description of the domains is provided below, including references to the literature which either uses or describes the domain.

Online advertising

Online advertising involves presenting advertisements to users that browse the internet in such a way that they become interested in, e.g., buying a product in a webshop. If there is only a limited amount of money available for advertising, then it is required to decide how this budget is spent in order to maximize revenue. Each user browsing on the internet is modeled as a Markov Decision Process in which states represent the level of interest of the user and actions represent the advertisements that can be shown to the user. Each action has cost associated with it, corresponding to the amount of money that is required to show the advertisement. The global budget imposes a constraint on the advertisements that can be shown to the users.

Model: MDP

Type of constraints: budget

Literature: Boutilier, C., & Lu, T. (2016). Budget Allocation using Weakly Coupled, Constrained Markov Decision Processes. In Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (pp. 823–830).

De Nijs, F., Walraven, E., De Weerdt, M. M., & Spaan, M. T. J. (2017). Bounding the Probability of Resource Constraint Violations in Multi-Agent MDPs. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (pp. 3562–3568).

Maze rovers: Maze

In the Maze domain it is required to assign a limited set of tools to Mars rovers. These tools are required to perform research tasks, and the assignment of tools to robots influences the total value of the research tasks performed. The planner needs to decide how the tools are assigned, such that the expected value of the research tasks is maximized.

Model: MDP

Type of constraints: instantaneous

Literature: Wu, J., & Durfee, E. H. (2010). Resource-Driven Mission-Phasing Techniques for Constrained Agents in Stochastic Environments. Journal of Artificial Intelligence Research, 38, 415–473.

De Nijs, F., Walraven, E., De Weerdt, M. M., & Spaan, M. T. J. (2017). Bounding the Probability of Resource Constraint Violations in Multi-Agent MDPs. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (pp. 3562–3568).

Thermostatically Controlled Loads (TCL)

A Thermostatically Controlled Load (TCL) is a load in a power grid that is controlled autonomously using a thermostat. For example, heating systems in a house are controlled by a thermostat in order to ensure that the room temperature is close to a given setpoint. Since the temperature in a room decreases over time if it is cold outside, the thermostat needs to activate the heating multiple times a day to ensure that the temperature remains close to the setpoint. If there are multiple houses connected to the same power line, then the capacity limit of this line imposes constraints on the behavior of the thermostats. For example, activating all heating systems in a street at the same time may lead to more power consumption than the line can accommodate, which should be prevented at all times. In this domain each thermostat is represented by a Markov Decision Process in which the actions correspond to activating and deactivating the heating, and the states represent the room temperature.

Model: MDP

Type of constraints: instantaneous

Literature: De Nijs, F., Spaan, M. T. J., & De Weerdt, M. M. (2015). Best-Response Planning of Thermostatically Controlled Loads under Power Constraints. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (pp. 615–621).

De Nijs, F., Walraven, E., De Weerdt, M. M., & Spaan, M. T. J. (2017). Bounding the Probability of Resource Constraint Violations in Multi-Agent MDPs. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (pp. 3562–3568).

WebAd

The web-ad domain is based on the same principle as the online advertising domain described earlier, but it includes partial observability of the level of interest of the user. As a result, the decision maker needs to maintain a belief regarding the level of interest of the users, rather than observing the level of interest directly. The original POMDP file corresponding to the domain can be found here.

Model: POMDP

Type of constraints: budget

Literature: Walraven, E., & Spaan, M. T. J. (2018). Column Generation Algorithms for Constrained POMDPs. Journal of Artificial Intelligence Research, 62, 489–533.

Condition-based maintenance

todo

Define your own domain

explain how new domain can be used

The ConstrainedPlanningToolbox has been developed by the Algorithmics group at Delft University of Technology, The Netherlands. Please visit our website for more information.