# Chapter 18 - Multiagent Decision Making

*In which we examine what to do when more than one agent inhabits the environment.* - AI: A Modern Approach

## 18.1 Properties of Multiagent Environments
 
- Multiagent systems involve multiple actors making decisions, differing from scenarios where only a single agent is considered.
- The introduction of multiple agents introduces complexity in sensing, planning, and acting processes, representing a more realistic approach to real-world AI applications.
- The nature of multiagent planning problems and the strategies for addressing them vary based on the relationships between the agents within the system.

### 18.1.1 One Decision Maker** 
- The environment may contain multiple actors but only one entity makes decisions, planning actions for others under the **benevolent agent assumption**. Benovolent agents are those that act in the best interest of the other agents, such as a human manager planning the actions of a team of robots.
- Planning for multiple actors requires synchronization of actions, which can involve joint, mutually exclusive, or sequential actions depending on the situation.
- A single decision maker with multiple effectors (e.g., a human who can walk and talk simultaneously) needs to engage in multieffector planning to coordinate concurrent actions and manage interactions between effectors.
- In cases where effectors are separate units (like a fleet of delivery robots), this planning extends to multibody planning, still considered a single-agent problem if sensor data can be pooled to inform a unified plan.
- Decentralized planning problems arise when communication constraints prevent pooling of sensor information, necessitating plans that include provisions for intermittent communication among separate units.

### 18.1.2 Multiple Decision Makers** 
- Multiple actors in the environment can make their own decisions, referred to as counterparts, with each having their own preferences and plans. 
- Two main scenarios exist among multiple decision makers:
- All decision makers share a common goal, similar to employees in a company, where coordination is essential to avoid conflicting actions.
- Decision makers have individual preferences, which can vary from opposing to complex relationships beyond zero-sum scenarios like chess.
- Multiagent systems with diverse preferences require consideration of other agents' preferences and strategies, leading to strategic decision-making akin to game theory.
- Game theory, distinct from decision theory, provides a foundation for understanding strategic interactions among agents, applicable far beyond recreational games to include significant economic and strategic situations. 
- Uses of game theory in AI:
1. Agent design: Analyzing decisions and computing expected utility, assuming rational behavior from other agents.
2. Mechanism design: Defining rules to maximize collective good by ensuring individual utility maximization leads to beneficial outcomes for all.
- Game theory distinguishes between cooperative games (with binding agreements) and non-cooperative games (without binding agreements), each requiring different analytical approaches.
- Multiagent systems often involve a mix of strategies, including centralized planning and autonomous adjustments to dynamic conditions, underpinned by incentive structures to align individual goals with the collective objectives.

### 18.1.3 Multiagent Planning

- Multiagent planning involves defining transition models, correct plans, and efficient planning algorithms for environments with multiple actors, including effectors, bodies, and agents.
- A correct plan achieves the goal when executed by all actors, though in true multiagent settings, agreement on plan execution might not be reached.
- Key challenges in multiagent planning include modeling concurrency (simultaneous execution of plans by different actors) and considering interactions between actions of different agents. 
- Three approaches to model concurrency: 
- **Interleaved Execution** : Assumes actions from different plans are executed in turn while preserving order within each plan, but struggles with scalability and does not represent simultaneous actions well. 
- **True Concurrency** : Maintains a partial order of actions, acknowledging that some actions can happen concurrently, offering a more theoretical than practical solution. 
- **Perfect Synchronization** : Assumes a global clock for simultaneous action execution, simplifying concurrency modeling but not fully capturing real-world complexities.
- Transition models in multiagent settings consider joint actions by all actors, significantly increasing the complexity and branching factor of planning.
- Research focuses on decoupling actors to reduce complexity, with strategies for loosely coupled systems inspired by successful methods in constraint satisfaction problems (CSPs) and planning heuristics.
- A standard approach for loosely coupled problems involves planning as if actors were independent, later adjusting for interactions.
- Action schemas may include concurrent action constraints to specify allowable simultaneous actions, helping to generate multiactor plans that consider the interactions and constraints among actors' actions.
- Efficient multiagent planning can be achieved by adapting single-agent planning algorithms to account for concurrency constraints, with the potential for high efficiency in loosely coupled scenarios.

### 18.1.4 Planning with Multiple Agents: Cooperation and Coordination** 
- In true multiagent settings, each agent independently creates its own plan, even when goals and knowledge are shared, which complicates achieving a unified outcome.
- Multiple valid plans can exist for achieving a goal, but coordination is necessary to ensure all agents execute a compatible portion of a jointly agreed plan to avoid failure. 
- Coordination strategies include: 
- **Adopting conventions** : Predefined constraints on plan selection, such as "stick to your side of the court," to ensure agents inherently select compatible actions. 
- **Social laws** : Widespread conventions that govern agent behavior in a community, akin to driving on a specific side of the road or speaking a common language to facilitate predictable and harmonious interactions. 
- **Communication** : Direct or indirect exchange of information to establish common knowledge of a feasible joint plan, like verbal cues or observable actions indicating a preferred plan. 
- **Plan recognition** : Inferring and aligning with a joint plan based on observable actions of other agents, allowing for dynamic and context-specific coordination without explicit communication.

These mechanisms facilitate cooperation among multiple decision-makers, ensuring that even in the absence of centralized control, agents can work together towards common goals.

## 18.2 Non-Cooperative Game Theory** 
- Introduces the foundational concepts and analytical methods of game theory, focusing on non-cooperative game theory as a critical framework for understanding decision-making processes in environments with multiple autonomous agents.
- Emphasizes the strategic considerations agents must make when their interests do not necessarily align, exploring how agents can make rational decisions in competitive settings.

### 18.2.1 Games with a Single Move: Normal Form Games**  
- **Definition** : Normal form games involve players making decisions simultaneously without knowledge of others' choices. These games are defined by players, actions, and a payoff function detailing utility for each action combination. 
- **Key Components** : 
- **Players/Agents** : Usually two, but can involve more. 
- **Actions** : Choices available to players. They may vary between players. 
- **Payoff Function** : Determines the utility for each player for all possible action combinations. In two-player games, this is represented by a matrix showing the payoffs for both players. 
- **Example** : The two-finger Morra game illustrates how payoffs are determined based on the actions chosen by players simultaneously. 
#### **Strategies** : 
- **Pure Strategy** : A deterministic action choice. 
- **Mixed Strategy** : A probabilistic approach, choosing actions based on a probability distribution. 
- **Strategy Profile** : Assigns a strategy to each player, determining the game's outcome. 
- **Solution Concepts** : 
- **Dominant Strategy** : An action that yields a higher payoff regardless of the other player's action. 
- **Dominant Strategy Equilibrium** : The outcome when all players choose their dominant strategy, offering no incentive to deviate. 
- **Nash Equilibrium** : A strategy profile where no player can unilaterally change their strategy for a higher payoff, assuming others’ strategies remain unchanged. Every game has at least one Nash equilibrium, possibly in mixed strategies. 
- **Prisoner's Dilemma** : Highlights the concept of dominant strategies and equilibrium, where rational decision-making leads to a suboptimal collective outcome. 
- **Coordination and Focal Points** : Address coordination challenges in games with multiple Nash equilibria by identifying "obvious" outcomes for cooperation. 
- **Matching Pennies Example** : Illustrates a game without a Nash equilibrium in pure strategies, highlighting the significance of mixed strategies for achieving equilibrium.

Normal form games encapsulate the strategic interdependence of players' decisions in a simplified model, underlining the importance of considering others' potential actions in determining optimal strategies.

### 18.2.2 Social Welfare** 

Social welfare in game theory takes a broader view than individual players' objectives, focusing on the best overall outcome for society or the collective of players involved. It introduces concepts for evaluating and choosing outcomes that optimize social benefits. 
- **Pareto Optimality** : An outcome is Pareto optimal if no other outcome can make any player better off without making at least one player worse off. Choosing a non-Pareto optimal outcome is considered wasteful since it misses opportunities to improve utility for some without harming others. 
- **Utilitarian Social Welfare** : This approach sums the utilities of all players to assess the aggregate welfare of an outcome. However, it faces criticism for ignoring utility distribution among players and assuming utilities can be compared on a common scale, which is contentious due to the subjective nature of utility. 
- **Egalitarian Social Welfare and Other Measures** : Egalitarian approaches, like the maximin principle, focus on improving the utility of the least well-off to address utility distribution concerns. Other measures, like the Gini coefficient, assess how evenly utility is distributed. These approaches can sometimes sacrifice total welfare for minor distributional improvements and still struggle with the issue of measuring utility on a common scale.

In the context of the prisoner’s dilemma, the dilemma stems from the conflict between pursuing individual rationality, leading to a non-Pareto optimal dominant strategy equilibrium (testify, testify), and achieving the socially optimal outcome (refuse, refuse) which maximizes both utilitarian and egalitarian welfare but is inaccessible through individual rational decision-making.

#### **Computing Equilibria** : 
- **Pure Strategies** : Finding Nash equilibria in pure strategies can involve exhaustive search or iterative methods like myopic best response, which adjusts strategies towards optimal responses until a Nash equilibrium is reached, though convergence is not guaranteed for all games. 
- **Mixed-Strategy Equilibria** : For two-player zero-sum games, the maximin method developed by von Neumann identifies optimal mixed strategies that ensure a player cannot do worse by revealing their strategy. This involves linear programming to handle the complexities of mixed strategies across multiple possible actions. 
- **General Strategy for Non-Zero-Sum Games** : Identifying equilibria involves enumerating possible mixed strategies and checking for equilibrium conditions using linear programming for two-player games or more complex nonlinear solutions for games with three or more players.

The concept of social welfare and the computational approaches to finding game equilibria highlight the complexity of optimizing outcomes not only from the perspective of individual players but also from the broader social or collective standpoint.

### 18.2.3 Repeated Games**  
- **Introduction** : Repeated games, or iterated games, involve players playing multiple rounds of a single-move game (stage game), with strategies that can depend on the entire history of play. 
- **Finite Repeated Games** :
- In games repeated a known finite number of times, backward induction predicts that rational players will revert to playing the Nash equilibrium of the stage game in every round. This is because the last game is effectively a single-shot game, and knowing this influences all previous rounds.
- Example: In a 100-round prisoner's dilemma, rational players will choose to testify against each other in every round, leading to a suboptimal outcome for both. 
- **Infinite Repeated Games** :
- Indefinite or infinitely repeated games do not allow for backward induction, as there is no final round from which to induct.
- Strategies are often represented as finite state machines (FSMs) due to the impracticality of storing infinite histories. 
- **FSM Strategies** : Strategies like Tit-for-Tat, GRIM, HAWK, and DOVE illustrate different approaches to iterated play, with Tit-for-Tat mimicking the opponent's last move and GRIM switching to a permanent retaliate mode upon a single defection. 
- **Utility in Infinite Games** : The limit of means approach is used to calculate the utility over an infinite sequence, focusing on the average utility over time. 
- **Nash Equilibria and Folk Theorems** :
- In infinitely repeated games, strategies that are not equilibria in the single-shot game can become equilibria.
- The Nash folk theorems suggest that any outcome giving players at least their security level can be sustained as a Nash equilibrium, provided players use strategies like GRIM to enforce cooperation.
- The presence of mutual punishment mechanisms allows for the maintenance of cooperative outcomes that would not be possible in single-shot or finitely repeated versions of the game. 
- **Implications** : Infinitely repeated games show how changing the game's temporal dynamics can significantly alter strategic outcomes, allowing for cooperation in scenarios where single-shot games predict defection. This highlights the importance of expectations about future interactions in determining current behavior.

### 18.2.4 Sequential Games: The Extensive Form**  
- **Definition and Structure** : Sequential games, represented by game trees (extensive form), include multiple turns with potentially different actions. These trees capture essential information: initial state, who's playing, possible actions, state transitions, and payoffs. Stochastic elements are introduced through a special player, Chance, with predetermined probabilities for its actions. 
- **Perfect Information Assumption** : Initially, it's assumed players have perfect information, meaning they know their exact position in the game tree without any uncertainty about past game events. This applies to games like chess but not to games like poker. 
- **Strategy in Extensive-Form Games** : A strategy dictates a player's action at every decision point. A complete strategy profile leads to a path in the game tree from start to finish, determining the game's outcome. 
- **Analysis Tools** : Nash equilibria concepts apply to extensive-form games. Backward induction, akin to dynamic programming, is used for computing Nash equilibrium strategies efficiently, assuming perfect information. 
- **Subgame Perfect Nash Equilibrium** : A refinement to handle credibility of threats, requiring strategies to form Nash equilibria within all subgames of the main game, ensuring rationality at every decision point. 
- **Handling Stochastic and Simultaneous Moves** : Games with random elements involve Chance as a player. Simultaneous moves can be represented by ordering players arbitrarily but hiding earlier players’ actions from later ones. 
- **Imperfect Information** : Extensive form can also model games where players don’t have complete knowledge about the game state, using information sets to group states indistinguishable to a player. However, solving games with imperfect information is more complex than those with perfect information. 
- **Computational Considerations** : While backward induction and Nash equilibrium computation are polynomial in the size of the game tree for perfect information games, real-world applications often face challenges due to exponentially large trees. Sequential games with imperfect information introduce additional complexities that make direct computation impractical for large game trees. 
- **Abstraction and Solution Methods** : For complex games, abstraction (simplifying the game to a manageable size) and sequence form representation (linear in the tree size) are techniques used to make solution computation feasible. Advanced poker programs, for instance, use these methods along with others like Monte Carlo tree search and abstraction to compete at high levels. 
- **Limitations of Extensive Form** : While versatile, the extensive form struggles with continuous states/actions and assumes the game structure is known. Discovering actions, strategies, and adapting to opponents’ rationality or irrationality pose additional challenges not fully addressed by traditional game theory.

### 18.2.5 Uncertain Payoffs and Assistance Games**  
- **Uncertainty and AI** : This section explores the design of AI systems that can operate under uncertainty about the true human objective. It builds on concepts introduced earlier, such as handling uncertain preferences through latent variables and sensor models, illustrated with the example of durian-flavored ice cream. 
- **The Off-Switch Problem** : It revisits the scenario where a robot, uncertain about human preferences, would allow itself to be switched off, showing deference to human decision-making. 
- **Assistance Games** : The concept is expanded into assistance games, a two-person game model involving a human (Harriet) and a robot (Robbie). Harriet knows her preferences (θ) and acts accordingly, while Robbie has a prior probability over these preferences. The goal for both is to maximize Harriet's payoff, embodying the idea of provably beneficial AI. 
- **Equilibrium Strategies in Assistance Games** : These games can lead to behaviors such as teaching, rewarding, and explaining by Harriet, and asking permission or learning from demonstrations by Robbie, emerging naturally as equilibrium strategies without needing to be explicitly scripted. 
- **Paperclip Game Example** : This simple game illustrates how Harriet signals her preferences to Robbie, who then interprets these signals to understand her preferences better. The game setup involves choices between making paperclips and staples, with the outcome depending on Harriet's signaling and Robbie's interpretation, leading to a Nash equilibrium strategy. 
- **Teaching and Learning Preferences** : Through the equilibrium strategy, Harriet effectively teaches Robbie about her preferences, enabling Robbie to act optimally on her behalf, even without knowing her preferences exactly. This outcome demonstrates the robot's provable benefit to Harriet. 
- **Myopic Best Response and POMDPs** : While the myopic best response strategy works in simple cases like the paperclip game, solving more complex assistance games is reducible to solving a Partially Observable Markov Decision Process (POMDP), which can be challenging but is facilitated by the additional structure of assistance games. 
- **Generalizing Assistance Games** : The section concludes with the potential to extend assistance games to include multiple humans, robots, and various complexities, such as imperfectly rational humans or those unaware of their own preferences. The overarching idea is that the more intelligent the robot, the better the outcome for the human.

## 18.3 Cooperative Game Theory** 

Cooperative game theory addresses scenarios where agents can form binding agreements to cooperate, aiming to achieve greater value together than they could individually. 
- **Cooperative Games with Transferable Utility** : These games are defined in the characteristic function form, focusing on the collective utility a group of agents can generate when they choose to cooperate. The key components are: 
- A set of players N={1,…,n}N = \{1, \ldots, n\}N={1,…,n}. 
- A characteristic function ν\nuν, which assigns a value to every subset of players C⊆NC \subseteq NC⊆N, representing the utility that the group can obtain by cooperating. 
- **Assumptions** : 
- The value of an empty set of players is zero (ν({})=0\nu(\{\}) = 0ν({})=0), indicating no utility is generated without participation. 
- The characteristic function is non-negative (ν(C)≥0\nu(C) \geq 0ν(C)≥0 for all CCC), ensuring that cooperation does not produce negative value. 
- In certain games, it is assumed that individual players cannot achieve any utility on their own (ν({i})=0\nu(\{i\}) = 0ν({i})=0 for all i∈Ni \in Ni∈N), emphasizing the importance of cooperation.

The model abstracts away from specific actions that agents might take and does not dictate how the generated value should be distributed among them. The focus is on the potential utility from cooperation, leaving the division of this utility as a matter to be resolved within the cooperative framework.

### 18.3.1 Coalition Structures and Outcomes**  
- **Coalitions** : Any subset of players is termed a coalition. The entire set of players, NNN, is called the grand coalition. Players choose to join exactly one coalition, which might consist solely of themselves. 
- **Coalition Structure** : This is a partition of the player set NNN into coalitions where: 
- Each coalition CiC_iCi​ is non-empty. 
- Each coalition CiC_iCi​ is a subset of NNN. 
- Any two coalitions CiC_iCi​ and CjC_jCj​ are disjoint (Ci∩Cj={}C_i \cap C_j = \{\}Ci​∩Cj​={} for all i≠ji \neq ji=j). 
- The union of all coalitions equals NNN. 
- **Possible Coalition Structures** : With a player set N={1,2,3}N = \{1, 2, 3\}N={1,2,3}, there are seven possible coalitions and five potential coalition structures, ranging from each player acting individually to all players forming a grand coalition. 
- **Outcome of a Game** : Determined by the coalition choices and the division of the utility (ν(C)\nu(C)ν(C)) that each coalition receives. An outcome includes a coalition structure and a payoff vector x=(x1,…,xn)x = (x_1, \ldots, x_n)x=(x1​,…,xn​), where xix_ixi​ is the payoff to player iii, with the constraint that the total value ν(C)\nu(C)ν(C) is distributed among the members of each coalition. 
- **Example** : In a game with N={1,2,3}N = \{1, 2, 3\}N={1,2,3} and ν\nuν such that ν({1})=4\nu(\{1\}) = 4ν({1})=4 and ν({2,3})=10\nu(\{2, 3\}) = 10ν({2,3})=10, a possible outcome is one where player 1 operates alone with a value of 4, while players 2 and 3 form a coalition and evenly split a value of 10. 
- **Superadditivity** : This property signifies that the combined utility of merging any two coalitions, CCC and DDD, is at least as great as their individual utilities added together (ν(C∪D)≥ν(C)+ν(D)\nu(C \cup D) \geq \nu(C) + \nu(D)ν(C∪D)≥ν(C)+ν(D)). In superadditive games, the grand coalition theoretically maximizes the total value. However, forming the grand coalition is not guaranteed due to strategic considerations, similar to how individuals may not achieve a Pareto-optimal outcome in scenarios like the prisoner’s dilemma.


### 18.3.2 Strategy in Cooperative Games** 

Cooperative game theory focuses on how players strategically choose with whom to form coalitions, aiming to align with those who maximize the overall value of the coalition. Key concepts and definitions relevant to understanding strategy in cooperative games include: 
- **Imputation** : A payoff vector distributing the total value of the grand coalition while ensuring individual rationality, meaning each player receives at least as much as they would on their own. 
- **Core** : The set of imputations where no subset of players (coalition) could gain more by breaking away from the grand coalition. The core represents stable payoff distributions where no group has an incentive to deviate. 
- **Computational Aspects of the Core** : Determining the core involves solving a system of linear inequalities. Although this can theoretically be done using linear programming, the exponential number of potential coalitions makes the task computationally challenging, with co-NP-complete complexity for many game classes. 
- **Superadditivity and the Core** : Superadditive games, where the value of a united coalition exceeds the sum of its parts, do not always result in the formation of a grand coalition due to strategic and fairness considerations. 
- **Shapley Value** : A proposed fair distribution scheme for the value generated by the grand coalition, based on players' marginal contributions across all possible coalition formations. The Shapley value is unique in satisfying a set of fairness axioms (Efficiency, Dummy Player, Symmetry, Additivity), making it a standard for equitable payoff distribution in cooperative games.

The Shapley value stands out for its focus on fairness, rewarding players based on their contributions to the collective success, rather than arbitrary characteristics. This method accounts for all permutations of player orderings, ensuring each player's payoff reflects their average marginal contribution to the coalition's value, thus embodying a principle of contribution-based fairness.


### 18.3.3 Computation in Cooperative Games** 

The theoretical framework for cooperative games is well-established, focusing on equitable distributions of collective gains (e.g., through the core and the Shapley value). However, practical computation and representation of these games pose significant challenges: 
- **Characteristic Function Representation** : Ideally, the characteristic function ν(C)\nu(C)ν(C) for all possible coalitions CCC would be listed in a table. However, this approach becomes impractical for a large number of players (nnn) due to the exponential growth of possible coalitions (2n2^n2n). 
- **Compact Representation** : To manage computational complexity, researchers have developed methods for compactly representing cooperative games. These methods vary in their completeness and compactness: 
- **Complete Representation Schemes** : Capable of representing any cooperative game. The downside is that not all games can be compactly represented within these schemes. 
- **Guaranteed Compact Representation** : These schemes ensure compact representation but are not complete, meaning they cannot represent every possible cooperative game.

The choice between complete and compact representation schemes involves a trade-off between the universality of the representation and the practicality of managing computational resources.

#### Marginal Contribution Nets (MC-nets)

Marginal Contribution Nets (MC-nets) provide a way to represent the characteristic function of cooperative games in a structured format. Here's a summary of the key points: 
- **Basic Concept** : MC-nets represent a game's characteristic function through a series of rules. Each rule is a pair (Ci,xi)(C_i, x_i)(Ci​,xi​) where CiC_iCi​ is a coalition of players within the set NNN, and xix_ixi​ is a value associated with that coalition. The value of any coalition CCC is calculated by summing the values xix_ixi​ for all rules where CiC_iCi​ is a subset of CCC. 
- **Example of MC-nets** : Given a rule set R={(1,2,5),(2,2),(3,4)}R = \{({1,2},5), ({2},2), ({3},4)\}R={(1,2,5),(2,2),(3,4)}, the value for different coalitions would be: 
- ν({1})=0\nu(\{1\}) = 0ν({1})=0 (no rules apply), 
- ν({3})=4\nu(\{3\}) = 4ν({3})=4 (third rule applies), 
- ν({1,3})=4\nu(\{1,3\}) = 4ν({1,3})=4 (third rule applies), 
- ν({2,3})=6\nu(\{2,3\}) = 6ν({2,3})=6 (second and third rules apply), 
- ν({1,2,3})=11\nu(\{1,2,3\}) = 11ν({1,2,3})=11 (all rules apply). 
- **Computation of the Shapley Value** : The Shapley value for a player in the game represented by MC-nets can be computed efficiently by considering each rule as defining a symmetric game among the involved players. The Shapley value of a player iii from a rule set RRR is the sum of the player's share in each rule they are part of, given by x/∣C∣x / |C|x/∣C∣ if iii is in CCC, and 000 otherwise. 
- **Completeness and Limitations** : The simplified version of MC-nets presented here is not complete; it cannot represent the characteristic function of every possible game. However, a more sophisticated version that uses propositional logic formulas to define conditions for coalition formation is complete. In this advanced scheme, a rule's condition is satisfied if the coalition matches a satisfying assignment for the propositional logic formula, allowing for a complete representation of any game. This version also supports polynomial-time computation of the Shapley value, though it involves more complex mechanisms than the simplified version.

In summary, MC-nets offer a structured and efficient way to represent and analyze cooperative games, especially for calculating the Shapley value, though the most straightforward version has limitations in terms of completeness.

#### Coalition Structures for maximum social welfare

When examining cooperative games from the perspective of maximizing social welfare, the focus shifts from individual strategic considerations to optimizing the overall productivity or value generated by all players working together in teams or coalitions. This approach, aiming to maximize social welfare, seeks to find the coalition structure that yields the highest total value across all possible groupings of players. The social welfare of a coalition structure is the sum of the values of all individual coalitions within it.

However, identifying the socially optimal coalition structure, denoted as CS∗CS^*CS∗, that maximizes social welfare is computationally challenging. This problem, akin to the set partitioning problem, is known to be NP-hard due to the exponential growth of possible coalition structures with an increasing number of players. Consequently, exhaustive search methods for finding the optimal structure are generally impractical.

A notable strategy to tackle this problem involves searching within a subset of the total coalition structure space, illustrated through the concept of a coalition structure graph. This graph organizes all potential coalition structures based on the number of coalitions they contain, with structures grouped by levels according to their coalition counts. The search for the optimal structure is then conducted within this graph, focusing on a subspace that promises a practical compromise between exhaustive search and optimal outcome identification.

In practice, limiting the search to the bottom two levels of the coalition structure graph, where every possible coalition appears at least once, ensures that the best-found coalition structure within these levels, CS′CS'CS′, provides a social welfare value at least as high as the best individual coalition. This method guarantees that the identified structure’s value is no worse than 1/n1/n1/n of the optimal, with nnn being the number of agents. While not always yielding the absolute best solution, this approach significantly narrows the gap between the practical search outcome and the theoretically optimal coalition structure, often surpassing the lower bound of 1/n1/n1/n of the optimal in real-world applications.

## 18.4 Making Collective Decisions

The section on making collective decisions shifts focus from agent design to mechanism design, which involves crafting the appropriate rules and structure for games that a group of agents will engage in. Mechanism design is concerned with ensuring that when agents follow their individual interests within the framework of a game, the outcomes align with the desired objectives. The key components of a mechanism include: 
1. **Language for Strategies:**  A defined language that specifies all permissible strategies agents can adopt. This aspect ensures that the actions agents can take are clearly outlined and understood. 
2. **The Center:**  A central authority or distinguished agent, known as the center, is responsible for collecting the strategy choices from all participating agents. This role is pivotal for orchestrating the game and can be exemplified by the auctioneer in an auction context, who collects bids from participants. 
3. **Outcome Rule:**  A predefined rule, transparent to all agents, that the center employs to determine the payoffs for each agent based on the strategies they have chosen. This rule is critical for ensuring that the game operates fairly and predictably, guiding agents on how their actions translate into outcomes.

Mechanism design is pivotal in various domains, such as auctions, voting systems, and market design, where the goal is to align individual agent behaviors with overall system objectives, often ensuring efficiency, fairness, or some other desired property. By carefully designing the game's rules and structure, mechanism designers aim to induce outcomes that are optimal or satisfactory from a collective standpoint, even as each agent acts out of self-interest.

### 18.4.1 Allocating tasks with contract net

The Contract Net Protocol is a foundational method in multiagent systems, facilitating task sharing among agents through a structured negotiation process inspired by how companies use contracts. This protocol operates in four main phases: 
1. **Task Announcement** : An agent, recognizing the need for cooperation to accomplish a task beyond its individual capacity or to achieve a more desirable outcome (e.g., efficiency, speed, accuracy), advertises this task to others in the network. The announcement includes details necessary for potential bidders to evaluate their ability and willingness to take on the task, such as task requirements, deadlines, and quality standards. 
2. **Bid Submission** : Upon receiving a task announcement, agents assess the task against their capabilities and preferences. If an agent deems itself capable and interested, it submits a bid to the task's manager, outlining its qualifications and the conditions under which it can perform the task. 
3. **Bid Evaluation and Task Awarding** : The manager, who may receive several bids, evaluates these submissions to select the most suitable agent(s) for the task. The chosen agent(s) are then notified of their selection through an award message, making them contractors responsible for completing the task. 
4. **Task Execution** : The awarded contractor(s) proceed to fulfill the task, which might involve generating and advertising new subtasks to other agents, continuing the cycle of cooperation.

Key computational aspects of implementing the Contract Net Protocol include processing task announcements to decide on bidding, evaluating bids to select contractors, and managing the awarded tasks, potentially leading to further task decomposition and delegation. This protocol, notable for its simplicity and versatility, has been widely implemented across various applications, illustrating its effectiveness in facilitating cooperative problem solving among agents.

### 18.4.2 Allocating Scarce Resources with Auctions** 

Auctions are pivotal in multiagent systems for allocating scarce resources, effectively determining how these resources are distributed among interested parties. These resources are typically limited, making their allocation a critical task. The auction mechanism serves as a structured method for this, involving bidders who have their own valuation (vi) for the item up for auction.

The nature of these valuations can vary: they might be private, differing significantly between bidders based on personal value, or common, where the value is the same but the uncertainty lies in the estimation of this value due to varying information held by bidders. The auction progresses with bidders placing their bids (bi), and the item is awarded to the highest bid, though the payment might not necessarily match the highest bid, which is an element of the auction's design.

The ascending-bid (or English) auction is a popular format, starting at a minimum bid and increasing incrementally until no further bids are made, with the item going to the last bidder at their bid price. Auctions aim for efficiency (allocating goods to the highest valuer) and maximum revenue for the seller, though these goals can be compromised by factors like overly high or low reserve prices, or collusion among bidders to manipulate prices.

In 1999, a German auction for cellphone spectrum highlighted the issue of collusion, where two bidders tacitly agreed to keep bids low, showing how auction rules can be exploited to limit competition. Altering the auction's mechanism, such as changing the reserve price, adopting a sealed-bid auction, or inviting more bidders, can prevent such outcomes and encourage fairer, more competitive bidding.

Auction mechanisms that simplify participation and encourage truthful bidding (where bidders reveal their true value vi) are preferred for their straightforwardness and efficiency. The ascending-bid auction is appreciated for its simplicity and ability to ensure that the item goes to the bidder who values it most, though it's not entirely truth-revealing and may discourage competition in certain scenarios. Sealed-bid auctions and the Vickrey auction (where the winner pays the second-highest bid) offer alternatives with different strategic implications and levels of bidder engagement.

Auctions not only facilitate resource allocation but also play a role in cooperative decision-making among agents, demonstrating their versatility and critical role in distributed AI and multiagent systems.


#### Common Goods

Common Goods** 
- The tragedy of the commons is illustrated through a game where countries decide between reducing pollution at a cost or continuing to pollute. The dominant strategy for each is to continue polluting, leading to worse overall utility.
- The situation demonstrates how a common resource, if unchecked, may be exploited to everyone's detriment, resembling the prisoner's dilemma but on a larger scale involving shared resources.
- A solution involves changing the game's mechanism to charge agents for using the common resource, aiming to internalize externalities (unaccounted effects on global utility).
- Correctly setting charges or prices is challenging, but the goal is to create a mechanism where agents, through local decisions, effectively contribute to maximizing global utility. A carbon tax is given as an example of such a mechanism.
- The Vickrey–Clarke–Groves (VCG) mechanism is introduced as a solution that both maximizes global utility and incentivizes agents to reveal their true values, thereby avoiding the need for strategic misrepresentation.
- The VCG mechanism works by having agents report their values, allocating goods to maximize the sum of reported values, and then charging winning agents a tax based on the loss their participation causes to losers.
- This mechanism ensures that winners pay less than their value and losers are as content as possible, given the circumstances. It encourages truthful reporting, as bidding above or below one's true value is irrational.
- The VCG mechanism's generality means it can apply to various scenarios, including complex auctions where the computation of optimal outcomes can be NP-complete. It's noted that, with some exceptions, the VCG mechanism or its variations are essentially the only ones that can achieve these optimal, truth-revealing outcomes.

### 18.4.3 Voting

- Voting procedures are part of social choice theory, used for making decisions in democratic societies through the aggregation of individual preferences into a social preference order or outcome.
- Voters have qualitative preferences over possible outcomes or candidates, and the goal is to derive a collective preference or select a set of winners that reflect these individual preferences.
- The process faces challenges such as Condorcet's Paradox, where no candidate is preferred by a majority over every other candidate, leading to a situation where any chosen outcome can be contested by a majority preferring a different outcome. 
- Desirable properties for social welfare functions include: 
- **Pareto Condition** : If all voters prefer one outcome over another, the social preference should reflect this. 
- **Condorcet Winner Condition** : A candidate preferred by a majority over all others should be the winner. 
- **Independence of Irrelevant Alternatives (IIA)** : The social preference between two outcomes should not change if voters' preferences between them remain the same, even if other preferences change. 
- **No Dictatorships** : The collective decision should not replicate the preference of a single voter, disregarding others.
- Arrow’s theorem demonstrates that it is impossible to design a social choice mechanism that satisfies all these conditions for three or more outcomes, highlighting inherent limitations in achieving an ideal democratic decision-making process. 
- Various voting procedures are used to approach these challenges: 
- **Simple Majority Vote** : Effective with two candidates, where the one with the most votes wins. 
- **Plurality Voting** : Common but criticized method where the candidate with the most top-choice votes wins, without requiring a majority. 
- **Borda Count** : Considers voters' full preference orders, assigning scores to candidates based on their ranks across all ballots. 
- **Approval Voting** : Voters select a subset of acceptable candidates, with winners being those with the most approvals. 
- **Instant Runoff Voting** : Eliminates the candidate with the fewest first-place votes repeatedly until one gains a majority of the top-choice votes. 
- **True Majority Rule** : Decides winners based on pairwise comparisons across all candidates, potentially leading to indecision in scenarios like the Condorcet paradox.
- Each voting method has its strengths and weaknesses, reflecting the complex trade-offs involved in democratic decision-making.

#### Strategic Manipulation

**Strategic manipulation** 
- The Gibbard–Satterthwaite Theorem is a pivotal result in social choice theory, addressing the potential for voters to gain from misrepresenting their preferences in a voting process.
- A social choice function, which takes voters' preference orders to output winning candidates, doesn't inherently compel truthful reporting of preferences by voters. Voters can strategically declare preferences to potentially increase the utility of the outcome for themselves.
- An example of strategic voting is in plurality systems, where voters may choose their second preference if their first choice seems unlikely to win, aiming to maximize their expected utility by considering the likely preferences of other voters.
- The theorem posits a challenge: designing a voting mechanism that is immune to strategic manipulation while still being democratic (satisfying the Pareto condition) and applicable to more than two outcomes is impossible. Any such mechanism would either be susceptible to manipulation or would effectively be a dictatorship, where the outcome hinges on a single voter's preferences.
- Despite the theoretical possibility of manipulation under any reasonable voting system, the theorem does not specify how manipulation could be carried out nor does it guarantee that manipulation will occur in practice. It highlights a fundamental dilemma in creating a perfect, manipulation-proof voting system.

### 18.4.4. Bargaining

**Bargaining** 
- Bargaining, also known as negotiation, is a mechanism commonly employed in both everyday situations and in the field of game theory, where it has been a subject of study since the 1950s. Recently, it has also been explored in the context of automated agents.
- It is utilized when agents must come to an agreement on issues of mutual interest. These agents engage by making offers, which can also be referred to as proposals or deals, to one another following certain protocols.
- During the bargaining process, the involved parties have the option to either accept or reject the offers made by the other parties.

### 18.4.4 Bargaining with the Alternating Offers Protocol** 
- The alternating offers bargaining model is a key protocol for negotiation, assuming two agents for simplicity. Negotiation unfolds over a series of rounds, starting with Agent 1 (A1) making an offer. If Agent 2 (A2) accepts this offer, the deal is implemented; if rejected, the process moves to the next round where roles are reversed, and so forth. A never-ending negotiation results in a conflict deal, but it's assumed both agents prefer reaching any agreement over this endless conflict.
- To illustrate, consider dividing a "pie" (a resource valued at 1) between A1 and A2. An offer is represented as a pair (x, 1 − x), where x is the portion for A1, and 1 − x is for A2. The negotiation set comprises all possible divisions of the pie: {(x, 1 − x) : 0 ≤ x ≤ 1}. 
- **One-round scenario (Ultimatum Game):**  In a single round, A1 has all the power, proposing a division of the pie. If A2 rejects, the conflict deal is enforced, which is less desirable than accepting any portion of the pie. Thus, A1 proposing to take all the pie and A2 accepting constitutes a Nash equilibrium. 
- **Two-round scenario:**  Power shifts to A2, who can reject A1’s initial offer, making the game a one-round negotiation with A2 as the proposer, thereby securing all the pie for themselves. In a fixed number of rounds, the final mover always gains all advantages. 
- **Unlimited rounds:**  With no round limit, if A1 always proposes (1, 0) and rejects counteroffers, A2’s best response is to accept A1’s first proposal to avoid endless negotiation. Similarly, if A1 proposes any (x, 1 − x) deal, there's a Nash equilibrium where the outcome is agreement on the first offer. This framework shows how negotiation dynamics shift based on the number of rounds and highlights strategic considerations in proposing and accepting offers.

#### Impatient Agents

Impatient Agents**  
- The introduction of impatience through a discount factor γi for each agent modifies the bargaining dynamics by considering time as a crucial factor. Agents prefer receiving an outcome sooner rather than later, making them impatient. The discount factor γi (where 0 ≤ γi < 1) quantifies this impatience: the value of a slice of the pie x at time t is γt
i x, with a higher γi indicating more patience. 
- **One-round scenario:**  Remains an ultimatum game, where the first mover proposes a division, and due to impatience, the second mover is likely to accept any positive share of the pie immediately rather than waiting. 
- **Two-round scenario:**  The value of the pie diminishes over time for the agents due to the discount factors γi. If A2 rejects A1’s offer in the first round, the value of receiving the entire pie in the second round decreases to γ2 for A2. A1 can exploit A2’s impatience by offering (1−γ2, γ2), which A2 is incentivized to accept as it maximizes A2's outcome given their impatience. This strategy pair forms a Nash equilibrium. 
- **General case with unlimited rounds:**  The optimal strategy for A1 considers the discount factors to propose an offer that A2 will accept immediately, maximizing A2's discounted outcome. The formula for the division of the pie becomes 1−γ2 / 1−γ1γ2 for A1, with A2 receiving the remainder. This model shows that patience (a higher discount factor) allows a player to secure a larger portion of the pie, illustrating that patience is a virtue in bargaining scenarios.

#### Negotiation in Task-Oriented Domains**  
- **Context:**  Negotiation in task-oriented domains involves agents agreeing on the distribution of tasks to maximize efficiency and minimize costs. This is common in scenarios where tasks have different requirements, such as needing specific machinery or setups, leading to significant setup costs for each agent. An example of negotiation could be agents exchanging tasks based on the machinery they're already set up to use to reduce overall setup costs. 
- **Initial Allocation:**  Initially, tasks are distributed among agents. If negotiation fails, agents simply proceed with their originally assigned tasks, denoted as T0iT0_iT0i​ for each agent iii. 
- **Simple Two-Agent Model:**  For simplicity, this discussion considers two agents. TTT represents the total set of tasks, and (T01,T02)(T0_1, T0_2)(T01​,T02​) the initial task allocation. Each task must be assigned to exactly one agent. 
- **Cost Function:**  There's a cost function ccc, assigning a positive real cost to carrying out a set of tasks T′T'T′, independent of which agent performs them. The function is monotonic, meaning adding tasks doesn't decrease the cost, and doing no tasks has zero cost. For example, setting up a milling machine might cost 10 units, with each task adding 1 unit to the total cost. 
- **Offers and Utility:**  An offer is a proposal for a reassignment of tasks where T1T1T1 and T2T2T2 are the new sets of tasks for agents 1 and 2, respectively. The utility for an agent iii from an offer (T1,T2)(T1, T2)(T1,T2) is the cost difference between performing the new set of tasks and the originally assigned set, mathematically represented as Ui((T1,T2))=c(Ti)−c(T0i)U_i((T1, T2)) = c(T_i) - c(T0_i)Ui​((T1,T2))=c(Ti​)−c(T0i​). 
- **Rationality and Pareto Optimality:**  An offer is considered individually rational if it is at least as good for both agents compared to performing their originally assigned tasks. The negotiation set includes offers that are both individually rational and Pareto optimal, meaning no other offer could improve one agent's utility without reducing the other's. Offers that fail these criteria are either irrational (would be refused) or suboptimal (there exists a better alternative).

This framework provides a structured approach to negotiation in task-oriented domains, focusing on rational decision-making and efficiency in task distribution among agents.

#### The Monotonic Concession Protocol** 
- The protocol is used in task-oriented domains where agents negotiate who will perform which tasks.
- Negotiation involves multiple rounds with both agents simultaneously proposing deals in the first round.
- A deal is a proposal on task distribution between the two agents.
- An agreement is reached if one agent's proposal is as good as or better than the other's by their own evaluation.
- If an agreement is reached, the deal is either selected randomly if both proposals are equally acceptable, or the superior proposal is chosen.
- If no agreement is reached, the next round involves either repeating the previous proposal or making a concession to propose a deal more favorable to the other agent.
- The negotiation ends if neither agent makes a concession, resulting in both agents sticking to their original tasks (conflict deal).
- While the protocol ensures that negotiation will eventually end, there's no guarantee of a quick resolution due to the potentially large number of possible deals.

#### The Zeuthen Strategy** 
- The Zeuthen strategy guides agents on how to behave during negotiations using the monotonic concession protocol in task-oriented domains.
- This strategy assesses an agent's willingness to risk conflict based on the potential utility loss from conceding versus causing a conflict.
- An agent's risk of conflict is calculated as the ratio of utility loss from conceding to the utility loss from not conceding and facing conflict.
- Initially, each agent proposes a deal maximizing its utility. In subsequent rounds, the agent with the lower risk of conflict (having more to lose from a conflict) should concede.
- The extent of concession is just enough to make the other agent more likely to concede in the next round, thereby shifting the balance of risk.
- If both agents have equal risk, a random decision (like flipping a coin) determines who concedes to prevent both from conceding simultaneously.
- Agreements resulting from the Zeuthen strategy are both Pareto optimal (no one can be better off without making someone else worse off) and individually rational (better than or equal to the initial allocation).
- Implementing the Zeuthen strategy efficiently can be computationally intensive, as it may require evaluating an exponential number of possible deals.
- The strategy, including the coin-flip rule for equal risk scenarios, forms a Nash equilibrium, indicating that no agent has an incentive to deviate from this strategy unilaterally.


## Chapter 18 Summary

- **Multiagent Planning** : Necessary for environments with multiple agents to coordinate or compete, requiring joint plans and coordination to execute agreed-upon plans. 
- **Game Theory** : Provides a framework for rational behavior in multiagent interactions, akin to decision theory for single-agent decisions. 
- **Solution Concepts in Game Theory** : Aim to characterize rational outcomes where all agents act rationally, guiding decisions in various games. 
- **Non-cooperative Game Theory** : Focuses on independent decision-making by agents, with Nash equilibrium being the key concept where no agent benefits from changing its strategy unilaterally. Techniques exist for repeated and sequential games. 
- **Cooperative Game Theory** : Deals with settings where agents can form binding agreements to cooperate within coalitions, focusing on stability (the core) and fair value distribution (the Shapley value). 
- **Specialized Techniques for Multiagent Decision** : Includes the contract net for task sharing, auctions for resource allocation, bargaining for reaching mutual agreements, and voting procedures for preference aggregation.

## Bibliographical and Historical Notes

- **Early hints of multiagent systems in the 1970s** : Ideas suggesting multiagent systems appeared before the field was formally recognized, notably:
- Marvin Minsky's Society of Mind theory (1986, 2007) proposing human minds as ensembles of agents.
- Doug Lenat's BEINGS framework (1975).
- Carl Hewitt's actor model of computation (1977; Agha, 1986), foundational in concurrent computation. 
- **Formal establishment in the 1980s and 1990s** : The field of multiagent systems became a distinctive subdiscipline of AI, with earlier research focusing on cooperative distributed problem solving, as seen in the Distributed Vehicle Monitoring Testbed (DVMT) under Victor Lesser and Daniel Corkill (1988). 
- **Influence of game theory from the late 1980s** : Recognizing the norm of agents with differing preferences, game theory became the main methodology for studying such agents. 
- **Developments in multiagent planning** :
- Formalization by Konolige (1982) and Pednault (1986).
- Joint intention from communicative acts research (Cohen and Perrault, 1979; Cohen and Levesque, 1990; Cohen et al., 1990).
- Adaptation of partial-order planning by Boutilier and Brafman (2001), and an efficient multiactor planning algorithm by Brafman and Domshlak (2008). 
- **Challenges in adversarial planning** : Highlighted by quotes from Jean-Paul Sartre (1960) and General Dwight D. Eisenhower, emphasizing the complexity added by opposition and the indispensability of planning. 
- **Distributed and multiagent reinforcement learning** : Growing interest in methods for coordinated learning to optimize a common utility function, with foundational work by Guestrin et al. (2002) and Russell and Zimdars (2003). 
- **Historical game theory milestones** :
- Early proposals by Christiaan Huygens and Gottfried Leibniz in the 17th century.
- Formal results by Zermelo (1913) and Emile Borel (1921).
- John von Neumann's foundational contributions (1928) and collaboration with Oskar Morgenstern leading to "Theory of Games and Economic Behavior" (1944).
- John Nash's equilibrium concept (1950), leading to his Nobel Prize in 1994. 
- **Multiagent reinforcement learning** : Distinction between distributed RL and multiagent RL, with the latter dealing with sequential game-theoretic problems or Markov games. Challenges include the nonstationary environment due to policy changes by opponents. 
- **Awards and recognitions** : Nobel Prizes awarded to John Nash (1994), Leonid Hurwicz, Eric S. Maskin, and Roger B. Myerson (2007) for contributions to game theory and mechanism design. 
- **Books and key publications** :
- "Readings in Distributed Artificial Intelligence" (Bond and Gasser, 1988) documents early challenges.
- Significant contributions by von Neumann and Morgenstern, Lloyd Shapley, and various economists and researchers across decades.
- Textbooks from both economics (Myerson, Fudenberg and Tirole, Osborne) and AI perspectives (Nisan et al., Leyton-Brown and Shoham). 
- **Conferences and journals** : The International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), the ACM Conference on Electronic Commerce (EC), and the journal Games and Economic Behavior are key publication venues.