# Chapter 1 - Interdisciplinarity of the Prisoner's Dilemma

This notebook serves as a "grab-bag" of Prisoner's Dilemma (PD) examples for an interdisciplinary first lecture. It combines classic intellectual lineage, matrix examples across disciplines (from lecture slides), and modern applications in AI/ML.

## (A) Classic/Intellectual Lineage

### Core References and Teachable "Anchor" Citations

1.  **Origin + Naming (RAND → Tucker story)**
    *   Historical overview and references to Flood/Dresher and Tucker’s framing are summarized well in the Stanford Encyclopedia of Philosophy entry.
    *   [Stanford Encyclopedia of Philosophy](https://plato.stanford.edu/entries/prisoner-dilemma/)

2.  **Why PD Matters (Iteration & Emergence of Cooperation)**
    *   Axelrod’s tournament story and *The Evolution of Cooperation* remains the cleanest narrative bridge from one-shot defection to repeated-game cooperation norms.
    *   [PhilPapers](https://philpapers.org/)

3.  **Modern Theoretical Shockwave in IPD**
    *   Zero-determinant / extortion strategies (Press & Dyson, PNAS 2012) are a great “students didn’t see that coming” moment.
    *   [PNAS: Iterated Prisoner's Dilemma contains strategies that dominate any evolutionary opponent](https://www.pnas.org/doi/10.1073/pnas.1206569109)

## Matrix Examples: The Canonical PD and Interdisciplinary Variants

A matrix fully describes a two-player game by displaying the payoffs to each player for every possible outcome. The general form of the Prisoner's Dilemma is defined by the inequality: $T > R > P > S$ (Temptation > Reward > Punishment > Sucker's Payoff).

### 1. The Canonical Story (Criminology/Law)
Two thieves are caught red-handed. Police interrogate them separately.
*   If both stay silent (Cooperate), they get 1 year ($R$).
*   If both confess (Defect), they get 2 years ($P$).
*   If one confesses and the other is silent, the confessor goes free ($T$, 0 years) and the silent one gets 3 years ($S$).

| | Silent (Col) | Confess (Col) |
|---|---|---|
| **Silent (Row)** | 1, 1 ($R$) | 3, 0 ($S$, $T$) |
| **Confess (Row)**| 0, 3 ($T$, $S$) | 2, 2 ($P$) |
*(Note: Lower numbers are better here, representing years in prison)*

**Dominant Strategy**: Confess. Regardless of what the other does, confessing yields a better result (0 < 1, 2 < 3).

### 2. Economics: Mobile Device Market Share

**Players**: Apple/Google vs. RIM (BlackBerry)/Microsoft
**Actions**: {Develop New OS, Stick to Current Platform}

Companies can either innovate (expensive) or stick to their current platform. If both innovate, they maintain share but at a cost. If one innovates and the other doesn't, the innovator wins big.

| | New OS (Player 2) | Current Platform (Player 2) |
|---|---|---|
| **New OS (Player 1)** | Competitive Share ($R$) | Win, Lose ($T$, $S$) |
| **Current (Player 1)**| Lose, Win ($S$, $T$) | Competitive Share + Decreasing Satisfaction ($P$) |

*Context*: 1st Motorola mobile 1973 -> RIM dominates mid-2000s -> Google purchases Android 2005 -> Apple iPhone 2007.

### 3. Psychology: The Addict (Intertemporal Decision Problem)

**Players**: Before-self (Player 1) vs. After-self (Player 2)
**Actions**: {Clean, Relapse}

Analysis of self-control as a game between a present and future self.

*   **T (Temptation)**: Enjoying the drug.
*   **S (Sucker)**: Effort to get clean (suffering).
*   **R (Reward)**: Happy to be clean.
*   **P (Punishment)**: Remain an addict.

| | Clean (After-self) | Relapse (After-self) |
|---|---|---|
| **Clean (Before-self)** | $R, R$ (Happy/Clean) | $S, T$ (Effort/Enjoyment) |
| **Relapse (Before-self)**| $T, S$ (Enjoyment/Effort) | $P, P$ (Remain Addict) |

If the before-self chooses to relapse because of temptation ($T$), he is "cheating" on his after-self, who will have to suffer to get clean again ($S$).

### 4. International Relations: The Security Dilemma

**Players**: USA vs. Soviet Union (Cold War)
**Actions**: {Disarm/Safe, Arm/Risk}

Countries build weapons or sign agreements.

*   **T (Win)**: Acquire territory, alliances, hegemony.
*   **R (Safe)**: Peace, less spending on weapons, more on public welfare.
*   **P (Risk)**: Spending on arms, intelligence, constant risk of war.
*   **S (Lose)**: Surrender territory, alliances, hegemony.

| | Disarm (USSR) | Arm (USSR) |
|---|---|---|
| **Disarm (USA)** | Safe, Safe ($R$) | Lose, Win ($S, T$) |
| **Arm (USA)** | Win, Lose ($T, S$) | Risk of War, Risk of War ($P$) |

Rule: $Win > Safe > Risk > Lose$. Result: Arms Race.

### 5. Biology: *Bacillus subtilis* Survival

**Players**: Two Microbes under harsh conditions
**Actions**: {Sporulate, Competent}

Strategies involve dumping DNA (Sporulate) or taking DNA (Competent).

*   **T (Strong)**: Use other's DNA to strengthen self.
*   **R (Survive)**: Both sporulate, enough DNA to survive.
*   **P (Risk Dying)**: Both competent, not enough DNA.
*   **S (Weak)**: Dump DNA, become weak.

| | Sporulate (M2) | Competent (M2) |
|---|---|---|
| **Sporulate (M1)** | Survive, Survive ($R$) | Become Weak, Continue Strong ($S, T$) |
| **Competent (M1)** | Continue Strong, Become Weak ($T, S$) | Risk Dying, Risk Dying ($P$) |

### 6. Electrical Engineering: TCP Backoff Game

**Players**: Two Computers
**Actions**: {TCP Backoff, Defective}
**Payoffs**: Delay time in ms (Lower is better)

*   $T = 0ms$
*   $R = 1ms$
*   $P = 3ms$
*   $S = 4ms$

| | TCP Backoff (Comp 2) | Defective (Comp 2) |
|---|---|---|
| **TCP Backoff (Comp 1)** | 1, 1 ($R$) | 4, 0 ($S, T$) |
| **Defective (Comp 1)** | 0, 4 ($T, S$) | 3, 3 ($P$) |

Computers attempt to reduce delay time. If both back off, delay is minimal. If one defects (doesn't back off), they get 0ms delay at the expense of the other.

## (B) Interdisciplinary "Real-World" Domains

These are broad categories that the specific examples above fit into. You can map each to the PD payoff logic (temptation vs mutual benefit).

**1. International Relations / Security Dilemma**  
Arms races, crisis bargaining, alliance tensions: PD is a standard formalization in IR/security-dilemma discussions. Axelrod’s framing explicitly uses trade barriers/arms-race style logic.
*   [Stanford EE Reference](https://ee.stanford.edu/)

**2. Competition / Cartels / Price Wars (Industrial Organization)**  
Two firms prefer high prices jointly, but each has an incentive to undercut (defect). Press & Dyson explicitly list cartel behavior as a canonical PD-style application area.
*   [PNAS](https://www.pnas.org/)

**3. Public Policy: Climate Agreements**  
Countries benefit from mutual emissions cuts; each is tempted to free-ride. This is a standard classroom mapping motivated via “collective action” logic.
*   [Stanford Encyclopedia of Philosophy](https://plato.stanford.edu/entries/prisoner-dilemma/)

**4. Biology: Reciprocity and Evolution of Cooperation**  
PD/IPD is central to modeling reciprocal altruism and strategy selection pressures; Axelrod’s work is the standard doorway for interdisciplinary cohorts.
*   [PhilPapers](https://philpapers.org/)

**5. Behavioral Econ / Psych: Trust, Fairness, and Deviations**  
Experiments revisiting early PD play and the role of fairness norms make a strong “humans aren’t Nash robots” point.
*   [SAGE Journals](https://journals.sagepub.com/)

## (C) Where PD Shows Up in AI / ML

**1. Multi-Agent Reinforcement Learning (MARL): PD as the “Fruit Fly”**
PD is used to study equilibrium selection, learning-induced non-stationarity, and reward shaping.
*   **Sequential PD + Deep MARL**: Policies that adapt based on inferred opponent cooperativeness. ([arXiv](https://arxiv.org/))
*   **Selective Interaction**: Agents learn *who* to interact with (IJCAI 2024).
*   **Punishment Mechanisms (2025)**: Direct punishment changes emergence of cooperation. ([Springer](https://link.springer.com/))
*   **Reputation-driven Cooperation (AAMAS 2025)**: Bottom-up reputation as an internal reward signal.

**2. "Learning in Games" Theory Meets Modern RL**
*   **Closed-form Characterization (2024)**: When Q-learning-like dynamics cooperate vs defect depends on learning rates/payoffs. ([ScienceDirect](https://www.sciencedirect.com/))

**3. LLMs Playing PD and Strategic Games**
Evaluating LLMs on consistency, susceptibility to prompts, and alignment.
*   **NeurIPS 2024**: Experiments where LLMs exhibit different cooperation tendencies than humans and can be steered by prompting. ([NeurIPS Proceedings](https://proceedings.neurips.cc/))
    *   *Demo Idea*: Have students propose prompts that push a model toward cooperate/defect.

**4. Federated Learning (FL): Free-Riding Dilemmas**
FL has PD-shaped incentive problems: each party wants the global model but is tempted to contribute less data/compute.
*   **Evolutionary-game modeling of FL participation (2023)**. ([ScienceDirect](https://www.sciencedirect.com/))
*   **Incentive-based FL overview (2025)**: Frames issues in "dilemma/free-riding" terms. ([arXiv](https://arxiv.org/))

**5. Cooperative AI / AI Safety**
PD as the "minimal model" of alignment between agents.
*   **Foundations of Cooperative AI (AAAI 2023)**. ([AAAI](https://ojs.aaai.org/))
*   **"Collusion" and emergent cooperation (2025 arXiv)**: Self-play learners converging to collusive equilibria.

**6. "AI Changes the Game"**
*   **Discriminatory and Samaritan AIs (2024)**: How AI behaviors shift cooperation outcomes in populations. ([Royal Society](https://royalsocietypublishing.org/))

## Lecture Structure Suggestion

A high-yield way to structure Lecture 1 using these examples:

1.  **One-shot PD**: Dominant strategy defect → Quick in-class vote.
2.  **Iterated PD**: Why "shadow of the future" changes incentives (Axelrod).
3.  **Mechanisms**: Punishment, reputation, partner choice → Jump to MARL papers that instantiate each.
4.  **Modern AI Angles**: 
    *   LLMs in PD as behavioral subjects.
    *   Federated Learning as a system with PD incentives.
    *   Cooperative AI / Safety lens.