TheWizardsCode · SorraTheOrc · Dec 4, 2025 · Dec 4, 2025 · Dec 4, 2025 · Dec 4, 2025
diff --git a/.github/workflows/ai-tournament.yml b/.github/workflows/ai-tournament.yml
@@ -0,0 +1,68 @@
+name: AI Tournament
+
+on:
+  schedule:
+    # Run nightly at 2:00 AM UTC
+    - cron: '0 2 * * *'
+  workflow_dispatch:
+    inputs:
+      games:
+        description: 'Number of games to run'
+        required: false
+        default: '100'
+      ticks:
+        description: 'Ticks per game'
+        required: false
+        default: '100'
+
+permissions:
+  contents: read
+
+jobs:
+  tournament:
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python 3.12
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e ".[dev]"
+
+      - name: Run AI tournament
+        run: |
+          python scripts/run_ai_tournament.py \
+            --games ${{ github.event.inputs.games || '100' }} \
+            --ticks ${{ github.event.inputs.ticks || '100' }} \
+            --strategies balanced aggressive diplomatic \
+            --seed 42 \
+            --verbose \
+            --output build/tournament-results.json
+
+      - name: Analyze tournament results
+        run: |
+          python scripts/analyze_ai_games.py \
+            --input build/tournament-results.json \
+            --world default \
+            --output build/tournament-analysis.json
+
+      - name: Archive tournament results
+        uses: actions/upload-artifact@v4
+        with:
+          name: tournament-results-${{ github.run_id }}
+          path: |
+            build/tournament-results.json
+            build/tournament-analysis.json
+          retention-days: 90
+
+      - name: Print analysis summary
+        run: |
+          python scripts/analyze_ai_games.py \
+            --input build/tournament-results.json \
+            --world default
diff --git a/.pm/tracker.md b/.pm/tracker.md
@@ -148,7 +148,7 @@
 
 **Recommended Next Tasks:**
 
-1. **9.4.1 - AI Tournaments & Balance Tooling** (Priority: MEDIUM, Effort: High)
+1. **9.4.1 - AI Tournaments & Balance Tooling** (Priority: MEDIUM, Effort: High) - Issue [#49](https://github.com/TheWizardsCode/GEngine/issues/49)
    - Why: Final Phase 9 task; all AI infrastructure complete (observer, actor, hybrid strategy)
    - Owner needed: Gamedev agent
    - Impact: Balance validation and AI testing at scale

diff --git a/README.md b/README.md
@@ -1,10 +1,9 @@
 # GEngine: Echoes of Emergence
 
-A staged simulation project that prototypes the "Echoes of Emergence" CLI + LLM
-experience. The long-term goal is a service-first architecture (simulation
-service, CLI gateway, LLM intent service) designed for Kubernetes. This README
-summarizes the current state of development and the immediate workflows you can
-run locally.
+
+A staged simulation project that prototypes the "Echoes of Emergence" CLI + LLM experience. The long-term goal is a service-first architecture (simulation service, CLI gateway, LLM intent service) designed for Kubernetes. This README summarizes the current state of development and the immediate workflows you can run locally.
+
+**For AI tournament and balance analysis tooling, see [Section 13: AI Tournament & Balance Analysis](docs/gengine/ai_tournament_and_balance_analysis.md).**
 
 ## Current Status (Phases 1–4)
 
@@ -1330,15 +1329,50 @@ service names as hostnames:
 - Gateway → Simulation: `http://simulation:8000`
 - Gateway → LLM: `http://llm:8001`
 
+## AI Tournaments & Balance Tooling
+
+Phase 9 M9.4 provides tournament infrastructure for automated balance testing:
+
+### Running Tournaments
+
+```bash
+# Run 100 games with default strategies
+uv run python scripts/run_ai_tournament.py --games 100 --output build/tournament.json
+
+# Run with specific strategies and more ticks
+uv run python scripts/run_ai_tournament.py \
+    --games 50 --ticks 200 --strategies balanced aggressive diplomatic --verbose
+```
+
+### Analyzing Results
+
+```bash
+# Analyze tournament results
+uv run python scripts/analyze_ai_games.py --input build/tournament.json
+
+# Compare against authored story seeds
+uv run python scripts/analyze_ai_games.py --input build/tournament.json --world default
+```
+
+The analysis identifies:
+- Win rate deltas between strategies
+- Dominant or underperforming strategies
+- Unused story seeds
+- Overpowered actions
+
+### CI Integration
+
+The `.github/workflows/ai-tournament.yml` workflow runs nightly tournaments
+and archives results. Trigger manual runs via the GitHub Actions UI.
+
+See `docs/gengine/how_to_play_echoes.md` Section 13 for the complete balance
+iteration workflow.
+
 ## Next Steps
 
 1. **Phase 8 – Kubernetes Deployment** – create Kubernetes manifests for local
    minikube deployment, enabling multi-container orchestration and service
    discovery. Docker containerization is complete (see Docker section above).
-2. **Phase 9 M9.4 – AI tournaments and balance tooling** – create tournament
-   scripts that run multiple AI strategies in parallel, aggregate comparative
-   reports (win rates, stability curves, story seed coverage), and identify
-   balance outliers. See the Phase 9 section of the implementation plan.
 
 Progress is tracked in the implementation plan document; update this README as
 new phases land (CLI tooling, services, Kubernetes manifests, etc.).
diff --git a/docs/gengine/ai_tournament_and_balance_analysis.md b/docs/gengine/ai_tournament_and_balance_analysis.md
@@ -0,0 +1,54 @@
+# Section 13: AI Tournament & Balance Analysis
+
+**Last Updated:** 2025-12-03
+
+## Overview
+This section describes how to use the AI tournament and balance analysis tooling introduced in Phase 9. These tools help designers and developers run large batches of AI-driven games in parallel, compare strategy performance, and identify balance issues or underutilized content.
+
+## Running AI Tournaments
+
+The tournament script executes multiple games in parallel, each using a configurable AI strategy (BALANCED, AGGRESSIVE, DIPLOMATIC, HYBRID). Telemetry is captured for each game, and results are aggregated into a single JSON file.
+
+**Example:**
+```bash
+uv run python scripts/run_ai_tournament.py --games 100 --output build/tournament.json
+```
+- `--games`: Number of games to run (default: 100)
+- `--output`: Path to save the aggregated results
+- Additional flags allow you to specify strategies, seeds, and world configs.
+
+## Analyzing Tournament Results
+
+After running a tournament, use the analysis script to generate comparative reports. This tool surfaces win rate differences, balance anomalies, and unused story seeds.
+
+**Example:**
+```bash
+uv run python scripts/analyze_ai_games.py build/tournament.json --report build/analysis.txt
+```
+- `--report`: Path to save the analysis output
+
+The report includes:
+- Win rate comparison across strategies
+- Detection of unused story seeds
+- Flagging of balance outliers
+
+## Balance Iteration Workflow
+
+1. Run a tournament with a large number of games and varied strategies.
+2. Analyze the results to identify dominant strategies, underpowered/overpowered actions, and unused content.
+3. Adjust simulation parameters or authored content as needed.
+4. Repeat the process to validate improvements.
+
+## CI Integration
+
+A nightly CI workflow automatically runs tournaments and archives results for ongoing balance review. See `.github/workflows/ai-tournament.yml` for details.
+
+## Usage Tips
+- Use different world configs and seeds to stress-test balance across scenarios.
+- Review the analysis report regularly to guide design iteration.
+- Archived CI artifacts provide a historical record of balance changes.
+
+## See Also
+- [How to Play Echoes](./how_to_play_echoes.md)
+- [Implementation Plan](../simul/emergent_story_game_implementation_plan.md)
+- [README](../../README.md)
diff --git a/docs/gengine/how_to_play_echoes.md b/docs/gengine/how_to_play_echoes.md
@@ -1,9 +1,9 @@
 # How to Play Echoes of Emergence
 
-This guide explains how to run the current Echoes of Emergence prototype,
-interpret its outputs, and iterate on the simulation while new systems are
-under construction. It assumes you have cloned the repository and installed all
-runtime/dev dependencies via `uv sync --group dev`.
+
+This guide explains how to run the current Echoes of Emergence prototype, interpret its outputs, and iterate on the simulation while new systems are under construction. It assumes you have cloned the repository and installed all runtime/dev dependencies via `uv sync --group dev`.
+
+**New!** For large-scale AI playtesting and balance iteration, see [Section 13: AI Tournament & Balance Analysis](./ai_tournament_and_balance_analysis.md).
 
 ## 1. Launching the Shell
 
@@ -951,3 +951,170 @@ Post-mortem summary:
 
 The post-mortem is saved alongside the campaign data for later review. Ended
 campaigns can still be resumed if you want to continue playing.
+
+## 13. AI Tournaments & Balance Tooling
+
+The repository includes AI tournament infrastructure for automated balance
+testing and validation. Tournaments run multiple AI players with different
+strategies in parallel, then aggregate results to identify balance anomalies.
+
+### Running Tournaments
+
+Run a tournament with default settings:
+
+```bash
+uv run python scripts/run_ai_tournament.py \
+    --games 100 --ticks 100 --output build/tournament.json
+```
+
+The tournament script supports several options:
+
+| Flag           | Description                                          |
+| -------------- | ---------------------------------------------------- |
+| `--games/-g`   | Total number of games to run (default: 100)          |
+| `--ticks/-t`   | Ticks per game (default: 100)                        |
+| `--strategies` | Strategies to test (balanced, aggressive, diplomatic)|
+| `--seed`       | Base random seed for deterministic runs (default: 42)|
+| `--workers`    | Max parallel workers (default: auto)                 |
+| `--output/-o`  | Path to write JSON results                           |
+| `--verbose/-v` | Print progress during tournament                     |
+
+Example output shows win rates and stability metrics per strategy:
+
+```
+================================================================================
+AI TOURNAMENT RESULTS
+================================================================================
+
+Games: 100/100 completed (0 failed)
+Total duration: 45.2s
+
+Strategy     Win Rate   Avg Stab   Min Stab   Max Stab   Avg Actions
+--------------------------------------------------------------------------------
+balanced        65.0%      0.720      0.450      1.000          5.2
+aggressive      72.0%      0.680      0.380      1.000          8.1
+diplomatic      58.0%      0.750      0.520      1.000          3.4
+--------------------------------------------------------------------------------
+```
+
+### Analyzing Results
+
+After running a tournament, analyze the results for balance insights:
+
+```bash
+uv run python scripts/analyze_ai_games.py \
+    --input build/tournament.json --world default
+```
+
+The analysis script:
+
+- Compares win rates across strategies
+- Identifies dominant strategies (win rate delta > 15%)
+- Flags unused or underused story seeds
+- Detects overpowered actions
+- Generates actionable recommendations
+
+Example analysis output:
+
+```
+================================================================================
+AI TOURNAMENT ANALYSIS REPORT
+================================================================================
+
+Tournament: 100 games, 100 ticks each
+Strategies: balanced, aggressive, diplomatic
+
+--------------------------------------------------------------------------------
+WIN RATE ANALYSIS
+--------------------------------------------------------------------------------
+Best strategy: aggressive (72.0%)
+Worst strategy: diplomatic (58.0%)
+Win rate delta: 14.0%
+Balance status: ✓ Balanced
+
+--------------------------------------------------------------------------------
+ACTION ANALYSIS
+--------------------------------------------------------------------------------
+Most used: INSPECT (450 times)
+Least used: NEGOTIATE (120 times)
+
+--------------------------------------------------------------------------------
+RECOMMENDATIONS
+--------------------------------------------------------------------------------
+1. No significant balance issues detected - system appears well-tuned
+================================================================================
+```
+
+### Balance Iteration Workflow
+
+When tuning game balance, follow this workflow:
+
+1. **Run baseline tournament**: Capture initial metrics with `--seed 42` for
+   reproducibility.
+
+   ```bash
+   uv run python scripts/run_ai_tournament.py \
+       --games 100 --output build/baseline.json
+   ```
+
+2. **Analyze baseline**: Review strategy balance, action distribution, and seed
+   coverage.
+
+   ```bash
+   uv run python scripts/analyze_ai_games.py \
+       --input build/baseline.json --world default
+   ```
+
+3. **Adjust parameters**: Based on analysis findings, modify config values in
+   `content/config/simulation.yml`:
+
+   - Strategy thresholds affect AI decision-making
+   - Economy settings influence resource pressure
+   - Director pacing controls narrative density
+
+4. **Run comparison tournament**: Use the same seed for deterministic comparison.
+
+   ```bash
+   uv run python scripts/run_ai_tournament.py \
+       --games 100 --output build/tuned.json --seed 42
+   ```
+
+5. **Compare results**: Diff the analysis reports to validate improvements.
+
+   ```bash
+   # Compare win rates between runs
+   python scripts/analyze_ai_games.py --input build/baseline.json --json > /tmp/a.json
+   python scripts/analyze_ai_games.py --input build/tuned.json --json > /tmp/b.json
+   diff /tmp/a.json /tmp/b.json
+   ```
+
+6. **Iterate**: Repeat steps 3-5 until balance metrics fall within acceptable
+   ranges.
+
+### CI Integration
+
+The repository includes a GitHub Actions workflow (`.github/workflows/ai-tournament.yml`)
+that runs nightly tournaments:
+
+- Executes 100 games with all strategies
+- Archives results as artifacts for 90 days
+- Prints analysis summary in the job log
+
+To trigger a manual tournament run, use the GitHub Actions UI and select
+"Run workflow" with optional game/tick counts.
+
+### Interpreting Anomalies
+
+The analysis script flags several types of balance issues:
+
+| Anomaly Type         | Severity | Meaning                                    |
+| -------------------- | -------- | ------------------------------------------ |
+| `dominant_strategy`  | High     | One strategy wins > 20% more than others   |
+| `strategy_imbalance` | Medium   | Win rate delta between 15-20%              |
+| `dominant_action`    | Medium   | One action accounts for > 50% of all uses  |
+| `unused_story_seeds` | High/Low | Story seeds never triggered during games   |
+| `low_seed_coverage`  | Medium   | Less than 50% of seeds were activated      |
+| `low_activity`       | Low      | A strategy averages < 1 action per game    |
+
+Recommendations are generated automatically based on detected anomalies. Use
+them as starting points for parameter tuning rather than prescriptive fixes.