Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions .github/workflows/ai-tournament.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: AI Tournament

on:
schedule:
# Run nightly at 2:00 AM UTC
- cron: '0 2 * * *'
workflow_dispatch:
inputs:
games:
description: 'Number of games to run'
required: false
default: '100'
ticks:
description: 'Ticks per game'
required: false
default: '100'

permissions:
contents: read

jobs:
tournament:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"

- name: Run AI tournament
run: |
python scripts/run_ai_tournament.py \
--games ${{ github.event.inputs.games || '100' }} \
--ticks ${{ github.event.inputs.ticks || '100' }} \
--strategies balanced aggressive diplomatic \
--seed 42 \
--verbose \
--output build/tournament-results.json

- name: Analyze tournament results
run: |
python scripts/analyze_ai_games.py \
--input build/tournament-results.json \
--world default \
--output build/tournament-analysis.json

- name: Archive tournament results
uses: actions/upload-artifact@v4
with:
name: tournament-results-${{ github.run_id }}
path: |
build/tournament-results.json
build/tournament-analysis.json
retention-days: 90

- name: Print analysis summary
run: |
python scripts/analyze_ai_games.py \
--input build/tournament-results.json \
--world default
2 changes: 1 addition & 1 deletion .pm/tracker.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@

**Recommended Next Tasks:**

1. **9.4.1 - AI Tournaments & Balance Tooling** (Priority: MEDIUM, Effort: High)
1. **9.4.1 - AI Tournaments & Balance Tooling** (Priority: MEDIUM, Effort: High) - Issue [#49](https://github.com/TheWizardsCode/GEngine/issues/49)
- Why: Final Phase 9 task; all AI infrastructure complete (observer, actor, hybrid strategy)
- Owner needed: Gamedev agent
- Impact: Balance validation and AI testing at scale
Expand Down
52 changes: 43 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
# GEngine: Echoes of Emergence

A staged simulation project that prototypes the "Echoes of Emergence" CLI + LLM
experience. The long-term goal is a service-first architecture (simulation
service, CLI gateway, LLM intent service) designed for Kubernetes. This README
summarizes the current state of development and the immediate workflows you can
run locally.

A staged simulation project that prototypes the "Echoes of Emergence" CLI + LLM experience. The long-term goal is a service-first architecture (simulation service, CLI gateway, LLM intent service) designed for Kubernetes. This README summarizes the current state of development and the immediate workflows you can run locally.

**For AI tournament and balance analysis tooling, see [Section 13: AI Tournament & Balance Analysis](docs/gengine/ai_tournament_and_balance_analysis.md).**

## Current Status (Phases 1–4)

Expand Down Expand Up @@ -1330,15 +1329,50 @@ service names as hostnames:
- Gateway → Simulation: `http://simulation:8000`
- Gateway → LLM: `http://llm:8001`

## AI Tournaments & Balance Tooling

Phase 9 M9.4 provides tournament infrastructure for automated balance testing:

### Running Tournaments

```bash
# Run 100 games with default strategies
uv run python scripts/run_ai_tournament.py --games 100 --output build/tournament.json

# Run with specific strategies and more ticks
uv run python scripts/run_ai_tournament.py \
--games 50 --ticks 200 --strategies balanced aggressive diplomatic --verbose
```

### Analyzing Results

```bash
# Analyze tournament results
uv run python scripts/analyze_ai_games.py --input build/tournament.json

# Compare against authored story seeds
uv run python scripts/analyze_ai_games.py --input build/tournament.json --world default
```

The analysis identifies:
- Win rate deltas between strategies
- Dominant or underperforming strategies
- Unused story seeds
- Overpowered actions

### CI Integration

The `.github/workflows/ai-tournament.yml` workflow runs nightly tournaments
and archives results. Trigger manual runs via the GitHub Actions UI.

See `docs/gengine/how_to_play_echoes.md` Section 13 for the complete balance
iteration workflow.

## Next Steps

1. **Phase 8 – Kubernetes Deployment** – create Kubernetes manifests for local
minikube deployment, enabling multi-container orchestration and service
discovery. Docker containerization is complete (see Docker section above).
2. **Phase 9 M9.4 – AI tournaments and balance tooling** – create tournament
scripts that run multiple AI strategies in parallel, aggregate comparative
reports (win rates, stability curves, story seed coverage), and identify
balance outliers. See the Phase 9 section of the implementation plan.

Progress is tracked in the implementation plan document; update this README as
new phases land (CLI tooling, services, Kubernetes manifests, etc.).
54 changes: 54 additions & 0 deletions docs/gengine/ai_tournament_and_balance_analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Section 13: AI Tournament & Balance Analysis

**Last Updated:** 2025-12-03

## Overview
This section describes how to use the AI tournament and balance analysis tooling introduced in Phase 9. These tools help designers and developers run large batches of AI-driven games in parallel, compare strategy performance, and identify balance issues or underutilized content.

## Running AI Tournaments

The tournament script executes multiple games in parallel, each using a configurable AI strategy (BALANCED, AGGRESSIVE, DIPLOMATIC, HYBRID). Telemetry is captured for each game, and results are aggregated into a single JSON file.

**Example:**
```bash
uv run python scripts/run_ai_tournament.py --games 100 --output build/tournament.json
```
- `--games`: Number of games to run (default: 100)
- `--output`: Path to save the aggregated results
- Additional flags allow you to specify strategies, seeds, and world configs.

## Analyzing Tournament Results

After running a tournament, use the analysis script to generate comparative reports. This tool surfaces win rate differences, balance anomalies, and unused story seeds.

**Example:**
```bash
uv run python scripts/analyze_ai_games.py build/tournament.json --report build/analysis.txt
```
- `--report`: Path to save the analysis output

The report includes:
- Win rate comparison across strategies
- Detection of unused story seeds
- Flagging of balance outliers

## Balance Iteration Workflow

1. Run a tournament with a large number of games and varied strategies.
2. Analyze the results to identify dominant strategies, underpowered/overpowered actions, and unused content.
3. Adjust simulation parameters or authored content as needed.
4. Repeat the process to validate improvements.

## CI Integration

A nightly CI workflow automatically runs tournaments and archives results for ongoing balance review. See `.github/workflows/ai-tournament.yml` for details.

## Usage Tips
- Use different world configs and seeds to stress-test balance across scenarios.
- Review the analysis report regularly to guide design iteration.
- Archived CI artifacts provide a historical record of balance changes.

## See Also
- [How to Play Echoes](./how_to_play_echoes.md)
- [Implementation Plan](../simul/emergent_story_game_implementation_plan.md)
- [README](../../README.md)
175 changes: 171 additions & 4 deletions docs/gengine/how_to_play_echoes.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# How to Play Echoes of Emergence

This guide explains how to run the current Echoes of Emergence prototype,
interpret its outputs, and iterate on the simulation while new systems are
under construction. It assumes you have cloned the repository and installed all
runtime/dev dependencies via `uv sync --group dev`.

This guide explains how to run the current Echoes of Emergence prototype, interpret its outputs, and iterate on the simulation while new systems are under construction. It assumes you have cloned the repository and installed all runtime/dev dependencies via `uv sync --group dev`.

**New!** For large-scale AI playtesting and balance iteration, see [Section 13: AI Tournament & Balance Analysis](./ai_tournament_and_balance_analysis.md).

## 1. Launching the Shell

Expand Down Expand Up @@ -951,3 +951,170 @@ Post-mortem summary:

The post-mortem is saved alongside the campaign data for later review. Ended
campaigns can still be resumed if you want to continue playing.

## 13. AI Tournaments & Balance Tooling

The repository includes AI tournament infrastructure for automated balance
testing and validation. Tournaments run multiple AI players with different
strategies in parallel, then aggregate results to identify balance anomalies.

### Running Tournaments

Run a tournament with default settings:

```bash
uv run python scripts/run_ai_tournament.py \
--games 100 --ticks 100 --output build/tournament.json
```

The tournament script supports several options:

| Flag | Description |
| -------------- | ---------------------------------------------------- |
| `--games/-g` | Total number of games to run (default: 100) |
| `--ticks/-t` | Ticks per game (default: 100) |
| `--strategies` | Strategies to test (balanced, aggressive, diplomatic)|
| `--seed` | Base random seed for deterministic runs (default: 42)|
| `--workers` | Max parallel workers (default: auto) |
| `--output/-o` | Path to write JSON results |
| `--verbose/-v` | Print progress during tournament |

Example output shows win rates and stability metrics per strategy:

```
================================================================================
AI TOURNAMENT RESULTS
================================================================================

Games: 100/100 completed (0 failed)
Total duration: 45.2s

Strategy Win Rate Avg Stab Min Stab Max Stab Avg Actions
--------------------------------------------------------------------------------
balanced 65.0% 0.720 0.450 1.000 5.2
aggressive 72.0% 0.680 0.380 1.000 8.1
diplomatic 58.0% 0.750 0.520 1.000 3.4
--------------------------------------------------------------------------------
```

### Analyzing Results

After running a tournament, analyze the results for balance insights:

```bash
uv run python scripts/analyze_ai_games.py \
--input build/tournament.json --world default
```

The analysis script:

- Compares win rates across strategies
- Identifies dominant strategies (win rate delta > 15%)
- Flags unused or underused story seeds
- Detects overpowered actions
- Generates actionable recommendations

Example analysis output:

```
================================================================================
AI TOURNAMENT ANALYSIS REPORT
================================================================================

Tournament: 100 games, 100 ticks each
Strategies: balanced, aggressive, diplomatic

--------------------------------------------------------------------------------
WIN RATE ANALYSIS
--------------------------------------------------------------------------------
Best strategy: aggressive (72.0%)
Worst strategy: diplomatic (58.0%)
Win rate delta: 14.0%
Balance status: ✓ Balanced

--------------------------------------------------------------------------------
ACTION ANALYSIS
--------------------------------------------------------------------------------
Most used: INSPECT (450 times)
Least used: NEGOTIATE (120 times)

--------------------------------------------------------------------------------
RECOMMENDATIONS
--------------------------------------------------------------------------------
1. No significant balance issues detected - system appears well-tuned
================================================================================
```

### Balance Iteration Workflow

When tuning game balance, follow this workflow:

1. **Run baseline tournament**: Capture initial metrics with `--seed 42` for
reproducibility.

```bash
uv run python scripts/run_ai_tournament.py \
--games 100 --output build/baseline.json
```

2. **Analyze baseline**: Review strategy balance, action distribution, and seed
coverage.

```bash
uv run python scripts/analyze_ai_games.py \
--input build/baseline.json --world default
```

3. **Adjust parameters**: Based on analysis findings, modify config values in
`content/config/simulation.yml`:

- Strategy thresholds affect AI decision-making
- Economy settings influence resource pressure
- Director pacing controls narrative density

4. **Run comparison tournament**: Use the same seed for deterministic comparison.

```bash
uv run python scripts/run_ai_tournament.py \
--games 100 --output build/tuned.json --seed 42
```

5. **Compare results**: Diff the analysis reports to validate improvements.

```bash
# Compare win rates between runs
python scripts/analyze_ai_games.py --input build/baseline.json --json > /tmp/a.json
python scripts/analyze_ai_games.py --input build/tuned.json --json > /tmp/b.json
diff /tmp/a.json /tmp/b.json
```

6. **Iterate**: Repeat steps 3-5 until balance metrics fall within acceptable
ranges.

### CI Integration

The repository includes a GitHub Actions workflow (`.github/workflows/ai-tournament.yml`)
that runs nightly tournaments:

- Executes 100 games with all strategies
- Archives results as artifacts for 90 days
- Prints analysis summary in the job log

To trigger a manual tournament run, use the GitHub Actions UI and select
"Run workflow" with optional game/tick counts.

### Interpreting Anomalies

The analysis script flags several types of balance issues:

| Anomaly Type | Severity | Meaning |
| -------------------- | -------- | ------------------------------------------ |
| `dominant_strategy` | High | One strategy wins > 20% more than others |
| `strategy_imbalance` | Medium | Win rate delta between 15-20% |
| `dominant_action` | Medium | One action accounts for > 50% of all uses |
| `unused_story_seeds` | High/Low | Story seeds never triggered during games |
| `low_seed_coverage` | Medium | Less than 50% of seeds were activated |
| `low_activity` | Low | A strategy averages < 1 action per game |

Recommendations are generated automatically based on detected anomalies. Use
them as starting points for parameter tuning rather than prescriptive fixes.
Loading