Skip to content

HammadArifeen/Gambit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GAMBIT

Formerly known as "Crucible", GAMBIT is an adversarial AI simulation where two agents play 100 rounds of Split or Steal. Through private reflection and experience, they discover deception, trust manipulation, and counter-deception. Nothing is prompted. Everything emerges.

What this is

An adversarial simulation engine for studying emergent deception in LLM agents. Both agents start with identical naive prompts and zero strategic priming. Deceptive behavior develops purely through experience and private reflection. GAMBIT measures how it happens, when it happens, and distills defensive skills from the patterns that emerge.

The security application: AI copilots are entering every enterprise workflow. GAMBIT stress-tests how these agents behave under adversarial pressure and produces deployable countermeasures.

Why this matters for Gradient

GAMBIT turns DigitalOcean Gradient™ AI into an adversarial lab for AI copilots. Teams can point GAMBIT at any Gradient-hosted model, run 100-round stress tests under iterated game-theory pressure, and export defensive prompt modules when deception emerges. This makes it a reusable tool for hardening Gradient-based agents before they’re deployed into real workflows.

Key findings

Metric Result
Mutual destruction rate 86%
Cooperation rate 6%
Deception Index 22.9 / 100
First betrayal Round 6

Round 6 is the inflection point. After five rounds of cooperation, one agent identifies the opponent's trust pattern and exploits it. The opponent develops a theory of mind about the attacker within one round. From there, mutual destruction dominates and trust never recovers.

Stack

  • Game engine: DigitalOcean Gradient AI (configurable model, default llama3.3-70b-instruct)
  • Metrics pipeline: Mutual information decay, strategy entropy, exploitation windows, language drift, composite Deception Index
  • Skill distillation: Converts emergent strategy patterns into deployable prompt modules for hardening customer-facing agents
  • Voice rendering: ElevenLabs TTS with emotion-mapped parameters (two distinct agent voices)
  • Observability: Datadog LLM Observability integration
  • Evaluation: Braintrust structured eval logging
  • Frontend: Static HTML dashboard with split-screen agent view, strategy analysis, and skill cards

Quick start

Run a new game

pip install -r requirements.txt
cp .env.example .env  # add your API keys (MODEL_ACCESS_KEY required)

# Run 100 rounds
python -m engine.run --rounds 100 --turns 3

# Optional prompt controls:
#   --prompt-mode {balanced_competitive,hard_max,legacy}
#   --psychology-block {on,off}
#   --deception-policy {explicit,implicit,discourage}

# Render voice clips for highlight rounds
python -m engine.voice --rounds auto

# Distill defensive skills from the run
python -m engine.distill

# Evaluate the distilled skill bundle
python -m engine.skill_eval

Explore saved demo data

If you have the data/ folder with pre-run results (JSON + audio), no API keys needed:

python serve.py
# Main dashboard:      http://localhost:8080/demo/
# Strategy analysis:   http://localhost:8080/demo/analysis.html
# Distilled skills:    http://localhost:8080/demo/skills.html

Compare prompt modes

python scripts/compare_prompt_modes.py --rounds 25 --turns 2

Set up DigitalOcean Gradient Agents

python scripts/setup_gradient.py

Creates two Gradient player agents, a Knowledge Base, guardrails, and a Game Master agent. Requires DO_API_TOKEN env var.

Structure

engine/
  game.py               # Core game loop (conversation, choice, private reflection)
  run.py                # CLI runner
  metrics.py            # Adaptation metrics pipeline (MI decay, entropy, drift)
  distill.py            # Skill distillation (strategy patterns -> prompt modules)
  skill_eval.py         # Evaluation harness for distilled skills
  voice.py              # ElevenLabs voice renderer (emotion-mapped)
  prompt_packager.py    # Prompt mode system (balanced_competitive, hard_max, legacy)
  instrumentation.py    # Datadog LLM Observability integration
shared/
  models.py             # Pydantic models (GameState, RoundState, AgentMemory)
  skills.py             # SkillCard, DistilledSkillBundle models
demo/
  index.html            # Main dashboard (split-screen agent view, audio playback)
  analysis.html         # Strategy deep dive (timeline, entropy curves, MI decay)
  skills.html           # Distilled skill cards UI
scripts/
  setup_gradient.py         # DigitalOcean Gradient AI provisioning
  compare_prompt_modes.py   # Run multiple prompt configs side by side
  clean_latest.py           # Strip artifacts from run JSON
  render_highlights.py      # Generate highlight clips
data/                   # Run outputs (gitignored)
  latest_game.json      # Full game state (conversations, choices, reflections)
  latest_metrics.json   # Computed metrics
  latest_skills.json    # Distilled skill bundle
  audio/                # Per-round voice clips (MP3)
  skills/               # Skill bundles by run ID

DigitalOcean Gradient AI Integration

GAMBIT uses DigitalOcean Gradient Serverless Inference as its LLM backend. Key Gradient features used:

  • Serverless Inference: All LLM calls route through the Gradient Serverless Inference endpoint (https://inference.do-ai.run/v1/) using the OpenAI-compatible API. Default model: llama3.3-70b-instruct. No GPU infrastructure to manage.
  • Gradient Agents: scripts/setup_gradient.py provisions two Gradient Agents (gambit-player-a and gambit-player-b) that represent the two players in the adversarial simulation. A third Game Master agent routes to the player agents.
  • Knowledge Base: A Knowledge Base with game theory content (Nash equilibrium, iterated prisoner's dilemma, tit-for-tat strategies) is created and attached to both player agents, giving them domain context for strategic reasoning.
  • Guardrails: Content Moderation and Jailbreak guardrails are attached to both player agents, ensuring that emergent deceptive behavior stays within safe boundaries and does not produce harmful content.
  • Agent Routing: The Game Master agent uses Gradient’s agent routing capabilities to orchestrate which player agent handles each turn, making the multi-agent setup native to the Gradient platform.
  • Model Flexibility: Switch models by changing the GRADIENT_MODEL environment variable. Compare adversarial resilience across different models without code changes.

Environment variables

MODEL_ACCESS_KEY=...         # Required. DigitalOcean Gradient AI access key.
GRADIENT_MODEL=llama3.3-70b-instruct  # Optional. Override model.
ELEVENLABS_API_KEY=...       # Optional. For voice rendering.
DD_API_KEY=...               # Optional. Datadog LLM tracing.
BRAINTRUST_API_KEY=...       # Optional. Structured eval logging.
DO_API_TOKEN=...             # Required for scripts/setup_gradient.py.
DO_PROJECT_ID=...            # DigitalOcean project UUID.

License

MIT License. © 2026 Hammad Arifeen. See LICENSE.

Credits

By Hammad Arifeen for the DigitalOcean Gradient™ AI Hackathon.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors