A multi-agent marathon simulation built with Google ADK and Gemini. AI agents plan a marathon route through Las Vegas, simulate the environment around it (weather, traffic, crowds), and run the race autonomously. Everything talks over the A2A protocol.
Originally demoed at the Google Cloud Next '26 Developer Keynote.
This is the open-source release of the multi-agent simulation we ran at the Google Cloud Next '26 Developer Keynote. It is also a working reference architecture for a few patterns that are hard to study in isolation:
- Cached vs live replay. The frontend can replay NDJSON streams recorded from real agent runs, indistinguishable from a live session. We use it for keynote reliability; you can use it to demo, test UI changes, or teach without paying for LLM calls.
- A deterministic runner variant.
runner_autopilotmakes the same shape of decisions a real LLM-powered runner would, but with zero API calls. It is the right baseline when you want to measure the simulator under load without your bill becoming the experiment. - A planner ladder. Three variants of the planner (
planner,planner_with_eval,planner_with_memory) show what each layer actually adds — eval gating, persistent memory in AlloyDB — as separate agents instead of feature flags. - The Hub session pattern. A Go gateway routes WebSocket traffic to and from Python ADK agents over A2A, with batching to keep the system from thundering-herding itself when hundreds of runners broadcast on the same tick.
- Backend-driven UI via A2UI. Agents emit UI primitives (cards, route lists, action buttons) over the wire instead of the frontend hard-coding layouts per response shape.
The README and the in-repo agent skills are organized so you can pull any one of these threads without running the whole thing.
A few things you saw on stage are not in this repo, and that is intentional:
- Some demos used products that are still in private preview. We cut those paths rather than ship code only the preview list could run.
- Some resources (always-on Memorystore at production sizing, GKE autoscaling) cost more than is reasonable for a demo. The default deploy scales compute to zero between runs.
- Some choices in the code reflect a keynote schedule — deadlines measured in days for things you would normally give weeks. Those edges are real. Other choices look strange and are deliberate; the cached-replay mode is the obvious one. If you are not sure which is which, the agent skills (
exploring-the-codebase) explain the design decisions.
We will fold preview-gated features back in as those products move to public preview.
A multi-agent system on Google Cloud, dressed up as a marathon. The agents split the work like this:
- A Planner designs the race course using Google Maps data, GIS tools, and a bit of financial modeling.
- A Simulator runs the environment tick by tick: weather, traffic, crowds, race progression.
- Runner agents (the NPCs) each decide their own pacing, hydration, and strategy as the race unfolds.
They coordinate over the Agent-to-Agent (A2A) protocol. A Go gateway sits in the middle and routes WebSocket traffic between a 3D Angular/Three.js frontend and the Python agents.
Locally, it runs on Docker Compose (Redis, Pub/Sub emulator, PostgreSQL). In the cloud, it deploys to Cloud Run and Vertex AI Agent Engine.
The fastest way to get Race Condition running is to ask your AI coding assistant to do it. The repo ships an AGENTS.md at the root that any modern AI dev tool will read on entry, plus four detailed skill files under .claude/skills/.
Skills auto-discover. Open the repo and ask:
cd race-condition
claude
> Set up this project and get the simulation runningClaude will load the getting-started skill on its own.
These tools read AGENTS.md automatically but do not auto-discover the skill files. Point your assistant at the relevant skill explicitly:
cd race-condition
gemini # or opencode, or open the folder in Cursor / VS Code
> Read .claude/skills/getting-started/SKILL.md and walk me through itSame pattern for any other task. Swap in the skill that fits:
| Task | Skill to point at |
|---|---|
| First-time local setup | .claude/skills/getting-started/SKILL.md |
| Understanding the architecture and design decisions | .claude/skills/exploring-the-codebase/SKILL.md |
| Deploying to your own GCP project | .claude/skills/deploying/SKILL.md |
| Preparing a contribution (hooks, tests, PR) | .claude/skills/contributing/SKILL.md |
The skill files are plain Markdown with YAML frontmatter. The assistant will walk you through GCP authentication, API enablement, dependency installation, and starting the simulation. Some steps (like gcloud auth login) need you to act in a browser; the assistant will tell you when.
You can also set up manually. See the Quickstart section below.
graph TD
Frontend["Frontend<br/>Angular + Three.js"]
Gateway["Gateway<br/>Go · WebSocket · Gin"]
Planner["Planner Agent<br/>Python · ADK"]
Simulator["Simulator Agent<br/>Python · ADK"]
Runners["Runner Agents<br/>Python · ADK"]
Redis["Redis"]
PubSub["Pub/Sub"]
Postgres["PostgreSQL<br/>pgvector"]
Frontend -->|WebSocket| Gateway
Gateway -->|A2A| Planner
Gateway -->|A2A| Simulator
Gateway -->|A2A| Runners
Planner ---|A2A| Simulator
Simulator ---|A2A| Runners
Gateway --> Redis
Gateway --> PubSub
Planner --> Postgres
| Component | What it does |
|---|---|
| Gateway | Central WebSocket hub (Go/Gin). Routes requests, manages sessions, bridges frontends with agents via A2A. |
| Planner | Designs marathon routes using GIS data, Google Maps MCP tools, and financial modeling. Three variants: base, with eval (LLM-as-Judge), and with memory (AlloyDB persistence). |
| Simulator | Runs the race as a pipeline: pre-race setup, a tick-based loop engine (up to 200 ticks), and post-race analysis. Spawns and coordinates runner agents. |
| Runners | Individual NPC agents. The LLM-powered variant uses Gemini to make strategic decisions each tick. The autopilot variant is deterministic (no LLM calls). |
| Frontend | Angular 21 + Three.js app rendering a 3D Las Vegas environment with real-time runner positions, weather, and crowd reactions. |
| Infrastructure | Redis (state/pub-sub fanout), Pub/Sub emulator (telemetry streaming), PostgreSQL with pgvector (route memory and embeddings). |
Install these before you start. make check-prereqs will verify them for you.
| Tool | Version | What it's for | Install |
|---|---|---|---|
| Go | 1.25+ | Gateway, admin, and frontend BFF servers | go.dev/dl |
| Python | 3.13+ | AI agents (installed and managed by uv) | python.org |
| uv | latest | Python package manager, virtual env, and task runner | docs.astral.sh/uv |
| Node.js | 24+ | Frontend (Angular), admin dashboard, tester UI | nodejs.org |
| Docker + Compose | latest | Local infrastructure: Redis, Pub/Sub emulator, PostgreSQL | docs.docker.com |
| Google Cloud SDK | latest | gcloud CLI for auth and API enablement |
cloud.google.com/sdk |
Compute scales to zero by default (min_instances=0, max_instances=1), so you pay for it only when something is running. The unavoidable fixed cost is around $91/month for Memorystore Redis, Cloud SQL, and Cloud NAT. Each simulation run is roughly $3-4 in Gemini API calls. If you want to develop without burning API credits, use the runner_autopilot variant — it's deterministic and makes zero LLM calls.
The deploy entry point (scripts/deploy.sh and make deploy) prints the same breakdown and waits for confirmation before it provisions anything. Once you confirm, it pre-flights the project (enables the APIs Cloud Build needs and grants the Cloud Build default SA the IAM roles to run Terraform; safe to re-run), then submits the build.
Tear down with cd infra && terraform destroy. If you want to keep services warm and skip cold starts, bump min_instances for the services you care about in infra/terraform.tfvars via the service_sizing map and re-apply.
You need a GCP project with billing enabled where you are an Owner (or have roles/aiplatform.user at minimum). If you just created the project, you're already Owner.
# Log in and write Application Default Credentials in one step
gcloud auth login --update-adc
# Set your project (replace MY_PROJECT_ID everywhere below)
gcloud config set project MY_PROJECT_ID
# Enable required APIs
gcloud services enable aiplatform.googleapis.com # Vertex AI (agent LLM calls)
gcloud services enable generativelanguage.googleapis.com # Gemini API (GIS traffic tool)
gcloud services enable cloudresourcemanager.googleapis.com # Required by Pub/Sub client
gcloud services enable pubsub.googleapis.com # Telemetry (emulated locally, but client validates the API)
gcloud services enable iam.googleapis.com # ADC token exchange
# Set the quota project so API calls are billed correctly
gcloud auth application-default set-quota-project MY_PROJECT_IDNote: API enablement can take a minute or two to propagate. If you see 403 errors on first start, wait a minute and run
make restart.
git clone https://github.com/GoogleCloudPlatform/race-condition.git
cd race-condition
# Install everything and build (checks prereqs, creates .env, installs deps)
make init
# Set your GCP project in .env (this is what the agents actually read)
sed -i '' 's/your-gcp-project-id/MY_PROJECT_ID/g' .env # macOS
# sed -i 's/your-gcp-project-id/MY_PROJECT_ID/g' .env # LinuxThe sed command sets GOOGLE_CLOUD_PROJECT and PROJECT_ID in .env. The agents read their project from this file, not from gcloud config.
make startThe frontend opens at http://localhost:9119. The admin dashboard at http://localhost:9100 shows service health.
- Checks that Go, Python, uv, Node.js, and Docker are installed.
- Copies
.env.exampleto.env(if.envdoesn't exist). - Installs Python dependencies with
uv sync. - Installs and builds the frontend and web UIs.
- Starts Docker infrastructure (Redis, Pub/Sub emulator, PostgreSQL).
- Builds Go services.
- Verifies
.envexists and no services are already running. - Starts Docker infrastructure.
- Checks that all required ports are free.
- Launches all services via Honcho (13 processes).
- Logs output to
logs/simulation.log.
Use make stop to shut everything down, or make restart to cycle.
The frontend boots in Cached mode. That is on purpose. When you are presenting to thousands of people on a stage with one network drop, you do not want a live LLM call to be the difference between a clean demo and a long, awkward pause. Cached mode replays NDJSON streams that were recorded from real runs, so the timing is real and the agent output is real, but nothing depends on the network.
Live mode talks to the agents over WebSockets and runs them for real.
Toggle between the two with Ctrl+L, or use the segmented control in the chat panel's Settings dropdown. A small indicator flashes the new mode in the corner.
The simulation ships with nine demos. Each one configures the active agent, the cached recording, and a few UI defaults. Switch between them with Ctrl+<key>:
| Hotkey | Demo | What it shows |
|---|---|---|
Ctrl+0 |
Sandbox | Held intro shot and a pre-loaded marathon plan. Press Ctrl+I to release the camera. |
Ctrl+1 |
Build agents with Agent Platform | The base planner working on its own. |
Ctrl+2 |
Creating multi-agent systems | Planner with eval (LLM-as-Judge). |
Ctrl+3 |
Enhancing agents with memory | Planner backed by AlloyDB route memory. |
Ctrl+4 |
Debugging at scale | Simulator with deliberate fault injection. |
Ctrl+5 / Ctrl+Shift+5 |
Intent to infrastructure with Cloud Assist | Base then upgraded variant. |
Ctrl+7 / Ctrl+Shift+7 |
Securing agents | Insecure then secure variant. |
Ctrl+R resets the current demo. Pressing the same demo's hotkey twice does the same thing.
Ctrl+A, Ctrl+S, and Ctrl+D switch between Camera A, B, and C. Heads up: Ctrl+D also toggles the alternative side panels because of a hotkey collision we never split apart. Patches welcome.
Ctrl+I plays the held camera intro (Sandbox uses this). Ctrl+Shift+I skips it.
Sandbox is the playground. It loads planner_with_memory and shows the Organizer's "top 3 routes" panel as soon as the page is ready. In cached mode you get three pre-recorded routes you can preview by clicking Show Route. Switch to Live mode and the same panel sends list the top 3 best routes to a fresh planner_with_memory agent — whatever that agent decides to return is what you see.
If you want to poke at the memory agent ad-hoc without spinning up a full simulation, this is the place to do it.
The Planner agent can use Google Maps MCP tools (search_places, compute_routes, lookup_weather) to design geographically accurate marathon routes. This requires a Google Maps API key. Without one, the planner still works but plans routes without live map data.
Enable these APIs before creating the key -- the key restriction dropdown in the console only shows APIs that are already enabled.
gcloud services enable apikeys.googleapis.com # API key management
gcloud services enable agentregistry.googleapis.com # ADK discovers Maps MCP server here
gcloud services enable mapstools.googleapis.com # Maps MCP server
gcloud services enable places.googleapis.com # search_places tool
gcloud services enable weather.googleapis.com # lookup_weather toolNote: API enablement can take a minute or two to propagate. Wait a couple minutes before creating the key.
- Open the Credentials page in the Google Cloud Console.
- Click Create credentials > API key.
- Copy the key value (you'll need it for Step 3).
- Click Edit API key (or click the key name in the list).
- Under API restrictions, select Restrict key.
- From the dropdown, select these APIs (use the filter box to find them):
- Cloud API Registry API
- Maps Grounding Lite API
- Places API (New)
- Weather API
- Click Save.
GOOGLE_MAPS_API_KEY=AIza...your-key-hereThen restart the simulation (make restart).
The planner resolves the key in this order:
GOOGLE_MAPS_API_KEYenvironment variable (if set and non-empty).- Google Cloud Secret Manager:
gcloud secrets versions access latest --secret=maps-api-key --project=$GOOGLE_CLOUD_PROJECT. - If neither is available, Maps tools are disabled and the planner logs a warning.
race-condition/
├── agents/ # Python AI agents (Google ADK)
│ ├── planner/ # Route planning with GIS + Maps MCP
│ ├── planner_with_eval/ # + LLM-as-Judge plan evaluation
│ ├── planner_with_memory/ # + AlloyDB route persistence
│ ├── simulator/ # Race engine (pipeline: setup → ticks → results)
│ ├── simulator_with_failure/ # Fault-injection test variant
│ ├── runner/ # LLM-powered marathon runner (Gemini/Ollama/vLLM)
│ └── runner_autopilot/ # Deterministic runner (no LLM calls)
├── cmd/ # Go service entry points
│ ├── gateway/ # WebSocket hub + A2A routing
│ ├── admin/ # Admin dashboard server
│ ├── tester/ # Tester UI server
│ └── frontend/ # Frontend BFF (serves Angular app)
├── internal/ # Go internal packages
│ ├── hub/ # Session routing + WebSocket management
│ ├── ecs/ # Entity-Component-System for simulation state
│ ├── sim/ # Simulation lifecycle management
│ ├── session/ # Session store (Redis, in-memory)
│ └── agent/ # A2A client + agent discovery
├── web/ # Web frontends
│ ├── frontend/ # Angular 21 + Three.js (3D visualization)
│ ├── admin-dash/ # Service health dashboard (Vite)
│ ├── tester/ # Developer testing console (Vite + Tailwind)
│ └── agent-dash/ # Real-time agent debug console (Chart.js)
├── gen_proto/ # Generated protobuf code (Go + Python, committed)
├── docker-compose.yml # Redis, Pub/Sub emulator, PostgreSQL (pgvector)
├── Dockerfile # Multi-stage build for all services
├── Makefile # Build, test, lint, run targets
├── Procfile # Service definitions for Honcho
└── pyproject.toml # Python dependencies (managed by uv)
Agents discover each other through agent cards served at /.well-known/agent-card.json. The gateway fetches these cards at startup and routes messages to the right agent based on declared skills.
The simulator runs a SequentialAgent pipeline:
graph LR
PreRace["Pre-race<br/>Parse plan, spawn runners"] --> RaceEngine["Race engine<br/>LoopAgent, up to 200 ticks"]
RaceEngine --> PostRace["Post-race<br/>Compile results"]
Each tick, the simulator advances the race clock, updates conditions (weather, traffic, crowd density), and broadcasts state to all runner agents. Runners respond with their decisions (accelerate, brake, hydrate).
| Variant | Model | Cost | Use case |
|---|---|---|---|
runner |
Gemini 3.1 Flash Lite (default) | Low | LLM-driven strategic decisions per tick |
runner (Ollama) |
Gemma 4 (local) | Free | Local development without API costs |
runner (vLLM/GKE) |
Gemma 4 on GKE | Self-hosted | Production-scale on Kubernetes |
runner_autopilot |
None (deterministic) | Free | Baseline testing, no LLM calls |
Configure the runner model in .env:
# Gemini (default, requires Vertex AI)
RUNNER_MODEL=gemini-3.1-flash-lite-preview
# Ollama (local, free)
RUNNER_MODEL=ollama_chat/gemma4:e2b
# vLLM on GKE
RUNNER_MODEL=openai/gemma-4-E4B-it
VLLM_API_URL=http://localhost:8080/v1All ports are configured in .env. Defaults:
| Service | Port | URL |
|---|---|---|
| Frontend (3D) | 9119 | http://localhost:9119 |
| Admin dashboard | 9100 | http://localhost:9100 |
| Gateway API | 9101 | http://localhost:9101 |
| Tester UI | 9112 | http://localhost:9112 |
| Agent debug console | 9111 | http://localhost:9111 |
| Planner | 9105 | |
| Planner (with eval) | 9106 | |
| Planner (with memory) | 9109 | |
| Simulator | 9104 | |
| Runner | 9108 | |
| Runner (autopilot) | 9110 | |
| Redis | 9102 | |
| Pub/Sub emulator | 9103 | |
| PostgreSQL | 9113 |
| Target | What it does |
|---|---|
make init |
One-time setup: installs deps, creates .env, starts infra, builds |
make start |
Start all services |
make stop |
Stop all services |
make restart |
Stop then start |
make test |
Run Go + Python + web tests |
make build |
Build Go services |
make lint |
Run Go + Python linters |
make fmt |
Format all code |
make coverage |
Generate coverage reports |
make eval |
Run agent evaluations (requires Gemini API) |
make check-prereqs |
Verify all tools are installed |
In production, Race Condition runs on Cloud Run (gateway, frontend BFF), Vertex AI Agent Engine (Python agents), AlloyDB (route memory, embeddings), Memorystore Redis (sessions), and Pub/Sub (telemetry).
The Dockerfile is multi-stage and covers every service. The local docker-compose.yml mirrors the same topology, which is what makes "works on my laptop" actually mean something here.
If a Cloud Build run fails partway through tf-apply-services, the next run may fail at tf-apply-base with:
Error: cannot destroy service without setting deletion_protection=false
and running `terraform apply`
This happens when Terraform state holds Cloud Run service resources from the partial run, but those resources were created before the deletion-protection fix was applied to the local code. The recovery is one-shot: delete the orphan Cloud Run services via gcloud, then re-run the same Cloud Build:
PROJECT=$(gcloud config get-value project)
REGION=us-central1
for SVC in admin dash frontend gateway runner-autopilot runner-cloudrun tester; do
gcloud run services delete "$SVC" \
--project="$PROJECT" --region="$REGION" --quiet || true
done
# Then re-run the bootstrap (Make target or scripts/deploy.sh).GCP-side deletionProtection is unset on these services (the block is purely a Terraform-provider-level guard), so the deletes succeed without needing gcloud run services update --no-deletion-protection first. After the deletes, Terraform's next apply will refresh state, see the resources are gone, and re-create them cleanly with the new declarations.
# Run everything
make test
# Just Go
make test-go
# Just Python (skips slow/eval tests)
make test-py
# Just web UIs
make test-web
# Agent evaluations (calls Gemini, costs money)
make evalPython tests run without real GCP credentials. A root conftest.py patches google.auth.default with mock credentials so agent modules can import and run tests offline.
We treat this as scaffolding more than a finished product. Plenty of ideas got cut on our way to the keynote stage that we would love to see someone pick up:
- Public safety agents. Medics, traffic management, emergency response. What happens to the runner field when an agent has to reroute the marathon around an incident?
- Local economy agents. Coffee shops, rideshare, vendors reacting to crowd density and weather. Pull the simulation out of pure logistics and into something more like a city.
- Less-cooperative runners. What if some runner agents cheat — hide a shortcut, draft illegally, lie about their hydration? Can the simulator catch them?
- Spectator interaction. Wire a phone, a Twitch chat, or a microphone into the gateway so the crowd can actually cheer for runners. The runner agents already model crowd density as an environmental input; turn that input into something a real human can move, and you have a live multi-agent system the audience is part of.
Those are the ones we still talk about. If you build something interesting, open a PR or just tell us about it on the issue tracker.
Contributions welcome. See CONTRIBUTING.md for the CLA process, code style, and PR guidelines.
The humans who built this, grouped by what they led. Inspired by the all-contributors spec, with role labels that match how the team actually worked.
|
Casey West |
Nicholas White |
|
Jason Davenport |
Wei Hsia |
Lucia Subatin |
Jack Wotherspoon |
Mofi Rahman |
|
Olle Kaiser |
Klas Kroon |
|
Lisa Granlund |
|
Tom Greenaway |
Jonatan Vallin |
Alex Hsie |
Jon Callard |
Apache 2.0. See LICENSE.
This is not an officially supported Google product.
