TracePilot — self-debugging ADK/Gemini operator demo

Compact internal hackathon package for the Arize @ Google Cloud Partnerships Hackathon track: a small Google ADK shopping agent, Gemini, OpenInference/Phoenix tracing, Phoenix MCP retrieval, and a thin TracePilot operator loop that diagnoses a real trace and proposes the next better task.

This repo uses a tiny in-memory catalog so you can run locally in minutes (no PyTorch, Pyserini, or multi-gigabyte product downloads). The agent still exposes familiar search / click tools and a shopping-focused system prompt derived from google/adk-samples personalized-shopping.

What it proves

TracePilot is an operator layer above an agent, not another shopping bot. The demo flow is:

Run a real Google ADK/Gemini shopping turn.
Emit OpenInference spans to Phoenix Cloud.
Retrieve the latest trace through @arizeai/phoenix-mcp.
Score whether the agent satisfied the task: trace health, search/click behavior, size constraint, and tool-step explanation.
Write safe before/diagnosis/improvement artifacts with a refined next task.

No credentials are printed or saved in generated reports.

Prerequisites

Python 3.10–3.12
uv
Google auth for Gemini: either GOOGLE_API_KEY or Vertex (gcloud auth application-default login + project/location)
Phoenix Cloud API key (Phoenix)

10-minute quickstart

Clone and install

 cd gemini-hackathon
 cp .env.example .env
 # Edit .env: PHOENIX_API_KEY, PHOENIX_COLLECTOR_ENDPOINT (Hostname with /s/...), and either GOOGLE_API_KEY or Vertex settings.
 uv sync

Run the TracePilot proof/demo package

 make tracepilot-demo MESSAGE='Find a floral dress in size M, then explain which tool steps you used.'

This runs the real proof gate, retrieves Phoenix trace context through MCP, and writes tracepilot_artifacts/<timestamp>/diagnosis.json, demo_report.md, and refined_task.txt. 3. Fast local re-render of the package — if a proof gate artifact already exists and you do not want another Gemini run:

 make tracepilot-demo-local

Run a single traced shopping turn only

 make run MESSAGE='Find a floral dress in size M'

Open Phoenix — project name defaults to PHOENIX_PROJECT_NAME (gemini-hackathon). Confirm LLM and tool spans appear.
(Optional) ADK CLI

 make run-adk
 # Find a floral dress in size M

This path also loads .env and initializes Phoenix tracing.

Phoenix MCP (Gemini CLI)

Phoenix MCP runs inside Gemini CLI, not inside the Python ADK process. After traces are flowing from make run, you can inspect the same Phoenix space from the CLI. Setup patterns and clients are covered in Phoenix MCP server.

Configure MCP — Ensure [.gemini/settings.json](.gemini/settings.json) in this repo (or ~/.gemini/settings.json) includes the phoenix server with @arizeai/phoenix-mcp@latest. Set --baseUrl to your Phoenix space hostname (same idea as PHOENIX_COLLECTOR_ENDPOINT: https://app.phoenix.arize.com/s/your-space) and set --apiKey to your Phoenix API key (px_live_...), or keep keys only in env if your CLI supports that pattern.
Export your API key in the shell that launches Gemini CLI (if the MCP server reads it from the environment):

 export PHOENIX_API_KEY=...

Start Gemini CLI from the repo root (or merge the mcpServers block into your global Gemini config). Restart the CLI if you just changed MCP settings.
Agent queries Phoenix via MCP (runtime superpower) — With @arizeai/phoenix-mcp configured, the assistant gets tools over your Phoenix workspace (traces, sessions, experiments, prompts, datasets, and more). Try prompts such as:

“In Phoenix, show me the last 3 traces in my gemini-hackathon project.”
“In Phoenix, summarize my latest experiment results.”
“In Phoenix, create a prompt that classifies user intent.” Additional ideas (sessions, annotation configs, datasets): Using the Phoenix MCP server.

(Optional) The same file defines Phoenix Docs MCP (phoenix-docs) for in-IDE Phoenix documentation.

More context: Phoenix docs.

TracePilot demo commands

# Full real-stack proof + operator package.
make tracepilot-demo MESSAGE='Find a floral dress in size M, then explain which tool steps you used.'

# Existing real proof gate remains available.
MESSAGE='Find a floral dress in size M, then explain which tool steps you used.' proof_gate/run_real_gemini_phoenix_gate.sh

# Rebuild diagnosis/report from an existing proof artifact, no external calls.
make tracepilot-demo-local

Expected proof artifacts:

proof_gate/artifacts/<timestamp>/preflight.txt — local dependency/auth readiness, no secrets.
proof_gate/artifacts/<timestamp>/phoenix_mcp_summary.json — Phoenix MCP retrieval metadata.
proof_gate/artifacts/<timestamp>/phoenix_mcp_get_trace.json — latest trace payload returned by MCP.
tracepilot_artifacts/<timestamp>/diagnosis.json — rubric score, failed checks, trace facts, refined task.
tracepilot_artifacts/<timestamp>/demo_report.md — human-readable before/diagnosis/improvement card.
tracepilot_artifacts/latest/ — copy of the latest TracePilot package.

Architecture

User task
  -> Google ADK shopping agent (Gemini + search/click tools)
  -> OpenInference instrumentation
  -> Phoenix Cloud traces
  -> Phoenix MCP retrieval
  -> tracepilot_operator.py rubric + diagnosis
  -> safe demo artifacts + refined next task

See also: docs/architecture.mmd for a simple Mermaid diagram covering the Google ADK coordinator, product_selection_agent, purchase_verification_agent, Gemini, OpenInference/Phoenix, Phoenix MCP, and TracePilot operator loop.

The operator script is intentionally small and deterministic. It does not mutate Phoenix, submit anything externally, or store credentials. It reads proof artifacts, extracts safe trace facts, scores the run, and writes a concrete next task that should force better agent behavior.

Local submission package

An operator-ready local review package is available in submission_package/:

DEVPOST_DRAFT.md — draft public story, still approval-gated.
DEMO_SCRIPT.md — 2–3 minute walkthrough with exact local commands and artifact paths.
PROOF_TABLE.md — verified 71/100 → 71/100 → 100/100 evidence table.
SUBMISSION_CHECKLIST.md — approval, screenshot/video, and secret-hygiene gates.

Suggested internal demo narrative:

Problem: agents often fail quietly: they produce plausible answers without proving they followed constraints or used tools correctly.
Demo: ask the shopping agent for a floral dress in size M and a tool-step explanation.
Observability loop: Phoenix/OpenInference captures the actual ADK/Gemini/tool trace; Phoenix MCP gives TracePilot retrieval access.
Self-debugging moment: TracePilot scores the trace, finds missing behavior such as product inspection or explicit size confirmation, and emits a refined task.
Evidence: show demo_report.md, diagnosis.json, Phoenix trace ID, and MCP summary.

Known caveats:

This is an internal proof slice, not a hosted product.
The refined task is generated locally; the default demo does not automatically run a second Gemini turn unless the operator chooses to run the proof gate again with tracepilot_artifacts/latest/refined_task.txt.
Current verified proof: baseline 71/100 → second turn 71/100 → latest completion-fix run 100/100 (tracepilot_artifacts/latest/before_after_report.md).
External Devpost/GitHub publication remains approval-gated.

Layout

Path	Purpose
`README.md`	Demo quickstart, architecture, submission notes
`.env.example`	`PHOENIX_`, `GOOGLE_`, optional `GEMINI_MODEL`
`.gemini/settings.json`	Phoenix MCP + Phoenix Docs MCP
`agent/main.py`	One-shot CLI run with tracing
`agent/instrumentation.py`	`phoenix.otel.register(..., auto_instrument=True)` for ADK tracing
`agent/shopping_demo/`	ADK `root_agent`, prompt, tools, mini webshop
`proof_gate/`	Real-stack preflight, ADK/Gemini/Phoenix/MCP proof gate, generated proof artifacts
`tracepilot_operator.py`	Deterministic TracePilot diagnosis/refinement package builder
`tracepilot_artifacts/`	Generated safe demo package artifacts
`submission_package/`	Local Devpost draft, demo script, proof table, and public-submission checklist
`Makefile`	`make setup`, `make run`, `make run-adk`, `make tracepilot-demo`, `make tracepilot-demo-local`

Upstream credit

Agent structure and prompts are adapted from Google ADK Samples — personalized-shopping (Apache-2.0). Replace shopping_demo/mini_webshop.py with the full WebShop stack when you need the original fidelity.

License

Apache-2.0 — see LICENSE.

ADK multi-agent shape

TracePilot now runs as a Google ADK multi-agent app locally:

personalized_shopping_agent is the root ADK coordinator.
product_selection_agent is an ADK specialist for webshop search/product selection.
purchase_verification_agent is an ADK specialist for product-page option verification.

The coordinator still exposes the direct search and click tools used in the latest 100/100 completion-fix proof so the one-turn demo behavior stays stable. The local static/runtime guard is:

PHOENIX_API_KEY= .venv/bin/python proof_gate/check_multi_agent_structure.py

A fresh real Gemini/Phoenix proof rerun is still recommended before public submission because the canonical 100/100 proof was recorded before this multi-agent wiring change.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gemini		.gemini
agent		agent
docs		docs
proof_gate		proof_gate
public_proof		public_proof
public_release		public_release
submission_package		submission_package
web_demo		web_demo
.env.example		.env.example
.gitignore		.gitignore
CANDIDATE_REVIEW.md		CANDIDATE_REVIEW.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
STATUS.md		STATUS.md
pyproject.toml		pyproject.toml
tracepilot_operator.py		tracepilot_operator.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TracePilot — self-debugging ADK/Gemini operator demo

What it proves

Prerequisites

10-minute quickstart

Phoenix MCP (Gemini CLI)

TracePilot demo commands

Architecture

Local submission package

Layout

Upstream credit

License

ADK multi-agent shape

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TracePilot — self-debugging ADK/Gemini operator demo

What it proves

Prerequisites

10-minute quickstart

Phoenix MCP (Gemini CLI)

TracePilot demo commands

Architecture

Local submission package

Layout

Upstream credit

License

ADK multi-agent shape

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages