Compact internal hackathon package for the Arize @ Google Cloud Partnerships Hackathon track: a small Google ADK shopping agent, Gemini, OpenInference/Phoenix tracing, Phoenix MCP retrieval, and a thin TracePilot operator loop that diagnoses a real trace and proposes the next better task.
This repo uses a tiny in-memory catalog so you can run locally in minutes (no PyTorch, Pyserini, or multi-gigabyte product downloads). The agent still exposes familiar search / click tools and a shopping-focused system prompt derived from google/adk-samples personalized-shopping.
TracePilot is an operator layer above an agent, not another shopping bot. The demo flow is:
- Run a real Google ADK/Gemini shopping turn.
- Emit OpenInference spans to Phoenix Cloud.
- Retrieve the latest trace through
@arizeai/phoenix-mcp. - Score whether the agent satisfied the task: trace health, search/click behavior, size constraint, and tool-step explanation.
- Write safe before/diagnosis/improvement artifacts with a refined next task.
No credentials are printed or saved in generated reports.
- Python 3.10–3.12
- uv
- Google auth for Gemini: either
GOOGLE_API_KEYor Vertex (gcloud auth application-default login+ project/location) - Phoenix Cloud API key (Phoenix)
- Clone and install
cd gemini-hackathon
cp .env.example .env
# Edit .env: PHOENIX_API_KEY, PHOENIX_COLLECTOR_ENDPOINT (Hostname with /s/...), and either GOOGLE_API_KEY or Vertex settings.
uv sync- Run the TracePilot proof/demo package
make tracepilot-demo MESSAGE='Find a floral dress in size M, then explain which tool steps you used.'This runs the real proof gate, retrieves Phoenix trace context through MCP, and writes tracepilot_artifacts/<timestamp>/diagnosis.json, demo_report.md, and refined_task.txt.
3. Fast local re-render of the package — if a proof gate artifact already exists and you do not want another Gemini run:
make tracepilot-demo-local- Run a single traced shopping turn only
make run MESSAGE='Find a floral dress in size M'- Open Phoenix — project name defaults to
PHOENIX_PROJECT_NAME(gemini-hackathon). Confirm LLM and tool spans appear. - (Optional) ADK CLI
make run-adk
# Find a floral dress in size MThis path also loads .env and initializes Phoenix tracing.
Phoenix MCP runs inside Gemini CLI, not inside the Python ADK process. After traces are flowing from make run, you can inspect the same Phoenix space from the CLI. Setup patterns and clients are covered in Phoenix MCP server.
- Configure MCP — Ensure
[.gemini/settings.json](.gemini/settings.json)in this repo (or~/.gemini/settings.json) includes thephoenixserver with@arizeai/phoenix-mcp@latest. Set--baseUrlto your Phoenix space hostname (same idea asPHOENIX_COLLECTOR_ENDPOINT:https://app.phoenix.arize.com/s/your-space) and set--apiKeyto your Phoenix API key (px_live_...), or keep keys only in env if your CLI supports that pattern. - Export your API key in the shell that launches Gemini CLI (if the MCP server reads it from the environment):
export PHOENIX_API_KEY=...- Start Gemini CLI from the repo root (or merge the
mcpServersblock into your global Gemini config). Restart the CLI if you just changed MCP settings. - Agent queries Phoenix via MCP (runtime superpower) — With
@arizeai/phoenix-mcpconfigured, the assistant gets tools over your Phoenix workspace (traces, sessions, experiments, prompts, datasets, and more). Try prompts such as:
- “In Phoenix, show me the last 3 traces in my gemini-hackathon project.”
- “In Phoenix, summarize my latest experiment results.”
- “In Phoenix, create a prompt that classifies user intent.” Additional ideas (sessions, annotation configs, datasets): Using the Phoenix MCP server.
- (Optional) The same file defines Phoenix Docs MCP (
phoenix-docs) for in-IDE Phoenix documentation.
More context: Phoenix docs.
# Full real-stack proof + operator package.
make tracepilot-demo MESSAGE='Find a floral dress in size M, then explain which tool steps you used.'
# Existing real proof gate remains available.
MESSAGE='Find a floral dress in size M, then explain which tool steps you used.' proof_gate/run_real_gemini_phoenix_gate.sh
# Rebuild diagnosis/report from an existing proof artifact, no external calls.
make tracepilot-demo-localExpected proof artifacts:
proof_gate/artifacts/<timestamp>/preflight.txt— local dependency/auth readiness, no secrets.proof_gate/artifacts/<timestamp>/phoenix_mcp_summary.json— Phoenix MCP retrieval metadata.proof_gate/artifacts/<timestamp>/phoenix_mcp_get_trace.json— latest trace payload returned by MCP.tracepilot_artifacts/<timestamp>/diagnosis.json— rubric score, failed checks, trace facts, refined task.tracepilot_artifacts/<timestamp>/demo_report.md— human-readable before/diagnosis/improvement card.tracepilot_artifacts/latest/— copy of the latest TracePilot package.
User task
-> Google ADK shopping agent (Gemini + search/click tools)
-> OpenInference instrumentation
-> Phoenix Cloud traces
-> Phoenix MCP retrieval
-> tracepilot_operator.py rubric + diagnosis
-> safe demo artifacts + refined next task
See also: docs/architecture.mmd for a simple Mermaid diagram covering the Google ADK coordinator, product_selection_agent, purchase_verification_agent, Gemini, OpenInference/Phoenix, Phoenix MCP, and TracePilot operator loop.
The operator script is intentionally small and deterministic. It does not mutate Phoenix, submit anything externally, or store credentials. It reads proof artifacts, extracts safe trace facts, scores the run, and writes a concrete next task that should force better agent behavior.
An operator-ready local review package is available in submission_package/:
DEVPOST_DRAFT.md— draft public story, still approval-gated.DEMO_SCRIPT.md— 2–3 minute walkthrough with exact local commands and artifact paths.PROOF_TABLE.md— verified 71/100 → 71/100 → 100/100 evidence table.SUBMISSION_CHECKLIST.md— approval, screenshot/video, and secret-hygiene gates.
Suggested internal demo narrative:
- Problem: agents often fail quietly: they produce plausible answers without proving they followed constraints or used tools correctly.
- Demo: ask the shopping agent for a floral dress in size M and a tool-step explanation.
- Observability loop: Phoenix/OpenInference captures the actual ADK/Gemini/tool trace; Phoenix MCP gives TracePilot retrieval access.
- Self-debugging moment: TracePilot scores the trace, finds missing behavior such as product inspection or explicit size confirmation, and emits a refined task.
- Evidence: show
demo_report.md,diagnosis.json, Phoenix trace ID, and MCP summary.
Known caveats:
- This is an internal proof slice, not a hosted product.
- The refined task is generated locally; the default demo does not automatically run a second Gemini turn unless the operator chooses to run the proof gate again with
tracepilot_artifacts/latest/refined_task.txt. - Current verified proof: baseline
71/100→ second turn71/100→ latest completion-fix run100/100(tracepilot_artifacts/latest/before_after_report.md). - External Devpost/GitHub publication remains approval-gated.
| Path | Purpose |
|---|---|
README.md |
Demo quickstart, architecture, submission notes |
.env.example |
PHOENIX_, GOOGLE_, optional GEMINI_MODEL |
.gemini/settings.json |
Phoenix MCP + Phoenix Docs MCP |
agent/main.py |
One-shot CLI run with tracing |
agent/instrumentation.py |
phoenix.otel.register(..., auto_instrument=True) for ADK tracing |
agent/shopping_demo/ |
ADK root_agent, prompt, tools, mini webshop |
proof_gate/ |
Real-stack preflight, ADK/Gemini/Phoenix/MCP proof gate, generated proof artifacts |
tracepilot_operator.py |
Deterministic TracePilot diagnosis/refinement package builder |
tracepilot_artifacts/ |
Generated safe demo package artifacts |
submission_package/ |
Local Devpost draft, demo script, proof table, and public-submission checklist |
Makefile |
make setup, make run, make run-adk, make tracepilot-demo, make tracepilot-demo-local |
Agent structure and prompts are adapted from Google ADK Samples — personalized-shopping (Apache-2.0). Replace shopping_demo/mini_webshop.py with the full WebShop stack when you need the original fidelity.
Apache-2.0 — see LICENSE.
TracePilot now runs as a Google ADK multi-agent app locally:
personalized_shopping_agentis the root ADK coordinator.product_selection_agentis an ADK specialist for webshop search/product selection.purchase_verification_agentis an ADK specialist for product-page option verification.
The coordinator still exposes the direct search and click tools used in the latest 100/100 completion-fix proof so the one-turn demo behavior stays stable. The local static/runtime guard is:
PHOENIX_API_KEY= .venv/bin/python proof_gate/check_multi_agent_structure.pyA fresh real Gemini/Phoenix proof rerun is still recommended before public submission because the canonical 100/100 proof was recorded before this multi-agent wiring change.