# Evaluation Runner Notebook
Use this notebook to reproduce the Gemini-backed supervisor run exactly as it appears in our documentation.

## Prerequisites
- Activate the project virtual environment (e.g., `source .venv/bin/activate`).
- Ensure `.env` is populated with `GOOGLE_API_KEY` and any other required configuration.
- Verify the repository root contains all installed dependencies (`pip install -r requirements.txt`).

## What the notebook does
1. Sets `PYTHONPATH` to include `src` so ADK imports resolve.
2. Loads key/value pairs from `.env` into the subprocess environment (without overwriting already-exported variables).
3. Invokes `scripts/run_adk_supervisor.py --verbose` with the standard prompt set to capture a full leadership briefing.
4. Prints STDOUT and STDERR so the transcript can be archived or inspected inline.

Re-run the single code cell below after quotas reset to collect a fresh transcript for evaluation artifacts.

In [4]:
import os
import subprocess
from pathlib import Path

In [5]:
def find_repo_root(start: Path) -> Path:
    """Walk up from the starting directory until the project root is found."""
    for path in [start, *start.parents]:
        if (path / "README.md").exists() and (path / "src").exists():
            return path
    raise RuntimeError(f"Unable to locate repository root from {start}")


In [6]:
REPO_ROOT = find_repo_root(Path.cwd())
env = os.environ.copy()
env.setdefault("PYTHONPATH", str(REPO_ROOT / "src"))
env_path = REPO_ROOT / ".env"
if env_path.exists():
    for line in env_path.read_text().splitlines():
        if line.strip() and not line.startswith("#") and "=" in line:
            key, value = line.split("=", 1)
            env.setdefault(key.strip(), value.strip())

In [7]:
command = [
    "python",
    "scripts/run_adk_supervisor.py",
    "--verbose",
    "Give me an ops briefing: what happened overnight, what are the top risks, and what should leadership do next?",
    "Investigate prod-app-01 with the default window and summarize key log anomalies.",
    "Provide the utilization stats and risks.",
    "Draft the leadership summary and actions.",
]

In [10]:
result = subprocess.run(command, cwd=REPO_ROOT, env=env, capture_output=True, text=True)
print(result.stdout)


 ### Created new session: debug_session_id

User > Give me an ops briefing: what happened overnight, what are the top risks, and what should leadership do next?
it_ops_supervisor > [Calling tool: transfer_to_agent({'agent_name': 'log_analyst'})]
it_ops_supervisor > [Tool result: {'result': None}]
log_analyst > [Calling tool: fetch_server_logs({'window_minutes': 60})]
log_analyst > [Tool result: {'result': '2025-11-29T04:50:31.133030Z [INFO] prod-app-01: Health check passed\n2025-11-29T04:55:31...]
log_analyst > Overnight, the `prod-app-01` server experienced two critical database connection timeouts at 05:10 and 05:20. This was accompanied by a latency spike detected on the API Gateway at 05:30, and increasing replica lag. CPU utilization was also noted to be approaching a threshold.

The top risks are:
1.  **Service Disruption:** The repeated database connection timeouts and API latency spikes indicate a potential for service degradation or outages.
2.  **Data Inconsistency:** Increa

In [11]:
if result.stderr:
    print("--- STDERR ---")
    print(result.stderr)

--- STDERR ---
App name mismatch detected. The runner is configured with app name "InMemoryRunner", but the root agent was loaded from "/home/shiftmint/Documents/googleCapstone2/.venv/lib/python3.10/site-packages/google/adk/agents", which implies app name "agents".

