[Feature]: Query LiteLLM proxy for per-instance ACP costs

### Problem

ACP agents that don't report costs natively (e.g. Gemini CLI, Codex) currently rely on token-count-based cost estimation using LiteLLM's pricing database. This works but is an approximation — it doesn't account for tiered pricing, cached token discounts, or pricing changes.

Since all ACP agents route through the LiteLLM proxy (`LLM_BASE_URL`), the proxy already tracks **actual per-request costs** in its spend logs. We should query these instead of estimating.

### Proposed Solution: Virtual key per instance

Use **LiteLLM virtual keys** — one per eval instance. The proxy tracks spend per key automatically, so we get exact costs without any ACP protocol or server changes.

**How it works:**

1. **Before each instance**, create a virtual key via the LiteLLM admin API:
   ```python
   resp = httpx.post(f"{LLM_BASE_URL}/key/generate", headers={"Authorization": f"Bearer {LITELLM_MASTER_KEY}"}, json={
       "metadata": {"instance_id": "django__django-12155", "run_id": "23764348286"},
       "max_budget": 50.0,  # safety limit per instance
   })
   virtual_key = resp.json()["key"]
   ```

2. **Pass the virtual key** to the ACP agent as `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` (depending on agent type). No ACP server changes needed — they already read these env vars.

3. **After instance completes**, query actual cost:
   ```python
   resp = httpx.get(f"{LLM_BASE_URL}/key/info", params={"key": virtual_key}, headers={"Authorization": f"Bearer {LITELLM_MASTER_KEY}"})
   actual_cost = resp.json()["info"]["spend"]  # exact USD from proxy
   ```

4. **Store the cost** in the instance output and **delete the virtual key**:
   ```python
   httpx.post(f"{LLM_BASE_URL}/key/delete", headers={"Authorization": f"Bearer {LITELLM_MASTER_KEY}"}, json={"keys": [virtual_key]})
   ```

**Why this works for all agents:**
- Every API call goes through the LiteLLM proxy regardless of agent type
- The proxy calculates exact per-request cost (including tiered pricing, cache discounts)
- Spend is tracked per key — no metadata injection or header forwarding needed
- Works for Claude Code, Codex, Gemini CLI, and any future ACP server

### Implementation

**New utility module** (`benchmarks/utils/litellm_proxy.py`):
- `create_virtual_key(instance_id, run_id) -> str`
- `get_key_spend(key) -> float`
- `delete_key(key)`
- Uses `LLM_BASE_URL` (existing) and `LITELLM_MASTER_KEY` (new secret)

**Benchmarks harness changes** (per-benchmark `run_infer.py`):
- Before instance: create virtual key
- Pass virtual key to agent instead of shared `LLM_API_KEY`
- After instance: query spend, store in output, delete key

**ACP env forwarding** (`benchmarks/utils/acp.py`):
- Override `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` with the virtual key

**New infrastructure**:
- `LITELLM_MASTER_KEY` secret in eval K8s jobs
- LiteLLM proxy must have spend tracking enabled (database backend)

### Additional Context

- Gemini CLI doesn't report costs via ACP protocol: google-gemini/gemini-cli#24280
- Token-based estimation implemented in `ACPAgent._record_usage()` as a stopgap
- LiteLLM virtual key docs: https://docs.litellm.ai/docs/proxy/virtual_keys
- LiteLLM spend tracking docs: https://docs.litellm.ai/docs/proxy/cost_tracking

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Query LiteLLM proxy for per-instance ACP costs #592

Problem

Proposed Solution: Virtual key per instance

Implementation

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Query LiteLLM proxy for per-instance ACP costs #592

Description

Problem

Proposed Solution: Virtual key per instance

Implementation

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions