Skip to content

[Feature]: Query LiteLLM proxy for per-instance ACP costs #592

@simonrosenberg

Description

@simonrosenberg

Problem

ACP agents that don't report costs natively (e.g. Gemini CLI, Codex) currently rely on token-count-based cost estimation using LiteLLM's pricing database. This works but is an approximation — it doesn't account for tiered pricing, cached token discounts, or pricing changes.

Since all ACP agents route through the LiteLLM proxy (LLM_BASE_URL), the proxy already tracks actual per-request costs in its spend logs. We should query these instead of estimating.

Proposed Solution: Virtual key per instance

Use LiteLLM virtual keys — one per eval instance. The proxy tracks spend per key automatically, so we get exact costs without any ACP protocol or server changes.

How it works:

  1. Before each instance, create a virtual key via the LiteLLM admin API:

    resp = httpx.post(f"{LLM_BASE_URL}/key/generate", headers={"Authorization": f"Bearer {LITELLM_MASTER_KEY}"}, json={
        "metadata": {"instance_id": "django__django-12155", "run_id": "23764348286"},
        "max_budget": 50.0,  # safety limit per instance
    })
    virtual_key = resp.json()["key"]
  2. Pass the virtual key to the ACP agent as ANTHROPIC_API_KEY / OPENAI_API_KEY (depending on agent type). No ACP server changes needed — they already read these env vars.

  3. After instance completes, query actual cost:

    resp = httpx.get(f"{LLM_BASE_URL}/key/info", params={"key": virtual_key}, headers={"Authorization": f"Bearer {LITELLM_MASTER_KEY}"})
    actual_cost = resp.json()["info"]["spend"]  # exact USD from proxy
  4. Store the cost in the instance output and delete the virtual key:

    httpx.post(f"{LLM_BASE_URL}/key/delete", headers={"Authorization": f"Bearer {LITELLM_MASTER_KEY}"}, json={"keys": [virtual_key]})

Why this works for all agents:

  • Every API call goes through the LiteLLM proxy regardless of agent type
  • The proxy calculates exact per-request cost (including tiered pricing, cache discounts)
  • Spend is tracked per key — no metadata injection or header forwarding needed
  • Works for Claude Code, Codex, Gemini CLI, and any future ACP server

Implementation

New utility module (benchmarks/utils/litellm_proxy.py):

  • create_virtual_key(instance_id, run_id) -> str
  • get_key_spend(key) -> float
  • delete_key(key)
  • Uses LLM_BASE_URL (existing) and LITELLM_MASTER_KEY (new secret)

Benchmarks harness changes (per-benchmark run_infer.py):

  • Before instance: create virtual key
  • Pass virtual key to agent instead of shared LLM_API_KEY
  • After instance: query spend, store in output, delete key

ACP env forwarding (benchmarks/utils/acp.py):

  • Override ANTHROPIC_API_KEY / OPENAI_API_KEY with the virtual key

New infrastructure:

  • LITELLM_MASTER_KEY secret in eval K8s jobs
  • LiteLLM proxy must have spend tracking enabled (database backend)

Additional Context

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions