You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is currently no supported way to observe per-LLM-call usage or cost from outside the core. The only place spend is materialized is the session store, and the session store only records a sub-session when it completes successfully.
When a sub-session (delegated agent / skill / background agent) fails, its turns are dropped from persistence:
pkg/runtime/agent_delegation.gorunForwarding returns early on the first ErrorEvent (~line 289), so parent.AddSubSession(s) and SubSessionCompletedEvent are never reached.
runCollecting has the same shape (returns on errMsg != "" before AddSubSession).
The persistence observer only writes a sub-session in response to SubSessionCompletedEvent, so a failed sub-session's turns never reach the DB.
This is reasonable for the session DB (a session is resumable conversation state, and a failed sub-session isn't part of it). But the API calls were already billed. The cost the session reports can be far below the actual provider invoice.
Concretely: in a real run against Vertex AI with several failing sub-sessions, the actual invoice was ~9× the cost shown by the session. Each failed sub-session made multiple billed model calls that never appeared in any total. This makes spend impossible to reconcile and is a real safety concern when running against a metered provider.
The root cause is that cost recording is coupled to session lifecycle (SubSessionCompletedEvent). The cleanest place to expose spend is the layer where it actually happens — once per LLM call. executeAfterLLMCallHooks already fires inside the shared turn loop (pkg/runtime/loop.go:572) on every successful model call, for sub-sessions too, and before the sub-session failure handling in agent_delegation.go. A per-turn payload with usage and cost therefore captures failed sub-session spend automatically, with no coupling to session completion.
Use Cases
Spend reconciliation / cost ledger — a sidecar (agent YAML + a shell / Node / Python script) appends every billed turn to its own store, so the total matches the provider invoice even when sub-sessions fail.
Budget guard — a handler that warns or stops a run once cumulative cost crosses a threshold.
Alerting / telemetry — forward per-turn usage and cost to an external monitoring system, independent of session persistence.
Proposed Solution
Widen the after_llm_call payload at executeAfterLLMCallHooks to include usage and cost. The data already exists locally at the call site in the turn loop; this is plumbing, not new computation. With this primitive, a cost ledger lives entirely outside the core — no new core subsystem and no new storage owned by the core.
Because this widens a public, irreversible payload contract, the schema shape should be settled first. Open questions:
Unpriced vs free. A flat cost: 0 can't distinguish a free call from a model with no pricing. Prefer a structural signal (e.g. nullable cost = unpriced) over a documented "check usage != nil" convention.
Token field naming. Nesting under usage avoids collision with the existing compaction-related token fields — is this the preferred convention?
Coverage. Main runtime and harness paths are straightforward; compaction sub-runtimes and the chatserver / a2a / acp paths likely need follow-up. Should the initial change scope to the main loop?
I have a working implementation of the payload widening plus an example sidecar ledger, and am happy to open a PR once the schema is agreed.
Alternatives
Internal Observer + built-in cost ledger (a new pkg/costlog package with its own SQLite store). This works and needs no public API change, but it makes the core own a new subsystem, storage format, and config surface, and is Go-only. The hook approach keeps the core to a primitive and lets the ledger be any language. (I have prototyped both.)
Persist failed sub-sessions into the session DB. This conflicts with the intended meaning of the session store as resumable state, so it's a poorer fit.
Line references are against main at the time of writing and may drift. The proposal also reconciles the existing doc that says after_llm_call populates the model id, which today it does not.(edit: that was fixed in #2911; model_id is already populated. The remaining scope of this issue is usage + cost.)
Overview
Add per-turn token usage and cost to the
after_llm_callhook payload. Today the payload carries only:This proposal adds, additively:
usage— the per-callchat.Usage(input / cached input / cache write / output tokens), nested to avoid colliding with existing flat token fieldscost— the per-call cost in USD(edit: already landed in fix(runtime): populate ModelID in after_llm_call hook payload #2911 — remaining scope ismodel_id— the model actually used for this callusage+costonly)Motivation
There is currently no supported way to observe per-LLM-call usage or cost from outside the core. The only place spend is materialized is the session store, and the session store only records a sub-session when it completes successfully.
When a sub-session (delegated agent / skill / background agent) fails, its turns are dropped from persistence:
pkg/runtime/agent_delegation.gorunForwardingreturns early on the firstErrorEvent(~line 289), soparent.AddSubSession(s)andSubSessionCompletedEventare never reached.runCollectinghas the same shape (returns onerrMsg != ""beforeAddSubSession).SubSessionCompletedEvent, so a failed sub-session's turns never reach the DB.This is reasonable for the session DB (a session is resumable conversation state, and a failed sub-session isn't part of it). But the API calls were already billed. The cost the session reports can be far below the actual provider invoice.
Concretely: in a real run against Vertex AI with several failing sub-sessions, the actual invoice was ~9× the cost shown by the session. Each failed sub-session made multiple billed model calls that never appeared in any total. This makes spend impossible to reconcile and is a real safety concern when running against a metered provider.
The root cause is that cost recording is coupled to session lifecycle (
SubSessionCompletedEvent). The cleanest place to expose spend is the layer where it actually happens — once per LLM call.executeAfterLLMCallHooksalready fires inside the shared turn loop (pkg/runtime/loop.go:572) on every successful model call, for sub-sessions too, and before the sub-session failure handling inagent_delegation.go. A per-turn payload with usage and cost therefore captures failed sub-session spend automatically, with no coupling to session completion.Use Cases
Proposed Solution
Widen the
after_llm_callpayload atexecuteAfterLLMCallHooksto includeusageandcost. The data already exists locally at the call site in the turn loop; this is plumbing, not new computation. With this primitive, a cost ledger lives entirely outside the core — no new core subsystem and no new storage owned by the core.Because this widens a public, irreversible payload contract, the schema shape should be settled first. Open questions:
cost: 0can't distinguish a free call from a model with no pricing. Prefer a structural signal (e.g. nullable cost = unpriced) over a documented "checkusage != nil" convention.usageavoids collision with the existing compaction-related token fields — is this the preferred convention?I have a working implementation of the payload widening plus an example sidecar ledger, and am happy to open a PR once the schema is agreed.
Alternatives
pkg/costlogpackage with its own SQLite store). This works and needs no public API change, but it makes the core own a new subsystem, storage format, and config surface, and is Go-only. The hook approach keeps the core to a primitive and lets the ledger be any language. (I have prototyped both.)Related Issues
Additional Context
Line references are against
mainat the time of writing and may drift.The proposal also reconciles the existing doc that says(edit: that was fixed in #2911;after_llm_callpopulates the model id, which today it does not.model_idis already populated. The remaining scope of this issue isusage+cost.)