Skip to content

feat(llm-gateway): emit $ai_input_cost_usd and $ai_output_cost_usd#60660

Merged
VojtechBartos merged 5 commits into
masterfrom
posthog-code/llm-gateway-emit-input-output-cost
May 29, 2026
Merged

feat(llm-gateway): emit $ai_input_cost_usd and $ai_output_cost_usd#60660
VojtechBartos merged 5 commits into
masterfrom
posthog-code/llm-gateway-emit-input-output-cost

Conversation

@VojtechBartos
Copy link
Copy Markdown
Member

@VojtechBartos VojtechBartos commented May 29, 2026

Problem

The llm-gateway captures $ai_generation events with only $ai_total_cost_usd set (sourced from LiteLLM's response_cost). Ingestion's cost calculator in nodejs/src/ingestion/ai/costs/index.ts::processCost short-circuits to the passthrough path only when both $ai_input_cost_usd and $ai_output_cost_usd are present on the event. Since the gateway sets neither, ingestion falls into the model-lookup path and rederives the per-side costs from its bundled OpenRouter catalog.

That recomputation misprices cache-read tokens versus LiteLLM's cost_breakdown — observed in production as $ai_input_cost_usd + $ai_output_cost_usd running ~4x higher than $ai_total_cost_usd on cache-heavy Anthropic traffic. Billing already reads $ai_total_cost_usd, so the billed amount is correct, but per-side cost breakdowns shown to customers (and to ourselves in usage exploration) are inflated.

Changes

  • Read cost_breakdown from LiteLLM's standard_logging_object in PostHogCallback._on_success.
  • Emit $ai_input_cost_usd as the sum of input_cost + cache_read_cost + cache_creation_cost so the property keeps PostHog's gross-input semantics (matches what the ingestion calculator would have produced from raw tokens).
  • Emit $ai_output_cost_usd from cost_breakdown.output_cost.
  • Properties are only set when LiteLLM populates them — no zero-pollution for providers that don't report a breakdown.

Once both properties are present, processCost flips to passthrough ($ai_cost_model_source = passthrough) and preserves the gateway-supplied numbers verbatim. No ingestion-side change required.

How did you test this code?

Agent-authored; I am Claude (Opus 4.7).

Automated tests (uv run pytest tests/callbacks/test_posthog.py):

  • test_on_success_emits_cost_breakdown_components_separately — full cost_breakdown maps 1:1 to per-side and per-cache PostHog properties; sum reconciles to $ai_total_cost_usd.
  • test_on_success_omits_cache_costs_when_breakdown_lacks_them — breakdowns without cache components don't emit zeroed cache properties.
  • test_on_success_omits_cost_breakdown_when_litellm_omits_it — providers that don't populate cost_breakdown continue to emit only $ai_total_cost_usd.
  • Full callback suite (63 tests) passes.

Manual verification against a local llm-gateway built from this branch — single Anthropic claude-haiku-4-5 request via /v1/chat/completions. The captured $ai_generation event carried $ai_input_cost_usd = 1.6e-05, $ai_output_cost_usd = 2.5e-05, $ai_total_cost_usd = 4.1e-05 — per-side numbers reconcile to the total. Cache cost properties were correctly omitted on this run since LiteLLM's cost_breakdown didn't include cache components.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Docs update

No docs change; this is an internal data-correctness fix.

🤖 Agent context

Authored by PostHog Code in a Slack-triggered investigation. Josh observed in production that $ai_input_cost_usd + $ai_output_cost_usd ran ~4x higher than $ai_total_cost_usd on cache-heavy traffic. Root cause traced through three layers: (1) the production llm-gateway only emits $ai_total_cost_usd, (2) the ingestion-side processCost passthrough short-circuit requires both per-side properties to be present, and (3) the recomputation path doesn't match LiteLLM's cache-aware cost split. Fix applied at the gateway — the smallest change that lets the existing passthrough path do its job.

Considered but rejected: adding an ingestion-side skip flag (e.g. $ai_cache_reporting_exclusive-style). That property already exists with a different meaning and would conflate two unrelated concerns. The passthrough mechanism is already there, the gateway just wasn't feeding it.


Created with PostHog Code

The llm-gateway only emits $ai_total_cost_usd from LiteLLM's
response_cost. Ingestion's cost calculator
(nodejs/src/ingestion/ai/costs/index.ts::processCost) then
recomputes $ai_input_cost_usd and $ai_output_cost_usd from its
bundled model catalog because the passthrough short-circuit
requires both per-side cost properties to be present. The
recomputation misprices cache-read tokens versus LiteLLM's
cost_breakdown, inflating $ai_input_cost_usd + $ai_output_cost_usd
well above $ai_total_cost_usd on cache-heavy traffic.

Emit $ai_input_cost_usd and $ai_output_cost_usd from LiteLLM's
cost_breakdown so the per-side and total properties stay in
agreement. Fold cache read/creation components into the input
total to match the PostHog property semantics. Once both
properties are present, processCost flips to passthrough and
preserves the gateway-supplied numbers.

Billing already reads $ai_total_cost_usd, so this changes the
user-visible per-side breakdown without altering billed amounts.

Generated-By: PostHog Code
Task-Id: ec6de343-7e98-492f-87af-ea2d88c14ff7
Initial revision summed cache_read_cost and cache_creation_cost into
$ai_input_cost_usd to match the ingestion calculator's gross-input
semantic. That conflicts with the convention the ai-gateway emitter
established (each component emitted as its own disjoint property:
$ai_input_cost_usd, $ai_output_cost_usd, $ai_cache_read_cost_usd,
$ai_cache_creation_cost_usd, summing to $ai_total_cost_usd) and with
LiteLLM's own semantic where input_cost is non-cached input only.

Emit each cost_breakdown component to its own PostHog property
so the data is internally consistent (sum of components reconciles
to total) and matches what the ai-gateway already emits. The
ingestion passthrough short-circuit only requires the input and
output cost to be present, so cache components remain optional
bonus context.

Generated-By: PostHog Code
Task-Id: ec6de343-7e98-492f-87af-ea2d88c14ff7
@VojtechBartos VojtechBartos self-assigned this May 29, 2026
@VojtechBartos VojtechBartos requested a review from a team May 29, 2026 12:00
@VojtechBartos VojtechBartos marked this pull request as ready for review May 29, 2026 12:00
@assign-reviewers-posthog assign-reviewers-posthog Bot requested a review from a team May 29, 2026 12:01
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Comments Outside Diff (1)

  1. services/llm-gateway/tests/callbacks/test_posthog.py, line 312-408 (link)

    P2 test_on_success_emits_cost_breakdown_components_separately and test_on_success_omits_cache_costs_when_breakdown_lacks_them exercise the same code path with different cost_breakdown shapes (all four components vs. only input/output). The project prefers parametrised tests; these two cases would collapse cleanly into a single @pytest.mark.parametrize covering each input dict and its expected assertions.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: services/llm-gateway/tests/callbacks/test_posthog.py
    Line: 312-408
    
    Comment:
    `test_on_success_emits_cost_breakdown_components_separately` and `test_on_success_omits_cache_costs_when_breakdown_lacks_them` exercise the same code path with different `cost_breakdown` shapes (all four components vs. only input/output). The project prefers parametrised tests; these two cases would collapse cleanly into a single `@pytest.mark.parametrize` covering each input dict and its expected assertions.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
services/llm-gateway/src/llm_gateway/callbacks/posthog.py:236-245
The loop maps `input_cost` 1-to-1 to `$ai_input_cost_usd`, but the PostHog analytics schema only extracts three cost columns — `$ai_input_cost_usd`, `$ai_output_cost_usd`, and `$ai_total_cost_usd` (see `posthog/models/ai_events/sql.py` lines 267-269). The new `$ai_cache_read_cost_usd` and `$ai_cache_creation_cost_usd` properties have no schema support anywhere in the codebase, so cache costs become invisible in analytics. This means `input_cost_usd + output_cost_usd < total_cost_usd` for cache-heavy Anthropic traffic (0.003 + 0.006 = 0.009 ≠ 0.012 in the test fixture). The PR description explicitly stated that cache components should be folded into `$ai_input_cost_usd` to preserve gross-input semantics and maintain that invariant.

```suggestion
        cost_breakdown = standard_logging_object.get("cost_breakdown") or {}
        if cost_breakdown:
            input_cost_components = [
                cost_breakdown.get(k)
                for k in ("input_cost", "cache_read_cost", "cache_creation_cost")
            ]
            input_cost_values = [v for v in input_cost_components if v is not None]
            if input_cost_values:
                properties["$ai_input_cost_usd"] = sum(input_cost_values)
            output_cost = cost_breakdown.get("output_cost")
            if output_cost is not None:
                properties["$ai_output_cost_usd"] = output_cost
```

### Issue 2 of 2
services/llm-gateway/tests/callbacks/test_posthog.py:312-408
`test_on_success_emits_cost_breakdown_components_separately` and `test_on_success_omits_cache_costs_when_breakdown_lacks_them` exercise the same code path with different `cost_breakdown` shapes (all four components vs. only input/output). The project prefers parametrised tests; these two cases would collapse cleanly into a single `@pytest.mark.parametrize` covering each input dict and its expected assertions.

Reviews (1): Last reviewed commit: "refactor(llm-gateway): split cache costs..." | Re-trigger Greptile

Comment thread services/llm-gateway/src/llm_gateway/callbacks/posthog.py Outdated
mypy narrowed the outer `value` name from the posthog_properties
loop, so reusing it for cost_breakdown values failed with an
assignment incompatibility. Rename to `cost_value`.

Also collapse the rationale comment down to one short line — the
reasoning lives in the PR description.

Generated-By: PostHog Code
Task-Id: ec6de343-7e98-492f-87af-ea2d88c14ff7
Greptile (P1) caught that only $ai_input_cost_usd /
$ai_output_cost_usd / $ai_total_cost_usd are materialized as
columns in posthog/models/ai_events/sql.py. The previously
emitted $ai_cache_read_cost_usd and $ai_cache_creation_cost_usd
would have been invisible to analytics, breaking the
input + output ≈ total invariant on cache-heavy traffic.

Sum the cache_read_cost and cache_creation_cost components from
LiteLLM's cost_breakdown into $ai_input_cost_usd so the gross
input cost is captured in the materialized column. Output stays
straight from cost_breakdown.output_cost.

Also collapse the two cost-breakdown tests into a single
parametrized test per Greptile (P2).

Generated-By: PostHog Code
Task-Id: ec6de343-7e98-492f-87af-ea2d88c14ff7
Reverts the fold-cache-into-input change. Each cost_breakdown
component maps 1:1 to its own PostHog property:
$ai_input_cost_usd, $ai_output_cost_usd, $ai_cache_read_cost_usd,
$ai_cache_creation_cost_usd.

Parametrized test kept; assertions updated to check each component.

Generated-By: PostHog Code
Task-Id: ec6de343-7e98-492f-87af-ea2d88c14ff7
@VojtechBartos VojtechBartos enabled auto-merge (squash) May 29, 2026 12:34
@VojtechBartos VojtechBartos merged commit e9a438e into master May 29, 2026
143 checks passed
@VojtechBartos VojtechBartos deleted the posthog-code/llm-gateway-emit-input-output-cost branch May 29, 2026 12:37
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented May 29, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-05-29 13:10 UTC Run
prod-us ✅ Deployed 2026-05-29 13:35 UTC Run
prod-eu ✅ Deployed 2026-05-29 13:39 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants