feat(llm-gateway): emit cache & reasoning token breakdown on $ai_generation events#60403
Conversation
LiteLLM hands the callback per-call usage details in standard_logging_object.metadata.usage_object, including cache_read_input_tokens and cache_creation_input_tokens for providers that support prompt caching (Anthropic). The Prometheus callback already consumes both for its counters; the PostHog callback was dropping them on the floor, so LLM Analytics and the AI usage report saw input/output tokens but no cache breakdown. Pull the same usage_object and emit the two fields as $ai_cache_read_input_tokens / $ai_cache_creation_input_tokens when LiteLLM provides them, matching the Nullable(Int64) schema in posthog/models/ai_events/sql.py. Conditional emission keeps non-caching providers from polluting events with zeros.
Extend the cache-token parity fix to also emit $ai_reasoning_tokens when LiteLLM reports it in standard_logging_object.metadata.usage_object.completion_tokens_details.reasoning_tokens. This matches the field posthoganalytics' langchain CallbackHandler already emits (ai/langchain/callbacks.py:614), so PostHog AI traces and gateway-routed traces are now indistinguishable on the token-accounting fields LLM Analytics queries.
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
services/llm-gateway/tests/callbacks/test_posthog.py:305-360
**Non-parameterized duplicate "absent" tests**
The two omission tests (`test_on_success_omits_cache_tokens_when_absent` and `test_on_success_omits_reasoning_tokens_when_absent`) share identical fixtures, patches, and structure — only the asserted absent keys differ. The project prefers parameterized tests: these could be a single `@pytest.mark.parametrize("absent_key", ["$ai_cache_read_input_tokens", "$ai_cache_creation_input_tokens", "$ai_reasoning_tokens"])` test, or even one non-parameterized test that asserts all three absent keys at once since they all share the same precondition (no `metadata` on the logging object).
Reviews (1): Last reviewed commit: "feat(llm-gateway): emit $ai_reasoning_to..." | Re-trigger Greptile |
| props = mock_client.capture.call_args.kwargs["properties"] | ||
| assert "$ai_cache_read_input_tokens" not in props | ||
| assert "$ai_cache_creation_input_tokens" not in props | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_on_success_emits_reasoning_tokens_when_present( | ||
| self, callback: PostHogCallback, auth_user: AuthenticatedUser, mock_posthog_client: tuple | ||
| ) -> None: | ||
| _, mock_client = mock_posthog_client | ||
| kwargs = { | ||
| "standard_logging_object": { | ||
| "model": "gpt-5.2", | ||
| "custom_llm_provider": "openai", | ||
| "prompt_tokens": 50, | ||
| "completion_tokens": 200, | ||
| "metadata": { | ||
| "usage_object": { | ||
| "completion_tokens_details": {"reasoning_tokens": 120}, | ||
| }, | ||
| }, | ||
| }, | ||
| "litellm_params": {}, | ||
| } | ||
|
|
||
| with ( | ||
| patch("llm_gateway.callbacks.posthog.get_auth_user", return_value=auth_user), | ||
| patch("llm_gateway.callbacks.posthog.get_product", return_value="slack_app_routing"), | ||
| ): | ||
| await callback._on_success(kwargs, None, 0.0, 1.0, end_user_id=None) | ||
|
|
||
| props = mock_client.capture.call_args.kwargs["properties"] | ||
| assert props["$ai_reasoning_tokens"] == 120 | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_on_success_omits_reasoning_tokens_when_absent( | ||
| self, | ||
| callback: PostHogCallback, | ||
| auth_user: AuthenticatedUser, | ||
| standard_logging_object: dict, | ||
| mock_posthog_client: tuple, | ||
| ) -> None: | ||
| _, mock_client = mock_posthog_client | ||
| kwargs = {"standard_logging_object": standard_logging_object, "litellm_params": {}} | ||
|
|
||
| with ( | ||
| patch("llm_gateway.callbacks.posthog.get_auth_user", return_value=auth_user), | ||
| patch("llm_gateway.callbacks.posthog.get_product", return_value="slack_app"), | ||
| ): | ||
| await callback._on_success(kwargs, None, 0.0, 1.0, end_user_id=None) | ||
|
|
||
| props = mock_client.capture.call_args.kwargs["properties"] | ||
| assert "$ai_reasoning_tokens" not in props | ||
|
|
||
| @pytest.mark.asyncio | ||
| @pytest.mark.parametrize("product", ["wizard", "posthog_code", "llm_gateway"]) | ||
| async def test_on_success_includes_ai_product( |
There was a problem hiding this comment.
Non-parameterized duplicate "absent" tests
The two omission tests (test_on_success_omits_cache_tokens_when_absent and test_on_success_omits_reasoning_tokens_when_absent) share identical fixtures, patches, and structure — only the asserted absent keys differ. The project prefers parameterized tests: these could be a single @pytest.mark.parametrize("absent_key", ["$ai_cache_read_input_tokens", "$ai_cache_creation_input_tokens", "$ai_reasoning_tokens"]) test, or even one non-parameterized test that asserts all three absent keys at once since they all share the same precondition (no metadata on the logging object).
Prompt To Fix With AI
This is a comment left during a code review.
Path: services/llm-gateway/tests/callbacks/test_posthog.py
Line: 305-360
Comment:
**Non-parameterized duplicate "absent" tests**
The two omission tests (`test_on_success_omits_cache_tokens_when_absent` and `test_on_success_omits_reasoning_tokens_when_absent`) share identical fixtures, patches, and structure — only the asserted absent keys differ. The project prefers parameterized tests: these could be a single `@pytest.mark.parametrize("absent_key", ["$ai_cache_read_input_tokens", "$ai_cache_creation_input_tokens", "$ai_reasoning_tokens"])` test, or even one non-parameterized test that asserts all three absent keys at once since they all share the same precondition (no `metadata` on the logging object).
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| @pytest.mark.asyncio | ||
| async def test_on_success_emits_cache_tokens_when_present( | ||
| self, callback: PostHogCallback, auth_user: AuthenticatedUser, mock_posthog_client: tuple | ||
| ) -> None: |
There was a problem hiding this comment.
we need some kind of integration test here really, with a real llm call and getting the usage object back, this is useful as a smoke test that the function works but we're mostly testing a mock
There was a problem hiding this comment.
as discussed on the call, it's post release item
| # present so providers that don't report them don't pollute events with | ||
| # zeros, matching the schema in posthog/models/ai_events/sql.py and the | ||
| # parity established by posthoganalytics' langchain CallbackHandler. | ||
| cache_read_input_tokens = usage_object.get("cache_read_input_tokens") |
There was a problem hiding this comment.
we're very sensitive to this being wrong / incorrectly calculated, how do we protect against that, should we alert on this field missing or provide some kind of indication of it not working?
There was a problem hiding this comment.
as discussed on the call, it's post release item
Problem
LiteLLM hands the gateway per-call usage details in
standard_logging_object.metadata.usage_object, includingcache_read_input_tokens,cache_creation_input_tokens(Anthropic prompt caching) andcompletion_tokens_details.reasoning_tokens(OpenAI o-series and similar). The Prometheus callback already consumes all three (callbacks/prometheus.py:42-46), andposthoganalytics' langchainCallbackHandleremits them on every$ai_generationit captures (ai/langchain/callbacks.py:610-614) — but the PostHog callback in the gateway was dropping them.Result: gateway-routed traces in LLM Analytics showed
$ai_input_tokens/$ai_output_tokensbut no cache or reasoning breakdown, andposthog/tasks/llm_analytics_usage_report.py:375-377(which sums these fields) was silently aggregating zero for any gateway traffic. The dollar amount ($ai_total_cost_usd) is unaffected — LiteLLM bakes pricing in — but the token accounting underneath it was not reconstructable from the events.Changes
In
services/llm-gateway/src/llm_gateway/callbacks/posthog.py, pullusage_objectfrom the same path Prometheus uses and conditionally emit on$ai_generation:$ai_cache_read_input_tokens$ai_cache_creation_input_tokens$ai_reasoning_tokensConditional emission matches the
Nullable(Int64)schema inposthog/models/ai_events/sql.py:262-263and skips providers that don't surface these keys at all._on_failureis unchanged (no usage data on failed calls).Parity after this PR:
$ai_input_tokens/$ai_output_tokens$ai_total_cost_usd$ai_cache_read_input_tokens/$ai_cache_creation_input_tokensprompt_tokens_details.cached_tokensstill not handled)$ai_reasoning_tokensHow did you test this code?
I'm an agent. Verification:
uv run pytest tests/callbacks/test_posthog.py— 56 passed, four new tests covering present/absent behavior for each field. Full gateway suite: 930 passed.claude-sonnet-4-5calls 12k chars apart withcache_control: ephemeralon the system block. Captured events showed$ai_cache_creation_input_tokens: 8741on call 1 and$ai_cache_read_input_tokens: 8741on call 2, with cost dropping ~12× between the two (matching Anthropic's cache discount).$ai_reasoning_tokensshipped on every event (0 for Claude, as expected). No reasoning-model end-to-end run yet — the code path is identical to cache, just a differentusage_objectkey.Publish to changelog?
no
🤖 Agent context
Authored by Claude Code (Opus 4.7, 1M context) at Vojta's direction. Notes for the reviewer:
$ai_input_cost_usd,$ai_output_cost_usd,$ai_saved_cache_cost_usd) deliberately left out —posthoganalyticsdoesn't emit them either, so adding them to the gateway would be a one-sided source, not a parity fix.prompt_tokens_details.cached_tokenspath also left out — that's a separate normalization the gateway's Prometheus callback doesn't handle either. Worth a follow-up if OpenAI traffic from Slack becomes large enough to care about its cache visibility.$ai_span_name/$ai_parent_idare langchain-specific span concepts that don't map onto the gateway's flat call model. Callers can pass either viax-posthog-property-*headers if they need them, since the existingposthog_propertiesmerge handles arbitrary fields.