feat(llm-gateway): emit cache & reasoning token breakdown on $ai_generation events by VojtechBartos · Pull Request #60403 · PostHog/posthog

VojtechBartos · 2026-05-28T09:57:54Z

Problem

LiteLLM hands the gateway per-call usage details in standard_logging_object.metadata.usage_object, including cache_read_input_tokens, cache_creation_input_tokens (Anthropic prompt caching) and completion_tokens_details.reasoning_tokens (OpenAI o-series and similar). The Prometheus callback already consumes all three (callbacks/prometheus.py:42-46), and posthoganalytics' langchain CallbackHandler emits them on every $ai_generation it captures (ai/langchain/callbacks.py:610-614) — but the PostHog callback in the gateway was dropping them.

Result: gateway-routed traces in LLM Analytics showed $ai_input_tokens / $ai_output_tokens but no cache or reasoning breakdown, and posthog/tasks/llm_analytics_usage_report.py:375-377 (which sums these fields) was silently aggregating zero for any gateway traffic. The dollar amount ($ai_total_cost_usd) is unaffected — LiteLLM bakes pricing in — but the token accounting underneath it was not reconstructable from the events.

Changes

In services/llm-gateway/src/llm_gateway/callbacks/posthog.py, pull usage_object from the same path Prometheus uses and conditionally emit on $ai_generation:

$ai_cache_read_input_tokens
$ai_cache_creation_input_tokens
$ai_reasoning_tokens

Conditional emission matches the Nullable(Int64) schema in posthog/models/ai_events/sql.py:262-263 and skips providers that don't surface these keys at all. _on_failure is unchanged (no usage data on failed calls).

Parity after this PR:

Field	PostHog AI	Gateway (before)	Gateway (after)
`$ai_input_tokens` / `$ai_output_tokens`	yes	yes	yes
`$ai_total_cost_usd`	yes	yes	yes
`$ai_cache_read_input_tokens` / `$ai_cache_creation_input_tokens`	yes (cross-provider normalized)	no	yes (Anthropic; OpenAI's `prompt_tokens_details.cached_tokens` still not handled)
`$ai_reasoning_tokens`	yes	no	yes

How did you test this code?

I'm an agent. Verification:

Unit: uv run pytest tests/callbacks/test_posthog.py — 56 passed, four new tests covering present/absent behavior for each field. Full gateway suite: 930 passed.
End-to-end on local: ran the gateway from this worktree, fired two claude-sonnet-4-5 calls 12k chars apart with cache_control: ephemeral on the system block. Captured events showed $ai_cache_creation_input_tokens: 8741 on call 1 and $ai_cache_read_input_tokens: 8741 on call 2, with cost dropping ~12× between the two (matching Anthropic's cache discount). $ai_reasoning_tokens shipped on every event (0 for Claude, as expected). No reasoning-model end-to-end run yet — the code path is identical to cache, just a different usage_object key.

Publish to changelog?

no

🤖 Agent context

Authored by Claude Code (Opus 4.7, 1M context) at Vojta's direction. Notes for the reviewer:

Cost-breakdown components ($ai_input_cost_usd, $ai_output_cost_usd, $ai_saved_cache_cost_usd) deliberately left out — posthoganalytics doesn't emit them either, so adding them to the gateway would be a one-sided source, not a parity fix.
OpenAI prompt_tokens_details.cached_tokens path also left out — that's a separate normalization the gateway's Prometheus callback doesn't handle either. Worth a follow-up if OpenAI traffic from Slack becomes large enough to care about its cache visibility.
$ai_span_name / $ai_parent_id are langchain-specific span concepts that don't map onto the gateway's flat call model. Callers can pass either via x-posthog-property-* headers if they need them, since the existing posthog_properties merge handles arbitrary fields.

LiteLLM hands the callback per-call usage details in standard_logging_object.metadata.usage_object, including cache_read_input_tokens and cache_creation_input_tokens for providers that support prompt caching (Anthropic). The Prometheus callback already consumes both for its counters; the PostHog callback was dropping them on the floor, so LLM Analytics and the AI usage report saw input/output tokens but no cache breakdown. Pull the same usage_object and emit the two fields as $ai_cache_read_input_tokens / $ai_cache_creation_input_tokens when LiteLLM provides them, matching the Nullable(Int64) schema in posthog/models/ai_events/sql.py. Conditional emission keeps non-caching providers from polluting events with zeros.

Extend the cache-token parity fix to also emit $ai_reasoning_tokens when LiteLLM reports it in standard_logging_object.metadata.usage_object.completion_tokens_details.reasoning_tokens. This matches the field posthoganalytics' langchain CallbackHandler already emits (ai/langchain/callbacks.py:614), so PostHog AI traces and gateway-routed traces are now indistinguishable on the token-accounting fields LLM Analytics queries.

greptile-apps · 2026-05-28T10:38:21Z

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
services/llm-gateway/tests/callbacks/test_posthog.py:305-360
**Non-parameterized duplicate "absent" tests**

The two omission tests (`test_on_success_omits_cache_tokens_when_absent` and `test_on_success_omits_reasoning_tokens_when_absent`) share identical fixtures, patches, and structure — only the asserted absent keys differ. The project prefers parameterized tests: these could be a single `@pytest.mark.parametrize("absent_key", ["$ai_cache_read_input_tokens", "$ai_cache_creation_input_tokens", "$ai_reasoning_tokens"])` test, or even one non-parameterized test that asserts all three absent keys at once since they all share the same precondition (no `metadata` on the logging object).

_{Reviews (1): Last reviewed commit: "feat(llm-gateway): emit $ai_reasoning_to..." | Re-trigger Greptile}

greptile-apps · 2026-05-28T10:38:25Z

+        props = mock_client.capture.call_args.kwargs["properties"]
+        assert "$ai_cache_read_input_tokens" not in props
+        assert "$ai_cache_creation_input_tokens" not in props
+
+    @pytest.mark.asyncio
+    async def test_on_success_emits_reasoning_tokens_when_present(
+        self, callback: PostHogCallback, auth_user: AuthenticatedUser, mock_posthog_client: tuple
+    ) -> None:
+        _, mock_client = mock_posthog_client
+        kwargs = {
+            "standard_logging_object": {
+                "model": "gpt-5.2",
+                "custom_llm_provider": "openai",
+                "prompt_tokens": 50,
+                "completion_tokens": 200,
+                "metadata": {
+                    "usage_object": {
+                        "completion_tokens_details": {"reasoning_tokens": 120},
+                    },
+                },
+            },
+            "litellm_params": {},
+        }
+
+        with (
+            patch("llm_gateway.callbacks.posthog.get_auth_user", return_value=auth_user),
+            patch("llm_gateway.callbacks.posthog.get_product", return_value="slack_app_routing"),
+        ):
+            await callback._on_success(kwargs, None, 0.0, 1.0, end_user_id=None)
+
+        props = mock_client.capture.call_args.kwargs["properties"]
+        assert props["$ai_reasoning_tokens"] == 120
+
+    @pytest.mark.asyncio
+    async def test_on_success_omits_reasoning_tokens_when_absent(
+        self,
+        callback: PostHogCallback,
+        auth_user: AuthenticatedUser,
+        standard_logging_object: dict,
+        mock_posthog_client: tuple,
+    ) -> None:
+        _, mock_client = mock_posthog_client
+        kwargs = {"standard_logging_object": standard_logging_object, "litellm_params": {}}
+
+        with (
+            patch("llm_gateway.callbacks.posthog.get_auth_user", return_value=auth_user),
+            patch("llm_gateway.callbacks.posthog.get_product", return_value="slack_app"),
+        ):
+            await callback._on_success(kwargs, None, 0.0, 1.0, end_user_id=None)
+
+        props = mock_client.capture.call_args.kwargs["properties"]
+        assert "$ai_reasoning_tokens" not in props
+
    @pytest.mark.asyncio
    @pytest.mark.parametrize("product", ["wizard", "posthog_code", "llm_gateway"])
    async def test_on_success_includes_ai_product(


Non-parameterized duplicate "absent" tests

The two omission tests (test_on_success_omits_cache_tokens_when_absent and test_on_success_omits_reasoning_tokens_when_absent) share identical fixtures, patches, and structure — only the asserted absent keys differ. The project prefers parameterized tests: these could be a single @pytest.mark.parametrize("absent_key", ["$ai_cache_read_input_tokens", "$ai_cache_creation_input_tokens", "$ai_reasoning_tokens"]) test, or even one non-parameterized test that asserts all three absent keys at once since they all share the same precondition (no metadata on the logging object).

Prompt To Fix With AI

This is a comment left during a code review. Path: services/llm-gateway/tests/callbacks/test_posthog.py Line: 305-360 Comment: **Non-parameterized duplicate "absent" tests** The two omission tests (`test_on_success_omits_cache_tokens_when_absent` and `test_on_success_omits_reasoning_tokens_when_absent`) share identical fixtures, patches, and structure — only the asserted absent keys differ. The project prefers parameterized tests: these could be a single `@pytest.mark.parametrize("absent_key", ["$ai_cache_read_input_tokens", "$ai_cache_creation_input_tokens", "$ai_reasoning_tokens"])` test, or even one non-parameterized test that asserts all three absent keys at once since they all share the same precondition (no `metadata` on the logging object). How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

joshsny · 2026-05-28T12:50:54Z

+    @pytest.mark.asyncio
+    async def test_on_success_emits_cache_tokens_when_present(
+        self, callback: PostHogCallback, auth_user: AuthenticatedUser, mock_posthog_client: tuple
+    ) -> None:


we need some kind of integration test here really, with a real llm call and getting the usage object back, this is useful as a smoke test that the function works but we're mostly testing a mock

as discussed on the call, it's post release item

joshsny · 2026-05-28T12:51:53Z

+        # present so providers that don't report them don't pollute events with
+        # zeros, matching the schema in posthog/models/ai_events/sql.py and the
+        # parity established by posthoganalytics' langchain CallbackHandler.
+        cache_read_input_tokens = usage_object.get("cache_read_input_tokens")


we're very sensitive to this being wrong / incorrectly calculated, how do we protect against that, should we alert on this field missing or provide some kind of indication of it not working?

as discussed on the call, it's post release item

deployment-status-posthog · 2026-05-28T16:35:05Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-05-28 16:34 UTC	Run
prod-us	✅ Deployed	2026-05-28 17:00 UTC	Run
prod-eu	✅ Deployed	2026-05-28 17:02 UTC	Run

VojtechBartos added 2 commits May 28, 2026 11:57

VojtechBartos changed the title ~~feat(llm-gateway): emit cache token breakdown on $ai_generation events~~ feat(llm-gateway): emit cache & reasoning token breakdown on $ai_generation events May 28, 2026

VojtechBartos self-assigned this May 28, 2026

VojtechBartos requested a review from a team May 28, 2026 10:35

VojtechBartos marked this pull request as ready for review May 28, 2026 10:36

assign-reviewers-posthog Bot requested a review from a team May 28, 2026 10:36

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

richardsolomou approved these changes May 28, 2026

View reviewed changes

joshsny approved these changes May 28, 2026

View reviewed changes

VojtechBartos merged commit 5a2f0dd into master May 28, 2026
173 checks passed

VojtechBartos deleted the vojtab/gateway-cache-tokens branch May 28, 2026 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm-gateway): emit cache & reasoning token breakdown on $ai_generation events#60403

feat(llm-gateway): emit cache & reasoning token breakdown on $ai_generation events#60403
VojtechBartos merged 2 commits into
masterfrom
vojtab/gateway-cache-tokens

VojtechBartos commented May 28, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 28, 2026

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

joshsny May 28, 2026

Uh oh!

VojtechBartos May 28, 2026

Uh oh!

joshsny May 28, 2026

Uh oh!

VojtechBartos May 28, 2026

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

VojtechBartos commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Publish to changelog?

🤖 Agent context

Uh oh!

greptile-apps Bot commented May 28, 2026

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

joshsny May 28, 2026

Choose a reason for hiding this comment

Uh oh!

VojtechBartos May 28, 2026

Choose a reason for hiding this comment

Uh oh!

joshsny May 28, 2026

Choose a reason for hiding this comment

Uh oh!

VojtechBartos May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

VojtechBartos commented May 28, 2026 •

edited

Loading

deployment-status-posthog Bot commented May 28, 2026 •

edited

Loading