fix(usage-report): trim oversize org payloads before capture by abhischekt · Pull Request #60462 · PostHog/posthog

abhischekt · 2026-05-28T15:57:15Z

Problem

The organization usage report event embeds a per-team breakdown under properties.teams. For very-large organizations — a few hundred projects each contributing ~3 KB of counters — the serialized payload exceeds Kafka's default message.max.bytes (~1 MiB). The broker rejects the event, and because posthog-python capture is fire-and-forget, the drop is silent: no organization usage report failure event fires, and the affected orgs simply don't appear in the dogfood project's usage data.

The largest org currently captured sits around 938 KB — right against the ceiling — and nothing past ~300 teams shows up at all. This was first surfaced by a stakeholder noticing one of their tracked orgs had no organization usage report events despite obvious activity in other event types.

Changes

Add MAX_USAGE_REPORT_PAYLOAD_BYTES = 900_000 with headroom for the Kafka envelope and the groups / $set / scope properties that capture_event layers on top.
Introduce _trim_oversize_usage_report_payload(...): returns the dict unchanged when small enough; otherwise returns a copy with teams={} and teams_omitted_due_to_size=True. Every org-level counter (including team_count) is preserved so existing insights and downstream consumers keep working.
capture_report passes the trimmed payload to the dogfood capture_event. The billing SQS path (_queue_report), group_identify, and per-person captures still see the full report — only the in-app analytics capture is trimmed.

The durable fix is to re-enable per-org capture from the v2 Temporal workflow (TODO at posthog/temporal/usage_report/activities.py:131-135), which chunks reports to S3 and avoids the Kafka per-message ceiling entirely. This PR is the narrow band-aid on the legacy Celery path until that cutover lands.

How did you test this code?

I'm an agent (PostHog Code). I did not perform any manual production verification. I added three automated tests in posthog/tasks/test/test_usage_report.py:

TestTrimOversizeUsageReportPayload.test_returns_dict_unchanged_when_under_limit — small payload returns the same object identity (no copy, no allocation).
TestTrimOversizeUsageReportPayload.test_drops_teams_and_sets_marker_when_over_limit — synthetic 600-team payload exceeds the threshold, the trimmed copy drops teams, sets the marker, keeps org-level counters, and serializes under the limit. Original dict is untouched.
TestCaptureReportTrimsOversizePayload.test_capture_report_drops_teams_when_payload_too_large — end-to-end through capture_report with a mocked PHA client, asserting the pha_client.capture call receives the trimmed properties.

All three pass. The pre-existing TestCaptureReportGroupProperties.test_capture_report_sets_org_group_properties still passes (group_identify continues to receive the full counts). Ran ruff check, ruff format, and ty check via the pre-commit hooks.

Publish to changelog?

no

Docs update

N/A — internal analytics capture path.

🤖 Agent context

Authored by PostHog Code in response to a CSM-reported anomaly: an internal insight filtering organization usage report on a specific organization_id was empty, but the org was clearly active in other event types. Investigation via the PostHog MCP confirmed the EU usage-report cron was running and emitting reports for ~140k other orgs in the same window — just not this one. No failure event existed for it either. A scan by team_count showed the largest captured org sat at 938 KB of payload (~294 teams). Nothing beyond that appeared at all — the pattern matched a per-message size limit, not a logic exclusion.

Decisions made along the way:

Scope of the fix. Considered (a) re-enabling per-org capture from the v2 Temporal flow, (b) emitting a separate organization usage report per team event, and (c) adding only a visibility-signal event. Picked the minimal trim because the legacy Celery path is still the only producer of organization usage report, the durable fix belongs with the v2 cutover, and bundling it here would expand blast radius unnecessarily.
What to drop. Dropped teams wholesale rather than truncating or sampling. team_count and every org-level counter are already in the top of the report; consumers that need per-team granularity can use the chunked JSONL the v2 workflow already writes to S3.
Where to apply. Only on the dogfood capture_event call — the billing SQS payload has separate chunking and a much larger size budget, so it stays unmodified.
Threshold choice. 900 KB leaves ~140 KB of headroom for the Kafka message envelope plus the groups, $set, scope, and instance_metadata keys that capture_event layers on. The largest currently-captured payload (938 KB) would not have been trimmed under this threshold, so steady-state behavior is unchanged for everyone already getting through.
Marker field. teams_omitted_due_to_size=True rather than e.g. removing the key silently, so downstream insight authors can detect the truncation and either backfill from the v2 chunks or skip the org.

Created with PostHog Code

Large orgs with several hundred projects produced `organization usage report` events above Kafka's ~1 MiB `message.max.bytes`, so the broker dropped them at ingestion. The `posthog-python` capture is fire-and-forget, so nothing surfaced — the affected orgs simply never appeared in the dogfood project. Trim the per-team breakdown when the serialized payload would exceed the limit, preserving every org-level counter and marking the trimmed event with `teams_omitted_due_to_size=True` for downstream consumers. The billing SQS path is untouched. Generated-By: PostHog Code Task-Id: 5a0684db-daba-47e7-86ae-54b4701f995f

greptile-apps · 2026-05-28T15:59:54Z

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
posthog/tasks/test/test_usage_report.py:1741
**`json.dumps` precondition checks missing `default=str`**

The precondition assertions at lines 1741 and 1777 call `json.dumps(oversize_report)` and `json.dumps(full_report_dict)` without `default=str`, while `_trim_oversize_usage_report_payload` uses `json.dumps(full_report_dict, default=str)`. In production, the real usage-report dict can contain datetime or other non-JSON-serializable values; without `default=str`, these assertions would raise `TypeError` when the test data happens to include them, making the precondition a worse model of the actual check than intended. Both precondition calls should match the implementation's `default=str`.

### Issue 2 of 2
posthog/tasks/test/test_usage_report.py:1714-1752
**Non-parameterised tests**

The two methods in `TestTrimOversizeUsageReportPayload` — one for the under-limit path and one for the over-limit path — are good candidates for a single `@pytest.mark.parametrize` or `subTest` pattern, in line with the project's preference for parameterised tests. The under-limit case only asserts identity, but that could be expressed as a boolean flag in the parameter set (e.g., `expect_same_object=True/False`) alongside the other assertions, keeping both scenarios visible in one place without duplicating the test body structure.

_{Reviews (1): Last reviewed commit: "fix(usage-report): trim oversize org pay..." | Re-trigger Greptile}

github-actions · 2026-05-28T16:13:01Z

🎭 Playwright report · View test results →

⚠️ 1 flaky test:

Inline editing insight title via compact card popover (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

ceyniustranberg · 2026-05-28T16:23:52Z

Dropped teams wholesale rather than truncating or sampling. team_count and every org-level counter are already in the top of the report;

Worth double checking if this breaks any destinations

I imagine that option b) of having a team usage report would produce lots of events, but curious to hear if you would consider it at some point?

ceyniustranberg

As far as I'm concerned, looks good

…versize-usage-report-payload # Conflicts: # posthog/tasks/usage_report.py

Fold the under-/over-limit cases into one parameterized test driven by team_count, and serialize the precondition assertions with default=str so they mirror what _trim_oversize_usage_report_payload measures.

…versize-usage-report-payload Resolve conflict in posthog/tasks/usage_report.py: keep both master's get_teams_with_sdk_logs_records_in_period helper and this branch's _trim_oversize_usage_report_payload helper. Generated-By: PostHog Code Task-Id: 5a0684db-daba-47e7-86ae-54b4701f995f

…versize-usage-report-payload

deployment-status-posthog · 2026-06-01T09:58:07Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-06-01 09:58 UTC	Run
prod-us	✅ Deployed	2026-06-01 10:10 UTC	Run
prod-eu	✅ Deployed	2026-06-01 10:14 UTC	Run

abhischekt self-assigned this May 28, 2026

abhischekt requested review from ceyniustranberg, mcoll-posthog and pawel-cebula May 28, 2026 15:59

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

Comment thread posthog/tasks/test/test_usage_report.py Outdated

Comment thread posthog/tasks/test/test_usage_report.py Outdated

ceyniustranberg approved these changes May 29, 2026

View reviewed changes

abhischekt added 4 commits May 29, 2026 17:24

Merge remote-tracking branch 'origin/master' into posthog-code/trim-o…

2dff989

…versize-usage-report-payload # Conflicts: # posthog/tasks/usage_report.py

chore(usage-report): parameterize oversize-payload trim tests

ca72b1a

Fold the under-/over-limit cases into one parameterized test driven by team_count, and serialize the precondition assertions with default=str so they mirror what _trim_oversize_usage_report_payload measures.

Merge remote-tracking branch 'origin/master' into posthog-code/trim-o…

fb52744

…versize-usage-report-payload

abhischekt enabled auto-merge (squash) June 1, 2026 09:12

abhischekt merged commit 779336f into master Jun 1, 2026
197 checks passed

abhischekt deleted the posthog-code/trim-oversize-usage-report-payload branch June 1, 2026 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(usage-report): trim oversize org payloads before capture#60462

fix(usage-report): trim oversize org payloads before capture#60462
abhischekt merged 5 commits into
masterfrom
posthog-code/trim-oversize-usage-report-payload

abhischekt commented May 28, 2026

Uh oh!

greptile-apps Bot commented May 28, 2026

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

ceyniustranberg commented May 28, 2026

Uh oh!

ceyniustranberg left a comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abhischekt commented May 28, 2026

Problem

Changes

How did you test this code?

Publish to changelog?

Docs update

🤖 Agent context

Uh oh!

greptile-apps Bot commented May 28, 2026

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ceyniustranberg commented May 28, 2026

Uh oh!

ceyniustranberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 28, 2026 •

edited

Loading

deployment-status-posthog Bot commented Jun 1, 2026 •

edited

Loading