feat(error-tracking): add per-issue pre-cymbal rate limit by ablaszkiewicz · Pull Request #59611 · PostHog/posthog

ablaszkiewicz · 2026-05-22T12:40:17Z

Problem

A single noisy exception fingerprint can dominate the error-tracking consumer and crowd out the rest. We need per-issue throttling before Cymbal grouping so hot issues don't burn capacity downstream.

Stacked on #59610 (V3 lua denial-persistence fix) — without that, per-issue limits with sub-2 fillRate (100/15min ≈ 0.11/s) would wedge at -1 forever under sustained 1 req/s traffic.

Also I discovered an unwanted behavior - per team rate limiter runs before per issue rate limiter so it is possible that we will be consuming team budget and then rejecting that exception on the per-issue level. I swapped them so per-issue runs first. This means worse performance because we have to compute hashes for every exception that comes in but let's observe metrics - this is the proper way.

Changes

New pre-cymbal-group-key step — derives a stable issue key (team + exception fingerprint) for rate-limit decisions ahead of full Cymbal grouping.
Wires KeyedRateLimiterService into the error-tracking consumer, keyed by that issue key.
Refactors the shared keyed-rate-limiter-step from per-key cost summation to per-input fan-out → partial pass-through within a key group (some inputs allowed, the rest dropped) instead of all-or-nothing.
Reporting-mode flag (ERROR_TRACKING_RATE_LIMITER_REPORTING_MODE) so the limiter can be shadowed in prod before enforcement.
Swapped order of per-team and per-issue rate limiters so per-issue runs first.

How did you test this code?

tests pass,
I manually tested this and it works - I set per issue rate limit to 100 per 15 min and I am sending 1 exception per second. It quickly reaches ~105 (105 instead of 100 because bucket has a refill rate) and then traffic slows down drastically leaving us with ~1 exception per 10s (100 per 15 min is 0.11 per s). Other exceptions have their own limit

Publish to changelog?

no

🤖 Agent context

Branched off #59610 and built up from an earlier WIP branch by the agent. PR base will switch to master automatically once #59610 merges.

greptile-apps · 2026-05-22T12:44:49Z

Prompt To Fix All With AI

Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
nodejs/src/ingestion/error-tracking/load-error-tracking-settings-step.ts:18-25
Debug `console.log` left in production code — this fires for every event processed by the error-tracking pipeline. Under sustained load that will generate enormous log volume and expose settings data (including rate-limit values) on every invocation.

```suggestion
        const errorTrackingSettings = manager ? await manager.getSettings(input.team.id) : null

        return ok({ ...input, errorTrackingSettings })
```

### Issue 2 of 3
nodejs/src/ingestion/error-tracking/error-tracking-consumer.ts:312-356
Two `console.log` debug statements are left in production code. `getBucketConfig` is called once per unique rate-limit key per batch, so these fire frequently under real traffic. The same issue also appears in the per-issue block immediately below (`[ET-RL] per-issue bucket config`).

```suggestion
                    const config = {
                        bucketSize: value,
                        refillRate: value / (minutes * 60),
                    }

                    return config
                },
            },
            // Per-stack cap: independent opt-in from the team-global limit.
            {
                rateLimiter: this.rateLimiter,
                appMetricsAggregator: this.rateLimiterAppMetricsAggregator,
                appSource: 'exceptions',
                getKey: (input) => {
                    if (input.errorTrackingSettings?.perIssueRateLimitValue == null) {
                        return null
                    }
                    const sig = preCymbalGroupKey(input.event)
                    return sig ? `${input.team.id}:exceptions:stack:${sig}` : null
                },
                getTeamId: (input) => input.team.id,
                reportingMode: this.config.rateLimiterReportingMode,
                dropReason: 'rate_limited:per_stack',
                getBucketConfig: (input) => {
                    const settings = input.errorTrackingSettings!
                    const value = settings.perIssueRateLimitValue!
                    const minutes = settings.perIssueRateLimitBucketSizeMinutes ?? 60
                    const config = {
                        bucketSize: value,
                        refillRate: value / (minutes * 60),
                    }

                    return config
```

### Issue 3 of 3
nodejs/src/ingestion/error-tracking/pre-cymbal-group-key.ts:32-35
The `|` separator in the message-only path can silently merge distinct exceptions into the same rate-limit bucket. For example, `type="A|B", value="C"` and `type="A", value="B|C"` both produce the string `"A|B|C"` and therefore the same hash. Using a null byte (as the per-frame fields already do) removes the ambiguity.

```suggestion
        return createHash('sha1')
            .update(String(exc.type ?? ''))
            .update('\0')
            .update(value)
            .digest('hex')
            .slice(0, 16)
```

_{Reviews (1): Last reviewed commit: "feat(error-tracking): add per-issue pre-..." | Re-trigger Greptile}

veria-ai · 2026-05-22T13:27:18Z

PR overview

The PR has worked down most findings, with 2 issues already addressed and 1 remaining open. The remaining concern is that the new per-issue pre-cymbal rate limiter can create high-cardinality Redis keys from attacker-controlled exception payloads before a team-wide cap is enforced, allowing a sender with a project token to drive unbounded Redis writes by varying stack traces. This should be gated by a bounded team/global admission check or another cap on per-team issue-key creation before writing per-issue buckets.

Open issues (1)

nodejs/src/ingestion/error-tracking/error-tracking-consumer.ts:301 — Medium: High-cardinality rate-limit keys preCymbalGroupKey(input.event) is derived from public exception payloads, and this per-issue

Fixed/addressed: 2 · PR risk: 6/10

cat-ph · 2026-05-22T13:26:00Z

+        return null
+    }
+
+    const frames = exc.stacktrace?.frames


for 'extra safety' should we handle invalid frames? something like frames being [null], maybe we can validate the frames are valid objects too, otherwise I think this fails the entire batch?

cat-ph · 2026-05-22T13:28:43Z

+
+/** Pre-Cymbal stack signature for per-issue rate limit buckets. */
+export function preCymbalGroupKey(event: PluginEvent): string | null {
+    const exc = event.properties?.$exception_list?.[0]


thinking out loud, would chained exceptions with the first one acting as a sort of "wrapper", swallow more than wanted, and throttle issues that are not part of it? thinking that we go through all exceptions in cymbal

Very good point. Fixing that

ablaszkiewicz · 2026-05-26T09:48:36Z

Consolidated into existing tests

Adds a per-issue token-bucket rate limiter that fires before Cymbal so a single hot issue can't dominate the consumer. Keyed by the pre-cymbal group key (team + exception fingerprint), wired through the shared KeyedRateLimiterService. - New step: pre-cymbal-group-key — derives the issue key for rate-limit decisions ahead of full Cymbal grouping. - Refactored keyed-rate-limiter step from per-key cost summation to per-input fan-out so partial batches can pass through (some inputs in a key group allowed, others dropped) rather than all-or-nothing. - Reporting mode flag so the limiter can be shadowed in prod before enforcement. Builds on the V3 lua denial-persistence fix so per-issue limits with sub-2 fillRate (e.g. 100/15min ≈ 0.11/s) can recover instead of wedging.

greptile-apps · 2026-05-26T11:45:32Z

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nodejs/src/ingestion/error-tracking/error-tracking-consumer.ts:289-308
**Per-issue limiter runs after project-global, so hot issues still burn the project budget**

The two specs are applied in array order — project-global first, per-issue second. When both limits are configured an event from a hot issue that passes the project-global check (consuming a project token) can still be dropped at the per-issue step. Those consumed tokens are then unavailable for other issues, which is exactly the crowd-out problem the per-issue limiter is meant to solve.

Reversing the order so per-issue runs first means a hot issue's overflow is dropped before it can consume any project tokens, and the remaining project budget is fully available to other issues. When only one limit is configured the other step's `getKey` returns `null` for all inputs (no-op), so the swap is transparent in single-limit deployments.

_{Reviews (2): Last reviewed commit: "feat(error-tracking): add per-issue pre-..." | Re-trigger Greptile}

Copilot

Pull request overview

Adds a per-issue (pre-Cymbal) rate limit to the error-tracking ingestion path to prevent a single noisy fingerprint from monopolizing consumer capacity, and reorders limiters so per-issue throttling happens before the team-global cap.

Changes:

Extend error-tracking settings loading to include per-issue rate limit value and bucket window.
Introduce a new preCymbalGroupKey hashing step and wire a keyed per-issue rate limiter into the consumer ahead of Cymbal grouping.
Update/expand consumer tests for both project-wide and per-issue rate limiting; remove an older ET rate-limiter E2E test.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
nodejs/src/utils/error-tracking-settings-manager.ts	Loads per-issue rate limit settings from `posthog_errortrackingsettings`.
nodejs/src/utils/error-tracking-settings-manager.test.ts	Updates settings upsert helper + expectations to include per-issue fields.
nodejs/src/ingestion/error-tracking/rate-limiter-e2e.test.ts	Removes prior Redis-backed E2E test for sustained traffic rate limiting.
nodejs/src/ingestion/error-tracking/pre-cymbal-group-key.ts	Adds pre-Cymbal stable signature hashing for exceptions.
nodejs/src/ingestion/error-tracking/pre-cymbal-group-key.test.ts	Adds unit tests covering hashing behavior and edge cases.
nodejs/src/ingestion/error-tracking/load-error-tracking-settings-step.test.ts	Updates test typings/fixtures to include per-issue settings.
nodejs/src/ingestion/error-tracking/error-tracking-consumer.ts	Adds per-issue keyed limiter before team-global limiter using the pre-Cymbal signature.
nodejs/src/ingestion/error-tracking/error-tracking-consumer.test.ts	Adds integration-ish tests verifying project and per-issue limiter behavior with Redis-backed limiter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

marandaneto · 2026-05-26T12:40:59Z

curious why assign-reviewers-posthog[bot] is auto assigning client libs approvers as reviewers quite often

cat-ph · 2026-05-26T13:01:28Z

@marandaneto I think it's probably that one of the base branch changes caused the diff to actually contain a js version change (also noticed that in another PR), maybe there's a way to compare current PR's package.json against master in

posthog/.github/workflows/pr-posthog-js-reviewer.yml

Line 43 in d6577af

    
                             DIFF=$(gh api "repos/${REPO}/pulls/${PR_NUMBER}/files" --paginate --jq '.[] | select(.filename | endswith("package.json")) | .patch // empty')

/cc @PostHog/team-devex for thoughts

marandaneto · 2026-05-26T13:03:11Z

@marandaneto I think it's probably that one of the base branch changes caused the diff to actually contain a js version change (also noticed that in another PR), maybe there's a way to compare current PR's package.json against master in

posthog/.github/workflows/pr-posthog-js-reviewer.yml

Line 43 in d6577af

DIFF=$(gh api "repos/${REPO}/pulls/${PR_NUMBER}/files" --paginate --jq '.[] | select(.filename | endswith("package.json")) | .patch // empty')

/cc @PostHog/team-devex for thoughts

yep i figured, thx
#60076 so its a bit better now

cat-ph · 2026-05-26T13:41:51Z

+    const hash = createHash('sha1')
+    let hasContent = false
+
+    for (const exc of excList) {


there's probably a very limited edge case here where some SDKs sending captureException("") would bypass it, but IMO not something we need to explicitly handle here, just noting

veria-ai · 2026-05-26T15:31:35Z

+                    if (input.errorTrackingSettings?.perIssueRateLimitValue == null) {
+                        return null
+                    }
+                    const sig = preCymbalGroupKey(input.event)


Medium: High-cardinality rate-limit keys

preCymbalGroupKey(input.event) is derived from public exception payloads, and this per-issue limiter runs before the team-global limiter. A sender with the project token can vary stack frames to create one Redis bucket per synthetic issue, so globally rate-limited traffic still causes unbounded Redis writes; put a bounded team/global admission check before this per-issue bucket, or otherwise cap per-team issue-key creation before writing per-issue keys.

Ehhh that's true. I need to think about this

a thought: one benefit to having the team limiter run first is that we could actually move it further up the stack (using just the parsed headers), and drop events before we parse them (saving some compute cost) - we could then have the separate per group check after we've parsed the events.

IMO the fact that this would exhaust team-limit is a deal breaker.

A customer we are mainly designing this for is research gate (amongst top 100 most visited websites on the internet). Their exploding issue (issue which has uncontrollably high amount of exceptions) will basically eat team-limit in seconds probably effectively stopping ingestion of all other issues.

I was thinking about either:

hashing the exception hash to some limited space (like let's say even 10k hashes) - dead simple to implement but prone to clashes. We are sure that there is no way for potential attacker to use more than 10k slots in redis,

implementing something actually smart based on Redis and keeping track of how many distinct hashes we have recorded for a given team in a given time. If we are over this limit, we return null in issue-limit and then team-limit applies. This should NEVER happen when customer has a proper implementation. This is purely to prevent possible attacks. I see there are quite some options how to implement it - will research it tomorrow.

If you have any thoughts on above - please drop them here ❤️

i think option 2 makes sense there! option 1 can be tricky, and can possibly lead to false positives. starting with 2 (claude says to use redis hll) makes sense, we can also break it up further over time and have heavy traffic buckets vs a tail end bucket if needed.

z0br0wn

all in all looks good to me - have a thought about ordering the limiters but not a blocker. we should have some per step latency metrics so i'd be keen to see the impact there

deployment-status-posthog · 2026-05-27T13:34:12Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-05-27 13:34 UTC	Run
prod-us	✅ Deployed	2026-05-27 13:54 UTC	Run
prod-eu	✅ Deployed	2026-05-27 13:58 UTC	Run

assign-reviewers-posthog Bot requested a review from a team May 22, 2026 12:40

greptile-apps Bot reviewed May 22, 2026

View reviewed changes

Comment thread nodejs/src/ingestion/error-tracking/load-error-tracking-settings-step.ts

Comment thread nodejs/src/ingestion/error-tracking/error-tracking-consumer.ts Outdated

Comment thread nodejs/src/ingestion/error-tracking/pre-cymbal-group-key.ts Outdated

ablaszkiewicz force-pushed the feat/error-tracking-per-issue-rate-limit-v2 branch 3 times, most recently from 8037387 to 387bb41 Compare May 22, 2026 12:59

veria-ai Bot reviewed May 22, 2026

View reviewed changes

Comment thread nodejs/src/ingestion/error-tracking/keyed-rate-limiter-step.ts Outdated

cat-ph reviewed May 22, 2026

View reviewed changes

ablaszkiewicz marked this pull request as draft May 26, 2026 09:22

ablaszkiewicz force-pushed the feat/error-tracking-per-issue-rate-limit-v2 branch from 387bb41 to 3aa7234 Compare May 26, 2026 09:23

assign-reviewers-posthog Bot requested a review from a team May 26, 2026 09:23

ablaszkiewicz closed this May 26, 2026

ablaszkiewicz reopened this May 26, 2026

ablaszkiewicz changed the base branch from fix/redis-token-bucket-v3-persist-balance-on-denial to master May 26, 2026 09:30

ablaszkiewicz force-pushed the feat/error-tracking-per-issue-rate-limit-v2 branch from 3aa7234 to 03cbace Compare May 26, 2026 09:48

ablaszkiewicz commented May 26, 2026

View reviewed changes

ablaszkiewicz force-pushed the feat/error-tracking-per-issue-rate-limit-v2 branch from 03cbace to 20132f6 Compare May 26, 2026 10:11

ablaszkiewicz marked this pull request as ready for review May 26, 2026 11:40

ablaszkiewicz requested review from cat-ph and Copilot May 26, 2026 11:40

veria-ai Bot reviewed May 26, 2026

View reviewed changes

Comment thread nodejs/src/ingestion/error-tracking/pre-cymbal-group-key.ts

feat: swap places

153c408

ablaszkiewicz requested a review from benjackwhite May 26, 2026 11:59

ablaszkiewicz added 2 commits May 26, 2026 14:02

fix: non iterable case

5c0544c

fix: env

4856189

Copilot started reviewing on behalf of ablaszkiewicz May 26, 2026 12:19 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

Comment thread nodejs/src/ingestion/error-tracking/error-tracking-consumer.test.ts Outdated

Comment thread nodejs/src/ingestion/error-tracking/error-tracking-consumer.ts

fix: copy

00cb759

marandaneto mentioned this pull request May 26, 2026

chore(ci): sync posthog-js reviewer requests #60076

Merged

cat-ph approved these changes May 26, 2026

View reviewed changes

Merge branch 'master' into feat/error-tracking-per-issue-rate-limit-v2

cf57b6b

veria-ai Bot reviewed May 26, 2026

View reviewed changes

z0br0wn approved these changes May 26, 2026

View reviewed changes

ablaszkiewicz merged commit 3c3d43b into master May 27, 2026
175 checks passed

ablaszkiewicz deleted the feat/error-tracking-per-issue-rate-limit-v2 branch May 27, 2026 13:15

Conversation

ablaszkiewicz commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Publish to changelog?

🤖 Agent context

Uh oh!

greptile-apps Bot commented May 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

veria-ai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR overview

Open issues (1)

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cat-ph May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented May 26, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

marandaneto commented May 26, 2026

Uh oh!

cat-ph commented May 26, 2026

Uh oh!

marandaneto commented May 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

veria-ai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ablaszkiewicz May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

z0br0wn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

ablaszkiewicz commented May 22, 2026 •

edited

Loading

veria-ai Bot commented May 22, 2026 •

edited

Loading

cat-ph May 22, 2026 •

edited

Loading

ablaszkiewicz May 26, 2026 •

edited

Loading

deployment-status-posthog Bot commented May 27, 2026 •

edited

Loading