feat(property-vals): emit-once seen cache in merger by andyzzhao · Pull Request #60733 · PostHog/posthog

andyzzhao · 2026-05-29T20:22:49Z

Problem

property-vals-rs re-produces a tuple once per flush window for as long as that tuple keeps appearing. Re-emission of recurring tuples across flushes contributes to increase write volume to clickhouse.

Changes

Adds a bounded "already emitted" cache to the merger. At flush, a tuple still resident in the cache is suppressed; a new one is emitted and then inserted into the cache only after the produce succeeds.

New SeenCache wraps a quick_cache LRU keyed on the full (team, type, key, value) tuple, with a custom eviction lifecycle. Same structure property-defs-rs uses for its dedup cache.
Controlled by MERGER_SEEN_CACHE_CAPACITY (default 0 = disabled). Capacity bounds memory directly; it is independent of total cardinality.

Trade-offs:

This degrades property_count to a presence-ish undercount, because a suppressed occurrence does not increment the count. I will likely remove this col as a follow up.
On a merger redeploy the cache starts cold and produced volume returns to the full rate until it re-warms.

How did you test this code?

I'm an agent. Added and ran automated unit tests for the cache path:

an already-cached tuple is suppressed (no second produce, aggregator still drained)
a tuple forgotten on produce failure is re-emitted on retry, not lost to the cache
distinct new values all emit while a repeat is suppressed

Automatic notifications

Publish to changelog?
Alert Sales and Marketing teams?

🤖 Agent context

Authored with Claude Code (Opus 4.8). The cache lives in the merger because the intermediate topic is hashed by the full tuple, so each tuple has a single owning merger pod and the per-pod caches do not overlap, which makes the dedup correct without coordination. Chose an exact LRU (quick_cache) over a bloom filter on purpose: the bloom is cheaper per element but its false positives drop genuinely-new values, whereas the LRU's only failure mode is re-emitting an evicted tuple, which is lossless. Sizing is against the active working set, which was measured to be small per pod, rather than the all-time cardinality. Ships disabled (capacity 0); enabling and capacity tuning are operational, guided by the hit/miss/eviction metrics.

Add an optional bounded "already emitted" cache to the merger so a tuple that is still resident is suppressed instead of re-produced every flush window. Re-emission of recurring tuples is the bulk of produced rows, so this drops produced volume toward the new-distinct-value arrival rate. The cache is a quick_cache LRU keyed on the full tuple, sized to the active working set rather than total cardinality, so memory is bounded by MERGER_SEEN_CACHE_CAPACITY (default 0 = disabled), not by the number of distinct values. It is lossless on values: an evicted or forgotten tuple is re-emitted and the AggregatingMergeTree absorbs the duplicate. On a produce failure the emitted set is forgotten so the retry re-emits it. Applied in the merger only; events and groups run with reduction disabled. Hits, misses, and evictions are exported per worker for sizing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-05-29T20:28:30Z

_{Reviews (1): Last reviewed commit: "feat(property-vals): emit-once seen cach..." | Re-trigger Greptile}

stamphog

Gates denied this PR because it adds a new Rust dependency (quick_cache) to Cargo.toml and Cargo.lock, which triggers the dependency/toolchain deny-list rule. This requires human review before auto-approval.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…cceeds Replace the optimistic insert + forget-on-failure with inserting into the seen cache only after the produce succeeds. The cache now only ever holds tuples that were actually emitted, so a produce failure just restores the batch to the aggregator with nothing to undo, and a tuple can never be suppressed without having been produced first. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

deployment-status-posthog · 2026-06-01T21:28:11Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-06-01 21:28 UTC	Run
prod-us	✅ Deployed	2026-06-02 10:18 UTC	Run
prod-eu	✅ Deployed	2026-06-01 21:53 UTC	Run

andyzzhao added the stamphog Request AI review from stamphog label Jun 1, 2026

stamphog Bot reviewed Jun 1, 2026

View reviewed changes

stamphog Bot removed the stamphog Request AI review from stamphog label Jun 1, 2026

chore(property-vals): drop comments in merger seen cache

3448a13

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

andyzzhao requested a review from a team June 1, 2026 17:06

aspicer approved these changes Jun 1, 2026

View reviewed changes

andyzzhao merged commit 10b3138 into master Jun 1, 2026
189 checks passed

andyzzhao deleted the andy/property-vals-merger-seen-cache branch June 1, 2026 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(property-vals): emit-once seen cache in merger#60733

feat(property-vals): emit-once seen cache in merger#60733
andyzzhao merged 3 commits into
masterfrom
andy/property-vals-merger-seen-cache

andyzzhao commented May 29, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 29, 2026

Uh oh!

stamphog Bot left a comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andyzzhao commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Automatic notifications

🤖 Agent context

Uh oh!

greptile-apps Bot commented May 29, 2026

Uh oh!

stamphog Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andyzzhao commented May 29, 2026 •

edited

Loading

deployment-status-posthog Bot commented Jun 1, 2026 •

edited

Loading