Skip to content

feat(property-vals): aggregate property values per event behind flags#61325

Merged
andyzzhao merged 3 commits into
masterfrom
andy/property-vals-value-len-config
Jun 3, 2026
Merged

feat(property-vals): aggregate property values per event behind flags#61325
andyzzhao merged 3 commits into
masterfrom
andy/property-vals-value-len-config

Conversation

@andyzzhao
Copy link
Copy Markdown
Contributor

@andyzzhao andyzzhao commented Jun 3, 2026

Problem

We want property-value autocomplete to be scopable to a specific event (filter "events where event = X and property = Y"). For that, the value catalog needs to know which event each value appeared on. This threads the source event name through the property-values aggregation pipeline so a later ClickHouse column can support per-event value lookups.

It is gated and ships dark, so it is safe to land and deploy ahead of the ClickHouse side.

Changes

Per-event aggregation (the main change):

  • The events worker stamps the source event name onto event-type values. Person and group values are never event-scoped (group properties are not even on the event stream), so they stay empty.
  • event_name becomes part of the in-memory aggregation key, the merger's emit-once seen cache, and the top-K cap, so event values aggregate and cap per (event, key) rather than per key.
  • Two independent flags, both default off:
    • AGGREGATE_BY_EVENT_NAME controls whether the event name enters the aggregation key. On, event values fragment per event (this drives the write amplification). Off, event values carry an empty name like person/group.
    • EMIT_EVENT_NAME controls whether the merger writes the event_name field on the output topic. The field is omitted when empty (serde(skip_serializing_if)), so with this off the messages to clickhouse_property_values are byte-identical to the pre-event_name format.

Splitting the two flags lets us exercise the write-amplification behavior before the ClickHouse column exists: turn AGGREGATE_BY_EVENT_NAME on with EMIT_EVENT_NAME off to produce the same fragmented record count (and throughput) the finished feature will, while the wire stays in the old format so the current ClickHouse kafka table keeps consuming it unchanged.

Supporting change:

  • The per-value length cap (previously a hard-coded 255) is now a MAX_PROPERTY_VALUE_LEN config field (default 255), so it can be tuned without a redeploy.

Rollout order: deploy the ClickHouse PR that adds event_name to the value table's sort key (#61323), deploy this service with both flags off, optionally flip AGGREGATE_BY_EVENT_NAME alone to load-test throughput, then flip EMIT_EVENT_NAME on.

One thing to weigh: aggregating per event lowers the collapse ratio, so write throughput to ClickHouse rises (roughly 2-6x at the flush cadence depending on a team's event diversity). Lengthening FLUSH_INTERVAL_SECS is the lever if that is too much.

How did you test this code?

I am an agent (Claude Code). No manual testing. On the property-vals-rs crate: cargo fmt, cargo clippy, and cargo test (55 passed). Tests cover:

  • event values get the source event name when aggregation is on; person values stay empty in the same event
  • all values are empty when aggregation is off
  • oversized event names (over 200 chars) are not stamped, so the values still land unscoped
  • the wire field is omitted when empty (matches the pre-event_name format) and round-trips back to empty
  • the merger carries event_name through from the intermediate message
  • the value-length cap is configurable: a 256-char value is dropped at cap 255 but kept at cap 300

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Docs update

No docs change.

🤖 Agent context

Authored with Claude Code (Read/Edit/Bash) while working through property-value search performance. This branch consolidates two changes: the configurable value-length cap and the per-event aggregation feature (the latter merged in via the stacked PR #61322).

Decisions: the two flags (AGGREGATE_BY_EVENT_NAME, EMIT_EVENT_NAME) are split so the per-event aggregation grain, and its write amplification, can be exercised on production before the consuming ClickHouse column exists, which decouples the two deploys. The output field is omitted when empty so flag-off messages stay byte-identical to the old format. Person and group are intentionally left unscoped because their values are not event-specific (groups are not on the event stream at all). The aggregation key, merger seen cache, and top-K cap all key on event_name so the per-event grain is consistent end to end. The ~2-6x write amplification was measured and is called out as a rollout consideration, not a blocker, since it only bites once aggregation is on.

Thread MAX_PROPERTY_VALUE_LEN through fan_out as a config-driven
parameter (MAX_PROPERTY_VALUE_LEN env var, default 255) so the cap can
be tuned without a redeploy.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 3, 2026

Comments Outside Diff (1)

  1. rust/property-vals-rs/src/fan_out.rs, line 219 (link)

    P2 check_tuple_invariants still hardcodes MAX_PROPERTY_VALUE_LEN

    check_tuple_invariants asserts property_value.chars().count() <= MAX_PROPERTY_VALUE_LEN (255) regardless of the cap actually passed to fan_out. Any future proptest that supplies a cap larger than 255 would emit values between 256 and the new cap, causing this invariant assertion to fail spuriously. The helper should accept max_property_value_len: usize and use that value in the assertion to stay in sync with the cap under test.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: rust/property-vals-rs/src/fan_out.rs
    Line: 219
    
    Comment:
    **`check_tuple_invariants` still hardcodes `MAX_PROPERTY_VALUE_LEN`**
    
    `check_tuple_invariants` asserts `property_value.chars().count() <= MAX_PROPERTY_VALUE_LEN` (255) regardless of the cap actually passed to `fan_out`. Any future proptest that supplies a cap larger than 255 would emit values between 256 and the new cap, causing this invariant assertion to fail spuriously. The helper should accept `max_property_value_len: usize` and use that value in the assertion to stay in sync with the cap under test.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
rust/property-vals-rs/src/fan_out.rs:219
**`check_tuple_invariants` still hardcodes `MAX_PROPERTY_VALUE_LEN`**

`check_tuple_invariants` asserts `property_value.chars().count() <= MAX_PROPERTY_VALUE_LEN` (255) regardless of the cap actually passed to `fan_out`. Any future proptest that supplies a cap larger than 255 would emit values between 256 and the new cap, causing this invariant assertion to fail spuriously. The helper should accept `max_property_value_len: usize` and use that value in the assertion to stay in sync with the cap under test.

### Issue 2 of 2
rust/property-vals-rs/src/fan_out.rs:223-251
**No test exercises the configurable cap with a non-default value**

Every test call threads through `MAX_PROPERTY_VALUE_LEN` (255), so no test actually demonstrates that changing the cap changes the filtering behaviour. A simple deterministic test — e.g., a 256-char value is dropped when `max_property_value_len = 255` but accepted when `max_property_value_len = 300` — would directly prove the new parameter works end-to-end, and would also be a better fit for the project's preference for parameterised tests.

Reviews (1): Last reviewed commit: "feat(property-vals): make property value..." | Re-trigger Greptile

Comment thread rust/property-vals-rs/src/fan_out.rs
@andyzzhao andyzzhao added the stamphog Request AI review from stamphog label Jun 3, 2026
github-actions[bot]
github-actions Bot previously approved these changes Jun 3, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, low-risk refactor that threads a new env-configurable cap through existing filtering logic with a backward-compatible default of 255. No production behavior changes unless the env var is explicitly set.

…#61322)

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot dismissed their stale review June 3, 2026 05:37

New commits pushed (delta classified non_trivial_delta) — stamphog approval dismissed; re-review running automatically.

Copy link
Copy Markdown

@stamphog stamphog Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bot comment flagging a missing test for the configurable cap with a non-default value is valid and unaddressed — every test call still threads MAX_PROPERTY_VALUE_LEN (255), so the core advertised feature (a non-default cap changing behavior) has zero direct test coverage. Request a human review or add a targeted test before re-requesting.

@stamphog stamphog Bot removed the stamphog Request AI review from stamphog label Jun 3, 2026
@andyzzhao andyzzhao force-pushed the andy/property-vals-value-len-config branch from 8f9f980 to e76e232 Compare June 3, 2026 05:42
Table-driven test proving the cap parameter changes filtering: a
256-char value is dropped at cap 255 but kept at cap 300.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@andyzzhao andyzzhao force-pushed the andy/property-vals-value-len-config branch from e76e232 to 2950b4f Compare June 3, 2026 05:45
@andyzzhao andyzzhao added the stamphog Request AI review from stamphog label Jun 3, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bot review concerns about missing test coverage for the configurable cap were addressed in a subsequent commit — the current diff includes a parameterized value_length_cap_is_configurable test that exercises both default and non-default caps. All changes are backward-compatible: defaults preserve existing behavior, empty event names are omitted from the wire format, and deserialization of old messages is handled via #[serde(default)].

@andyzzhao andyzzhao changed the title feat(property-vals): make property value length cap configurable feat(property-vals): aggregate property values per event behind flags Jun 3, 2026
@andyzzhao andyzzhao merged commit 75423d2 into master Jun 3, 2026
221 checks passed
@andyzzhao andyzzhao deleted the andy/property-vals-value-len-config branch June 3, 2026 05:58
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented Jun 3, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-06-03 07:15 UTC Run
prod-us ✅ Deployed 2026-06-03 07:27 UTC Run
prod-eu ✅ Deployed 2026-06-03 07:30 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stamphog Request AI review from stamphog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant