feat(property-vals): aggregate property values per event behind flags#61325
Conversation
Thread MAX_PROPERTY_VALUE_LEN through fan_out as a config-driven parameter (MAX_PROPERTY_VALUE_LEN env var, default 255) so the cap can be tuned without a redeploy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
…#61322) Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
New commits pushed (delta classified non_trivial_delta) — stamphog approval dismissed; re-review running automatically.
There was a problem hiding this comment.
The bot comment flagging a missing test for the configurable cap with a non-default value is valid and unaddressed — every test call still threads MAX_PROPERTY_VALUE_LEN (255), so the core advertised feature (a non-default cap changing behavior) has zero direct test coverage. Request a human review or add a targeted test before re-requesting.
8f9f980 to
e76e232
Compare
Table-driven test proving the cap parameter changes filtering: a 256-char value is dropped at cap 255 but kept at cap 300. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
e76e232 to
2950b4f
Compare
There was a problem hiding this comment.
The bot review concerns about missing test coverage for the configurable cap were addressed in a subsequent commit — the current diff includes a parameterized value_length_cap_is_configurable test that exercises both default and non-default caps. All changes are backward-compatible: defaults preserve existing behavior, empty event names are omitted from the wire format, and deserialization of old messages is handled via #[serde(default)].
Problem
We want property-value autocomplete to be scopable to a specific event (filter "events where event = X and property = Y"). For that, the value catalog needs to know which event each value appeared on. This threads the source event name through the property-values aggregation pipeline so a later ClickHouse column can support per-event value lookups.
It is gated and ships dark, so it is safe to land and deploy ahead of the ClickHouse side.
Changes
Per-event aggregation (the main change):
event_namebecomes part of the in-memory aggregation key, the merger's emit-once seen cache, and the top-K cap, so event values aggregate and cap per(event, key)rather than perkey.AGGREGATE_BY_EVENT_NAMEcontrols whether the event name enters the aggregation key. On, event values fragment per event (this drives the write amplification). Off, event values carry an empty name like person/group.EMIT_EVENT_NAMEcontrols whether the merger writes theevent_namefield on the output topic. The field is omitted when empty (serde(skip_serializing_if)), so with this off the messages toclickhouse_property_valuesare byte-identical to the pre-event_nameformat.Splitting the two flags lets us exercise the write-amplification behavior before the ClickHouse column exists: turn
AGGREGATE_BY_EVENT_NAMEon withEMIT_EVENT_NAMEoff to produce the same fragmented record count (and throughput) the finished feature will, while the wire stays in the old format so the current ClickHouse kafka table keeps consuming it unchanged.Supporting change:
MAX_PROPERTY_VALUE_LENconfig field (default 255), so it can be tuned without a redeploy.Rollout order: deploy the ClickHouse PR that adds
event_nameto the value table's sort key (#61323), deploy this service with both flags off, optionally flipAGGREGATE_BY_EVENT_NAMEalone to load-test throughput, then flipEMIT_EVENT_NAMEon.One thing to weigh: aggregating per event lowers the collapse ratio, so write throughput to ClickHouse rises (roughly 2-6x at the flush cadence depending on a team's event diversity). Lengthening
FLUSH_INTERVAL_SECSis the lever if that is too much.How did you test this code?
I am an agent (Claude Code). No manual testing. On the
property-vals-rscrate:cargo fmt,cargo clippy, andcargo test(55 passed). Tests cover:event_nameformat) and round-trips back to emptyevent_namethrough from the intermediate messageAutomatic notifications
Docs update
No docs change.
🤖 Agent context
Authored with Claude Code (Read/Edit/Bash) while working through property-value search performance. This branch consolidates two changes: the configurable value-length cap and the per-event aggregation feature (the latter merged in via the stacked PR #61322).
Decisions: the two flags (
AGGREGATE_BY_EVENT_NAME,EMIT_EVENT_NAME) are split so the per-event aggregation grain, and its write amplification, can be exercised on production before the consuming ClickHouse column exists, which decouples the two deploys. The output field is omitted when empty so flag-off messages stay byte-identical to the old format. Person and group are intentionally left unscoped because their values are not event-specific (groups are not on the event stream at all). The aggregation key, merger seen cache, and top-K cap all key onevent_nameso the per-event grain is consistent end to end. The ~2-6x write amplification was measured and is called out as a rollout consideration, not a blocker, since it only bites once aggregation is on.