Skip to content

feat(property-vals): add EXCLUDED_PROPERTY_KEYS env var to skip keys at fan_out#60160

Merged
andyzzhao merged 5 commits into
masterfrom
andy/property-vals-rs-excluded-keys
May 27, 2026
Merged

feat(property-vals): add EXCLUDED_PROPERTY_KEYS env var to skip keys at fan_out#60160
andyzzhao merged 5 commits into
masterfrom
andy/property-vals-rs-excluded-keys

Conversation

@andyzzhao
Copy link
Copy Markdown
Contributor

@andyzzhao andyzzhao commented May 26, 2026

Problem

Identifier-shaped properties ($insert_id, $session_id, $window_id, distinct_id, etc.) amplify write throughput resulting in consumer lag because the table stores each row as a unique (team_id, property_type, property_key, property_value) tuple — so a property whose value is essentially unique per event (UUIDs, timestamps, request IDs) generates a new row for every event it appears on, with no aggregation collapse to absorb the volume.

Confirmed empirically by joining property_values row counts against autocomplete search frequency from system.query_log for team 2 over 30 days:

Property key % of rows Avg value count Searches (30d)
$insert_id 14.49% 1.2 0
$sent_at 14.22% 3.0 0
$feature_flag_request_id 13.02% 2.5 0
$time 12.80% 1.4 0
$feature_flag_evaluated_at 9.55% 1.6 0

24 of the top 25 keys (by row count) have zero searches in 30 days. The top 5 alone account for ~64% of all stored rows yet receive no autocomplete traffic — they're pure write amplification.

The exceptions to leave alone: $session_id (258 searches, real autocomplete usage despite high cardinality), $current_url (high cardinality but also high real search demand), $set_once (modest searches but worth preserving).

Avg value count is avg(property_count) per row — close to 1 means each value appears ~once (every event is a new row), which is the identifier signature. Compare to $current_url=50.4 or $set_once=65.7 where each unique value gets emitted dozens of times before a new one appears (real aggregation happening).

Changes

  • config.rs: add EXCLUDED_PROPERTY_KEYS env var, parsed via new ExcludedPropertyKeys type into an Arc<HashSet<String>>. Empty by default.
  • fan_out.rs: fan_out and fan_out_group take the exclusion set as a parameter; emit_from_blob skips emission for keys in the set.
  • main.rs: wraps fan_out and fan_out_group in closures that capture the config's exclusion set, passes them to worker_loop.

Tunable without code deploys — change the chart values, Argo syncs, pods pick it up on restart.

Companion charts PR adds the env var to apps/property-vals-rs/values.yaml so per-environment overrides can set the list.

How did you test this code?

I am an agent. Local automated checks only:

  • cargo test -p property-vals-rs --lib — all 38 tests pass (34 existing + 4 new)
  • 4 new tests verify exclusion behavior:
    • excluded_event_property_keys_are_skipped — drops $insert_id / distinct_id / $session_id, only $browser survives
    • excluded_person_property_keys_are_skipped — filters $session_id from person properties, leaves email and plan
    • excluded_group_property_keys_are_skipped — same for group properties
    • empty_exclusion_list_is_a_noop — verifies ExcludedPropertyKeys::default() and "".parse() are both no-ops (production behavior preserved when env var unset)

Publish to changelog?

no

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Comments Outside Diff (1)

  1. rust/property-vals-rs/src/main.rs, line 76-84 (link)

    P2 Missing startup log for the exclusion list

    The config loaded log statement records most tunable knobs but omits excluded_property_keys. Because the whole point of this feature is to silently drop certain property keys, an operator who misconfigures the list (wrong casing, typo, etc.) will see no evidence in logs. PropertyKeyList currently has no Display or Debug impl, which is what blocks adding it here. Adding impl fmt::Display for PropertyKeyList in config.rs (e.g. sorted join of self.keys) would let you include excluded_property_keys = %config.excluded_property_keys in this log call and make deployments self-documenting at startup.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: rust/property-vals-rs/src/main.rs
    Line: 76-84
    
    Comment:
    **Missing startup log for the exclusion list**
    
    The `config loaded` log statement records most tunable knobs but omits `excluded_property_keys`. Because the whole point of this feature is to silently drop certain property keys, an operator who misconfigures the list (wrong casing, typo, etc.) will see no evidence in logs. `PropertyKeyList` currently has no `Display` or `Debug` impl, which is what blocks adding it here. Adding `impl fmt::Display for PropertyKeyList` in `config.rs` (e.g. sorted join of `self.keys`) would let you include `excluded_property_keys = %config.excluded_property_keys` in this log call and make deployments self-documenting at startup.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
rust/property-vals-rs/src/main.rs:76-84
**Missing startup log for the exclusion list**

The `config loaded` log statement records most tunable knobs but omits `excluded_property_keys`. Because the whole point of this feature is to silently drop certain property keys, an operator who misconfigures the list (wrong casing, typo, etc.) will see no evidence in logs. `PropertyKeyList` currently has no `Display` or `Debug` impl, which is what blocks adding it here. Adding `impl fmt::Display for PropertyKeyList` in `config.rs` (e.g. sorted join of `self.keys`) would let you include `excluded_property_keys = %config.excluded_property_keys` in this log call and make deployments self-documenting at startup.

### Issue 2 of 2
rust/property-vals-rs/src/fan_out.rs:300-333
**Repeated exclusion test structure; prefer parameterised tests**

`excluded_event_property_keys_are_skipped`, `excluded_person_property_keys_are_skipped`, and `excluded_group_property_keys_are_skipped` all exercise the same logical invariant — "keys in the exclusion list are not emitted" — across the three property types. Per the team's preference for parameterised tests, a single macro or a helper that iterates over `(input_fn, exclusion_key, expected_surviving_keys)` tuples would express this OnceAndOnlyOnce and make it easier to add new property types in the future.

Reviews (1): Last reviewed commit: "feat(property-vals): add EXCLUDED_PROPER..." | Re-trigger Greptile

Comment on lines +300 to +333
#[test]
fn excluded_event_property_keys_are_skipped() {
let blob = r#"{"$insert_id":"abc-123","$browser":"Chrome","distinct_id":"u1","$session_id":"s1"}"#;
let exclusions = excluded(&["$insert_id", "distinct_id", "$session_id"]);
let tuples = fan_out(&event(blob), &exclusions);
assert_eq!(tuples.len(), 1, "only $browser should survive exclusion");
assert_eq!(tuples[0].property_key, "$browser");
assert_eq!(tuples[0].property_value, "Chrome");
}

#[test]
fn excluded_person_property_keys_are_skipped() {
let ev = Event {
team_id: 2,
properties: None,
person_properties: Some(
r#"{"email":"a@b.com","$session_id":"s1","plan":"enterprise"}"#.to_string(),
),
};
let tuples = fan_out(&ev, &excluded(&["$session_id"]));
assert_eq!(tuples.len(), 2);
let keys: Vec<&str> = tuples.iter().map(|t| t.property_key.as_str()).collect();
assert!(keys.contains(&"email"));
assert!(keys.contains(&"plan"));
assert!(!keys.contains(&"$session_id"));
}

#[test]
fn excluded_group_property_keys_are_skipped() {
let g = group_identify(0, r#"{"plan":"enterprise","internal_id":"g-42"}"#);
let tuples = fan_out_group(&g, &excluded(&["internal_id"]));
assert_eq!(tuples.len(), 1);
assert_eq!(tuples[0].property_key, "plan");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Repeated exclusion test structure; prefer parameterised tests

excluded_event_property_keys_are_skipped, excluded_person_property_keys_are_skipped, and excluded_group_property_keys_are_skipped all exercise the same logical invariant — "keys in the exclusion list are not emitted" — across the three property types. Per the team's preference for parameterised tests, a single macro or a helper that iterates over (input_fn, exclusion_key, expected_surviving_keys) tuples would express this OnceAndOnlyOnce and make it easier to add new property types in the future.

Prompt To Fix With AI
This is a comment left during a code review.
Path: rust/property-vals-rs/src/fan_out.rs
Line: 300-333

Comment:
**Repeated exclusion test structure; prefer parameterised tests**

`excluded_event_property_keys_are_skipped`, `excluded_person_property_keys_are_skipped`, and `excluded_group_property_keys_are_skipped` all exercise the same logical invariant — "keys in the exclusion list are not emitted" — across the three property types. Per the team's preference for parameterised tests, a single macro or a helper that iterates over `(input_fn, exclusion_key, expected_surviving_keys)` tuples would express this OnceAndOnlyOnce and make it easier to add new property types in the future.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

andyzzhao and others added 3 commits May 26, 2026 18:33
…at fan_out

Property keys to filter out before they hit the aggregator are now config-driven via a comma-separated env var. Useful for identifier-shaped keys ($insert_id, $session_id, $window_id, distinct_id, etc.) that nobody type-aheads in autocomplete but dominate property_values row count.

Data point from a recent property_values dump on team 2: the top 5 keys by row count ($insert_id, $sent_at, $feature_flag_request_id, $time, $feature_flag_evaluated_at) accounted for 64.6% of all rows but 0.56% of total appearances. Pure write amp, no read benefit.

The new PropertyKeyList type parses the env var into an Arc<HashSet<String>>; fan_out and fan_out_group take it as a parameter and skip emission for keys in the set. Empty list is a no-op (current production behavior preserved). The set can be tuned without redeploying the binary — change the chart values, Argo syncs, pods pick it up on restart.

Tests:
- excluded_event_property_keys_are_skipped: drops $insert_id / distinct_id / $session_id, only $browser survives
- excluded_person_property_keys_are_skipped: filters $session_id from person properties, leaves email and plan
- excluded_group_property_keys_are_skipped: same for group properties
- empty_exclusion_list_is_a_noop: PropertyKeyList::default() and "".parse() both produce empty sets and don't affect output

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reads more honestly — the type only ever expresses an exclusion. Unlike TeamList (reused for both allowed_teams and blocked_teams), this type has one job and the name should say so.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@andyzzhao andyzzhao requested a review from a team May 26, 2026 22:34
@andyzzhao andyzzhao force-pushed the andy/property-vals-rs-excluded-keys branch from abb9a8d to b5994e5 Compare May 26, 2026 22:34
@andyzzhao andyzzhao added the stamphog Request AI review from stamphog label May 26, 2026
github-actions[bot]
github-actions Bot previously approved these changes May 26, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, backward-compatible feature addition in a Rust service. The Arc<HashSet<String>> approach is safe for sharing across threads, the default (empty string → empty set) preserves existing behavior, and all call sites are updated consistently. The bot's comment about parameterized tests is a style preference, not a substantive concern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions github-actions Bot dismissed their stale review May 26, 2026 22:39

New commits pushed (delta classified non_trivial_delta) — stamphog approval dismissed; re-review running automatically.

github-actions[bot]
github-actions Bot previously approved these changes May 26, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, backward-compatible Rust feature addition. Arc sharing is safe across threads, the empty-string default preserves existing behavior, and all call sites are correctly updated with new tests covering the exclusion logic.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions github-actions Bot dismissed their stale review May 26, 2026 22:49

New commits pushed (delta classified non_trivial_delta) — stamphog approval dismissed; re-review running automatically.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, backward-compatible Rust feature addition. The Arc<HashSet<String>> approach is correct for shared-read concurrency, the empty-string default preserves existing behavior, and all call sites are consistently updated. The only inline comment (parameterized tests) is a style preference already flagged as non-substantive by the automated review.

@andyzzhao andyzzhao merged commit 1efbf41 into master May 27, 2026
173 checks passed
@andyzzhao andyzzhao deleted the andy/property-vals-rs-excluded-keys branch May 27, 2026 12:50
webjunkie pushed a commit that referenced this pull request May 27, 2026
…at fan_out (#60160)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented May 27, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-05-27 13:23 UTC Run
prod-us ✅ Deployed 2026-05-27 13:36 UTC Run
prod-eu ✅ Deployed 2026-05-27 13:44 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stamphog Request AI review from stamphog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants