feat(property-vals): add EXCLUDED_PROPERTY_KEYS env var to skip keys at fan_out by andyzzhao · Pull Request #60160 · PostHog/posthog

andyzzhao · 2026-05-26T21:26:13Z

Problem

Identifier-shaped properties ($insert_id, $session_id, $window_id, distinct_id, etc.) amplify write throughput resulting in consumer lag because the table stores each row as a unique (team_id, property_type, property_key, property_value) tuple — so a property whose value is essentially unique per event (UUIDs, timestamps, request IDs) generates a new row for every event it appears on, with no aggregation collapse to absorb the volume.

Confirmed empirically by joining property_values row counts against autocomplete search frequency from system.query_log for team 2 over 30 days:

Property key	% of rows	Avg value count
`$insert_id`	14.49%	1.2
`$sent_at`	14.22%	3.0
`$feature_flag_request_id`	13.02%	2.5
`$time`	12.80%	1.4
`$feature_flag_evaluated_at`	9.55%	1.6

24 of the top 25 keys (by row count) have zero searches in 30 days. The top 5 alone account for ~64% of all stored rows yet receive no autocomplete traffic — they're pure write amplification.

The exceptions to leave alone: $session_id (258 searches, real autocomplete usage despite high cardinality), $current_url (high cardinality but also high real search demand), $set_once (modest searches but worth preserving).

Avg value count is avg(property_count) per row — close to 1 means each value appears ~once (every event is a new row), which is the identifier signature. Compare to $current_url=50.4 or $set_once=65.7 where each unique value gets emitted dozens of times before a new one appears (real aggregation happening).

Changes

config.rs: add EXCLUDED_PROPERTY_KEYS env var, parsed via new ExcludedPropertyKeys type into an Arc<HashSet<String>>. Empty by default.
fan_out.rs: fan_out and fan_out_group take the exclusion set as a parameter; emit_from_blob skips emission for keys in the set.
main.rs: wraps fan_out and fan_out_group in closures that capture the config's exclusion set, passes them to worker_loop.

Tunable without code deploys — change the chart values, Argo syncs, pods pick it up on restart.

Companion charts PR adds the env var to apps/property-vals-rs/values.yaml so per-environment overrides can set the list.

How did you test this code?

I am an agent. Local automated checks only:

cargo test -p property-vals-rs --lib — all 38 tests pass (34 existing + 4 new)
4 new tests verify exclusion behavior:
- excluded_event_property_keys_are_skipped — drops $insert_id / distinct_id / $session_id, only $browser survives
- excluded_person_property_keys_are_skipped — filters $session_id from person properties, leaves email and plan
- excluded_group_property_keys_are_skipped — same for group properties
- empty_exclusion_list_is_a_noop — verifies ExcludedPropertyKeys::default() and "".parse() are both no-ops (production behavior preserved when env var unset)

Publish to changelog?

no

greptile-apps · 2026-05-26T21:29:22Z

Comments Outside Diff (1)

rust/property-vals-rs/src/main.rs, line 76-84 (link)

Missing startup log for the exclusion list

The config loaded log statement records most tunable knobs but omits excluded_property_keys. Because the whole point of this feature is to silently drop certain property keys, an operator who misconfigures the list (wrong casing, typo, etc.) will see no evidence in logs. PropertyKeyList currently has no Display or Debug impl, which is what blocks adding it here. Adding impl fmt::Display for PropertyKeyList in config.rs (e.g. sorted join of self.keys) would let you include excluded_property_keys = %config.excluded_property_keys in this log call and make deployments self-documenting at startup.

Prompt To Fix With AI

This is a comment left during a code review.
Path: rust/property-vals-rs/src/main.rs
Line: 76-84

Comment:
**Missing startup log for the exclusion list**

The `config loaded` log statement records most tunable knobs but omits `excluded_property_keys`. Because the whole point of this feature is to silently drop certain property keys, an operator who misconfigures the list (wrong casing, typo, etc.) will see no evidence in logs. `PropertyKeyList` currently has no `Display` or `Debug` impl, which is what blocks adding it here. Adding `impl fmt::Display for PropertyKeyList` in `config.rs` (e.g. sorted join of `self.keys`) would let you include `excluded_property_keys = %config.excluded_property_keys` in this log call and make deployments self-documenting at startup.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
rust/property-vals-rs/src/main.rs:76-84
**Missing startup log for the exclusion list**

The `config loaded` log statement records most tunable knobs but omits `excluded_property_keys`. Because the whole point of this feature is to silently drop certain property keys, an operator who misconfigures the list (wrong casing, typo, etc.) will see no evidence in logs. `PropertyKeyList` currently has no `Display` or `Debug` impl, which is what blocks adding it here. Adding `impl fmt::Display for PropertyKeyList` in `config.rs` (e.g. sorted join of `self.keys`) would let you include `excluded_property_keys = %config.excluded_property_keys` in this log call and make deployments self-documenting at startup.

### Issue 2 of 2
rust/property-vals-rs/src/fan_out.rs:300-333
**Repeated exclusion test structure; prefer parameterised tests**

`excluded_event_property_keys_are_skipped`, `excluded_person_property_keys_are_skipped`, and `excluded_group_property_keys_are_skipped` all exercise the same logical invariant — "keys in the exclusion list are not emitted" — across the three property types. Per the team's preference for parameterised tests, a single macro or a helper that iterates over `(input_fn, exclusion_key, expected_surviving_keys)` tuples would express this OnceAndOnlyOnce and make it easier to add new property types in the future.

_{Reviews (1): Last reviewed commit: "feat(property-vals): add EXCLUDED_PROPER..." | Re-trigger Greptile}

greptile-apps · 2026-05-26T21:29:29Z

+    #[test]
+    fn excluded_event_property_keys_are_skipped() {
+        let blob = r#"{"$insert_id":"abc-123","$browser":"Chrome","distinct_id":"u1","$session_id":"s1"}"#;
+        let exclusions = excluded(&["$insert_id", "distinct_id", "$session_id"]);
+        let tuples = fan_out(&event(blob), &exclusions);
+        assert_eq!(tuples.len(), 1, "only $browser should survive exclusion");
+        assert_eq!(tuples[0].property_key, "$browser");
+        assert_eq!(tuples[0].property_value, "Chrome");
+    }
+
+    #[test]
+    fn excluded_person_property_keys_are_skipped() {
+        let ev = Event {
+            team_id: 2,
+            properties: None,
+            person_properties: Some(
+                r#"{"email":"a@b.com","$session_id":"s1","plan":"enterprise"}"#.to_string(),
+            ),
+        };
+        let tuples = fan_out(&ev, &excluded(&["$session_id"]));
+        assert_eq!(tuples.len(), 2);
+        let keys: Vec<&str> = tuples.iter().map(|t| t.property_key.as_str()).collect();
+        assert!(keys.contains(&"email"));
+        assert!(keys.contains(&"plan"));
+        assert!(!keys.contains(&"$session_id"));
+    }
+
+    #[test]
+    fn excluded_group_property_keys_are_skipped() {
+        let g = group_identify(0, r#"{"plan":"enterprise","internal_id":"g-42"}"#);
+        let tuples = fan_out_group(&g, &excluded(&["internal_id"]));
+        assert_eq!(tuples.len(), 1);
+        assert_eq!(tuples[0].property_key, "plan");
+    }


Repeated exclusion test structure; prefer parameterised tests

excluded_event_property_keys_are_skipped, excluded_person_property_keys_are_skipped, and excluded_group_property_keys_are_skipped all exercise the same logical invariant — "keys in the exclusion list are not emitted" — across the three property types. Per the team's preference for parameterised tests, a single macro or a helper that iterates over (input_fn, exclusion_key, expected_surviving_keys) tuples would express this OnceAndOnlyOnce and make it easier to add new property types in the future.

Prompt To Fix With AI

This is a comment left during a code review. Path: rust/property-vals-rs/src/fan_out.rs Line: 300-333 Comment: **Repeated exclusion test structure; prefer parameterised tests** `excluded_event_property_keys_are_skipped`, `excluded_person_property_keys_are_skipped`, and `excluded_group_property_keys_are_skipped` all exercise the same logical invariant — "keys in the exclusion list are not emitted" — across the three property types. Per the team's preference for parameterised tests, a single macro or a helper that iterates over `(input_fn, exclusion_key, expected_surviving_keys)` tuples would express this OnceAndOnlyOnce and make it easier to add new property types in the future. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

…at fan_out Property keys to filter out before they hit the aggregator are now config-driven via a comma-separated env var. Useful for identifier-shaped keys ($insert_id, $session_id, $window_id, distinct_id, etc.) that nobody type-aheads in autocomplete but dominate property_values row count. Data point from a recent property_values dump on team 2: the top 5 keys by row count ($insert_id, $sent_at, $feature_flag_request_id, $time, $feature_flag_evaluated_at) accounted for 64.6% of all rows but 0.56% of total appearances. Pure write amp, no read benefit. The new PropertyKeyList type parses the env var into an Arc<HashSet<String>>; fan_out and fan_out_group take it as a parameter and skip emission for keys in the set. Empty list is a no-op (current production behavior preserved). The set can be tuned without redeploying the binary — change the chart values, Argo syncs, pods pick it up on restart. Tests: - excluded_event_property_keys_are_skipped: drops $insert_id / distinct_id / $session_id, only $browser survives - excluded_person_property_keys_are_skipped: filters $session_id from person properties, leaves email and plan - excluded_group_property_keys_are_skipped: same for group properties - empty_exclusion_list_is_a_noop: PropertyKeyList::default() and "".parse() both produce empty sets and don't affect output Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Reads more honestly — the type only ever expresses an exclusion. Unlike TeamList (reused for both allowed_teams and blocked_teams), this type has one job and the name should say so. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions

Clean, backward-compatible feature addition in a Rust service. The Arc<HashSet<String>> approach is safe for sharing across threads, the default (empty string → empty set) preserves existing behavior, and all call sites are updated consistently. The bot's comment about parameterized tests is a style preference, not a substantive concern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

New commits pushed (delta classified non_trivial_delta) — stamphog approval dismissed; re-review running automatically.

github-actions

Clean, backward-compatible Rust feature addition. Arc sharing is safe across threads, the empty-string default preserves existing behavior, and all call sites are correctly updated with new tests covering the exclusion logic.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

New commits pushed (delta classified non_trivial_delta) — stamphog approval dismissed; re-review running automatically.

github-actions

Clean, backward-compatible Rust feature addition. The Arc<HashSet<String>> approach is correct for shared-read concurrency, the empty-string default preserves existing behavior, and all call sites are consistently updated. The only inline comment (parameterized tests) is a style preference already flagged as non-substantive by the automated review.

…at fan_out (#60160) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

deployment-status-posthog · 2026-05-27T13:23:37Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-05-27 13:23 UTC	Run
prod-us	✅ Deployed	2026-05-27 13:36 UTC	Run
prod-eu	✅ Deployed	2026-05-27 13:44 UTC	Run

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

andyzzhao and others added 3 commits May 26, 2026 18:33

chore: drop verbose doc comment on ExcludedPropertyKeys

b5994e5

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

andyzzhao requested a review from a team May 26, 2026 22:34

andyzzhao force-pushed the andy/property-vals-rs-excluded-keys branch from abb9a8d to b5994e5 Compare May 26, 2026 22:34

andyzzhao added the stamphog Request AI review from stamphog label May 26, 2026

github-actions Bot previously approved these changes May 26, 2026

View reviewed changes

chore: cargo fmt

8cee200

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions Bot previously approved these changes May 26, 2026

View reviewed changes

chore: clippy fix — use to_vec() instead of iter().copied().collect()

75e4700

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions Bot approved these changes May 26, 2026

View reviewed changes

robbie-c approved these changes May 27, 2026

View reviewed changes

andyzzhao merged commit 1efbf41 into master May 27, 2026
173 checks passed

andyzzhao deleted the andy/property-vals-rs-excluded-keys branch May 27, 2026 12:50

webjunkie pushed a commit that referenced this pull request May 27, 2026

feat(property-vals): add EXCLUDED_PROPERTY_KEYS env var to skip keys …

fefce31

…at fan_out (#60160) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(property-vals): add EXCLUDED_PROPERTY_KEYS env var to skip keys at fan_out#60160

feat(property-vals): add EXCLUDED_PROPERTY_KEYS env var to skip keys at fan_out#60160
andyzzhao merged 5 commits into
masterfrom
andy/property-vals-rs-excluded-keys

andyzzhao commented May 26, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 26, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 26, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andyzzhao commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Publish to changelog?

Uh oh!

greptile-apps Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andyzzhao commented May 26, 2026 •

edited

Loading

greptile-apps Bot commented May 26, 2026 •

edited

Loading

deployment-status-posthog Bot commented May 27, 2026 •

edited

Loading