feat(logs): allow multiple distinct_id attribute keys with COALESCE matching#61312
Draft
DanielVisca wants to merge 2 commits into
Draft
feat(logs): allow multiple distinct_id attribute keys with COALESCE matching#61312DanielVisca wants to merge 2 commits into
DanielVisca wants to merge 2 commits into
Conversation
…atching `TeamLogsConfig.logs_distinct_id_attribute_key` (singular CharField) becomes `logs_distinct_id_attribute_keys: ArrayField[CharField]`. Teams can now configure an ordered list of OTel log attributes used to identify which person a log belongs to. For each log row, the first configured key present on that row is matched against the person's distinct_ids — priority-ordered fallback via SQL `coalesce()`, no OR-group recursion in the query runner. Motivating real-world example: Circleback's backend emits `posthog.distinct_id` (OTel dotted_snake), while the JS SDK auto-attaches `posthogDistinctId` (camelCase). With the single-key model these teams have to pick one and lose the other. With multi-key they configure both and every source surfaces on person profiles. Changes: - `products/logs/backend/models.py` + `migrations/0015_*`: switch to ArrayField with default `["posthogDistinctId"]`. Single forward migration mirrors `posthog/migrations/0032_team_multiple_app_urls.py` (add → backfill → remove), `elidable=True` on the RunPython. - `posthog/api/team.py`: serializer becomes `ListField(child=CharField)`, min_length=1, max_length=10, validates non-empty/non-blank entries, dedupes preserving order, trims whitespace. - `posthog/schema.py` + `frontend/src/types.ts`: new optional `keys: list[str]` on `LogPropertyFilter` (alternative to `key`). When set, the runner compiles to `coalesce(attributes[k1__suffix], attributes[k2__suffix], …) <op> value` via a new `_multikey_log_attribute_to_expr` helper. Single-key path is unchanged. - `products/logs/backend/logs_query_runner.py`: detect multi-key filters in the LogAttribute branch and route them through the new helper. Single compile-site change — no nested OR-group recursion, no `property_to_expr` changes. - `products/logs/frontend/logsConfigLogic.ts`: plural shape + `DEFAULT_LOGS_DISTINCT_ID_ATTRIBUTE_KEYS = ['posthogDistinctId']`. - `frontend/src/scenes/settings/environment/LogsDistinctIdAttributeKey.tsx`: swap `LemonInput` for `LemonInputSelect` (`mode="multiple"`, `allowCustomValues`, `sortable`) — free-form strings, drag-to-reorder, comma-paste-to-add. Save validates ≥1 entry, ≤200 chars per entry, dedupes/trims. Admin gating unchanged. - `frontend/src/scenes/persons/PersonLogsTab.tsx`: pinned filter now carries `keys` array; scope hint renders all configured keys with commas and surfaces a prominent "Link logs to a person →" link to the settings page (replaces the previous discreet gear icon — easier to discover). Tests: - Serializer: 23/23 passing (parametrized across `/api/projects/` and `/api/environments/`, covers empty/blank/dedup/order/overlength). - Runner: 3 new tests for `_multikey_log_attribute_to_expr` validate that multi-key compiles to a `coalesce(...)` Call wrapping per-key Field lookups with the `__str` type suffix, single-key skips the coalesce wrapper, and IsNot maps to NotIn. Tests bypass the parser-shadow pipeline (which has a pre-existing local divergence on unrelated HogQL) and exercise the helper directly. Out of scope (deliberate): - True OR-across-keys semantics (a log links if any key at any position matches a distinct_id) — see plan file; requires nested OR-group support in the query runner. Rejected in favor of the smaller COALESCE change. - Renaming `posthogDistinctId` to `posthog.distinct_id` (OTel-canonical) — separate SDK + default-config change. Generated-By: PostHog Code Task-Id: a0cd1bfc-9b19-4b30-9123-6488d22b3f6a
Contributor
|
Size Change: 0 B Total Size: 81.1 MB ℹ️ View Unchanged
|
Contributor
Migration SQL ChangesHey 👋, we've detected some migrations on this PR. Here's the SQL output for each migration, make sure they make sense:
|
Contributor
🔍 Migration Risk AnalysisWe've analyzed your migrations for potential risks. Summary: 0 Safe | 0 Needs Review | 1 Blocked ❌ BlockedCauses locks or breaks compatibility 📚 How to Deploy These Changes SafelyRemoveField: Multi-phase column drop:
See the migration safety guide RunPython: Use batching for large data migrations:
See the migration safety guide Last updated: 2026-06-03 00:55 UTC (673a267) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Today
TeamLogsConfig.logs_distinct_id_attribute_keyis a single OTel log attribute key per project — and real teams emit logs from multiple services that use different shapes for the person identifier. Concrete real-world example:posthogDistinctId(camelCase).posthog.distinct_id(OTel-canonical dotted snake).user.idor a plaindistinct_id.With the single-key model, these teams have to pick one source to surface on person profiles and lose linkage on the others.
Changes
Adds priority-ordered fallback support: configure an ordered list of attribute keys, and for any log row, the first configured key present on that row is matched against the person's distinct IDs. Implemented as a single SQL
COALESCE(...)per pinned filter — no nested OR-group support in the query runner, no restructuring of how pinned filters compose with user filters (see the plan-file rationale).Backend
products/logs/backend/models.py+migrations/0015_*:logs_distinct_id_attribute_key: CharField→logs_distinct_id_attribute_keys: ArrayField(CharField(max_length=200), default=lambda: ["posthogDistinctId"]). Migration mirrorsposthog/migrations/0032_team_multiple_app_urls.pyexactly — add → backfill (RunPython,elidable=True) → remove.posthog/api/team.py: serializer becomesListField(child=CharField),min_length=1,max_length=10. Validates non-blank entries, dedupes preserving order, trims whitespace.posthog/schema.py+frontend/src/types.ts: new optionalkeys: list[str]onLogPropertyFilter(alternative tokey).products/logs/backend/logs_query_runner.py: new_multikey_log_attribute_to_exprhelper compiles akeys-bearing filter tocoalesce(attributes[k1__suffix], attributes[k2__suffix], ...) <op> value. Single compile-site change in the LogAttribute branch —property_to_exprand the nested-group walk are untouched.Frontend
products/logs/frontend/logsConfigLogic.ts: plural shape +DEFAULT_LOGS_DISTINCT_ID_ATTRIBUTE_KEYS = ['posthogDistinctId'].frontend/src/scenes/settings/environment/LogsDistinctIdAttributeKey.tsx: swapLemonInputforLemonInputSelect(mode="multiple",allowCustomValues,sortable). Drag-to-reorder, comma-paste to add multiple at once. Save validates ≥1 entry, ≤200 chars per entry, dedupes/trims. Admin gating unchanged. Helper text explains priority order.frontend/src/scenes/persons/PersonLogsTab.tsx: pinned filter carries thekeysarray. Scope hint renders all configured keys with commas, with a prominent inline "Link logs to a person →" link to the settings page (replaces the previous discreet gear icon — easier to discover).Tests
posthog/api/test/test_team_logs_config.py: 23 parametrized tests, plural shape. Covers empty/blank/over-length/duplicate/whitespace input, order preservation, dual-route parity, env isolation.products/logs/backend/test/test_logs_query_runner.py: 3 new tests validate that_multikey_log_attribute_to_exprproduces acoalesce(...)Callwith per-keyFieldlookups (suffixed__str) andIn/NotIncomparison ops. Tests call the helper directly to bypass a pre-existing local divergence in the HogQL parser-shadow check.How did you test this code?
I'm an agent (PostHog Code / Opus 4.7). The user reviewed and approved the design (in particular the COALESCE-over-OR choice) via the plan file at
flickering-humming-honey.md. Verified locally:pytest posthog/api/test/test_team_logs_config.py— 23/23 passing.pytest products/logs/backend/test/test_logs_query_runner.py -k multikey— 3/3 passing.tsc --noEmitclean on all touched frontend files.Manual smoke test pending user verification — the parser-shadow local divergence blocks running the existing integration tests against the local stack from this worktree (pre-existing, not introduced here).
Notes for reviewers
posthog-code/logs-settings-pivot-attribute— the admin-gated settings UI). Merging feat(logs): admin-gated settings UI for the person pivot attribute #61301 first will let this rebase cleanly onto master.posthogDistinctId = X(not a distinct_id) ANDuser.id = Y(IS a distinct_id) does NOT link — the first configured key's value wins on each row. In practice a log emits one identifier under one key, so the case is rare. Documented as a known semantic in the public docs follow-up.Automatic notifications
Docs update
Follow-up docs PR in
PostHog/posthog.comwill update:contents/docs/logs/link-person.mdx— the "Customizing the attribute key" section becomes "Customizing the attribute keys"; add a worked example with multiple keys and explain priority/COALESCE semantics.contents/docs/logs/troubleshooting.mdx— mention the priority-fallback edge case (a log with a non-matching first-key value won't link even if a later key would have matched; suggest reordering).🤖 Agent context
PostHog Code (Opus 4.7). The first draft of this proposed true OR semantics with nested OR-group recursion in the query runner — the user pushed back on the complexity, hinting "AND is okay" and "any key can be the pivot". The revised plan landed on COALESCE/priority, which keeps the pinned filter as a single ANDed condition (no query-runner restructure needed). The motivating real-world example for multi-key (
posthog.distinct_idvsposthogDistinctId) came from a Circleback log screenshot the user shared mid-implementation.Created with PostHog Code