Swap out person_distinct_id in queries with subquery #3828

EDsCODE · 2021-03-31T19:11:20Z

Changes

Please describe.

instead of directly querying from person_distinct_id use a subquery to ensure the distinct_ids in consideration are always the latest
If this affects the frontend, include screenshots.

Checklist

All querysets/queries filter by Organization, by Team, and by User
Django backend tests
Jest frontend tests
Cypress end-to-end tests

…consider latest distinct_ids

EDsCODE · 2021-03-31T19:41:51Z

I did some manual testing across the analytics and the one that I had to skip for significant degraded performance was the lifecycle query

EDsCODE · 2021-03-31T19:47:51Z

ee/clickhouse/sql/person.py

@@ -65,6 +65,13 @@
 {query}
 """

+GET_LATEST_PERSON_DISTINCT_ID_SQL = """
+SELECT * FROM person_distinct_id JOIN (
+    SELECT distinct_id, max(_offset) as _offset FROM person_distinct_id WHERE team_id = %(team_id)s GROUP BY distinct_id


I'm using offset because it has more precision than _timestamp since _timestamp doesn't consider seconds

fuziontech

This looks good.

I'm just curious if we could maybe test out the impact of changing these queries to optionally use FINAL in the select if it would be faster?

swap out distinct_id table in queries with a subquery that will only …

dec1b12

…consider latest distinct_ids

timgl temporarily deployed to posthog-pr-3828 March 31, 2021 19:19 Inactive

wrong import

46b5d1a

EDsCODE temporarily deployed to posthog-pr-3828 March 31, 2021 19:20 Inactive

fix missin params

4f0c530

EDsCODE temporarily deployed to posthog-pr-3828 March 31, 2021 19:27 Inactive

EDsCODE requested review from timgl and fuziontech March 31, 2021 19:41

EDsCODE commented Mar 31, 2021

View reviewed changes

more missing params

3ffb131

EDsCODE temporarily deployed to posthog-pr-3828 March 31, 2021 20:20 Inactive

Merge branch 'master' into swap-distinct-id-querying

084ea8a

EDsCODE temporarily deployed to posthog-pr-3828 April 1, 2021 13:41 Inactive

EDsCODE marked this pull request as ready for review April 1, 2021 16:07

fuziontech approved these changes Apr 1, 2021

View reviewed changes

EDsCODE merged commit 954069b into master Apr 1, 2021

EDsCODE deleted the swap-distinct-id-querying branch April 1, 2021 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swap out person_distinct_id in queries with subquery #3828

Swap out person_distinct_id in queries with subquery #3828

EDsCODE commented Mar 31, 2021 •

edited

Loading

EDsCODE commented Mar 31, 2021

EDsCODE Mar 31, 2021

fuziontech left a comment

Swap out person_distinct_id in queries with subquery #3828

Swap out person_distinct_id in queries with subquery #3828

Conversation

EDsCODE commented Mar 31, 2021 • edited Loading

Changes

Checklist

EDsCODE commented Mar 31, 2021

EDsCODE Mar 31, 2021

Choose a reason for hiding this comment

fuziontech left a comment

Choose a reason for hiding this comment

EDsCODE commented Mar 31, 2021 •

edited

Loading