Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swap out person_distinct_id in queries with subquery #3828

Merged
merged 5 commits into from
Apr 1, 2021

Conversation

EDsCODE
Copy link
Member

@EDsCODE EDsCODE commented Mar 31, 2021

Changes

Please describe.

  • instead of directly querying from person_distinct_id use a subquery to ensure the distinct_ids in consideration are always the latest
    If this affects the frontend, include screenshots.

Checklist

  • All querysets/queries filter by Organization, by Team, and by User
  • Django backend tests
  • Jest frontend tests
  • Cypress end-to-end tests

@timgl timgl temporarily deployed to posthog-pr-3828 March 31, 2021 19:19 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-pr-3828 March 31, 2021 19:20 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-pr-3828 March 31, 2021 19:27 Inactive
@EDsCODE
Copy link
Member Author

EDsCODE commented Mar 31, 2021

I did some manual testing across the analytics and the one that I had to skip for significant degraded performance was the lifecycle query

@@ -65,6 +65,13 @@
{query}
"""

GET_LATEST_PERSON_DISTINCT_ID_SQL = """
SELECT * FROM person_distinct_id JOIN (
SELECT distinct_id, max(_offset) as _offset FROM person_distinct_id WHERE team_id = %(team_id)s GROUP BY distinct_id
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using offset because it has more precision than _timestamp since _timestamp doesn't consider seconds

@EDsCODE EDsCODE temporarily deployed to posthog-pr-3828 March 31, 2021 20:20 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-pr-3828 April 1, 2021 13:41 Inactive
@EDsCODE EDsCODE marked this pull request as ready for review April 1, 2021 16:07
Copy link
Member

@fuziontech fuziontech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

I'm just curious if we could maybe test out the impact of changing these queries to optionally use FINAL in the select if it would be faster?

@EDsCODE EDsCODE merged commit 954069b into master Apr 1, 2021
@EDsCODE EDsCODE deleted the swap-distinct-id-querying branch April 1, 2021 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants