Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new person matieralized #1944

Merged
merged 17 commits into from Oct 22, 2020
Merged

Conversation

EDsCODE
Copy link
Member

@EDsCODE EDsCODE commented Oct 20, 2020

Changes

Please describe.

  • using aggregation to keep the person table up to date on queries. (This does what the replacingMT would do but in time)
  • splits out person properties into key value stores just like we do for events. however, a further view is used to always take the latest key/value props for a given id
  • fix person endpoint so it handles all the possible filtering correctly

If this affects the front-end, include screenshots.

Checklist

  • All querysets/queries filter by Team (if this PR affects any querysets/queries)
  • Backend tests (if this PR affects the backend)
  • Cypress E2E tests (if this PR affects the front and/or backend)

@timgl timgl temporarily deployed to posthog-ch-person-mater-bgx5vp October 20, 2020 15:07 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 21, 2020 08:39 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 21, 2020 09:41 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 21, 2020 09:47 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 21, 2020 10:03 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 21, 2020 10:57 Inactive
@EDsCODE EDsCODE marked this pull request as ready for review October 21, 2020 10:57
)
ARRAY JOIN array_property_keys, array_property_values
) ep
WHERE {filters}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goodbye monster query

)

MAT_PERSONS_WITH_PROPS_TABLE_SQL = """
CREATE MATERIALIZED VIEW persons_with_array_props_mv
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to manually push the persons table into this materialized view when we deploy on prod because this is a TO table so it doesn't have auto-populating capabilities with the POPULATE

@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 21, 2020 11:08 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 21, 2020 11:13 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 21, 2020 11:33 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 21, 2020 13:47 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 21, 2020 14:59 Inactive
@EDsCODE EDsCODE requested review from fuziontech and timgl and removed request for fuziontech October 21, 2020 15:14
@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 22, 2020 11:50 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 22, 2020 13:53 Inactive
Comment on lines +50 to +70
def _destroy_person_tables(self):
sync_execute(DROP_PERSON_VIEW_SQL)
sync_execute(DROP_PERSON_MATERIALIZED_SQL)
sync_execute(DROP_PERSON_TABLE_SQL)
sync_execute(DROP_PERSON_DISTINCT_ID_TABLE_SQL)

sync_execute(DROP_PERSONS_PROP_UP_TO_DATE_VIEW_SQL)
sync_execute(DROP_MAT_PERSONS_PROP_TABLE_SQL)
sync_execute(DROP_MAT_PERSONS_WITH_ARRAY_PROPS_TABLE_SQL)
sync_execute(DROP_PERSONS_WITH_ARRAY_PROPS_TABLE_SQL)

def _create_person_tables(self):
sync_execute(PERSONS_TABLE_SQL)
sync_execute(PERSONS_DISTINCT_ID_TABLE_SQL)
sync_execute(PERSONS_UP_TO_DATE_MATERIALIZED_VIEW)
sync_execute(PERSONS_UP_TO_DATE_VIEW)

sync_execute(PERSONS_WITH_PROPS_TABLE_SQL)
sync_execute(MAT_PERSONS_WITH_PROPS_TABLE_SQL)
sync_execute(MAT_PERSONS_PROP_TABLE_SQL)
sync_execute(PERSONS_PROP_UP_TO_DATE_VIEW)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is scary

# queryset = queryset.filter(
# Filter(data={"properties": json.loads(request.GET["properties"])}).properties_to_Q(team_id=team.pk)
# )
print(PEOPLE_BY_TEAM_SQL.format(filters=all_filters))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🖨️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

@EDsCODE EDsCODE temporarily deployed to posthog-ch-person-mater-bgx5vp October 22, 2020 15:03 Inactive
Comment on lines 85 to 89
ENGINE = AggregatingMergeTree() ORDER BY (
id,
team_id,
updated_at
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah just realized as I was migrating this:
You want the sort order to be setup so that you can find groups easily, usually with broad to fine granularity.

For instance on this id, team_id, updated_at everything will be first sorted by id which will make the other order keys pretty much useless since id's will be unique.

The best order really depends on how you are going to query the table. In this case I would think that maybe team_id, id might be the best ordering if we are going to be picking persons out of the db by team and id. If we are looking to count the number of persons by period then we will want to do team_id, updated_at, id so that you can say for team x I want to look at people between a and b times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is aggregatingMT we have to have all non agg fields in the sort so the last one is prob best.

@fuziontech fuziontech temporarily deployed to posthog-ch-person-mater-bgx5vp October 22, 2020 20:15 Inactive
@fuziontech fuziontech merged commit 51105ac into master Oct 22, 2020
@fuziontech fuziontech deleted the ch-person-materialized-aggregate branch October 22, 2020 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants