-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new person matieralized #1944
Merged
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
05fc249
add new table migrations and change table names
EDsCODE eb710eb
include necessaray config for new tables in tests
EDsCODE 3937a1c
fix tests and table
EDsCODE 805e345
fix table name param
EDsCODE f93f0ce
add populate clause
EDsCODE c63a2a8
added table for key value person props
EDsCODE f5a86fe
adjust person filtering to use new table
EDsCODE 7d75527
.
EDsCODE 8d191ee
merge master
EDsCODE 8115b14
add ordering on updated_at
EDsCODE 5481cc5
add back all the condition handling on persons filtering endpoint
EDsCODE 19019b2
fix typgin
EDsCODE d3e283e
merge master
EDsCODE f367380
Merge branch 'ch-person-materialized-aggregate' of https://github.com…
EDsCODE 6b12caf
Merge branch 'master' into ch-person-materialized-aggregate
EDsCODE a4d8ca7
remove print
EDsCODE 5afbefb
re-order sort key for persons_up_to_date
fuziontech File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
from infi.clickhouse_orm import migrations # type: ignore | ||
|
||
from ee.clickhouse.sql.person import ( | ||
MAT_PERSONS_PROP_TABLE_SQL, | ||
MAT_PERSONS_WITH_PROPS_TABLE_SQL, | ||
PERSONS_PROP_UP_TO_DATE_VIEW, | ||
PERSONS_UP_TO_DATE_MATERIALIZED_VIEW, | ||
PERSONS_UP_TO_DATE_VIEW, | ||
PERSONS_WITH_PROPS_TABLE_SQL, | ||
) | ||
|
||
operations = [ | ||
migrations.RunSQL(PERSONS_UP_TO_DATE_MATERIALIZED_VIEW), | ||
migrations.RunSQL(PERSONS_UP_TO_DATE_VIEW), | ||
migrations.RunSQL(PERSONS_WITH_PROPS_TABLE_SQL), | ||
migrations.RunSQL(MAT_PERSONS_WITH_PROPS_TABLE_SQL), | ||
migrations.RunSQL(MAT_PERSONS_PROP_TABLE_SQL), | ||
migrations.RunSQL(PERSONS_PROP_UP_TO_DATE_VIEW), | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,30 @@ | |
DROP TABLE person_distinct_id | ||
""" | ||
|
||
DROP_PERSON_MATERIALIZED_SQL = """ | ||
DROP TABLE persons_up_to_date | ||
""" | ||
|
||
DROP_PERSON_VIEW_SQL = """ | ||
DROP VIEW persons_up_to_date_view | ||
""" | ||
|
||
DROP_PERSONS_WITH_ARRAY_PROPS_TABLE_SQL = """ | ||
DROP TABLE persons_with_array_props_view | ||
""" | ||
|
||
DROP_MAT_PERSONS_WITH_ARRAY_PROPS_TABLE_SQL = """ | ||
DROP TABLE persons_with_array_props_mv | ||
""" | ||
|
||
DROP_MAT_PERSONS_PROP_TABLE_SQL = """ | ||
DROP TABLE persons_properties_view | ||
""" | ||
|
||
DROP_PERSONS_PROP_UP_TO_DATE_VIEW_SQL = """ | ||
DROP VIEW persons_properties_up_to_date_view | ||
""" | ||
|
||
PERSONS_TABLE = "person" | ||
|
||
PERSONS_TABLE_BASE_SQL = """ | ||
|
@@ -37,7 +61,7 @@ | |
) | ||
|
||
KAFKA_PERSONS_TABLE_SQL = PERSONS_TABLE_BASE_SQL.format( | ||
table_name="kafka_" + PERSONS_TABLE, engine=kafka_engine(KAFKA_PERSON), extra_fields="" | ||
table_name="kafka_" + PERSONS_TABLE, engine=kafka_engine(KAFKA_PERSON), extra_fields="", | ||
) | ||
|
||
PERSONS_TABLE_MV_SQL = """ | ||
|
@@ -56,9 +80,101 @@ | |
table_name=PERSONS_TABLE | ||
) | ||
|
||
PERSONS_UP_TO_DATE_MATERIALIZED_VIEW = """ | ||
CREATE MATERIALIZED VIEW persons_up_to_date | ||
ENGINE = AggregatingMergeTree() ORDER BY ( | ||
team_id, | ||
updated_at, | ||
id | ||
) | ||
POPULATE | ||
AS SELECT | ||
id, | ||
argMaxState(team_id, created_at) team_id, | ||
argMaxState(is_identified, created_at) is_identified, | ||
argMaxState(properties, created_at) properties, | ||
minState(created_at) created_at_, | ||
maxState(created_at) updated_at | ||
FROM {table_name} | ||
GROUP BY id | ||
""".format( | ||
table_name=PERSONS_TABLE | ||
) | ||
|
||
PERSONS_UP_TO_DATE_VIEW = """ | ||
CREATE VIEW persons_up_to_date_view | ||
AS | ||
SELECT | ||
id, | ||
minMerge(created_at_) as created_at, | ||
argMaxMerge(team_id) as team_id, | ||
argMaxMerge(properties) as properties, | ||
argMaxMerge(is_identified) as is_identified, | ||
maxMerge(updated_at) as updated_at | ||
FROM persons_up_to_date | ||
GROUP BY id | ||
""" | ||
|
||
PERSONS_WITH_PROPS_TABLE_SQL = """ | ||
CREATE TABLE persons_with_array_props_view | ||
( | ||
id UUID, | ||
created_at DateTime64, | ||
team_id Int64, | ||
properties VARCHAR, | ||
is_identified Boolean, | ||
array_property_keys Array(VARCHAR), | ||
array_property_values Array(VARCHAR), | ||
_timestamp UInt64, | ||
_offset UInt64 | ||
) ENGINE = {engine} | ||
PARTITION BY toYYYYMM(created_at) | ||
ORDER BY (team_id, toDate(created_at), id) | ||
SAMPLE BY id | ||
{storage_policy} | ||
""".format( | ||
engine=table_engine("persons_with_array_props_view", "_timestamp"), storage_policy=STORAGE_POLICY | ||
) | ||
|
||
MAT_PERSONS_WITH_PROPS_TABLE_SQL = """ | ||
CREATE MATERIALIZED VIEW persons_with_array_props_mv | ||
TO persons_with_array_props_view | ||
AS SELECT | ||
id, | ||
created_at, | ||
team_id, | ||
properties, | ||
is_identified, | ||
arrayMap(k -> toString(k.1), JSONExtractKeysAndValuesRaw(properties)) array_property_keys, | ||
arrayMap(k -> toString(k.2), JSONExtractKeysAndValuesRaw(properties)) array_property_values, | ||
_timestamp, | ||
_offset | ||
FROM person | ||
""" | ||
|
||
MAT_PERSONS_PROP_TABLE_SQL = """ | ||
CREATE MATERIALIZED VIEW persons_properties_view | ||
ENGINE = MergeTree() | ||
ORDER BY (team_id, key, value, id) | ||
AS SELECT | ||
id, | ||
team_id, | ||
array_property_keys as key, | ||
array_property_values as value, | ||
created_at | ||
from persons_with_array_props_view | ||
ARRAY JOIN array_property_keys, array_property_values | ||
""" | ||
|
||
PERSONS_PROP_UP_TO_DATE_VIEW = """ | ||
CREATE VIEW persons_properties_up_to_date_view | ||
AS | ||
SELECT * FROM persons_properties_view WHERE (id, created_at) IN (SELECT id, maxMerge(updated_at) as latest FROM persons_up_to_date GROUP BY id) | ||
""" | ||
|
||
|
||
GET_PERSON_SQL = """ | ||
SELECT * FROM person WHERE team_id = %(team_id)s | ||
SELECT * FROM persons_up_to_date_view WHERE team_id = %(team_id)s | ||
""" | ||
|
||
PERSONS_DISTINCT_ID_TABLE = "person_distinct_id" | ||
|
@@ -87,7 +203,7 @@ | |
) | ||
|
||
KAFKA_PERSONS_DISTINCT_ID_TABLE_SQL = PERSONS_DISTINCT_ID_TABLE_BASE_SQL.format( | ||
table_name="kafka_" + PERSONS_DISTINCT_ID_TABLE, engine=kafka_engine(KAFKA_PERSON_UNIQUE_ID), extra_fields="" | ||
table_name="kafka_" + PERSONS_DISTINCT_ID_TABLE, engine=kafka_engine(KAFKA_PERSON_UNIQUE_ID), extra_fields="", | ||
) | ||
|
||
PERSONS_DISTINCT_ID_TABLE_MV_SQL = """ | ||
|
@@ -114,7 +230,7 @@ | |
""" | ||
|
||
GET_PERSON_BY_DISTINCT_ID = """ | ||
SELECT p.* FROM person as p inner join person_distinct_id as pid on p.id = pid.person_id where team_id = %(team_id)s AND distinct_id = %(distinct_id)s | ||
SELECT p.* FROM persons_up_to_date_view as p inner join person_distinct_id as pid on p.id = pid.person_id where team_id = %(team_id)s AND distinct_id = %(distinct_id)s | ||
""" | ||
|
||
GET_PERSONS_BY_DISTINCT_IDS = """ | ||
|
@@ -168,6 +284,10 @@ | |
ALTER TABLE person DELETE where id = %(id)s | ||
""" | ||
|
||
DELETE_PERSON_MATERIALIZED_BY_ID = """ | ||
ALTER TABLE persons_up_to_date DELETE where id = %(id)s | ||
""" | ||
|
||
DELETE_PERSON_DISTINCT_ID_BY_PERSON_ID = """ | ||
ALTER TABLE person_distinct_id DELETE where person_id = %(id)s | ||
""" | ||
|
@@ -181,61 +301,41 @@ | |
""" | ||
|
||
PEOPLE_THROUGH_DISTINCT_SQL = """ | ||
SELECT id, created_at, team_id, properties, is_identified, groupArray(distinct_id) FROM person INNER JOIN ( | ||
SELECT id, created_at, team_id, properties, is_identified, groupArray(distinct_id) FROM persons_up_to_date_view INNER JOIN ( | ||
SELECT DISTINCT person_id, distinct_id FROM person_distinct_id WHERE distinct_id IN ({content_sql}) | ||
) as pdi ON person.id = pdi.person_id GROUP BY id, created_at, team_id, properties, is_identified | ||
) as pdi ON persons_up_to_date_view.id = pdi.person_id GROUP BY id, created_at, team_id, properties, is_identified | ||
LIMIT 200 OFFSET %(offset)s | ||
""" | ||
|
||
PEOPLE_SQL = """ | ||
SELECT id, created_at, team_id, properties, is_identified, groupArray(distinct_id) FROM person INNER JOIN ( | ||
SELECT id, created_at, team_id, properties, is_identified, groupArray(distinct_id) FROM persons_up_to_date_view INNER JOIN ( | ||
SELECT DISTINCT person_id, distinct_id FROM person_distinct_id WHERE person_id IN ({content_sql}) | ||
) as pdi ON person.id = pdi.person_id GROUP BY id, created_at, team_id, properties, is_identified | ||
) as pdi ON persons_up_to_date_view.id = pdi.person_id GROUP BY id, created_at, team_id, properties, is_identified | ||
LIMIT 200 OFFSET %(offset)s | ||
""" | ||
|
||
PEOPLE_BY_TEAM_SQL = """ | ||
SELECT id, created_at, team_id, properties, is_identified, groupArray(distinct_id) FROM person INNER JOIN ( | ||
SELECT id, created_at, team_id, properties, is_identified, groupArray(distinct_id) FROM persons_up_to_date_view INNER JOIN ( | ||
SELECT DISTINCT person_id, distinct_id FROM person_distinct_id WHERE team_id = %(team_id)s | ||
) as pdi ON person.id = pdi.person_id | ||
) as pdi ON persons_up_to_date_view.id = pdi.person_id | ||
WHERE team_id = %(team_id)s {filters} | ||
GROUP BY id, created_at, team_id, properties, is_identified | ||
LIMIT 100 OFFSET %(offset)s | ||
""" | ||
|
||
GET_PERSON_TOP_PROPERTIES = """ | ||
SELECT key, count(1) as count FROM ( | ||
SELECT | ||
array_property_keys as key, | ||
array_property_values as value | ||
from ( | ||
SELECT | ||
arrayMap(k -> toString(k.1), JSONExtractKeysAndValuesRaw(properties)) AS array_property_keys, | ||
arrayMap(k -> toString(k.2), JSONExtractKeysAndValuesRaw(properties)) AS array_property_values | ||
FROM person WHERE team_id = %(team_id)s | ||
) | ||
ARRAY JOIN array_property_keys, array_property_values | ||
) GROUP BY key ORDER BY count DESC LIMIT %(limit)s | ||
SELECT key, count(1) as count FROM | ||
persons_properties_up_to_date_view | ||
WHERE team_id = %(team_id)s | ||
GROUP BY key ORDER BY count DESC LIMIT %(limit)s | ||
""" | ||
|
||
|
||
GET_DISTINCT_IDS_BY_PROPERTY_SQL = """ | ||
SELECT distinct_id FROM person_distinct_id WHERE person_id {negation}IN | ||
( | ||
SELECT id FROM ( | ||
SELECT | ||
id, | ||
array_property_keys as key, | ||
array_property_values as value | ||
from ( | ||
SELECT | ||
id, | ||
arrayMap(k -> toString(k.1), JSONExtractKeysAndValuesRaw(properties)) AS array_property_keys, | ||
arrayMap(k -> toString(k.2), JSONExtractKeysAndValuesRaw(properties)) AS array_property_values | ||
FROM person WHERE team_id = %(team_id)s | ||
) | ||
ARRAY JOIN array_property_keys, array_property_values | ||
) ep | ||
WHERE {filters} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Goodbye monster query |
||
) AND team_id = %(team_id)s | ||
SELECT id | ||
FROM persons_properties_up_to_date_view AS ep | ||
WHERE {filters} AND team_id = %(team_id)s | ||
) | ||
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to manually push the persons table into this materialized view when we deploy on prod because this is a
TO
table so it doesn't have auto-populating capabilities with thePOPULATE