Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cohorts): Backwards compatibility of groups and properties #9462

Merged
merged 119 commits into from
Apr 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
a2cfcc6
add fields to property
EDsCODE Apr 14, 2022
551fad8
add validatoin
EDsCODE Apr 14, 2022
b13f25e
fix naming'
EDsCODE Apr 14, 2022
c1a5884
fix errors
EDsCODE Apr 14, 2022
75d70aa
example
EDsCODE Apr 14, 2022
e1069b4
example implementations
EDsCODE Apr 14, 2022
2f976f8
remove none
EDsCODE Apr 14, 2022
59892a6
more typing
EDsCODE Apr 14, 2022
593c14e
change condition
EDsCODE Apr 15, 2022
693e25a
add terrible draft of lifecycle query
neilkakkar Apr 15, 2022
5e76c2c
move around date query
neilkakkar Apr 15, 2022
7383efa
one random test to satisfy
neilkakkar Apr 15, 2022
6534f00
add funnel persons subquery
neilkakkar Apr 15, 2022
2a43740
basic func
EDsCODE Apr 18, 2022
fe85668
use key as event
EDsCODE Apr 18, 2022
33a1d07
use key as event
EDsCODE Apr 18, 2022
bf4b76b
merge base branch
EDsCODE Apr 18, 2022
2dc7eca
change condition
EDsCODE Apr 18, 2022
db1c6ff
change to countif
EDsCODE Apr 18, 2022
cd44a29
add base query conditions
EDsCODE Apr 18, 2022
8dc1e59
add comments
EDsCODE Apr 18, 2022
c98cd41
condition building
EDsCODE Apr 18, 2022
f245c61
person props
EDsCODE Apr 19, 2022
10c5d4d
param cleanup
EDsCODE Apr 19, 2022
27b492e
basic test
EDsCODE Apr 19, 2022
37fe717
stub tests
EDsCODE Apr 19, 2022
f91ddba
remove unnecessary funcs
EDsCODE Apr 19, 2022
2cf01b5
merge new
EDsCODE Apr 19, 2022
fe3b652
adjust typing
EDsCODE Apr 19, 2022
fe685cd
wip
neilkakkar Apr 20, 2022
71420cf
Merge branch 'new-behavioral-filter-types' of github.com:PostHog/post…
neilkakkar Apr 20, 2022
6e36510
add filters to cohorts
neilkakkar Apr 20, 2022
09079e4
add migration, start refactoring property types
neilkakkar Apr 20, 2022
069e914
proof of concept of property refactor
neilkakkar Apr 20, 2022
968690a
add migration too
neilkakkar Apr 20, 2022
e145e86
Merge branch 'master' of github.com:PostHog/posthog into new-behavior…
neilkakkar Apr 20, 2022
02cf63a
Merge branch 'new-behavioral-filter-types' of github.com:PostHog/post…
neilkakkar Apr 20, 2022
da32c54
Merge branch 'new-behavioral-filter-types' of github.com:PostHog/post…
neilkakkar Apr 20, 2022
aa7b033
clean up property types
neilkakkar Apr 20, 2022
f30cd44
slight more clean up, add comments
neilkakkar Apr 20, 2022
990f247
removed optimizer
EDsCODE Apr 20, 2022
661bf17
add comment
EDsCODE Apr 20, 2022
681fb8e
support performed_event_regularly
rcmarron Apr 20, 2022
f69a868
Merge branch 'feat/cohort-query' of github.com:PostHog/posthog into f…
EDsCODE Apr 20, 2022
0f583ea
remove unneeded
EDsCODE Apr 20, 2022
75e9137
sql params on regular query
rcmarron Apr 20, 2022
3a26ed5
more tests
rcmarron Apr 21, 2022
e352f25
Merge branch 'master' into feat/cohort-query
EDsCODE Apr 21, 2022
0a799e1
Merge branch 'feat/cohort-query' of github.com:PostHog/posthog into f…
EDsCODE Apr 21, 2022
ca7833c
Merge branch 'master' into new-behavioral-filter-types
EDsCODE Apr 21, 2022
1612680
merge base
EDsCODE Apr 21, 2022
46e49ac
typing
EDsCODE Apr 21, 2022
ea969d7
some more clean up, enable person property push downs
neilkakkar Apr 21, 2022
a106247
add snapshots too
neilkakkar Apr 21, 2022
0e6f2c9
clean up types and validation
neilkakkar Apr 21, 2022
33a7738
fix typing
neilkakkar Apr 21, 2022
1971e12
remove unused snapshot
neilkakkar Apr 21, 2022
d528644
resolve conflicts
neilkakkar Apr 21, 2022
a9861ee
merge master
neilkakkar Apr 21, 2022
3ff1445
fix bug
neilkakkar Apr 21, 2022
c45d72b
Update ee/clickhouse/queries/cohort_query.py
EDsCODE Apr 21, 2022
27d02dd
merge master
neilkakkar Apr 21, 2022
d801974
oops
neilkakkar Apr 21, 2022
9ae5021
update snapshots
EDsCODE Apr 21, 2022
d564c60
use materialized person props
EDsCODE Apr 21, 2022
efba0ba
update fields
EDsCODE Apr 21, 2022
7c51246
add date value filter and tests
EDsCODE Apr 21, 2022
a299a3a
event sequence join
EDsCODE Apr 22, 2022
1feebc7
more join magic
EDsCODE Apr 22, 2022
e3228b9
typing
EDsCODE Apr 22, 2022
8ef465e
build fields
EDsCODE Apr 22, 2022
971219f
Merge branch 'feat/cohort-query' of github.com:PostHog/posthog into c…
neilkakkar Apr 22, 2022
d738685
Merge branch 'master' of github.com:PostHog/posthog into cohorts-new-…
neilkakkar Apr 22, 2022
1cc4f10
see which tests fail
neilkakkar Apr 22, 2022
2eb4916
resolve migrations conflict
neilkakkar Apr 22, 2022
be1cb1e
add some more tests
rcmarron Apr 22, 2022
6abf6ea
fix first_time and event_restarted query
rcmarron Apr 22, 2022
bc71d59
add negation handling
EDsCODE Apr 22, 2022
58de4b2
update snapshots
EDsCODE Apr 22, 2022
d3c6245
add tests for multiple events case
rcmarron Apr 22, 2022
6dce787
update stopped performing implementation
rcmarron Apr 22, 2022
61fd3aa
regularly period validation
rcmarron Apr 22, 2022
dac4b4e
old cohort support
EDsCODE Apr 22, 2022
ff4dd04
Merge branch 'feat/cohort-query' of github.com:PostHog/posthog into f…
EDsCODE Apr 22, 2022
9e1a837
clean up + some tests
rcmarron Apr 22, 2022
7ffc73f
fix some cohort tests issues
neilkakkar Apr 25, 2022
e0cd0b4
Merge branch 'master' of github.com:PostHog/posthog into feat/cohort-…
neilkakkar Apr 25, 2022
e2539be
Merge branch 'feat/cohort-query' of github.com:PostHog/posthog into c…
neilkakkar Apr 25, 2022
9c5a1ea
remove migrations from this pr
neilkakkar Apr 25, 2022
3ca8d4e
clean up
neilkakkar Apr 25, 2022
95ad465
support negations
neilkakkar Apr 25, 2022
1106af0
update snapshots
EDsCODE Apr 25, 2022
588fa62
Merge branch 'feat/cohort-query' of github.com:PostHog/posthog into c…
neilkakkar Apr 25, 2022
dcde45a
more tests
neilkakkar Apr 25, 2022
8c1b3d8
merge master
EDsCODE Apr 25, 2022
ac09c0a
did all tests just pass?
neilkakkar Apr 26, 2022
8c1cb1f
fix new failing tests
neilkakkar Apr 26, 2022
e495fdb
fix tests, try new format for cohort insertions
neilkakkar Apr 26, 2022
1cdae2d
inter-test dependency resolve
neilkakkar Apr 26, 2022
994c97c
gracefully handle empty cohorts
neilkakkar Apr 26, 2022
aec7f12
add type annotation
neilkakkar Apr 26, 2022
7d2dee4
address comments
neilkakkar Apr 26, 2022
97ae9d5
remove properties select in cohort removal query
neilkakkar Apr 26, 2022
4ea6ac0
address comment
neilkakkar Apr 26, 2022
0d835fa
update snapshots
EDsCODE Apr 26, 2022
a019949
add freeze time
EDsCODE Apr 26, 2022
80c2e63
fix some tests
neilkakkar Apr 27, 2022
00807bb
remove the Xs
neilkakkar Apr 27, 2022
3337d9c
more test fixes and clean up
neilkakkar Apr 27, 2022
a6f6a9c
raise on cyclic dependencies instead
neilkakkar Apr 27, 2022
3775859
fixes
neilkakkar Apr 27, 2022
1b3052f
test waters with parallel execution
neilkakkar Apr 27, 2022
b5310c1
merge master resolve conflicts
neilkakkar Apr 27, 2022
18feee6
fixes
neilkakkar Apr 27, 2022
d25a85a
update tests
neilkakkar Apr 27, 2022
68bacf2
final test
neilkakkar Apr 27, 2022
452ba39
clean up
neilkakkar Apr 27, 2022
3937274
more test fixes
neilkakkar Apr 27, 2022
0f23eb4
gahhhhh
neilkakkar Apr 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 82 additions & 28 deletions ee/clickhouse/models/cohort.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import uuid
from datetime import datetime, timedelta
from typing import Any, Dict, List, Optional, Tuple, Union
from typing import Any, Dict, List, Optional, Set, Tuple, Union

import structlog
from dateutil import parser
Expand Down Expand Up @@ -38,38 +38,57 @@


def format_person_query(
cohort: Cohort, index: int, *, custom_match_field: str = "person_id"
cohort: Cohort,
index: int,
*,
custom_match_field: str = "person_id",
cohorts_seen: Optional[Set[int]] = None,
using_new_query: bool = False,
) -> Tuple[str, Dict[str, Any]]:
filters = []
params: Dict[str, Any] = {}

if cohort.is_static:
return format_static_cohort_query(cohort.pk, index, prepend="", custom_match_field=custom_match_field)

or_queries = []
groups = cohort.groups
if using_new_query:
if not cohort.properties.values:
# No person can match an empty cohort
return "0 = 19", {}

if not groups:
# No person can match a cohort that has no match groups
return "0 = 19", {}
from ee.clickhouse.queries.cohort_query import CohortQuery

for group_idx, group in enumerate(groups):
if group.get("action_id") or group.get("event_id"):
entity_query, entity_params = get_entity_cohort_subquery(cohort, group, group_idx)
params = {**params, **entity_params}
filters.append(entity_query)
query, params = CohortQuery(
Filter(data={"properties": cohort.properties}), cohort.team, cohort_pk=cohort.pk, cohorts_seen=cohorts_seen
).get_query()

elif group.get("properties"):
prop_query, prop_params = get_properties_cohort_subquery(cohort, group, group_idx)
or_queries.append(prop_query)
params = {**params, **prop_params}
return f"{custom_match_field} IN ({query})", params

else:
filters = []
params = {}

or_queries = []
groups = cohort.groups

if not groups:
# No person can match a cohort that has no match groups
return "0 = 19", {}

for group_idx, group in enumerate(groups):
if group.get("action_id") or group.get("event_id"):
entity_query, entity_params = get_entity_cohort_subquery(cohort, group, group_idx)
params = {**params, **entity_params}
filters.append(entity_query)

elif group.get("properties"):
prop_query, prop_params = get_properties_cohort_subquery(cohort, group, group_idx)
or_queries.append(prop_query)
params = {**params, **prop_params}

if len(or_queries) > 0:
query = "AND ({})".format(" OR ".join(or_queries))
filters.append("{} IN {}".format(custom_match_field, GET_LATEST_PERSON_ID_SQL.format(query=query)))
if len(or_queries) > 0:
query = "AND ({})".format(" OR ".join(or_queries))
filters.append("{} IN {}".format(custom_match_field, GET_LATEST_PERSON_ID_SQL.format(query=query)))

joined_filter = " OR ".join(filters)
return joined_filter, params
joined_filter = " OR ".join(filters)
return joined_filter, params


def format_static_cohort_query(
Expand Down Expand Up @@ -239,8 +258,16 @@ def is_precalculated_query(cohort: Cohort) -> bool:
return False


def format_filter_query(cohort: Cohort, index: int = 0, id_column: str = "distinct_id") -> Tuple[str, Dict[str, Any]]:
person_query, params = format_cohort_subquery(cohort, index)
def format_filter_query(
cohort: Cohort,
index: int = 0,
id_column: str = "distinct_id",
cohorts_seen: Optional[Set[int]] = None,
using_new_query: bool = False,
) -> Tuple[str, Dict[str, Any]]:
person_query, params = format_cohort_subquery(
cohort, index, cohorts_seen=cohorts_seen, using_new_query=using_new_query
)

person_id_query = CALCULATE_COHORT_PEOPLE_SQL.format(
query=person_query,
Expand All @@ -250,12 +277,24 @@ def format_filter_query(cohort: Cohort, index: int = 0, id_column: str = "distin
return person_id_query, params


def format_cohort_subquery(cohort: Cohort, index: int, custom_match_field="person_id") -> Tuple[str, Dict[str, Any]]:
def format_cohort_subquery(
cohort: Cohort,
index: int,
custom_match_field="person_id",
cohorts_seen: Optional[Set[int]] = None,
using_new_query: bool = False,
) -> Tuple[str, Dict[str, Any]]:
is_precalculated = is_precalculated_query(cohort)
person_query, params = (
format_precalculated_cohort_query(cohort.pk, index, custom_match_field=custom_match_field)
if is_precalculated
else format_person_query(cohort, index, custom_match_field=custom_match_field)
else format_person_query(
cohort,
index,
custom_match_field=custom_match_field,
cohorts_seen=cohorts_seen,
using_new_query=using_new_query,
)
)
return person_query, params

Expand Down Expand Up @@ -296,6 +335,21 @@ def insert_static_cohort(person_uuids: List[Optional[uuid.UUID]], cohort_id: int
sync_execute(INSERT_PERSON_STATIC_COHORT, persons)


def recalculate_cohortpeople_with_new_query(cohort: Cohort) -> Optional[int]:
cohort_filter, cohort_params = format_person_query(cohort, 0, custom_match_field="id", using_new_query=True)

count = sync_execute(
f"""
SELECT COUNT(1)
FROM person
WHERE {cohort_filter}
""",
{**cohort_params, "team_id": cohort.team_id, "cohort_id": cohort.pk},
)[0][0]

return count


def recalculate_cohortpeople(cohort: Cohort) -> Optional[int]:
cohort_filter, cohort_params = format_person_query(cohort, 0, custom_match_field="id")

Expand Down
164 changes: 140 additions & 24 deletions ee/clickhouse/models/test/__snapshots__/test_cohort.ambr
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,30 @@
# name: TestCohort.test_cohortpeople_action_count.10
'

SELECT count(*)
FROM
(SELECT 1
FROM cohortpeople
WHERE team_id = 2
AND cohort_id = 2
GROUP BY person_id,
cohort_id,
team_id
HAVING sum(sign) > 0)
SELECT COUNT(1)
FROM person
WHERE id IN
(SELECT behavior_query.person_id AS id
FROM
(SELECT pdi.person_id AS person_id,
countIf(timestamp > now() - INTERVAL 3 day
AND timestamp < now()
AND ((event = '$pageview'))) = 1 AS performed_event_multiple_condition_level_level_level_0
FROM events e
INNER JOIN
(SELECT distinct_id,
argMax(person_id, version) as person_id
FROM person_distinct_id2
WHERE team_id = 2
GROUP BY distinct_id
HAVING argMax(is_deleted, version) = 0) AS pdi ON e.distinct_id = pdi.distinct_id
WHERE team_id = 2
AND event IN ['$pageview']
AND timestamp <= now()
AND timestamp >= now() - INTERVAL 3 day
GROUP BY person_id) behavior_query
WHERE 1 = 1
AND (((performed_event_multiple_condition_level_level_level_0))) )
'
---
# name: TestCohort.test_cohortpeople_action_count.11
Expand All @@ -50,14 +64,7 @@
where cohort_id = 2
'
---
# name: TestCohort.test_cohortpeople_action_count.2
'
SELECT person_id
FROM cohortpeople
where cohort_id = 2
'
---
# name: TestCohort.test_cohortpeople_action_count.3
# name: TestCohort.test_cohortpeople_action_count.12
'

SELECT count(*)
Expand All @@ -72,7 +79,7 @@
HAVING sum(sign) > 0)
'
---
# name: TestCohort.test_cohortpeople_action_count.4
# name: TestCohort.test_cohortpeople_action_count.13
'

SELECT count(*)
Expand All @@ -87,14 +94,79 @@
HAVING sum(sign) > 0)
'
---
# name: TestCohort.test_cohortpeople_action_count.5
# name: TestCohort.test_cohortpeople_action_count.14
'

SELECT COUNT(1)
FROM person
WHERE id IN
(SELECT behavior_query.person_id AS id
FROM
(SELECT pdi.person_id AS person_id,
countIf(timestamp > now() - INTERVAL 3 day
AND timestamp < now()
AND ((event = '$pageview'))) > 0 AS performed_event_condition_level_level_level_0
FROM events e
INNER JOIN
(SELECT distinct_id,
argMax(person_id, version) as person_id
FROM person_distinct_id2
WHERE team_id = 2
GROUP BY distinct_id
HAVING argMax(is_deleted, version) = 0) AS pdi ON e.distinct_id = pdi.distinct_id
WHERE team_id = 2
AND event IN ['$pageview']
AND timestamp <= now()
AND timestamp >= now() - INTERVAL 3 day
GROUP BY person_id) behavior_query
WHERE 1 = 1
AND (((performed_event_condition_level_level_level_0))) )
'
---
# name: TestCohort.test_cohortpeople_action_count.15
'
SELECT person_id
FROM cohortpeople
where cohort_id = 2
'
---
# name: TestCohort.test_cohortpeople_action_count.6
# name: TestCohort.test_cohortpeople_action_count.2
'

SELECT COUNT(1)
FROM person
WHERE id IN
(SELECT behavior_query.person_id AS id
FROM
(SELECT pdi.person_id AS person_id,
countIf(timestamp > now() - INTERVAL 3 day
AND timestamp < now()
AND ((event = '$pageview'))) >= 2 AS performed_event_multiple_condition_level_level_level_0
FROM events e
INNER JOIN
(SELECT distinct_id,
argMax(person_id, version) as person_id
FROM person_distinct_id2
WHERE team_id = 2
GROUP BY distinct_id
HAVING argMax(is_deleted, version) = 0) AS pdi ON e.distinct_id = pdi.distinct_id
WHERE team_id = 2
AND event IN ['$pageview']
AND timestamp <= now()
AND timestamp >= now() - INTERVAL 3 day
GROUP BY person_id) behavior_query
WHERE 1 = 1
AND (((performed_event_multiple_condition_level_level_level_0))) )
'
---
# name: TestCohort.test_cohortpeople_action_count.3
'
SELECT person_id
FROM cohortpeople
where cohort_id = 2
'
---
# name: TestCohort.test_cohortpeople_action_count.4
'

SELECT count(*)
Expand All @@ -109,7 +181,7 @@
HAVING sum(sign) > 0)
'
---
# name: TestCohort.test_cohortpeople_action_count.7
# name: TestCohort.test_cohortpeople_action_count.5
'

SELECT count(*)
Expand All @@ -124,13 +196,57 @@
HAVING sum(sign) > 0)
'
---
# name: TestCohort.test_cohortpeople_action_count.8
# name: TestCohort.test_cohortpeople_action_count.6
'

SELECT COUNT(1)
FROM person
WHERE id IN
(SELECT behavior_query.person_id AS id
FROM
(SELECT pdi.person_id AS person_id,
countIf(timestamp > now() - INTERVAL 3 day
AND timestamp < now()
AND ((event = '$pageview'))) <= 1 AS performed_event_multiple_condition_level_level_level_0
FROM events e
INNER JOIN
(SELECT distinct_id,
argMax(person_id, version) as person_id
FROM person_distinct_id2
WHERE team_id = 2
GROUP BY distinct_id
HAVING argMax(is_deleted, version) = 0) AS pdi ON e.distinct_id = pdi.distinct_id
WHERE team_id = 2
AND event IN ['$pageview']
AND timestamp <= now()
AND timestamp >= now() - INTERVAL 3 day
GROUP BY person_id) behavior_query
WHERE 1 = 1
AND (((performed_event_multiple_condition_level_level_level_0))) )
'
---
# name: TestCohort.test_cohortpeople_action_count.7
'
SELECT person_id
FROM cohortpeople
where cohort_id = 2
'
---
# name: TestCohort.test_cohortpeople_action_count.8
'

SELECT count(*)
FROM
(SELECT 1
FROM cohortpeople
WHERE team_id = 2
AND cohort_id = 2
GROUP BY person_id,
cohort_id,
team_id
HAVING sum(sign) > 0)
'
---
# name: TestCohort.test_cohortpeople_action_count.9
'

Expand Down Expand Up @@ -163,4 +279,4 @@
WHERE cohort_id = %(_cohort_id_0)s
AND team_id = %(team_id)s)
'
---
---
Loading