Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cohorts): Backwards compatibility of groups and properties #9462

Merged
merged 119 commits into from
Apr 27, 2022

Conversation

neilkakkar
Copy link
Collaborator

Two things in this PR:

  1. Proof of concept of properties refactor. Instead of having a new property type for each kind of behavioural filter, I want to incorporate these into one. The reason is that earlier, everywhere in the app that wanted to discard a property type simply had to do something like: prop.type != 'person'. Now, this basically has to be prop.type not in (all-possible-behavioural-filters-list), which is a bit annoying and hard to maintain. Thus, changing this to prop.type != 'behavioural', which will always encompass all behavioural types we add.

  2. Retire groups for cohorts, use property groups instead. This needs to interlink with the new query to be complete, but the current slice can go in independently. Basically, groups will keep working as before, but our backend should start using cohort.properties for all computations. We can test that everything is okay when removing cohort.groups doesn't break anything (however, we won't remove it actually, to support migration of frontend)

Changes

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

How did you test this code?

Copy link
Member

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions to address. Otherwise lgtm

@neilkakkar
Copy link
Collaborator Author

Suspiciously flakey, this test: ee/clickhouse/models/test/test_cohort.py::TestCohort::test_cohortpeople_prop_changed -- https://github.com/PostHog/posthog/runs/6179098641?check_suite_focus=true

I thought it was an inter-dependency issue, but it failed again once on newer CH. Hmmm, want to get to the bottom of this before I merge this PR

Copy link
Contributor

@rcmarron rcmarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! A few thoughts in the comments

time_value=group.get("days"),
operator=group.get("count_operator"),
operator_value=group.get("count"),
negation=group.get("count") == 0 and group.get("count_operator") in ["lte", "eq"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we aren't fully backward compatible, we should think about how we should fail here. Right now, there's validation in the query class that raises on the following cases:

  • The negation field is only if it's in an AND group and not the first item
  • The "count" value is never allowed to be 0. This applies to all operators (=, >=, and <=)

Do we want to make a best effort at recreating the old groups with properties and let the query fail and front-end validation show the error? If so, I'd lean toward just setting the operators + values and not touching the negation, but if we want to touch the negation, then do we also need to reverse the operators (e.g. =0 -> !performed_event)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, sure thing, I won't touch negation here then, since there's no way, given new constraints, it will work for any existing old group, as far as I can tell.

Hmm, does raise this question for me, that why do we have these new restrictions on negations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is that it allows users to do confusing things that they probably don't intend - and the result is a giant cohort (which isn't great).

For example, if you say "users who never performed an insight analyzed" on our account, you're going to get a result that includes every user who ever visited the marketing page. While this is technically correct, it probably isn't what the user wants.

The user can still get to the same result by saying "users did $pageview AND who never performed an insight analyzed", but they have to take the one further step showing they understand what they're asking for.

posthog/models/cohort.py Show resolved Hide resolved
posthog/models/cohort.py Show resolved Hide resolved
posthog/models/property.py Show resolved Hide resolved
posthog/test/test_cohort_model.py Show resolved Hide resolved
@@ -34,14 +34,16 @@ def test_insert_by_distinct_id_or_email(self):
@pytest.mark.ee
def test_calculating_cohort_clickhouse(self):
person1 = Person.objects.create(
distinct_ids=["person1"], team_id=self.team.pk, properties={"$some_prop": "something"}
distinct_ids=["person1"], team_id=self.team.pk, properties={"$some_propX": "something"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 What's going on here? Does the test pass if the prop is $some_prop?

posthog/test/test_feature_flag.py Outdated Show resolved Hide resolved
ee/clickhouse/queries/cohort_query.py Outdated Show resolved Hide resolved
@EDsCODE
Copy link
Member

EDsCODE commented Apr 26, 2022

Suspiciously flakey, this test: ee/clickhouse/models/test/test_cohort.py::TestCohort::test_cohortpeople_prop_changed -- https://github.com/PostHog/posthog/runs/6179098641?check_suite_focus=true

I thought it was an inter-dependency issue, but it failed again once on newer CH. Hmmm, want to get to the bottom of this before I merge this PR

  • I added a freezetime clause using relative time to try to address this because that was the main difference with the test on master. It was explicitly a day apart before.
  • I also noticed something that could be noteworthy. If you look at the generated query before these changes and after, we're missing a few joins that might have been helping ensure the result is correct. For example, previously, the spaghetti of JOINs included distinct_id2 joins and a "person_max" join.

I need to think more on this as I'm not quite sure how either point above can cause the flakiness yet

@rcmarron rcmarron self-requested a review April 27, 2022 19:59
@rcmarron rcmarron dismissed their stale review April 27, 2022 19:59

Addressed

@EDsCODE EDsCODE merged commit e531d0d into master Apr 27, 2022
@EDsCODE EDsCODE deleted the cohorts-new-model branch April 27, 2022 20:29
fuziontech added a commit that referenced this pull request Apr 28, 2022
* master: (137 commits)
  feat(cohorts): add cohort filter grammars (#9540)
  feat(cohorts): Backwards compatibility of groups and properties (#9462)
  perf(ingestion): unsubscribe from buffer topic while no events are produced to it (#9556)
  fix: Fix `Loading` positioning and `LemonButton` disabled state (#9554)
  test: Speed up backend tests (#9289)
  fix: LemonSpacer -> LemonDivider (#9549)
  feat(funnels): Highlight significant deviations in new funnel viz (#9536)
  docs(storybook): Lemon UI (#9426)
  feat: add support for list of teams to enable the conversion buffer for (#9542)
  chore(onboarding): cleanup framework grid experiment (#9527)
  fix(signup): domain provisioning on cloud (#9515)
  chore: split out async migrations ci (#9539)
  feat(ingestion): enable json ingestion for self-hosted by default (#9448)
  feat(cohort): add all cohort filter selectors to Storybook (#9492)
  feat(ingestion): conversion events buffer consumer (#9432)
  ci(run-backend-tests): remove CH version default (#9532)
  feat: Add person info to events (#9404)
  feat(ingestion): produce to buffer partitioned by team_id:distinct_id (#9518)
  fix: bring latest_migrations.manifest up to date (#9525)
  chore: removes unused feature flag (#9529)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants