Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #169 break down by cohort #690

Merged
merged 18 commits into from
May 11, 2020
Merged

Closes #169 break down by cohort #690

merged 18 commits into from
May 11, 2020

Conversation

timgl
Copy link
Collaborator

@timgl timgl commented Apr 29, 2020

Changes

  • Add multiple selection of cohorts in breakdown
  • Add support in API

Todo:

  • Precalculate people ids (or distinct_ids) of cohort

Checklist

  • All querysets/queries filter by Team (if applicable)
  • Backend tests (if applicable)

@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb April 29, 2020 21:01 Inactive
@timgl
Copy link
Collaborator Author

timgl commented Apr 29, 2020

@EDsCODE @mariusandra I think we need to pre-calculate which people are in cohorts (every 30 mins or so maybe?) to make breakdown by cohorts perform acceptably. I’m torn between storing person_id or distinct_id directly. Advantage of distinct_id is that it will save another query on every loop of events (which could be millions) so should make the /trends page a bit faster. Person_id just feels a bit more logical, especially in person.py for example. Thoughts?

Edit: another thought, maybe we should store person_id against the event? That would definitely speed funnels/paths etc up as you don't have to connect distinct_ids with the person first. Then we could precalculate person_ids here too.

@mariusandra
Copy link
Collaborator

@timgl tricky situation. I'm not sure if storing person_id directly with events (and going through the trouble of updating them with alias events) will speed things up that much.

Regarding cohorts, yeah, finding people who belong in a cohort is a massive query right now. Denormalising this is definitely something to consider strongly. 30 minutes sounds rather slow though, is there any way to have this near-real-time, for example in a background job that runs when any (or last) action that makes up a cohort is run?

Is it actually also possible that people could be removed from cohorts? Only with the passage of time or the cohort being changed I guess?
image

@EDsCODE
Copy link
Member

EDsCODE commented Apr 30, 2020

Going off both of your points, I agree we should have calculated table of the cohorts where it's just a table of people indexed by cohort. I'm thinking instead of doing a cron job type system where it calculates at preset intervals, to achieve close to real time, we could have that anytime an action happens, a job is dispatched (maybe using the worker system that we have and only dispatch if some cohort is using it as Marius said) and the job will go determine if the user needs to be added into the cohort related to the action. If yes, then it would be a simple addition of a row into the above-mentioned table.

Drawback is that it adds a fair bit of complexity and there will be a lot of unnecessary checking since a job is dispatched every single time an event happens so. If we could figure out how to reduce that it could work (can also batch the processing). Also, if cohort properties are changed, we would trigger a full recalculation. The benefit is that as long as the worker queue doesn't get stuck or backed up, retrieving cohort people should be really simple with the table and almost always up to date

@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb April 30, 2020 11:11 Inactive
@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb April 30, 2020 11:16 Inactive
@timgl timgl mentioned this pull request Apr 30, 2020
@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb April 30, 2020 11:21 Inactive
@timgl
Copy link
Collaborator Author

timgl commented Apr 30, 2020

Thanks both, I've moved this discussion to #696. I suggest we split out the precalculating from this PR as it's already chunky. That means this PR is ready for review :)

@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb April 30, 2020 22:05 Inactive
@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 1, 2020 09:40 Inactive
@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 4, 2020 10:46 Inactive
Copy link
Member

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic for breaking down looks good. Missing logic for returning people and a few NIT comments below.

UX here could be better if the dropdown doesn't auto open onclick.
Screen Shot 2020-05-04 at 1 34 44 PM

frontend/src/scenes/trends/BreakdownFilter.js Outdated Show resolved Hide resolved
frontend/src/scenes/trends/BreakdownFilter.js Outdated Show resolved Hide resolved
@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 5, 2020 14:08 Inactive
@EDsCODE EDsCODE self-requested a review May 6, 2020 15:19
Copy link
Member

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One bug found below.

I was testing cohorts that were created by filter but something thing I found which may have to do with me not fully understanding how the breakdown is supposed to work, but if I create a cohort of users with action $pageview and I attempt to breakdown by that cohort with entity $pageviews selected I get nothing. Seems like it should have something to show because all the users within the cohort have done a $pageview
Screen Shot 2020-05-06 at 1 33 09 PM

posthog/api/action.py Outdated Show resolved Hide resolved
@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 7, 2020 17:58 Inactive
@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 7, 2020 20:54 Inactive
@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 7, 2020 21:09 Inactive
@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 7, 2020 21:24 Inactive
@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 8, 2020 09:49 Inactive
@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 8, 2020 10:11 Inactive
@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 8, 2020 20:13 Inactive
@timgl
Copy link
Collaborator Author

timgl commented May 11, 2020

@EDsCODE Should be ready for re-QA

Copy link
Member

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool! QA'd and changes since last review look good. a few merge conflicts that should be trivial

@timgl timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 11, 2020 20:53 Inactive
@timgl
Copy link
Collaborator Author

timgl commented May 11, 2020

Re-tested that migration just to be sure, looks good. Thanks for QA in!

@timgl timgl merged commit 8e6b4f5 into master May 11, 2020
@timgl timgl deleted the 169-breakdown-cohorts branch May 11, 2020 21:06
fuziontech added a commit to fuziontech/posthog that referenced this pull request May 11, 2020
* upstream/master:
  Closes PostHog#169 break down by cohort (PostHog#690)
  703 multiple dashboards (PostHog#740)
  Use person_id instead of distinct_id for unique count (PostHog#734)
  new contributors (PostHog#739)
  Update Trends dotted line UX (PostHog#735)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants