Closes #169 break down by cohort #690

timgl · 2020-04-29T21:00:09Z

Changes

Add multiple selection of cohorts in breakdown
Add support in API

Todo:

Precalculate people ids (or distinct_ids) of cohort

Checklist

All querysets/queries filter by Team (if applicable)
Backend tests (if applicable)

timgl · 2020-04-29T21:07:50Z

@EDsCODE @mariusandra I think we need to pre-calculate which people are in cohorts (every 30 mins or so maybe?) to make breakdown by cohorts perform acceptably. I’m torn between storing person_id or distinct_id directly. Advantage of distinct_id is that it will save another query on every loop of events (which could be millions) so should make the /trends page a bit faster. Person_id just feels a bit more logical, especially in person.py for example. Thoughts?

Edit: another thought, maybe we should store person_id against the event? That would definitely speed funnels/paths etc up as you don't have to connect distinct_ids with the person first. Then we could precalculate person_ids here too.

mariusandra · 2020-04-29T21:32:52Z

@timgl tricky situation. I'm not sure if storing person_id directly with events (and going through the trouble of updating them with alias events) will speed things up that much.

Regarding cohorts, yeah, finding people who belong in a cohort is a massive query right now. Denormalising this is definitely something to consider strongly. 30 minutes sounds rather slow though, is there any way to have this near-real-time, for example in a background job that runs when any (or last) action that makes up a cohort is run?

Is it actually also possible that people could be removed from cohorts? Only with the passage of time or the cohort being changed I guess?

EDsCODE · 2020-04-30T02:22:43Z

Going off both of your points, I agree we should have calculated table of the cohorts where it's just a table of people indexed by cohort. I'm thinking instead of doing a cron job type system where it calculates at preset intervals, to achieve close to real time, we could have that anytime an action happens, a job is dispatched (maybe using the worker system that we have and only dispatch if some cohort is using it as Marius said) and the job will go determine if the user needs to be added into the cohort related to the action. If yes, then it would be a simple addition of a row into the above-mentioned table.

Drawback is that it adds a fair bit of complexity and there will be a lot of unnecessary checking since a job is dispatched every single time an event happens so. If we could figure out how to reduce that it could work (can also batch the processing). Also, if cohort properties are changed, we would trigger a full recalculation. The benefit is that as long as the worker queue doesn't get stuck or backed up, retrieving cohort people should be really simple with the table and almost always up to date

timgl · 2020-04-30T11:23:09Z

Thanks both, I've moved this discussion to #696. I suggest we split out the precalculating from this PR as it's already chunky. That means this PR is ready for review :)

EDsCODE

Logic for breaking down looks good. Missing logic for returning people and a few NIT comments below.

UX here could be better if the dropdown doesn't auto open onclick.

frontend/src/scenes/trends/BreakdownFilter.js

EDsCODE

One bug found below.

I was testing cohorts that were created by filter but something thing I found which may have to do with me not fully understanding how the breakdown is supposed to work, but if I create a cohort of users with action $pageview and I attempt to breakdown by that cohort with entity $pageviews selected I get nothing. Seems like it should have something to show because all the users within the cohort have done a $pageview

posthog/api/action.py

timgl · 2020-05-11T15:26:59Z

@EDsCODE Should be ready for re-QA

EDsCODE

very cool! QA'd and changes since last review look good. a few merge conflicts that should be trivial

timgl · 2020-05-11T21:06:31Z

Re-tested that migration just to be sure, looks good. Thanks for QA in!

* upstream/master: Closes PostHog#169 break down by cohort (PostHog#690) 703 multiple dashboards (PostHog#740) Use person_id instead of distinct_id for unique count (PostHog#734) new contributors (PostHog#739) Update Trends dotted line UX (PostHog#735)

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb April 29, 2020 21:01 Inactive

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb April 30, 2020 11:11 Inactive

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb April 30, 2020 11:16 Inactive

timgl mentioned this pull request Apr 30, 2020

Precalculate cohorts #696

Closed

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb April 30, 2020 11:21 Inactive

timgl mentioned this pull request Apr 30, 2020

Make it easier to search the break down on Trends #608

Closed

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb April 30, 2020 22:05 Inactive

timgl force-pushed the 169-breakdown-cohorts branch from a5dbbe1 to 3401761 Compare May 1, 2020 09:40

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 1, 2020 09:40 Inactive

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 4, 2020 10:46 Inactive

EDsCODE requested changes May 4, 2020

View reviewed changes

frontend/src/scenes/trends/BreakdownFilter.js Outdated Show resolved Hide resolved

frontend/src/scenes/trends/BreakdownFilter.js Outdated Show resolved Hide resolved

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 5, 2020 14:08 Inactive

EDsCODE self-requested a review May 6, 2020 15:19

EDsCODE requested changes May 6, 2020

View reviewed changes

posthog/api/action.py Outdated Show resolved Hide resolved

timgl added 10 commits May 7, 2020 15:00

Closes #169 break down by cohort

46df3ac

Fix test

5cef167

Prettier breakdown filters

57859df

Rerender each time it's opened

647406a

Fix tests

3f79323

Add 'all users' option in breakdown

726b7db

Make people work with cohorts

fe773e5

Precalculate cohorts

2e11602

Return last_calculation

a23e749

Show precalculation in the frontend

88c0e9f

timgl force-pushed the 169-breakdown-cohorts branch from c019600 to 88c0e9f Compare May 7, 2020 17:58

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 7, 2020 17:58 Inactive

timgl added 2 commits May 7, 2020 21:20

Closes #675 fix icon alignment

c606aac

Frontend fixes, antd and more

357106b

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 7, 2020 20:54 Inactive

Make sure unsetting breakdown doesn't break things

3551ffb

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 7, 2020 21:09 Inactive

Semantic name for migration

1dfa3e5

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 7, 2020 21:24 Inactive

Separate data migration + cohort key bugfix

9fa4090

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 8, 2020 09:49 Inactive

Correctly filter people by event

a263eca

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 8, 2020 10:11 Inactive

Typing

ca6be3e

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 8, 2020 20:13 Inactive

EDsCODE approved these changes May 11, 2020

View reviewed changes

Merge branch 'master' into 169-breakdown-cohorts

d733b9d

timgl temporarily deployed to posthog-169-breakdown-c-8wizhb May 11, 2020 20:53 Inactive

timgl merged commit 8e6b4f5 into master May 11, 2020

timgl deleted the 169-breakdown-cohorts branch May 11, 2020 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #169 break down by cohort #690

Closes #169 break down by cohort #690

timgl commented Apr 29, 2020 •

edited

Loading

timgl commented Apr 29, 2020 •

edited

Loading

mariusandra commented Apr 29, 2020

EDsCODE commented Apr 30, 2020 •

edited

Loading

timgl commented Apr 30, 2020

EDsCODE left a comment •

edited

Loading

EDsCODE left a comment

timgl commented May 11, 2020

EDsCODE left a comment •

edited

Loading

timgl commented May 11, 2020

Closes #169 break down by cohort #690

Closes #169 break down by cohort #690

Conversation

timgl commented Apr 29, 2020 • edited Loading

Changes

Checklist

timgl commented Apr 29, 2020 • edited Loading

mariusandra commented Apr 29, 2020

EDsCODE commented Apr 30, 2020 • edited Loading

timgl commented Apr 30, 2020

EDsCODE left a comment • edited Loading

Choose a reason for hiding this comment

EDsCODE left a comment

Choose a reason for hiding this comment

timgl commented May 11, 2020

EDsCODE left a comment • edited Loading

Choose a reason for hiding this comment

timgl commented May 11, 2020

timgl commented Apr 29, 2020 •

edited

Loading

timgl commented Apr 29, 2020 •

edited

Loading

EDsCODE commented Apr 30, 2020 •

edited

Loading

EDsCODE left a comment •

edited

Loading

EDsCODE left a comment •

edited

Loading