feat(performance): implement sampling for lifecycle queries #14283

yakkomajuri · 2023-02-16T19:13:20Z

Problem

#12908

Changes

Adds end-to-end sampling support for lifecycle queries. Mostly to get feedback at this point.

Why lifecycle? It was just the easiest query to work with.

How did you test this code?

Manually

yakkomajuri · 2023-02-16T19:18:09Z

Not a WIP in the sense that this works and could safely get merged but certainly an initial approach to gather feedback

macobo · 2023-02-17T08:43:19Z

posthog/models/filters/mixins/common.py

+    @cached_property
+    def sample_factor(self) -> Optional[float]:
+        factor = None
+        if self._data.get("sample_results", False):


Q: Why not send the sample factor from the FE?

I'm still unsure what the approach should be with sampling. Should we let the client decide or should we make decisions for them?

@mariusandra seems to like the idea of the client deciding too. From my side, I think I still have non-technical users in mind (i.e. building for PMs), but maybe I should drop that and think about devs first instead. If that's the case, then yeah I'd send this from the frontend and maybe make it a slider-like thing like Marius mentioned

Note we can hard-code the value from the frontend if we want and keep api flexibility up :)

macobo · 2023-02-17T08:44:56Z

posthog/queries/trends/util.py

@@ -85,6 +85,8 @@ def parse_response(stats: Dict, filter: Filter, additional_values: Dict = {}) ->
    counts = stats[1]
    labels = [item.strftime("%-d-%b-%Y{}".format(" %H:%M" if filter.interval == "hour" else "")) for item in stats[0]]
    days = [item.strftime("%Y-%m-%d{}".format(" %H:%M:%S" if filter.interval == "hour" else "")) for item in stats[0]]
+    if filter.sample_factor:
+        counts = [c * (1 / filter.sample_factor) for c in counts]


Nit: Probably worth creating a correct_for_sampling function.

yakkomajuri · 2023-02-17T13:26:32Z

Will move fast/iteratively with this project and evolve the API & UI as I go along

yakkomajuri added 6 commits February 16, 2023 15:25

working lifecycle sampling

30db63a

add test and snapshot

9b825d4

update tooltip

e4c0a0a

add feature flag

f3cf4ae

update wording

8a89f25

add result postprocessing

e17c229

yakkomajuri requested a review from macobo February 16, 2023 19:13

macobo reviewed Feb 17, 2023

View reviewed changes

macobo approved these changes Feb 17, 2023

View reviewed changes

update json schema

baa41c6

python schema

7c0f63a

yakkomajuri enabled auto-merge (squash) February 17, 2023 13:59

Merge branch 'master' into sampling

b8ca6c7

yakkomajuri merged commit 4c2a3fb into master Feb 17, 2023

yakkomajuri deleted the sampling branch February 17, 2023 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(performance): implement sampling for lifecycle queries #14283

feat(performance): implement sampling for lifecycle queries #14283

yakkomajuri commented Feb 16, 2023

yakkomajuri commented Feb 16, 2023

macobo Feb 17, 2023

yakkomajuri Feb 17, 2023

macobo Feb 17, 2023

macobo Feb 17, 2023

yakkomajuri commented Feb 17, 2023 •

edited

feat(performance): implement sampling for lifecycle queries #14283

feat(performance): implement sampling for lifecycle queries #14283

Conversation

yakkomajuri commented Feb 16, 2023

Problem

Changes

How did you test this code?

yakkomajuri commented Feb 16, 2023

macobo Feb 17, 2023

Choose a reason for hiding this comment

yakkomajuri Feb 17, 2023

Choose a reason for hiding this comment

macobo Feb 17, 2023

Choose a reason for hiding this comment

macobo Feb 17, 2023

Choose a reason for hiding this comment

yakkomajuri commented Feb 17, 2023 • edited

yakkomajuri commented Feb 17, 2023 •

edited