Multivariate testing - why so many Nones? #6043

kpthatsme · 2021-09-20T20:40:55Z

Bug description

I'm not sure if this is a bug, I'm trying to better understand our flag setup.

Last week when I got back from vacation, I created a multivariate flag with a 4 variants and a mix of audience.

The audience sums to 100% and there are no other targeting rules.

Expected behavior

I expect every Pageview since the launch of the flag (mid-last week) to have a variant associated with it.

How to reproduce

View this chart and observe the large bucket of Nones

Environment

PostHog Cloud
self-hosted PostHog (ClickHouse-based), version/commit: please provide
self-hosted PostHog (Postgres-based, legacy), version/commit: please provide

Additional context

I'd like to understand if this is a bug in our flags or if this is something else in how we're using feature flags.

Thank you for your bug report – we love squashing them!

kpthatsme · 2021-09-20T20:42:25Z

@samwinslow I know you have some other growth work on your plate right now - can you take charge of the initial triage here (i.e. determine if it's a bug) and if there are deeper problems we can queue them up for growth in a future sprint or for the core team to address sooner?

samwinslow · 2021-09-20T22:04:56Z

Hmm... the most likely case here is a floating point arithmetic problem. Seeing as the variants are assigned by a hash value (float) between 0 and 1, there could be cases where the hash is in between variant domains.

Actually, no. Different bug. Floating point problems should not happen for the same person, because their distinct ID does not normally change. Some events are coming in without $feature/ properties being set at all.

samwinslow · 2021-09-21T17:45:09Z

The question is really: why so many users without $feature/ properties set? Here's the same graph with a global filter applied, to only show users with 4267-event-property-taxonomy (which is another flag rolled out to all users).

None then becomes a tiny fraction of users explainable by some corner case.

@mariusandra any thoughts, since we worked on this together? I've been sleuthing on it for a while. Seems to happen most often on pageview events. Perhaps pageviews are sent before feature flags are calculated?

kpthatsme · 2021-09-21T18:30:34Z

The question is really: why so many users without $feature/ properties set? Here's the same graph with a global filter applied, to only show users with 4267-event-property-taxonomy (which is another flag rolled out to all users).

None then becomes a tiny fraction of users explainable by some corner case.

Looking at unique users, I think there's still too many Nones, "d" is set to an allocation of 6%.

I wonder if it could be older libraries or something like that? Anyway good call on looping in Marius, maybe you guys can go through it on your next pairing session. For now it's marked alpha and we aren't releasing it, so we're covered on that front.

mariusandra · 2021-09-22T05:32:59Z

I'm not sure what's happening here to be honest.

Break the users where this flag is not set down by the posthog-js library version and you'll get this:

Where the flag is set:

... so it's not that.

Breaking down by host reveals something else. Not going to copy/paste results due to PII, but in short the cases where this flag is not defined are mostly on posthog.com and some self-hosted domains, none of the cases appears to be on app.

Why this is happening is still a bit of a mystery though. The only guess I have is that /decide for some reason didn't return this flag to the users... either because it was never called, or had some other issue.

In any case, nothing really obvious comes to mind here. Breakdown by browser doesn't reveal anything interesting...

kpthatsme · 2021-09-24T21:05:45Z

Thanks for digging @mariusandra – looks like that answers the question I had around library state.

Do you think this could be in some way representative of overall failures from /decide? I think it's pretty important for us to have a conclusion here (even if we don't fix it).

Is this something you think the core team can investigate? Or do you want us to pick this up on growth.

As an aside I'm running a similar test with a bool flag to see if this is a deeper issue or not. I think the issue with the bool test is going to be we either send back True or nothing from the API, so Nones and "falses" are basically the same thing.

samwinslow · 2021-09-29T01:32:49Z

I spent some time collecting info about this issue, and gathered some findings into my test dashboard here.

All graphs are based on pageviews, with a filter such that active feature flags includes "kunal-test-mv".

For posthog.com pageviews, the proportion of Nones is very small.
For app pageviews, there are no Nones at all.
For non-PostHog pageviews, there are many Nones.
Above posthog-js version 1.13, the proportion of Nones is very small, probably explained by # 1 above.
Outside of PostHog, a fair number of people are using posthog-js < 1.13.

It seems like, besides a little bit of noise around posthog.com pageviews, these numbers are mostly because of outdated JS libraries. From an API design perspective, I believe we chose to include multivariate flag keys in active_feature_flags even if reading their values wouldn't be supported by the client library; that could be confusing.

Updated the feature flag docs to make it even more clear that an up-to-date library is needed to use multivariate flags — see PostHog/posthog.com#2111

samwinslow · 2021-10-01T14:38:50Z

Marius and I paired up to diagnose this today.

For posthog.com pageviews, the proportion of Nones is very small

-> small in this case being about 5% of traffic, which is too much, especially since we know posthog.com has the latest JS library installed and a proper PostHog setup.

It looks like the Nones are caused by events coming through before /decide is resolved. We verified this by adding an arbitrary delay to the /decide handler and saw events sent from the client before the network request resolved.

Because of this, the first few events in a session may have out-of-date flags, or no flags at all if it's a first-time user with no flags persisted in LocalStorage. My proposal to fix this was to add a queue for captured events and send them only after /decide resolves (or times out), but a naive implementation of such a queue would undo the work of PostHog/posthog-js#295.

So, we need to get smarter about this — unless we have a better idea, I'll implement a queue but also post all queued events immediately if the user navigates away from the page (using the onbeforeunload handler).

mariusandra · 2021-10-05T09:19:44Z

We actually have code in the backend, which adds the $active_feature_flags and $feature/* properties, but only if no $active_feature_flags is found.

Thus the problem is only not with people who visit the site for the first time (they send 0 feature flags which then get "backfilled"), but with people who have cached an old set of flags.

To get around this we could:

Not send flags from localstorage with posthog-js. They could be used as a local cache, but not put inside the API.
Improve documentation and explain the conditions when you may get empty feature flags.

posthog-contributions-bot · 2021-10-05T09:19:44Z

This issue has 1070 words. Issues this long are hard to read or contribute to, and tend to take very long to reach a conclusion. Instead, why not:

Write some code and submit a pull request! Code wins arguments
Have a sync meeting to reach a conclusion
Create a Request for Comments and submit a PR with it to the meta repo or product internal repo

samwinslow · 2021-10-05T12:41:26Z

Not send flags from localstorage with posthog-js. They could be used as a local cache, but not put inside the API.

I like this idea, but it could be said that the flags are "correct" at time of sending, in that they are consistent with what the user sees. On some session, if there is no cached entry for new-feature, the client experience will be as if new-feature is off, and you could argue that it should be that way.

There also is a legitimate case for including cached flags. If the user navigates to a different page in the client webapp and there is no fancy page routing, their initial events on the new page may be sent with cached flags. If the flags have not changed between pageloads there is no reason to discard them.

We could send an additional property such as feature_flags_from_cache: true or feature_flags_source: 'cache' and leave it up to the person writing the query to decide whether to exclude events where that property is set.

I think doc updates are warranted in any case, especially to note that:

isFeatureEnabled / getFlags / getFlagVariants read from persistent cache and may be stale or empty
onFeatureFlags should be used to make sure flags are up-to-date

mariusandra · 2021-10-05T12:51:58Z

I mean just not to send the flag properties in events when a /decide request is pending. So not to send flags we get from localStorage, but only freshly fetched flags.

samwinslow · 2021-10-05T13:47:11Z

I mean just not to send the flag properties in events when a /decide request is pending. So not to send flags we get from localStorage, but only freshly fetched flags.

Yep! Makes total sense. My only concern is that we would end up discarding a lot of flag metadata before ingestion (everything in red in the first scenario below). We could end up making it very confusing to run analytics on an app that has "hard" browser navigation events.

In the second scenario we keep all the flag metadata attached but add a property that can be used at query time to exclude indeterminate/cached flag metadata. A user would set up that exclusion if they know they're looking for differences before and after enabling a flag, and they want to make sure only the most authoritative data is used.

Does that make more sense? It's a query semantic I would personally use, but not sure if other users would see the value in it.

to be clear in the attached diagram: "send events" should be "include flag metadata on sent events"

mariusandra · 2021-10-06T07:48:14Z

By adding another property feature_flags_from_cache or whatnot, I'm worried about pushing complexity down onto the user when it doesn't need to be. When using feature flags, I really don't want to be thinking about frontend caching strategies, I have more relevant things taking up space in my active memory.

For me this issue would be solved if we just explain that some Nones are expected when querying flags, because the flags just hadn't loaded/refreshed by then and it's easiest to just ignore these. "Breakdown by feature where feature is not none".

neilkakkar · 2021-12-02T10:54:09Z

In practice: the first pageview of a user returning after 3 days may contain a very different set of flags than all the following events in that session.

&

it could be said that the flags are "correct" at time of sending, in that they are consistent with what the user sees.

Both these statements are correct, yes?

I was working on capturing these from all sources: is there a reason we only backfill for events from posthog-js ?

kpthatsme · 2021-12-02T20:07:39Z

hey @neilkakkar could you elaborate a bit on the backfill behavior that's in posthog-js?

neilkakkar · 2021-12-02T22:05:08Z

It's what Marius mentioned above:

posthog/posthog/api/capture.py

Line 107 in 2ae020d

    
           def _ensure_web_feature_flags_in_properties(event: Dict[str, Any], team: Team, distinct_id: str):

If we get an event from posthog-js that doesn't have feature-flags set, we query for the flags and set them on the event. I want to do this for events coming from everywhere, not just posthog-js, so we can guarantee that every event has the $active_feature_flags property.

mariusandra · 2021-12-03T07:33:05Z

Both these statements are correct, yes?

Indeed

I was working on capturing these from all sources: is there a reason we only backfill for events from posthog-js ?
I added just the multivariate support there. The == '$web' part was there before.

I understand the patch for $web, as that library sends all flags to us, and we just improve the data. However to patch this onto any event from any library sounds unexpected. We're modifying the properties in an unexpected way, making a lot of queries to fit the flags in the first place. See #7115

kpthatsme added the bug Something isn't working right label Sep 20, 2021

kpthatsme assigned samwinslow Sep 20, 2021

mariusandra mentioned this issue Oct 1, 2021

Send initial pageview immediately PostHog/posthog-js#295

Merged

2 tasks

mariusandra mentioned this issue Dec 1, 2021

Experimentation MVP Things to Do #7462

Closed

5 tasks

macobo added the feature/feature-flags Feature Tag: Feature flags label Dec 7, 2021

neilkakkar mentioned this issue Jan 12, 2022

Feature Flag rollout %s are not accurate #8001

Closed

3 tasks

neilkakkar closed this as completed Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multivariate testing - why so many Nones? #6043

Multivariate testing - why so many Nones? #6043

kpthatsme commented Sep 20, 2021 •

edited

Loading

kpthatsme commented Sep 20, 2021

samwinslow commented Sep 20, 2021 •

edited

Loading

samwinslow commented Sep 21, 2021

kpthatsme commented Sep 21, 2021

mariusandra commented Sep 22, 2021

kpthatsme commented Sep 24, 2021

samwinslow commented Sep 29, 2021 •

edited

Loading

samwinslow commented Oct 1, 2021

mariusandra commented Oct 5, 2021

posthog-contributions-bot bot commented Oct 5, 2021

samwinslow commented Oct 5, 2021

mariusandra commented Oct 5, 2021

samwinslow commented Oct 5, 2021 •

edited

Loading

mariusandra commented Oct 6, 2021

neilkakkar commented Dec 2, 2021

kpthatsme commented Dec 2, 2021

neilkakkar commented Dec 2, 2021

mariusandra commented Dec 3, 2021 •

edited

Loading

Multivariate testing - why so many Nones? #6043

Multivariate testing - why so many Nones? #6043

Comments

kpthatsme commented Sep 20, 2021 • edited Loading

Bug description

Expected behavior

How to reproduce

Environment

Additional context

Thank you for your bug report – we love squashing them!

kpthatsme commented Sep 20, 2021

samwinslow commented Sep 20, 2021 • edited Loading

samwinslow commented Sep 21, 2021

kpthatsme commented Sep 21, 2021

mariusandra commented Sep 22, 2021

kpthatsme commented Sep 24, 2021

samwinslow commented Sep 29, 2021 • edited Loading

samwinslow commented Oct 1, 2021

mariusandra commented Oct 5, 2021

posthog-contributions-bot bot commented Oct 5, 2021

samwinslow commented Oct 5, 2021

mariusandra commented Oct 5, 2021

samwinslow commented Oct 5, 2021 • edited Loading

mariusandra commented Oct 6, 2021

neilkakkar commented Dec 2, 2021

kpthatsme commented Dec 2, 2021

neilkakkar commented Dec 2, 2021

mariusandra commented Dec 3, 2021 • edited Loading

kpthatsme commented Sep 20, 2021 •

edited

Loading

samwinslow commented Sep 20, 2021 •

edited

Loading

samwinslow commented Sep 29, 2021 •

edited

Loading

samwinslow commented Oct 5, 2021 •

edited

Loading

mariusandra commented Dec 3, 2021 •

edited

Loading