Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multivariate testing - why so many Nones? #6043

Closed
1 of 3 tasks
kpthatsme opened this issue Sep 20, 2021 · 18 comments
Closed
1 of 3 tasks

Multivariate testing - why so many Nones? #6043

kpthatsme opened this issue Sep 20, 2021 · 18 comments
Assignees
Labels
bug Something isn't working right feature/feature-flags Feature Tag: Feature flags

Comments

@kpthatsme
Copy link
Contributor

kpthatsme commented Sep 20, 2021

Bug description

I'm not sure if this is a bug, I'm trying to better understand our flag setup.

Last week when I got back from vacation, I created a multivariate flag with a 4 variants and a mix of audience.

The audience sums to 100% and there are no other targeting rules.

Expected behavior

I expect every Pageview since the launch of the flag (mid-last week) to have a variant associated with it.

How to reproduce

  1. View this chart and observe the large bucket of Nones

Environment

  • PostHog Cloud
  • self-hosted PostHog (ClickHouse-based), version/commit: please provide
  • self-hosted PostHog (Postgres-based, legacy), version/commit: please provide

Additional context

I'd like to understand if this is a bug in our flags or if this is something else in how we're using feature flags.

Thank you for your bug report – we love squashing them!

@kpthatsme kpthatsme added the bug Something isn't working right label Sep 20, 2021
@kpthatsme
Copy link
Contributor Author

@samwinslow I know you have some other growth work on your plate right now - can you take charge of the initial triage here (i.e. determine if it's a bug) and if there are deeper problems we can queue them up for growth in a future sprint or for the core team to address sooner?

@samwinslow
Copy link
Contributor

samwinslow commented Sep 20, 2021

Hmm... the most likely case here is a floating point arithmetic problem. Seeing as the variants are assigned by a hash value (float) between 0 and 1, there could be cases where the hash is in between variant domains.

Actually, no. Different bug. Floating point problems should not happen for the same person, because their distinct ID does not normally change. Some events are coming in without $feature/ properties being set at all.

@samwinslow
Copy link
Contributor

The question is really: why so many users without $feature/ properties set? Here's the same graph with a global filter applied, to only show users with 4267-event-property-taxonomy (which is another flag rolled out to all users).

None then becomes a tiny fraction of users explainable by some corner case.

Screen Shot 2021-09-21 at 1 38 37 PM

@mariusandra any thoughts, since we worked on this together? I've been sleuthing on it for a while. Seems to happen most often on pageview events. Perhaps pageviews are sent before feature flags are calculated?

@kpthatsme
Copy link
Contributor Author

The question is really: why so many users without $feature/ properties set? Here's the same graph with a global filter applied, to only show users with 4267-event-property-taxonomy (which is another flag rolled out to all users).

None then becomes a tiny fraction of users explainable by some corner case.

Looking at unique users, I think there's still too many Nones, "d" is set to an allocation of 6%.

I wonder if it could be older libraries or something like that? Anyway good call on looping in Marius, maybe you guys can go through it on your next pairing session. For now it's marked alpha and we aren't releasing it, so we're covered on that front.

@mariusandra
Copy link
Collaborator

I'm not sure what's happening here to be honest.

Break the users where this flag is not set down by the posthog-js library version and you'll get this:

image

Where the flag is set:

image

... so it's not that.

Breaking down by host reveals something else. Not going to copy/paste results due to PII, but in short the cases where this flag is not defined are mostly on posthog.com and some self-hosted domains, none of the cases appears to be on app.

Why this is happening is still a bit of a mystery though. The only guess I have is that /decide for some reason didn't return this flag to the users... either because it was never called, or had some other issue.

In any case, nothing really obvious comes to mind here. Breakdown by browser doesn't reveal anything interesting...

@kpthatsme
Copy link
Contributor Author

Thanks for digging @mariusandra – looks like that answers the question I had around library state.

Do you think this could be in some way representative of overall failures from /decide? I think it's pretty important for us to have a conclusion here (even if we don't fix it).

Is this something you think the core team can investigate? Or do you want us to pick this up on growth.

As an aside I'm running a similar test with a bool flag to see if this is a deeper issue or not. I think the issue with the bool test is going to be we either send back True or nothing from the API, so Nones and "falses" are basically the same thing.

@samwinslow
Copy link
Contributor

samwinslow commented Sep 29, 2021

I spent some time collecting info about this issue, and gathered some findings into my test dashboard here.

All graphs are based on pageviews, with a filter such that active feature flags includes "kunal-test-mv".

  1. For posthog.com pageviews, the proportion of Nones is very small.
  2. For app pageviews, there are no Nones at all.
  3. For non-PostHog pageviews, there are many Nones.
  4. Above posthog-js version 1.13, the proportion of Nones is very small, probably explained by # 1 above.
  5. Outside of PostHog, a fair number of people are using posthog-js < 1.13.

It seems like, besides a little bit of noise around posthog.com pageviews, these numbers are mostly because of outdated JS libraries. From an API design perspective, I believe we chose to include multivariate flag keys in active_feature_flags even if reading their values wouldn't be supported by the client library; that could be confusing.

Updated the feature flag docs to make it even more clear that an up-to-date library is needed to use multivariate flags — see PostHog/posthog.com#2111

@samwinslow
Copy link
Contributor

Marius and I paired up to diagnose this today.

For posthog.com pageviews, the proportion of Nones is very small

-> small in this case being about 5% of traffic, which is too much, especially since we know posthog.com has the latest JS library installed and a proper PostHog setup.

It looks like the Nones are caused by events coming through before /decide is resolved. We verified this by adding an arbitrary delay to the /decide handler and saw events sent from the client before the network request resolved.

Because of this, the first few events in a session may have out-of-date flags, or no flags at all if it's a first-time user with no flags persisted in LocalStorage. My proposal to fix this was to add a queue for captured events and send them only after /decide resolves (or times out), but a naive implementation of such a queue would undo the work of PostHog/posthog-js#295.

So, we need to get smarter about this — unless we have a better idea, I'll implement a queue but also post all queued events immediately if the user navigates away from the page (using the onbeforeunload handler).

@mariusandra
Copy link
Collaborator

We actually have code in the backend, which adds the $active_feature_flags and $feature/* properties, but only if no $active_feature_flags is found.

Thus the problem is only not with people who visit the site for the first time (they send 0 feature flags which then get "backfilled"), but with people who have cached an old set of flags.

To get around this we could:

  1. Not send flags from localstorage with posthog-js. They could be used as a local cache, but not put inside the API.
  2. Improve documentation and explain the conditions when you may get empty feature flags.

@posthog-contributions-bot
Copy link
Contributor

This issue has 1070 words. Issues this long are hard to read or contribute to, and tend to take very long to reach a conclusion. Instead, why not:

  1. Write some code and submit a pull request! Code wins arguments
  2. Have a sync meeting to reach a conclusion
  3. Create a Request for Comments and submit a PR with it to the meta repo or product internal repo

@samwinslow
Copy link
Contributor

  1. Not send flags from localstorage with posthog-js. They could be used as a local cache, but not put inside the API.

I like this idea, but it could be said that the flags are "correct" at time of sending, in that they are consistent with what the user sees. On some session, if there is no cached entry for new-feature, the client experience will be as if new-feature is off, and you could argue that it should be that way.

There also is a legitimate case for including cached flags. If the user navigates to a different page in the client webapp and there is no fancy page routing, their initial events on the new page may be sent with cached flags. If the flags have not changed between pageloads there is no reason to discard them.

We could send an additional property such as feature_flags_from_cache: true or feature_flags_source: 'cache' and leave it up to the person writing the query to decide whether to exclude events where that property is set.


I think doc updates are warranted in any case, especially to note that:

  • isFeatureEnabled / getFlags / getFlagVariants read from persistent cache and may be stale or empty
  • onFeatureFlags should be used to make sure flags are up-to-date

@mariusandra
Copy link
Collaborator

I mean just not to send the flag properties in events when a /decide request is pending. So not to send flags we get from localStorage, but only freshly fetched flags.

@samwinslow
Copy link
Contributor

samwinslow commented Oct 5, 2021

I mean just not to send the flag properties in events when a /decide request is pending. So not to send flags we get from localStorage, but only freshly fetched flags.

Yep! Makes total sense. My only concern is that we would end up discarding a lot of flag metadata before ingestion (everything in red in the first scenario below). We could end up making it very confusing to run analytics on an app that has "hard" browser navigation events.

image

In the second scenario we keep all the flag metadata attached but add a property that can be used at query time to exclude indeterminate/cached flag metadata. A user would set up that exclusion if they know they're looking for differences before and after enabling a flag, and they want to make sure only the most authoritative data is used.

Does that make more sense? It's a query semantic I would personally use, but not sure if other users would see the value in it.

to be clear in the attached diagram: "send events" should be "include flag metadata on sent events"

@mariusandra
Copy link
Collaborator

By adding another property feature_flags_from_cache or whatnot, I'm worried about pushing complexity down onto the user when it doesn't need to be. When using feature flags, I really don't want to be thinking about frontend caching strategies, I have more relevant things taking up space in my active memory.

For me this issue would be solved if we just explain that some Nones are expected when querying flags, because the flags just hadn't loaded/refreshed by then and it's easiest to just ignore these. "Breakdown by feature where feature is not none".

@neilkakkar
Copy link
Collaborator

In practice: the first pageview of a user returning after 3 days may contain a very different set of flags than all the following events in that session.

&

it could be said that the flags are "correct" at time of sending, in that they are consistent with what the user sees.

Both these statements are correct, yes?


I was working on capturing these from all sources: is there a reason we only backfill for events from posthog-js ?

@kpthatsme
Copy link
Contributor Author

hey @neilkakkar could you elaborate a bit on the backfill behavior that's in posthog-js?

@neilkakkar
Copy link
Collaborator

It's what Marius mentioned above:

def _ensure_web_feature_flags_in_properties(event: Dict[str, Any], team: Team, distinct_id: str):

If we get an event from posthog-js that doesn't have feature-flags set, we query for the flags and set them on the event. I want to do this for events coming from everywhere, not just posthog-js, so we can guarantee that every event has the $active_feature_flags property.

@mariusandra
Copy link
Collaborator

mariusandra commented Dec 3, 2021

Both these statements are correct, yes?

Indeed

I was working on capturing these from all sources: is there a reason we only backfill for events from posthog-js ?
I added just the multivariate support there. The == '$web' part was there before.

I understand the patch for $web, as that library sends all flags to us, and we just improve the data. However to patch this onto any event from any library sounds unexpected. We're modifying the properties in an unexpected way, making a lot of queries to fit the flags in the first place. See #7115

@macobo macobo added the feature/feature-flags Feature Tag: Feature flags label Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right feature/feature-flags Feature Tag: Feature flags
Projects
None yet
Development

No branches or pull requests

5 participants