-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multivariate testing - why so many Nones? #6043
Comments
@samwinslow I know you have some other growth work on your plate right now - can you take charge of the initial triage here (i.e. determine if it's a bug) and if there are deeper problems we can queue them up for growth in a future sprint or for the core team to address sooner? |
Actually, no. Different bug. Floating point problems should not happen for the same person, because their distinct ID does not normally change. Some events are coming in without |
The question is really: why so many users without
@mariusandra any thoughts, since we worked on this together? I've been sleuthing on it for a while. Seems to happen most often on pageview events. Perhaps pageviews are sent before feature flags are calculated? |
Looking at unique users, I think there's still too many Nones, "d" is set to an allocation of 6%. I wonder if it could be older libraries or something like that? Anyway good call on looping in Marius, maybe you guys can go through it on your next pairing session. For now it's marked alpha and we aren't releasing it, so we're covered on that front. |
I'm not sure what's happening here to be honest. Break the users where this flag is not set down by the posthog-js library version and you'll get this: Where the flag is set: ... so it's not that. Breaking down by host reveals something else. Not going to copy/paste results due to PII, but in short the cases where this flag is not defined are mostly on posthog.com and some self-hosted domains, none of the cases appears to be on app. Why this is happening is still a bit of a mystery though. The only guess I have is that In any case, nothing really obvious comes to mind here. Breakdown by browser doesn't reveal anything interesting... |
Thanks for digging @mariusandra – looks like that answers the question I had around library state. Do you think this could be in some way representative of overall failures from Is this something you think the core team can investigate? Or do you want us to pick this up on growth. As an aside I'm running a similar test with a bool flag to see if this is a deeper issue or not. I think the issue with the bool test is going to be we either send back True or nothing from the API, so Nones and "falses" are basically the same thing. |
I spent some time collecting info about this issue, and gathered some findings into my test dashboard here. All graphs are based on pageviews, with a filter such that active feature flags includes "kunal-test-mv".
It seems like, besides a little bit of noise around posthog.com pageviews, these numbers are mostly because of outdated JS libraries. From an API design perspective, I believe we chose to include multivariate flag keys in Updated the feature flag docs to make it even more clear that an up-to-date library is needed to use multivariate flags — see PostHog/posthog.com#2111 |
Marius and I paired up to diagnose this today.
-> small in this case being about 5% of traffic, which is too much, especially since we know posthog.com has the latest JS library installed and a proper PostHog setup. It looks like the Nones are caused by events coming through before Because of this, the first few events in a session may have out-of-date flags, or no flags at all if it's a first-time user with no flags persisted in LocalStorage. My proposal to fix this was to add a queue for captured events and send them only after So, we need to get smarter about this — unless we have a better idea, I'll implement a queue but also post all queued events immediately if the user navigates away from the page (using the |
We actually have code in the backend, which adds the Thus the problem is only not with people who visit the site for the first time (they send 0 feature flags which then get "backfilled"), but with people who have cached an old set of flags. To get around this we could:
|
This issue has 1070 words. Issues this long are hard to read or contribute to, and tend to take very long to reach a conclusion. Instead, why not:
|
I like this idea, but it could be said that the flags are "correct" at time of sending, in that they are consistent with what the user sees. On some session, if there is no cached entry for There also is a legitimate case for including cached flags. If the user navigates to a different page in the client webapp and there is no fancy page routing, their initial events on the new page may be sent with cached flags. If the flags have not changed between pageloads there is no reason to discard them. We could send an additional property such as I think doc updates are warranted in any case, especially to note that:
|
I mean just not to send the flag properties in events when a |
Yep! Makes total sense. My only concern is that we would end up discarding a lot of flag metadata before ingestion (everything in red in the first scenario below). We could end up making it very confusing to run analytics on an app that has "hard" browser navigation events. In the second scenario we keep all the flag metadata attached but add a property that can be used at query time to exclude indeterminate/cached flag metadata. A user would set up that exclusion if they know they're looking for differences before and after enabling a flag, and they want to make sure only the most authoritative data is used. Does that make more sense? It's a query semantic I would personally use, but not sure if other users would see the value in it. to be clear in the attached diagram: "send events" should be "include flag metadata on sent events" |
By adding another property For me this issue would be solved if we just explain that some Nones are expected when querying flags, because the flags just hadn't loaded/refreshed by then and it's easiest to just ignore these. "Breakdown by feature where feature is not none". |
&
Both these statements are correct, yes? I was working on capturing these from all sources: is there a reason we only backfill for events from |
hey @neilkakkar could you elaborate a bit on the backfill behavior that's in posthog-js? |
It's what Marius mentioned above: posthog/posthog/api/capture.py Line 107 in 2ae020d
If we get an event from posthog-js that doesn't have feature-flags set, we query for the flags and set them on the event. I want to do this for events coming from everywhere, not just posthog-js, so we can guarantee that every event has the |
Indeed
I understand the patch for $web, as that library sends all flags to us, and we just improve the data. However to patch this onto any event from any library sounds unexpected. We're modifying the properties in an unexpected way, making a lot of queries to fit the flags in the first place. See #7115 |
Bug description
I'm not sure if this is a bug, I'm trying to better understand our flag setup.
Last week when I got back from vacation, I created a multivariate flag with a 4 variants and a mix of audience.
The audience sums to 100% and there are no other targeting rules.
Expected behavior
I expect every Pageview since the launch of the flag (mid-last week) to have a variant associated with it.
How to reproduce
Environment
Additional context
I'd like to understand if this is a bug in our flags or if this is something else in how we're using feature flags.
Thank you for your bug report – we love squashing them!
The text was updated successfully, but these errors were encountered: