Feature Flags aren't consistent when distinctIDs can change #9547

neilkakkar · 2022-04-27T08:20:55Z

Bug description

When someone logs out / switches to a new browser and then tries logging in, they get a new distinct ID, which can mean that sometimes the feature flags can flip, because feature flags are based on distinctID, not personID.

Then once they're logged in and move to their existing distinctID, until the /decide call returns, we're using the 'wrong' feature flag value. Further, if they keep using the $anon_distinct_id, our feature flag is permanently flipped, which is not great for experiments.

Additional context

While there's not much we can do when a new unidentified user pops up, is there something we can do to minimise lag between logging in & new feature flags being loaded?
What are problems / advantages with moving to personID as feature flag identifier, for cases when you'd want multiple distinctIDs belonging to the same person to go to the same bucket.

Thank you for your bug report – we love squashing them!

The text was updated successfully, but these errors were encountered:

pauldambra · 2022-04-27T08:36:50Z

While there's not much we can do when a new unidentified user pops up, is there something we can do to minimise lag between logging in & new feature flags being loaded?

~~The server could write the feature flags as JSON into a script tag in the HTML of the page for the JS to load~~

Doh, which would only work for us 🤦

AtifSS · 2022-04-29T23:41:10Z

I think right from when
a) a user opens a web browser
b) after login
c) to logout
d) closing and then reopening the browser
e) logging-in again.

during all these steps and further the feature flags with variants should not change their value. right?

neilkakkar · 2022-05-02T12:29:27Z

between (d) and (e), if you go incognito / clear caches / disabled cookies/cache , then there's no way to know if you are the same person who logged in earlier, so its very much possible for feature flags to flip.

neilkakkar · 2022-05-04T15:33:32Z

to minimise lag between logging in & new feature flags being loaded?

There's an existing issue addressing this problem: #7115

hey get a new distinct ID, which can mean that sometimes the feature flags can flip

Using personID as the hash identifier solves this problem

macobo · 2022-05-24T11:11:18Z

Note that feature flags can (and I have in the past) be used as authorization-type gates - can user do X.

This means that if you don't change the flag as user logs in you also create problems.

neilkakkar · 2022-06-06T12:13:59Z

I'm looking into fixing this issue this week. To have all context and make informed decisions, here's all the use cases we know of: ( @marcushyett-ph - could use some help filling out anything I've missed)

Feature Flags use cases

As a rollout gate: I want to release a new feature, but only to a set % of people.
As a precise rollout gate: I want to release a new feature, but only to a set % of people, where being precise matters (ex: running an experiment). We've chosen not to support this usecase (use multivariates here instead)
As a rollout gate on person properties: I want to release a new feature, but only to a set % of people, with browser = Chrome
As an authorization gate: I want to enable this feature for people with property X. No rollout nonsense.
As a A/B/X Test: I want to split variants between different people.
As a 'persistent' A/B/X Test: I want to split variants between different people, where the first variant they see (anonymous or not) remains throughout their journey.
As a 'persistent' A/B/X Test with person properties: I want to split variants between different people with property X, where the first variant they see (anonymous or not) remains throughout their journey.
As a 'persistent' A/B/X Test with person properties and a rollout to reduce scope of experiment: I want to split variants between different people with property X, where the first variant they see (anonymous or not) remains throughout their journey.

And in all of these cases, person & people can be replaced by group.

Problems with above use cases

Wherever rollout %s or variants are involved, we have a nasty issue: distinctIDs belonging to the same person can get a different value, since the hash is based on the distinctID.
Groups evade this problem by hashing on the group_key, instead of distinctID.
We don't yet do the same with persons because this borks client side evaluation, and makes turbo-mode impossible.
Also, note that this only occurs in a flow involving identify: If you're already identified / anonymous throughout, your distinctID remains the same. However, this is a common enough use-case that it's worth solving for.
So, the open question here is which cases require consistency, and whether it makes sense to expose this as an option to users.
The same problem arises in a different form where 'persistence' is required. Here, a variant is determined based on a anonymous distinct id, which flips when a new distinct ID is identified.

Before I jump into solutions, does this make sense? Am I missing any use cases? (cc: @macobo @mariusandra @Twixes @EDsCODE @rcmarron . No requirement to respond, just pinging to confirm I've covered all bases)

neilkakkar · 2022-06-06T12:58:27Z

Constraints to keep in mind:

We want to solve this issue not just using our libraries, but for people directly accessing the APIs as well (since we'll probably never support all possible libraries)
We don't want to stop / close the road to moarr client-side evaluation of FFs.
Existing features should still work: If I come back to reduce rollout %s, things should still work as expected. Or, in an experiment case, if I switch rollout % for a variant to 100%, the old 'persistent' variant should change as well.
???

neilkakkar · 2022-06-07T12:38:57Z

I think there's 3 ways to about solving this, but one seems superior to all options.

Firstly, since consistency is not a requirement for all sorts of feature flags, and only plays a role when the flag has to go through an identify / login / distinct_id change event, we should make this optional, and default to false.

For the special case when consistency is switched on (we need a better name though), we need some extra work to make things work. All these options can't be evaluated completely on the client-side.

Not great option 1: Use person_id as the key for hashing

This seems to work, except for the fact that personIDs may merge when identify is called for the first time, which leads to deleting old person IDs, which can just as easily cause the same problems, although in a smaller number of cases.

Implementing this, however, is pretty straightforward.

Not great option 2: Use a feature flag value override

Whenever we have a distinctID which is assigned a variant / is considered part of a rollout for the first time, we set an override for the person to stick to this variant. When persons are merged, the override carries forward to all merged persons.

Thus, no matter the change in distinctID, we can be sure that the person always sees the same variant. The problem here, though, is how this handles adjustments to rollout %s. If, say, at the end of the experiment, I come and rollout the winning variant to 100%, it simply won't work because the persons are all overridden to this old value.

Option 3: Use a hash key override

Whenever we have a distinctID which is assigned a variant / is considered part of a rollout for the first time, we set an override for hash key to be used for calculating variants / rollout. When persons are merged, the override carries forward to all merged persons.

This ensures that future changes correctly change the behaviour for the feature flag, while also ensuring it remains consistent when distinct IDs change.

Open to hear about other options, but option 3 seems best to me so far. I considered going the opposite route, where the client keeps the anon distinctID as the 'main' one, but that has other problems, and doesn't fit in well with existing architecture, not worth reversing this at all.

More detailed spec of how option 3 will work:

Client side has an anon distinct ID
Client side triggers identify with a new distinct ID
The API request to decide has a new field for anon_distinct_id, which shall be used for the feature flag overrides. This has set once semantics: it only works the first time & if no override already exists for the feature flag - personID combo.
1. The update-and-get happens synchronously, so we get the right feature flag values as well as update the overrides.
2. TBD: How do person merges on identify affect this?^
3. We don't want to turn this into a separate API call, keeping in mind clients & async partitioning issues.
This does mean more cruft on feature flag evaluation, and I'm thinking of ways to optimise this / achieve option 3 using a different route.

Effectively, option 3 is a more general form of option 1. Choosing between option 1 and 3 also depends on how the ingestion side works when identifying persons, which I'm looking into next. Maybe the cases where the person_id changes are actually much smaller (or that's my suspicion), which can help us get rid of the cruft.

neilkakkar · 2022-06-09T15:31:37Z

Option 1 was a no-go because of how person merges affect identify.

Consider this very standard case:

We create a new feature flag with experience consistency enabled.
neil@posthog.com is an existing user.
An anonymous user, John Doe, comes to the website, does random stuff, gets their own person created, has feature flag values set based on their personID.
John Doe now identifies as neil@posthog.com, which deletes John Doe's old person, and the new personID is neil's personID, which means the feature flag can flip!

In contrast to option 3, where the override is added at this point, ensuring that neil@posthog.com uses John Doe's distinctID. Option 3 could've overridden personIDs instead, but that's just needlessly complicating things, and keeps the way open for single-client client-side consistent feature flags, by using distinct IDs.

neilkakkar · 2022-07-20T17:17:59Z

This is now fixed and rolling out slowly to everyone

neilkakkar added bug Something isn't working right feature/feature-flags Feature Tag: Feature flags feature/experimentation Feature Tag: Experimentation labels Apr 27, 2022

neilkakkar mentioned this issue Jun 8, 2022

feat(feature-flags): Enable experience continuity with feature flags #10196

Closed

neilkakkar mentioned this issue Jun 12, 2022

Have an option to send feature variants with the .capture(...) calls. PostHog/posthog-python#64

Closed

neilkakkar closed this as completed Jul 20, 2022

neilkakkar mentioned this issue Nov 8, 2022

feat(persons): Allow selecting IDs to see more accurate feature flag … #12646

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Flags aren't consistent when distinctIDs can change #9547

Feature Flags aren't consistent when distinctIDs can change #9547

neilkakkar commented Apr 27, 2022

pauldambra commented Apr 27, 2022 •

edited

Loading

AtifSS commented Apr 29, 2022 •

edited

Loading

neilkakkar commented May 2, 2022

neilkakkar commented May 4, 2022

macobo commented May 24, 2022

neilkakkar commented Jun 6, 2022 •

edited

Loading

neilkakkar commented Jun 6, 2022 •

edited

Loading

neilkakkar commented Jun 7, 2022 •

edited

Loading

neilkakkar commented Jun 9, 2022

neilkakkar commented Jul 20, 2022

Feature Flags aren't consistent when distinctIDs can change #9547

Feature Flags aren't consistent when distinctIDs can change #9547

Comments

neilkakkar commented Apr 27, 2022

Bug description

Additional context

Thank you for your bug report – we love squashing them!

pauldambra commented Apr 27, 2022 • edited Loading

AtifSS commented Apr 29, 2022 • edited Loading

neilkakkar commented May 2, 2022

neilkakkar commented May 4, 2022

macobo commented May 24, 2022

neilkakkar commented Jun 6, 2022 • edited Loading

Feature Flags use cases

Problems with above use cases

neilkakkar commented Jun 6, 2022 • edited Loading

neilkakkar commented Jun 7, 2022 • edited Loading

Not great option 1: Use person_id as the key for hashing

Not great option 2: Use a feature flag value override

Option 3: Use a hash key override

neilkakkar commented Jun 9, 2022

neilkakkar commented Jul 20, 2022

pauldambra commented Apr 27, 2022 •

edited

Loading

AtifSS commented Apr 29, 2022 •

edited

Loading

neilkakkar commented Jun 6, 2022 •

edited

Loading

neilkakkar commented Jun 6, 2022 •

edited

Loading

neilkakkar commented Jun 7, 2022 •

edited

Loading