Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added feature variations #297

Closed
wants to merge 5 commits into from
Closed

Conversation

elhoyos
Copy link
Contributor

@elhoyos elhoyos commented Jan 30, 2018

Here's a proposal adding variations in a feature toggle to support further experiments.

Any suggestion or guidance is very welcome.

@coveralls
Copy link

coveralls commented Jan 30, 2018

Coverage Status

Coverage increased (+0.06%) to 91.958% when pulling 6a58861 on elhoyos:variations into 4c85c6d on Unleash:master.

@vsandvold
Copy link
Contributor

vsandvold commented Jan 30, 2018

Hi Juan!

Thanks for using Unleash and for contributing to the project. We are currently exploring how to add A/B and multi-variant testing support to Unleash, and your suggestions are much welcome.

So far we have an experimental implementation developed in-house on top of the current Unleash client libraries, both for Java and Node. When in use, it looks something like this:

Variation variation = unleash.startExperiment("someNewFeature"); // Activate toggle and initiate tracking
if (variation.isEnabled("someVariant")) {
    // Serve some variant
} else if (variation.isEnabled("anotherVariant")) {
    // Serve another variant
} else {
    // Serve control/disabled
}

The idea is to provide a separate startExperiment method on the Unleash instance, with a Variation object return value that can be queried for the outcome. User data is provided by the Unleash context as usual, and special experiment strategies handle the allocation of users to different variants.

The biggest challenge is how to manage the tracking side-effect and reporting of experimentation metrics. Different companies use different tracking and reporting solutions, and we think it is important that people can plug in the solution of their choice.

Experiments may eventually become a whole new section in the Unleash UI, fully supported throughout the whole stack. How is still in the open, and maybe @ivarconr can share some of his thoughts on that.

@ivarconr
Copy link
Member

I love that you ignite this discussion via a pull request!

From your suggestion it is a bit unclear to me how variants and strategies are connected? And how did you imagine to use the variants in the client SDKs? It also feels a bit intrusive to always require a toggle to have at least to variants.

As @vsandvold points out, we have been using Unleash to control our MultiVariate Experiments internally for som time now on Java and more recently on node projects. We have been using a custom strategy, and wrapped client SDKs to enable experiments.

What we have learned so far is that experiments are not quite the same as feature toggles and they probably are used in a different phase, than a controlled feature rollout.

The biggest difference is the requirements around how you allocate (and potentially segement) users and how you expose this data to the analytics tool used. There are also statistical requirements to the amount of users in each variant.

It does make sense to be able to control experiments with Unleash and I believe as @vsandvold points out, that experiments is a different section. Possibly you would also define some configurable options used to define new variants on the fly, without having to change the implementation in the client applications. I also imagine that the strategies concept can work for experiments also, but you would probably have experiments specific strategies.

@elhoyos
Copy link
Contributor Author

elhoyos commented Jan 30, 2018

Thanks for the input @vsandvold & @ivarconr.

The biggest challenge is how to manage the tracking side-effect and reporting of experimentation metrics. Different companies use different tracking and reporting solutions, and we think it is important that people can plug in the solution of their choice.

I think your vision here is great. Let me explain what I'm trying to achieve on that side that could also answer @ivarconr's questions:

I'm targeting Google Optimize server-side experimentation by borrowing their experimentId/variations[] in unleash-server and pushing them all the way to unleash-client. Once there unleash strategies are applied and a variation is chosen. Later in the pipeline, a web-server's [middleware takes care of persisting variations and signaling events to Google Optimize (e.g. fflip-express).

From a person setting up an AB test:

  • In Google Optimize: Create experiments, variations, set goals, start/stop experiments and check reports
  • In unleash app: Manage toggle, create/set strategies, set variations (manual or via api)
  • In own code: Instantiate unleash, check variations, set up middleware

It also feels a bit intrusive to always require a toggle to have at least to variants.

My intention is that toggles should either have more than one variant or no variants at all. That's how an experiment toggle could optionally be identified.

What we have learned so far is that experiments are not quite the same as feature toggles and they probably are used in a different phase, than a controlled feature rollout.

Maybe you're using toggles in a special way? From a conceptual model's perspective, I see Experiments as a special case of Feature Toggles. Possibly I got it from LaunchDarkly. I agree they are meant to serve different purposes, but the commonalities are evident from this point of view:

  • Variations = Values (variations/enabled in unleash jargon)
  • Target = Segments = Allocation (strategies in unleash jargon)

I have not put much thought on a statistical model and a report component for unleash, but I see a lot of potential with current framework. Having an experiments manager in a different section makes sense from a user experience perspective.

If you think I may be cluttering too much ideas in a god model, please stop me.

... There are also statistical requirements to the amount of users in each variant.

🤔 Do you mean others than simple weighting strategies? Have any example in mind?

@@ -96,6 +98,9 @@ class FeatureToggleStore {
enabled: data.enabled ? 1 : 0,
archived: data.archived ? 1 : 0,
strategies: JSON.stringify(data.strategies),
variations: data.variations
? JSON.stringify(data.variations)
: '[]',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have the database migrate script (which is missing from this changeset) to default this column to an empty array.

@ivarconr
Copy link
Member

...Once there unleash strategies are applied and a variation is chosen...

Thanks, this was the missing link for me. Thinking about this as a pipeline really makes sense. I think this can work very well with the modified client API explained by @vsandvold.

What do you think?

If you think I may be cluttering too much ideas in a god model, please stop me.

No not really. It's just code, and we are meant to iterate on it. And I'm very interested in finding ways to add first class support for experimentation in Unleash. What we have done internally might not be the best approach. I have already discussed with @vsandvold to make this the priority for v4 of Unleash. All outside help of any help is highly appreciated!

My intention is that toggles should either have more than one variant or no variants at all. That's how an experiment toggle could optionally be identified.

Would it make sense to have a 'default' variation if none is not defined, in order to make it more predictable to the client code?

Do you mean others than simple weighting strategies? Have any example in mind?

There is two critical factors I'm thinking about:

  1. User stickiness to a variation. In our company we sometimes also do experiments across platforms (web, iOS, Android) and want our users to have same experience
  2. Equal size to each variation (one one variant should be your control. If the control and treatment are expected to have the same distribution, there is an important recommendation we can
    make: ensure that the control and treatment are equally sized (from Seven Rules of Thumb for Web Site Experimenters

Also we have found the need to be able to specify/override variations for specific users. This makes it much easier for the teams to verify that the various variations works as expected.

In the future it might also be interesting to make sure that the same user is not assigned to multiple experiments at the same time that might push the same KPIs. But this should not be the focus in the first iteration of this imho. Do you know whether Google Optimize has the ability to discover users attending multiple experiments at the same time?

Summary

  • I think the suggestion in this PR is interesting, and thinking of this as a Pipeline make sense, where clients first apply strategies, and the pick variations when applicable.
  • I think having a separate method for experiments in the clients, startExperiement makes sense. The suggested approach in this PR supports that, with the added bonus of all toggles will just work as today via the isEnabled call.
  • Thinking of strategies as a way to segments/allocation makes sense!

Still a bit unclear to me:

  • How would we offer the ability to override variations for specific users? Is this a separate property on the variation object?
  • How to offer variation-stickiness in a general way?
  • Should the variation object also have a options map. Making it possible to define new variations on the fly. Example could be to set an option color and then you could define new experiments and defining the value of the color object in each variation.

@elhoyos
Copy link
Contributor Author

elhoyos commented Jan 31, 2018

What do you think?

I think @vsandvold's explanation is a good starting point. However, I would simplify it with something like how Optimizely does. Maybe the following way could allow more comparison flexibility:

any variation = unleash.experiment("an.experiment.key", context, ...); // run strategies
if (variation === 'someVariant') {
    // Serve some variant
} else if (variation === 'anotherVariant') {
    // Serve another variant
} else {
    // Serve control/disabled
}

In any case, I'm fine with the signature you decide to go on.

Thinking of strategies as a way to segments/allocation makes sense!

Yes, strategies seems a very good option to give clients the right elements for both targeting and variation allocation. I believe experimentation strategies are highly reusable via parametrization as it currently works. In this sense, lets have two types of strategies:

  • boolean: to determine if a toggle apply for a given context (i.e. targeting)
  • multi-values: to determine a variation value (i.e. allocation). Later, we could add the possibility to have multi-values toggles that are not experiments.

When a plain toggle: boolean strategies are computed as usual to resolve for unleash#isEnabled
When an experiment toggle: boolean strategies are computed as usual first and then multi-value strategies are executed in order, the first to resolve a non-null value will resolve the definitive value for unleash#experiment.

Would it make sense to have a 'default' variation if none is not defined, in order to make it more predictable to the client code?

I don't think it makes sense. However you made me realize that the minimum amount of variations if added should be one. We can assume that, if the experiment is on for a single variation, both a baseline and a variation will be tested.

Regarding the variation cases you expose:

  • "User stickiness to a variation". Each client shall implement and execute same strategies in a deterministic way. Persistence is our friend here, be it a cookie or a user property from db that should be saved on first allocation and then passed as context in subsequent calls. I don't see a general solution right now.
  • "Equal size to each variation". Computing weights on strategies based on random or a specific distribution shall stand even across environments. Thus, we as client users shall create deterministic strategies.
  • "specify/override variations". Sure! Strategies should be aware of the variations they can resolve to (possibly injecting these in the params object served to the strategy). Have a devsExperiment multi-values strategy that uses a developerKey:variation list param and set it up in the first position of the experiment toggle with the specific values. Pass and evaluate the appropriate context to unleash.experiment and keep it special for your devs in the implementation of the multi-values strategy.

Do you know whether Google Optimize has the ability to discover users attending multiple experiments at the same time?

You mean a user on same page that runs multiple experiments? I have not tried it, but I would guess they do. Should be a matter of signaling each experiment-variation:

ga('set', 'exp', 'experiment1.variationA');
ga('set', 'exp', 'experiment2.anotherVariation');
...
// Events
ga('send', 'pageview');

Should the variation object also have a options map. Making it possible to define new variations on the fly. Example could be to set an option color and then you could define new experiments and defining the value of the color object in each variation.

Yes! That could be a useful feature to have when you consider your experiment needs more information than just a simple variation name. We may need to warn users against modifying these values after an experiment toggle is turned on.

@ivarconr
Copy link
Member

ivarconr commented Feb 2, 2018

Maybe the following way could allow more comparison flexibility

I think it makes sense to return a more complex object than just a string. This would allow us to add mote features later. It also makes sense, from our experience, to advice the user to dedicate one variant as the control group. This will make it easier to compare metrics from control against variants.

Something in the line of:

Variant variant = unleash.experiment("an.experiment.key", context, ...); 
if (variant.getName() === 'control') {
    // Serve control
} else if (variant.getName() === 'someVariant') {
    // Serve variant
} else if (variant.getName() === 'otherVariant') {
    // Serve other variant
} else {
    // Serve default, could be same behavior result as control, but no experiment-related metrics-tracking will be enabled. 
}

I think you suggestion in this PR is plausible and could be the way forward in order to introduce real support for experimental toggles. I also believe our internal A/B testing approach based on strategies can benefit and be simplified of variants where part of the toggle definition.

Next steps:

  1. Write a small PoC using the enhanced toggle definition to implement support for the experiment API in at least two client SDKs (java and node)
  2. Figure out a simple way's for the user to register a custom "ExperiementTracker", responsible for picking up which experiments a user takes part in and variants assigned. The experiment tracker should probably also tell unleash if the current user is already assigned an specific variation. The simplest version of the experiment tracker will probably use a cookie for storage.
  3. Decide on how to differ between "simple" toggles vs. "experiment" toggles. I think this can just be a flag on the toggle itself, which also should offer a new section in the UI where the user can configure variants. This should be pretty straight forward.
  4. Unleash counters: Should we tag the counters reported back to Unleash with details of which variant was served? This would probably provide confidence that Unleash is set up as expected.

Also;
I would like to make a formal v3 release before settling this PR. I hope to do this in a few days.

@ivarconr
Copy link
Member

ivarconr commented Feb 7, 2018

@vsandvold would you have capacity to do some PoC-ing with the java client to see if our current internal version could play well with the suggested format from this PR?

@vsandvold
Copy link
Contributor

@ivarcon Not for the next few weeks, I'm afraid, unless I find some time in between other tasks at work. I think it has a lot of promise though, so please it up :-)

@elhoyos
Copy link
Contributor Author

elhoyos commented Feb 8, 2018

There's a scenario I've been unable to solve for while implementing this in the node.js client and I believe needs the appropriate discussion.

Say you want to target and allocate according to the following:

1 devsExperiment (as shown above) OR
2 10% visitors [gradual rollout] AND
3 (NORWAY OR COLOMBIA) [geo] AND
4 DESKTOP [device]AND
5 90%: control, 10%: variant1 [weighted allocation]

If you were to implement this in the current proposal you will need two strategies one for line 1 (A) and one for lines 2-5 (B). The problem with this approach is that you cannot reuse inner AND strategies, so developers will need to create new complex strategies (e.g. B) every time they need a slightly different set of conditions.

Do you guys have any idea on how to solve it?

I have one but don't want to bias you.

@vsandvold
Copy link
Contributor

@elhoyos That's a very relevant use case. We could treat A and B as different segments, the first targeting developers and the second a specific group of users. That may be a suitable abstraction for OR and AND combinations, one that ideally should be reinforced visually by the UI.

Another way of grouping strategies could follow a stricter segment definition like "team, organization, rest of the world", with a more predefined setup of rules for each segment (like 100% team rollout, 50% organization, and 2% external).

But I'm just throwing idea around. Please tell us what you have in mind :-)

@ivarconr
Copy link
Member

ivarconr commented Feb 9, 2018

Do you guys have any idea on how to solve it?

Is this not exactly what grouping of strategies would solve, discussed in #229 ? Her we talk about introducing "strategy groups". A strategy group could have one or more strategies. Within a strategy group all strategies are ANDed, effectively scoping the targeted users for all strategies added in the group. Strategy-groups should be ORed as they should not affect each other.

Your example works if there only exist one variant, but what happens if there is 3 variants? Which one should be used in dev? As you suggested earlier there might be required to have some kind of "special" strategies where one can override variant for specific context attributes such as userId. Tehn the first group could be:

  1. devsExperiment AND overrideVariantForUser

@elhoyos
Copy link
Contributor Author

elhoyos commented Feb 12, 2018

Thank you both for your input. I was thinking to have something similar as the "ANDing" strategy you both suggest. Following the referenced issue is the way to go.

@sveisvei
Copy link
Contributor

I will have a go at adding AND to clients and UI.

@SimenB
Copy link
Contributor

SimenB commented Nov 21, 2018

What's the status here? 🙏

@ivarconr
Copy link
Member

This is scheduled for v4.x release. I finally seem to have some more time to dedicate to this project.

I am in the process of figuring out how to handle "multiple environments" / "grouping of strategies". I think this also will affect how we solve variations.

@batjko
Copy link

batjko commented Jan 22, 2019

Multi-variants and environments definitely seem to be the last big key features that would bring Unleash up to par with the big players.
Looking forward to v4 then, any rough timelines by any chance?

@ivarconr
Copy link
Member

I will work on variants in the coming weeks. Environment support will come after that.

@batjko
Copy link

batjko commented Jan 23, 2019

I will work on variants in the coming weeks. Environment support will come after that.

Thank you. Appreciate the effort you're putting into this!

@ivarconr ivarconr mentioned this pull request Jan 23, 2019
12 tasks
@ivarconr
Copy link
Member

needed to do a lot of re-basing and adjust format a bit, work continue in #379

@ivarconr ivarconr closed this Jan 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants