Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Flag Spec for integration libraries #10459

Closed
neilkakkar opened this issue Jun 23, 2022 · 6 comments
Closed

Feature Flag Spec for integration libraries #10459

neilkakkar opened this issue Jun 23, 2022 · 6 comments
Labels
enhancement New feature or request feature/feature-flags Feature Tag: Feature flags

Comments

@neilkakkar
Copy link
Collaborator

neilkakkar commented Jun 23, 2022

All of our client and server side libraries should support experience continuity and multivariate feature flags. This is the spec for functions we'd want to see:

Client Side Spec

  • A mechanism to send a POST request to /decide/?v=2 with the following data:

    // Content-Type: application/json
    {
        token: <posthog_project_api_key>,
        distinct_id: <user_distinct_id>
    }
    

    The response will contain an object at response['featureFlags'] with the flags that are enabled for this user, and the variant value, if any. The distinct ID should be already available from existing methods implemented in this lib.
    The response should be cached and refreshed periodically (for simple flags only. Doesn't apply (yet) to multivariate support.

  • Experience continuity: New feature to "persist flag across authentication steps"

    Whenever an $identify call is made for the first time for an anonymous user, we also want to send $anon_distinct_id as part of the decide request. See the posthog-js implementation

  • A method getFeatureFlag

    This method takes a feature flag key as an input and returns the variant value if multivariate flag, or otherwise a boolean indicating if the given flag is on or off for the user based on the response from /decide. See posthog-js.
    Whenever this method is called, we also send a $feature_flag_called event.

  • A method isFeatureEnabled
    This method takes a feature flag key as an input and returns a boolean indicating if the given flag is on or off for the user based on the response from /decide. See posthog-js.
    Whenever this method is called, we also send a $feature_flag_called event.

  • Set feature flag values on capture
    Whenever an event is captured, the library also sets the following two properties on every event:

    1. $feature/<feature-flag-name>, for each feature flag, with the value as the output of getFeatureFlag.
    2. $active_feature_flags, which is a list of all active feature flags.

Server Side Spec

On the server-side, the template to follow is posthog-python. The biggest difference is that server side libraries are stateless. We don't store distinctIDs, nor feature flags, because the same instance is often called for multiple users, depending on what requests are coming in.

  • Requests to /decide/?v=2 to get feature flags, the API contract is the same.

  • No experience continuity, since there's no anon_distinct_ids

  • A method getFeatureFlag, same as client side libraries, which also sends a $feature_flag_called event.

  • A method isFeatureEnabled, same as client side libraries, which also sends a $feature_flag_called event.

  • An option send_feature_flags on capture calls, which gets all feature flags for a given distinct ID, and adds them to the event, just like in client side libraries. See this PR

New addition: See comment #10459 (comment)

@neilkakkar neilkakkar added enhancement New feature or request feature/feature-flags Feature Tag: Feature flags labels Jun 23, 2022
@neilkakkar
Copy link
Collaborator Author

cc: @liyiy @alexkim205

@weyert
Copy link
Contributor

weyert commented Jun 23, 2022

You might want to have a look at the OpenFeature specification www.open-feature.dev

@neilkakkar
Copy link
Collaborator Author

neilkakkar commented Jul 25, 2022

Local evaluation for server side libraries

Flag definition loading

On initialization, the library makes one request:

  1. To /api/feature_flag/local_evaluation to get the flag definitions of all feature flags.

This request returns an object with two keys:

  1. flags, which is a list of flags with their definition.
    1. Note that: This endpoint parses cohorts that depend on person/group properties only to reflect the same.
  2. group_type_mapping, which is a mapping from group type index to the group type.

Experience continuity

If a flag has experience continuity enabled, then it's never evaluated locally.

Active flags

Consider this case today:
- flag is active, locally evaluated, but running into performance issues, so you want to disable it.
- one poll interval later, the flag is gone from local_evaluation, so now you’re falling back on /decide to evaluate it.
- this is not great, because you’ve just replaced a local evaluated flag with a remote one, which by default will cause some perf issues, since it adds 200-500ms to an operation that was taking 10ms.

Thus, when local evaluating, get non-active flags as well from /local_evaluation, and check if the flag is active or not. If not active, return False.

Debug logging

To make it easier for developers to debug when a flag is locally vs remotely evaluated, log this.

  1. Whenever we have an inconclusive match, i.e. can't make a decision about the flag locally, log the reason so people using the library don’t have to guess whether the code they are writing is going to /decide/ or not.
  2. And whenever we go to /decide, log that as well.

Person and Group properties

For local evaluation, we expect the client to provide person and group properties alongside any isFeatureEnabled, getFeatureFlag call.

This changes the interface from:

def get_feature_flag(self, key, distinct_id, groups={}):

to

def get_feature_flag(self, key, distinct_id, groups={}, person_properties={}, group_properties={}):

For the flag in question, if all required property keys are present, and the operator is not is_not_set, then the flag is evaluated locally.

In every other case, we fall back to a /decide API call.

Note: Whenever we fallback to /decide, we should pass all group_properties and person_properties that were called with the get_feature_flag call to `/decide.

Handling Cohorts

For cohorts, we fall back to /decide.

Hash Identifiers

If flag evaluation conditions above are met, we then determine if rollout percentage conditions are met. To do this, we generate a hash as follows:

### Non-multivariate flags

The hash key is

        hash_key = f"{feature_flag.key}.{distinctID or groupKey}"

and the value is

        hash_val = int(hashlib.sha1(hash_key.encode("utf-8")).hexdigest()[:15], 16)

divided by the largest float, float(0xFFFFFFFFFFFFFFF), to give a number between 0 and 1.

That is, the first 16 characters of a sha1 hash, parsed as an integer in base 16.

If this value is less than or equal to the rollout %, then the flag evaluates to True.

Multivariate flags

For multivariate flags, the value is calculated the same way.

The only two differences are the hash key, and which variant is assigned based on rollout %s.

The hash key is

        hash_key = f"{feature_flag.key}.{distinctID or groupKey}variant"

That is, there is a salt added to the key, called variant.

To determine which variant the flag belongs to, we generate a variant lookup table that looks like so:

value_min = 0
for variant in feature_flag.variants:
            value_max = value_min + variant["rollout_percentage"] / 100
            lookup_table.append({"value_min": value_min, "value_max": value_max, "key": variant["key"]})
            value_min = value_max

then to compute the variant, if hash_value >= value_min and hash_value < value_max, then it's the right variant.


Test Coverage

Since it's easy to make mistakes here, we also want a consistent test suite across all libraries. Here are some expected tests and the subsequent results:

General Recommendations

  1. get_feature_flag and is_feature_enabled should follow the exact same codepaths. This ensures that we don't have to double the amount of tests we write, testing for both.
  2. We are getting rid of the concept of is_simple_flag, and try and calculate all flags locally. This can fail whenever the properties we're looking for are not present, which then fall backs to /decide, failing which, it falls back to the default value.
  3. Remember to patch API calls, all these tests should be blazing fast. (On posthog-python, it takes less than 1 second to run all these tests)
  4. We don't need a personal_api_key to call /decide/, in any case! We only need it to download flag definitions, which is via the api/feature_flag/local_evaluation/ endpoint. Always remember to pass ?token={} param to this call, which ensures it gets the flags for the right project.
  5. While you're at this, ensure that $feature_flag_called is called only once per distinctID per flag
  6. The change in the API endpoint means this wouldn't work for self hosted users, which is a breaking change! Bump Major version, and add the changelog that you need ATLEAST PostHog 1.38 for this new major version.

Consistency Tests

These tests ensure that the hashes are computed correctly. These take the result from the server, and ensure that client libs match these results. Every library must have 2 of these tests, which replicates the server tests here.

  1. Simple flags consistency test
  2. Multivariate flags

Fallback test

  1. Ensure that flags ball back to /decide when results are inconclusive: Property type is cohort, or when all properties aren't present.

Default value Tests

We expect 2 tests here as well.

  1. Flag passes local evaluation, and uses the same value, no matter the default. Same for a second flag with results coming from decide.
  2. /decide errors out, so flag value is whatever the default value is.

Single Property Matching Tests

For each property type, ensure that the results are what we'd expect.

8 tests (one per operator)

Property Group Matching Tests

Feature flag definitions look something like:

groups: [
    { properties: [], rollout_percentage: k},
    { properties: [], rollout_percentage: k},
]

Here, all groups are OR'ed, while all properties in the arrays are AND'ed. Have a complex test to test all permutations here.

And some special edge cases:

  1. No property for one group, but property for a second group that passes. This shouldn't fall back on decide, even if not all properties for the flag are present

Person and Group property matching

1 test each for simple person and group property matching.

Experience continuity test

If flag has continuity enabled, never try to calculate locally.

Get All flags

  1. If no flags available locally, fall back to decide
  2. Response from decide overrides local response, if some flags were computed locally
  3. No $feature_flag_called capture events are fired
  4. If all flags are computed locally, no fallback

Memory Management

For long running libraries, the caching we do to check if we've sent a $feature_flag_called for this user can explode. Thus we need to make sure there's a size limit on this cache.

This event follows atleast once semantics, so it's fine if we send the event again for a single user, so just clear the cache when it reaches a certain size. Python library has this at 100,000 people right now.

Special Edge cases

  1. No property for one group, but property for a second group that passes

@neilkakkar
Copy link
Collaborator Author

To enable client-side libs to not wait for flags to parse, we can have an option to initialise the library with a distinctID, and corresponding flags. To make this seamless though, our server side libs need another method: get_all_flags(). I think we should add this now!

Spec looks like this:

Only for server-side libraries.

Function signature is: get_all_flags(distinctID, groups, person_properties, group_properties) , which is very similar to the signature of get_feature_flag.

It computes locally what it can, and then sends a request to /decide for the remaining flags.

The final return type is a dict of flag_key and values.

@neilkakkar
Copy link
Collaborator Author

neilkakkar commented Aug 2, 2022

After playing around a bit, I'd like to add a few more things ( and noticed a few more missing things):

  1. It can get very stressful to ensure that your flag is always computed locally (for example, maybe you made an error, updated feature flag definitions, etc. etc.). Specially in latency sensitive applications, adding a backend delay of upto 500ms because you changed a feature flag in PostHog can be disastrous. Thus, I want to give an option to only compute a given flag locally. That is, get_feature_flag(**existing_args, only_evaluate_locally = False).

  2. $feature_flag_called should get a locally_evaluated property. Also, should be sent not just once per person, but once per person per response value (i.e. if it returned true, and then false, we should send another event)

  3. $feature_flag_called should also pass the groups property coming to feature flags.

  4. Ensure that poll interval is customisable by client.

And one more questionable thing:
5. Have a configuration option for specific calls that determines whether $feature_flag_called events should be sent

@neilkakkar
Copy link
Collaborator Author

Default results parameter in the spec has been confusing users, specially around when it applies. So:

  1. We get rid of defaultResult in all our libraries.
  2. We return None | Nil | undefined when we can’t find the flag result / decide errors out / only evaluating locally & inconclusive match etc. etc.
  3. When decide is successful, we never return undefined, it’s either false, or true, or string.
  4. This way, clients can add their own defaults to the value, only when it returns undefined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature/feature-flags Feature Tag: Feature flags
Projects
None yet
Development

No branches or pull requests

2 participants