feat: Lazy identity-flag evaluation in local-eval mode#200
Conversation
gagantrivedi
left a comment
There was a problem hiding this comment.
I don't know if I'm overthinking it, but this pull request certainly makes me feel conflicted. On one hand, it does make optimisations that make sense; on the other, it blurs the boundary between engine and client that we worked very hard to establish.
|
|
||
| return flag | ||
|
|
||
| def _resolve_flag(self, feature_name: str) -> Flag: |
There was a problem hiding this comment.
Lazy path skips identity-key enrichment (multivariate + percentage_split break)
flagsmith/models.py:265–311 — Flags._resolve_flag
- L274–275: grabs context = self._context and overrides_index = self._overrides_index straight from the lazy Flags state.
- L280: feature_context = context["features"][feature_name] — reads the feature off the raw per-identity context that was stored in from_evaluation_context.
- L287–295: walks the segment overrides index, calls is_context_in_segment(context, segment_context) (L288). This is where PERCENTAGE_SPLIT rules silently fail —
context_matches_condition in the engine falls back to _get_identity_key(context) (engine evaluator.py:301), and that key is never set on this context. - L297–308: passes the same un-enriched context to get_flag_result_from_context. Inside the engine (evaluator.py:179, key = _get_identity_key(context)), key is
None, so the if key is not None and (variants := ...) branch (engine evaluator.py:183) is skipped — multivariate always returns the control value.
flagsmith/flagsmith.py:431–456 — _get_identity_flags_from_document
- L439–443: builds the per-identity context via map_context_and_identity_data_to_context(...).
- L444: branches on self.lazy_identity_evaluation.
- L448–456 (lazy branch): hands context straight to Flags.from_evaluation_context — no get_enriched_context call.
- L457–459 (eager branch): goes through engine.get_evaluation_result(context=context), which does call get_enriched_context at engine.py evaluator.py:60.
flagsmith/mappers.py:88–99 — map_context_and_identity_data_to_context
- L93–98: returns {**context, "identity": {"identifier": identifier, "traits": ...}}. Note no key. The eager path's enrichment fills this in; the lazy path
doesn't.
Maybe we should consider bringing back the testing harness we had that used to cover all SDKs but was limited to just the engine during the last context value work — to avoid issues like this.
There was a problem hiding this comment.
Thanks — all this fixed by using get_evaluation_result against a trimmed context in 5a7d7dc.
Maybe we should consider bringing back the testing harness we had that used to cover all SDKs but was limited to just the engine during the last context value work — to avoid issues like this.
I'll have a think on this... there's a bit of a chicken & egg problem when it comes to remote evaluation.
| for override in segment_context.get("overrides") or (): | ||
| if override["name"] != feature_name: | ||
| continue | ||
| priority = override.get("priority", float("inf")) |
There was a problem hiding this comment.
Should we consider import DEFAULT_PRIORITY from engine instead here?
There was a problem hiding this comment.
We now are — this is once again the engine's concern as per 5a7d7dc.
``get_identity_flags`` now returns a ``Flags`` that holds the
evaluation context plus a precomputed segment-overrides reverse index,
and resolves each feature on first access via the engine primitives
(``is_context_in_segment`` + ``get_flag_result_from_context``) rather
than running a full bulk evaluation up-front.
In environments shaped like the Slack-report customer (420 features,
30 CSV-IN segments, hot loop reading one boolean flag) this takes
``get_identity_flags().is_feature_enabled(name)`` from ~430 µs to
~1.85 µs per call; 200-segment envs go from ~1200 µs to ~2 µs. The
``.all_flags()`` materialisation path is never slower than the
eager baseline in the bench matrix.
Back-compat:
* ``Flags`` public API unchanged (``is_feature_enabled``,
``get_feature_value``, ``get_flag``, ``all_flags``).
* ``FlagResult`` construction reuses the same engine helper as the
bulk path — identical output shape.
* New ``lazy_identity_evaluation`` constructor kwarg, default
``True``, lets operators flip back to the eager path if they hit
an unexpected regression.
Engine contract is untouched: the SDK consumes only already-public
``flag_engine.segments.evaluator`` symbols.
beep boop
Picks up the IN segment-condition evaluation speedup (Flagsmith/flagsmith-engine#295), which cuts per-IN-condition latency on segment walks by roughly 30%. Complementary to the lazy identity evaluation added in this PR — most customer envs will benefit from both. beep boop
…on_result
Per review: keep the engine/client boundary intact and let the engine
handle all evaluation correctness — instead of reaching into
``is_context_in_segment`` / ``get_flag_result_from_context`` directly,
``Flags._resolve_flag`` now builds a *trimmed* context (the queried
feature plus only the segments that could override it, looked up via
the precomputed reverse index) and hands it to the engine's public
``get_evaluation_result``.
Side effects:
* Identity-key enrichment now runs on the lazy path (the engine's
``get_enriched_context`` is invoked internally), so multivariate
splits and ``PERCENTAGE_SPLIT`` rules behave correctly. Previously
the lazy path silently degraded these.
* Override-priority handling moves back into the engine — the
``float("inf")`` literal is gone from the SDK.
* ``Flags.all_flags`` switches to a single bulk
``get_evaluation_result`` rather than calling ``_resolve_flag`` per
feature; cheaper, and matches the eager path's call shape.
Trim cost is ~0.8 µs per call, so the lazy path is now ~2.6 µs mean /
~3.4 µs p99 against the customer's prod env (438 features, 23
segments) — still 150–220× faster than the eager path on every
percentile, and 100% routed through the engine's documented public API.
beep boop
05f6304 to
5a7d7dc
Compare
Lazy is now the only path through ``_get_identity_flags_from_document``. Per-flag resolution still goes via the engine's public ``get_evaluation_result`` (against a trimmed context), so callers get the perf win without the eager fallback or the kwarg surface. Tests pinned to the old eager path are reworked to mock ``flagsmith.models.engine.get_evaluation_result`` instead — that's where the call lands now — and trigger evaluation via ``.all_flags()`` when they need the bulk-context call shape. beep boop
Replaces the module-level helper with three fixtures:
* ``lazy_context_factory`` — keyword-only callable producing an
evaluation context, for tests that need a non-default shape.
* ``lazy_context`` — a default context (identity matches the
segment override).
* ``lazy_flags`` — a ``Flags`` built from the default context.
Tests now request whichever fixture suits their case instead of
calling the helper directly. While here, mark each test body with
Given / When / Then comments to match the rest of the file.
beep boop
emyller
left a comment
There was a problem hiding this comment.
Great work. Left a few non-blocking comments.
| PipelineAnalyticsProcessor | ||
| ] = None | ||
| self._evaluation_context: typing.Optional[SDKEvaluationContext] = None | ||
| self.__evaluation_context: typing.Optional[SDKEvaluationContext] = None |
There was a problem hiding this comment.
note: possible, but unharmful, misuse os Python name mangling.
| In lazy mode, the caller has signalled they want every flag, so | ||
| we run the bulk evaluator once on the full context and copy the | ||
| results into the per-flag cache. Cheaper than asking the engine | ||
| for each feature one at a time. | ||
|
|
There was a problem hiding this comment.
note: This comment, and a few similar others, read a bit like deixis. This SDK exports no such thing as "lazy mode" — seemingly, it's the only mode now.
| if ( | ||
| self._context is not None | ||
| and self._overrides_index is not None | ||
| and feature_name in (self._context.get("features") or {}) | ||
| ): |
There was a problem hiding this comment.
| if ( | |
| self._context is not None | |
| and self._overrides_index is not None | |
| and feature_name in (self._context.get("features") or {}) | |
| ): | |
| if ( | |
| and self._overrides_index is not None | |
| and feature_name in (self._context.get("features") or {}) | |
| ): |
Because we need to release this sooner rather than later, I am dismissing @gagantrivedi's previous request for changes, but will leave threads open for any follow-ups.
Adds lazy identity-flag evaluation to local-eval mode, addressing the regression reported in #198.
Flagsmith.get_identity_flagsreturns aFlagsthat holds the evaluation context plus a precomputed segment-overrides reverse index, and resolves each feature on first access by handing a trimmed context (the queried feature plus only the segments that could override it) toengine.get_evaluation_result. The engine remains the only public API the SDK consumes — identity-key enrichment, multivariate hashing, percentage-split rules and override-priority handling all stay in the engine.The reverse index is rebuilt inside the
_evaluation_contextsetter, so it stays in sync with environment refreshes with zero hot-path cost.Bench against real customer env (Sevenrooms QA, 438 features, 23 segments)
Hot loop of
get_identity_flags(...).is_feature_enabled(name), 20,000 iterations, single process, single identity.100–200× across every percentile vs any released version, on the customer's actual environment shape.
Back-compat
Flagspublic surface unchanged (is_feature_enabled,get_feature_value,get_flag,all_flags).FlagResultproduced viaengine.get_evaluation_result— identical output shape to the previous bulk-eval path.Flagsmith.__init__signature unchanged.Engine contract
Untouched. SDK only imports
flag_engine.engine.get_evaluation_resultand theSegmentContexttype alias.Tests
test_models.py(override match/no-match, per-flag caching,all_flagsmaterialisation, fallthrough to default handler, reverse-index correctness).test_flagsmith.py:get_identity_flagsdefers eval; touching one flag triggers exactly one engine call against a trimmed context (queried feature only).Follow-ups (not in this PR)
A separate optimisation skips the env-doc re-parse on no-change refreshes via the
x-flagsmith-document-updated-atresponse header — it eliminates the ~5 ms p99 GIL stall the polling thread otherwise imposes every refresh interval. That'll come as its own PR to keep this one focused.