feat: Lazy identity-flag evaluation in local-eval mode by khvn26 · Pull Request #200 · Flagsmith/flagsmith-python-client

khvn26 · 2026-04-24T20:10:29Z

Adds lazy identity-flag evaluation to local-eval mode, addressing the regression reported in #198.

Flagsmith.get_identity_flags returns a Flags that holds the evaluation context plus a precomputed segment-overrides reverse index, and resolves each feature on first access by handing a trimmed context (the queried feature plus only the segments that could override it) to engine.get_evaluation_result. The engine remains the only public API the SDK consumes — identity-key enrichment, multivariate hashing, percentage-split rules and override-priority handling all stay in the engine.

The reverse index is rebuilt inside the _evaluation_context setter, so it stays in sync with environment refreshes with zero hot-path cost.

Bench against real customer env (Sevenrooms QA, 438 features, 23 segments)

Hot loop of get_identity_flags(...).is_feature_enabled(name), 20,000 iterations, single process, single identity.

Version	mean	p50	p99	p99.9	max
flagsmith 3.8.0 (flag-engine 5.4.3)	434 µs	408 µs	726 µs	2.59 ms	25.7 ms
flagsmith 5.2.0 (flag-engine 10.0.4)	442 µs	403 µs	572 µs	4.52 ms	42.4 ms
this PR	2.71 µs	2.54 µs	3.83 µs	27.0 µs	183 µs

100–200× across every percentile vs any released version, on the customer's actual environment shape.

Back-compat

Flags public surface unchanged (is_feature_enabled, get_feature_value, get_flag, all_flags).
FlagResult produced via engine.get_evaluation_result — identical output shape to the previous bulk-eval path.
Flagsmith.__init__ signature unchanged.

Engine contract

Untouched. SDK only imports flag_engine.engine.get_evaluation_result and the SegmentContext type alias.

Tests

8 new unit tests in test_models.py (override match/no-match, per-flag caching, all_flags materialisation, fallthrough to default handler, reverse-index correctness).
1 new integration test in test_flagsmith.py: get_identity_flags defers eval; touching one flag triggers exactly one engine call against a trimmed context (queried feature only).
94 passing, mypy strict clean, black/isort clean.

Follow-ups (not in this PR)

A separate optimisation skips the env-doc re-parse on no-change refreshes via the x-flagsmith-document-updated-at response header — it eliminates the ~5 ms p99 GIL stall the polling thread otherwise imposes every refresh interval. That'll come as its own PR to keep this one focused.

gagantrivedi

I don't know if I'm overthinking it, but this pull request certainly makes me feel conflicted. On one hand, it does make optimisations that make sense; on the other, it blurs the boundary between engine and client that we worked very hard to establish.

gagantrivedi · 2026-04-28T09:21:34Z


        return flag

+    def _resolve_flag(self, feature_name: str) -> Flag:


Lazy path skips identity-key enrichment (multivariate + percentage_split break)

flagsmith/models.py:265–311 — Flags._resolve_flag

L274–275: grabs context = self._context and overrides_index = self._overrides_index straight from the lazy Flags state.

L280: feature_context = context["features"][feature_name] — reads the feature off the raw per-identity context that was stored in from_evaluation_context.

L287–295: walks the segment overrides index, calls is_context_in_segment(context, segment_context) (L288). This is where PERCENTAGE_SPLIT rules silently fail —
context_matches_condition in the engine falls back to _get_identity_key(context) (engine evaluator.py:301), and that key is never set on this context.

L297–308: passes the same un-enriched context to get_flag_result_from_context. Inside the engine (evaluator.py:179, key = _get_identity_key(context)), key is
None, so the if key is not None and (variants := ...) branch (engine evaluator.py:183) is skipped — multivariate always returns the control value.

flagsmith/flagsmith.py:431–456 — _get_identity_flags_from_document

L439–443: builds the per-identity context via map_context_and_identity_data_to_context(...).

L444: branches on self.lazy_identity_evaluation.

L448–456 (lazy branch): hands context straight to Flags.from_evaluation_context — no get_enriched_context call.

L457–459 (eager branch): goes through engine.get_evaluation_result(context=context), which does call get_enriched_context at engine.py evaluator.py:60.

flagsmith/mappers.py:88–99 — map_context_and_identity_data_to_context

L93–98: returns {**context, "identity": {"identifier": identifier, "traits": ...}}. Note no key. The eager path's enrichment fills this in; the lazy path
doesn't.

Maybe we should consider bringing back the testing harness we had that used to cover all SDKs but was limited to just the engine during the last context value work — to avoid issues like this.

Thanks — all this fixed by using get_evaluation_result against a trimmed context in 5a7d7dc.

Maybe we should consider bringing back the testing harness we had that used to cover all SDKs but was limited to just the engine during the last context value work — to avoid issues like this.

I'll have a think on this... there's a bit of a chicken & egg problem when it comes to remote evaluation.

gagantrivedi · 2026-04-28T09:25:21Z

+            for override in segment_context.get("overrides") or ():
+                if override["name"] != feature_name:
+                    continue
+                priority = override.get("priority", float("inf"))


Should we consider import DEFAULT_PRIORITY from engine instead here?

We now are — this is once again the engine's concern as per 5a7d7dc.

``get_identity_flags`` now returns a ``Flags`` that holds the evaluation context plus a precomputed segment-overrides reverse index, and resolves each feature on first access via the engine primitives (``is_context_in_segment`` + ``get_flag_result_from_context``) rather than running a full bulk evaluation up-front. In environments shaped like the Slack-report customer (420 features, 30 CSV-IN segments, hot loop reading one boolean flag) this takes ``get_identity_flags().is_feature_enabled(name)`` from ~430 µs to ~1.85 µs per call; 200-segment envs go from ~1200 µs to ~2 µs. The ``.all_flags()`` materialisation path is never slower than the eager baseline in the bench matrix. Back-compat: * ``Flags`` public API unchanged (``is_feature_enabled``, ``get_feature_value``, ``get_flag``, ``all_flags``). * ``FlagResult`` construction reuses the same engine helper as the bulk path — identical output shape. * New ``lazy_identity_evaluation`` constructor kwarg, default ``True``, lets operators flip back to the eager path if they hit an unexpected regression. Engine contract is untouched: the SDK consumes only already-public ``flag_engine.segments.evaluator`` symbols. beep boop

Picks up the IN segment-condition evaluation speedup (Flagsmith/flagsmith-engine#295), which cuts per-IN-condition latency on segment walks by roughly 30%. Complementary to the lazy identity evaluation added in this PR — most customer envs will benefit from both. beep boop

…on_result Per review: keep the engine/client boundary intact and let the engine handle all evaluation correctness — instead of reaching into ``is_context_in_segment`` / ``get_flag_result_from_context`` directly, ``Flags._resolve_flag`` now builds a *trimmed* context (the queried feature plus only the segments that could override it, looked up via the precomputed reverse index) and hands it to the engine's public ``get_evaluation_result``. Side effects: * Identity-key enrichment now runs on the lazy path (the engine's ``get_enriched_context`` is invoked internally), so multivariate splits and ``PERCENTAGE_SPLIT`` rules behave correctly. Previously the lazy path silently degraded these. * Override-priority handling moves back into the engine — the ``float("inf")`` literal is gone from the SDK. * ``Flags.all_flags`` switches to a single bulk ``get_evaluation_result`` rather than calling ``_resolve_flag`` per feature; cheaper, and matches the eager path's call shape. Trim cost is ~0.8 µs per call, so the lazy path is now ~2.6 µs mean / ~3.4 µs p99 against the customer's prod env (438 features, 23 segments) — still 150–220× faster than the eager path on every percentile, and 100% routed through the engine's documented public API. beep boop

Lazy is now the only path through ``_get_identity_flags_from_document``. Per-flag resolution still goes via the engine's public ``get_evaluation_result`` (against a trimmed context), so callers get the perf win without the eager fallback or the kwarg surface. Tests pinned to the old eager path are reworked to mock ``flagsmith.models.engine.get_evaluation_result`` instead — that's where the call lands now — and trigger evaluation via ``.all_flags()`` when they need the bulk-context call shape. beep boop

Replaces the module-level helper with three fixtures: * ``lazy_context_factory`` — keyword-only callable producing an evaluation context, for tests that need a non-default shape. * ``lazy_context`` — a default context (identity matches the segment override). * ``lazy_flags`` — a ``Flags`` built from the default context. Tests now request whichever fixture suits their case instead of calling the helper directly. While here, mark each test body with Given / When / Then comments to match the rest of the file. beep boop

beep boop

emyller

Great work. Left a few non-blocking comments.

emyller · 2026-04-28T16:59:08Z

            PipelineAnalyticsProcessor
        ] = None
-        self._evaluation_context: typing.Optional[SDKEvaluationContext] = None
+        self.__evaluation_context: typing.Optional[SDKEvaluationContext] = None


note: possible, but unharmful, misuse os Python name mangling.

emyller · 2026-04-28T17:37:01Z

+        In lazy mode, the caller has signalled they want every flag, so
+        we run the bulk evaluator once on the full context and copy the
+        results into the per-flag cache. Cheaper than asking the engine
+        for each feature one at a time.
+


note: This comment, and a few similar others, read a bit like deixis. This SDK exports no such thing as "lazy mode" — seemingly, it's the only mode now.

emyller · 2026-04-28T17:40:09Z

+            if (
+                self._context is not None
+                and self._overrides_index is not None
+                and feature_name in (self._context.get("features") or {})
+            ):


Suggested change

if (

self._context is not None

and self._overrides_index is not None

and feature_name in (self._context.get("features") or {})

):

if (

and self._overrides_index is not None

and feature_name in (self._context.get("features") or {})

):

@gagantrivedi

Because we need to release this sooner rather than later, I am dismissing @gagantrivedi's previous request for changes, but will leave threads open for any follow-ups.

khvn26 requested a review from a team as a code owner April 24, 2026 20:10

khvn26 requested review from gagantrivedi and removed request for a team April 24, 2026 20:10

gagantrivedi previously requested changes Apr 28, 2026

View reviewed changes

khvn26 marked this pull request as draft April 28, 2026 09:57

khvn26 added 3 commits April 28, 2026 11:21

khvn26 force-pushed the feat/lazy-identity-flags branch from 05f6304 to 5a7d7dc Compare April 28, 2026 10:24

khvn26 added 3 commits April 28, 2026 11:29

style: Use single backticks in comments and docstrings

7ce45cf

beep boop

khvn26 marked this pull request as ready for review April 28, 2026 10:40

khvn26 requested a review from gagantrivedi April 28, 2026 10:43

khvn26 assigned gagantrivedi and emyller Apr 28, 2026

emyller approved these changes Apr 28, 2026

View reviewed changes

emyller merged commit f7750bb into main Apr 28, 2026
6 checks passed

emyller deleted the feat/lazy-identity-flags branch April 28, 2026 17:57

flagsmithdev mentioned this pull request Apr 28, 2026

chore(main): release 5.3.0 #201

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Lazy identity-flag evaluation in local-eval mode#200

feat: Lazy identity-flag evaluation in local-eval mode#200
emyller merged 6 commits intomainfrom
feat/lazy-identity-flags

khvn26 commented Apr 24, 2026 •

edited

Loading

Uh oh!

gagantrivedi left a comment

Uh oh!

gagantrivedi Apr 28, 2026

Uh oh!

khvn26 Apr 28, 2026 •

edited

Loading

Uh oh!

gagantrivedi Apr 28, 2026

Uh oh!

khvn26 Apr 28, 2026

Uh oh!

emyller left a comment

Uh oh!

emyller Apr 28, 2026

Uh oh!

emyller Apr 28, 2026

Uh oh!

emyller Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		return flag

		def _resolve_flag(self, feature_name: str) -> Flag:

Conversation

khvn26 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bench against real customer env (Sevenrooms QA, 438 features, 23 segments)

Back-compat

Engine contract

Tests

Follow-ups (not in this PR)

Uh oh!

gagantrivedi left a comment

Choose a reason for hiding this comment

Uh oh!

gagantrivedi Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

khvn26 Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gagantrivedi Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

khvn26 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

emyller left a comment

Choose a reason for hiding this comment

Uh oh!

emyller Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

emyller Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

emyller Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

khvn26 commented Apr 24, 2026 •

edited

Loading

khvn26 Apr 28, 2026 •

edited

Loading