feat: add evaluate_flags() API for single-call flag evaluation#539
feat: add evaluate_flags() API for single-call flag evaluation#539
Conversation
Introduce posthog.evaluate_flags(distinct_id, ...) returning a FeatureFlagEvaluations snapshot. Branch on .is_enabled() / .get_flag() and pass the snapshot to capture() via a new flags option so events carry the exact values the code branched on, with no extra /flags request per capture. Filtering helpers .only_accessed() and .only([keys]) narrow the flag set attached to events. Pass flag_keys=[...] to evaluate_flags() to scope the underlying /flags request. Local evaluation is transparent. Deprecation (Phase 2 of the RFC): - feature_enabled, get_feature_flag, get_feature_flag_payload, and capture(send_feature_flags=...) now emit DeprecationWarning. - The deprecated surface remains functional and will be removed in the next major version. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703
`mock` is not in the project's test dependencies on CI. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703
posthog-python Compliance ReportDate: 2026-04-28 18:50:54 UTC ✅ All Tests Passed!30/30 tests passed Capture Tests✅ 29/29 tests passed View Details
Feature_Flags Tests✅ 1/1 tests passed View Details
|
Phase 2 deprecation warnings on `feature_enabled`, `get_feature_flag`, `get_feature_flag_payload`, and `capture(send_feature_flags=...)` are moved to a separate PR so this minor ships only the new API. Gives users one minor to migrate before runtime warnings start. The deprecated methods are restored to their original implementations (no longer need to bypass each other to avoid cascading warnings). Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703
Prompt To Fix All With AIThis is a comment left during a code review.
Path: posthog/test/test_evaluate_flags.py
Line: 42-44
Comment:
**Missing `tearDown` leaks Client background threads**
`TestEvaluateFlagsRemote` creates a `Client` in `setUp` but never calls `self.client.shutdown()`. The other two test classes in this file (`TestEvaluateFlagsFiltering`, `TestCaptureWithFlagsSnapshot`) both have correct `tearDown` implementations. Without shutdown, the client's background consumer thread is never joined, which can cause test-suite noise, delayed process exit, or flaky behaviour when mocks from one test bleed into the next.
```suggestion
class TestEvaluateFlagsRemote(unittest.TestCase):
def setUp(self):
self.client = Client(FAKE_TEST_API_KEY)
def tearDown(self):
self.client.shutdown()
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: posthog/feature_flag_evaluations.py
Line: 69-80
Comment:
**`flag_definitions_loaded_at` is accepted but never supplied**
`_record_access` (line 242) emits `$feature_flag_definitions_loaded_at` only when `self._flag_definitions_loaded_at is not None`, but `Client.evaluate_flags()` never passes this argument to the `FeatureFlagEvaluations` constructor — so the property is always `None` and the event property is never emitted for locally-evaluated flags. Either wire up the value from the poller or remove the dead parameter and the guarded block in `_record_access` until it's ready.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: posthog/test/test_evaluate_flags.py
Line: 42-321
Comment:
**Prefer parameterised tests**
The test suite exercises multiple flag types (boolean, variant, disabled, missing) through per-assertion branches inside single test methods — e.g. `test_is_enabled_returns_correct_values_and_fires_events` and `test_get_flag_returns_variant_or_bool_with_full_metadata`. The project's review standard is to prefer parameterised tests. Using `subTest` (or a library like `ddt`) would give each flag-type scenario its own named, independently-failing case and reduce the assertion density per method.
**Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "revert: drop deprecation warnings (Phase..." | Re-trigger Greptile |
- Add `tearDown` to `TestEvaluateFlagsRemote` so the Client's background consumer thread is joined between tests, matching the pattern in the other test classes in this file. - Remove the dead `flag_definitions_loaded_at` constructor parameter from `FeatureFlagEvaluations`. The Python poller doesn't currently expose a definitions-loaded timestamp, so the parameter was always None and the gated branch in `_record_access` never fired. Trim it rather than leaving a confusing no-op; can be re-added with a real data source later. - Convert the two flag-type-variation tests to parameterized form. `test_is_enabled` and `test_get_flag_known_keys` now run as independently-named cases per flag type, and the missing-key behavior is split into its own focused test. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703
dustinbyrne
left a comment
There was a problem hiding this comment.
Looks good! I have a few questions/considerations
| if not self._accessed: | ||
| self._host.log_warning( | ||
| "FeatureFlagEvaluations.only_accessed() was called before any flags were accessed — " | ||
| "attaching all evaluated flags as a fallback. See " | ||
| "https://posthog.com/docs/feature-flags/server-sdks for details." | ||
| ) | ||
| return self._clone_with(self._flags) |
There was a problem hiding this comment.
If this is a legitimate case of nothing being accessed, there's no way to avoid getting all flags. This could be more confusing than getting nothing?
E.g., I'm imaging a case where flags are first retrieved and consumed as needed. Should an early call to capture(flags=flags.only_accessed()) include all flags? I think it's contradictory with the name only_accessed
There was a problem hiding this comment.
Good catch — fixed in 95eb1e9. only_accessed() now honors its name and returns an empty snapshot when nothing has been accessed (no fallback, no warning). The previous behavior was a misguided safety net that ended up being more surprising than helpful for exactly the scenario you describe (early capture() before any branching). The Node-side equivalent has the same change queued.
Updated test: test_only_accessed_returns_empty_when_no_flags_accessed.
| ) | ||
| return to_flags_and_payloads(resp) | ||
|
|
||
| def get_flags_decision( |
There was a problem hiding this comment.
Should we deprecate the other methods?
There was a problem hiding this comment.
Reversed course on this — shipped Phase 2 in eda573d alongside Phase 1 in this PR (rather than splitting it into a follow-up).
feature_enabled, get_feature_flag, get_feature_flag_payload, and capture(send_feature_flags=...) now emit DeprecationWarnings pointing at evaluate_flags(). The methods continue to return the same values; users who pin warnings to errors will get a heads-up on first use, and the rest see them surface via pytest / python -W / IDE inspectors.
feature_enabled and get_feature_flag were restructured to call _get_feature_flag_result directly so a single user-level call emits exactly one warning instead of cascading three.
Phase 3 (removal in next major) is separate.
| local_result, fallback_to_server = self._get_all_flags_and_payloads_locally( | ||
| distinct_id, | ||
| groups=dict(groups), | ||
| person_properties=person_properties, | ||
| group_properties=group_properties, | ||
| flag_keys_to_evaluate=flag_keys, | ||
| ) |
There was a problem hiding this comment.
We should pass device_id here, as it may be present in the context (via tracing headers).
I don't think it's necessary to add it as a parameter of evaluate_flags as well, but I'll leave it up to you.
There was a problem hiding this comment.
Good catch — wired up in 95eb1e9. evaluate_flags() now resolves device_id from context (get_context_device_id()) at the top of the method, then forwards it to the get_flags_decision(...) call so it lands in the /flags request body. I went with your suggestion and didn't add it as a method parameter — context-only is cleaner and matches the existing distinct_id resolution pattern.
| if self._evaluated_at and not (flag and flag.locally_evaluated): | ||
| properties["$feature_flag_evaluated_at"] = self._evaluated_at | ||
| if flag is None: | ||
| properties["$feature_flag_error"] = "flag_missing" |
There was a problem hiding this comment.
Just noting that we're losing granularity of $feature_flag_error values here
There was a problem hiding this comment.
You're right — fixed in 95eb1e9. The snapshot now tracks response-level errors (errors_while_computing_flags, quota_limited) at construction and _record_access builds a comma-joined $feature_flag_error matching the single-flag path's granularity. So a missing flag during a quota-limited response now reports quota_limited,flag_missing instead of just flag_missing.
New test: test_errors_while_computing_flags_propagates_to_event covers both the standalone (errors_while_computing_flags) and combined (errors_while_computing_flags,flag_missing) cases.
Not covered yet: TIMEOUT/CONNECTION_ERROR/api_error_NNN. Those manifest as exceptions on the request and currently just log + leave the snapshot empty (so flags would surface as flag_missing). Could fold those into the snapshot in a follow-up if useful, but the most common cases (errors_while_computing and quota_limited) are now covered.
|
how does auto captured events and errors set |
i'd say we should deprecate and add a warning now otherwise theres multiple ways of doing the same thing and users and agents will be very confused |
Resolves a typing import conflict in client.py from main and bundles the review-feedback changes: - Fix changeset format per RELEASING.md (`pypi/posthog: minor`). - `only_accessed()` now returns empty when nothing was accessed, instead of falling back to all flags. The fallback contradicted the method name and surprised reviewers. - Propagate response-level errors (`errors_while_computing_flags`, `quota_limited`) into `\$feature_flag_called` events so each access carries the granular error code(s) the single-flag path emits. - Resolve `device_id` from context in `evaluate_flags()` and pass it through to the `/flags` request. Important for experience-continuity flag matching when the device id flows in via tracing headers. - Make the precedence between `flags` and `send_feature_flags` explicit: `flags` always wins, and we log a warning when both are passed. - Clarify the `flag_keys` doc on the module-level `evaluate_flags`. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703
Per reviewer feedback, ship Phase 2 in this PR alongside Phase 1 instead of splitting it into a follow-up. The deprecated methods continue to work — they just emit a `DeprecationWarning` pointing at `evaluate_flags()`: - `feature_enabled()` - `get_feature_flag()` - `get_feature_flag_payload()` - `capture(send_feature_flags=...)` (only when truthy) `feature_enabled` and `get_feature_flag` are restructured to call `_get_feature_flag_result` directly instead of routing through each other, so a single user-level call emits exactly one warning instead of cascading. Tests cover each warning's emission and the no-cascade behavior. Existing tests that use the legacy methods will now generate DeprecationWarnings but otherwise pass unchanged. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703
Per PR review feedback (manoel): manual exception captures should be able to attach a `FeatureFlagEvaluations` snapshot the same way `capture()` can, so `\$exception` events carry the same flag context as the rest of the request's events. `capture_exception` already accepted `**kwargs` and forwarded select ones to `capture()` — `flags` was just missing from the forwarded set. This doesn't yet solve the wider question of how *auto*-captured exceptions (sys.excepthook, context-block exception handler) attach flags — that requires a separate mechanism (likely context-stashed flags) and is a follow-up. Generated-By: PostHog Code Task-Id: b8a45b11-b41c-4995-8622-acea525e7703
Good question — the answer has two layers. Manual error captures: flags = posthog.evaluate_flags(distinct_id)
try:
risky_thing()
except Exception as e:
posthog.capture_exception(e, distinct_id=distinct_id, flags=flags)The same applies to anywhere the developer is the one calling Auto-captured exceptions (the case you're really asking about —
Happy to do (1) as a separate PR — felt out of scope for this one since it touches the context module and needs its own design pass. Worth tracking as a follow-up issue? |
You're right — reversed course on this and shipped Phase 2 in eda573d alongside Phase 1 in this PR. The PR description has been updated.
|
|
|
||
| The returned `FeatureFlagEvaluations` snapshot exposes `is_enabled()`, `get_flag()`, `get_flag_payload()` for branching and `only_accessed()` / `only([keys])` filter helpers. Pass `flag_keys=[...]` to `evaluate_flags()` to scope the underlying `/flags` request itself. | ||
|
|
||
| Deprecates `feature_enabled()`, `get_feature_flag()`, `get_feature_flag_payload()`, and `capture(send_feature_flags=...)`. They continue to work but now emit a `DeprecationWarning` pointing at `evaluate_flags()`. Removal is planned for the next major version. |
There was a problem hiding this comment.
Let's deprecate get_feature_flag_result as well. It lived a short life, but better to clean it up now.
|
@dustinbyrne you'll need to coordinate this with the @PostHog/team-docs-wizard as well otherwise the wizard will be lost or instrumenting new apps with deprecated APIs |
|
I think 'Stash flags on the context' makes sense |
Problem
Phase 1 + Phase 2 of the Server SDK Feature Flag Evaluations RFC for
posthog-python. Companion to the Node SDK PR (PostHog/posthog-js#3476).Today every flag check fires its own
/flagsrequest, andcapture(send_feature_flags=True)silently fires yet another on every captured event. The flag values on a captured event can diverge from the ones the code actually branched on when person/group properties differ between calls.send_feature_flagsalso attaches every evaluated flag to every event, which bloats properties on high-volume events.Changes
New API (Phase 1)
posthog.evaluate_flags(distinct_id, ...)returns aFeatureFlagEvaluationssnapshot:A single
/flagsrequest powers both branching and event enrichment.is_enabled()andget_flag()fire$feature_flag_calledevents (deduped through the existing cache) with the full metadata —$feature_flag_id,$feature_flag_version,$feature_flag_reason,$feature_flag_request_id— so experiment exposure tracking keeps working.Two layers of scoping
Network-level (
flag_keysoption): scopes the underlying/flagsrequest itself.Event-level (filter helpers): narrow which flags get attached to a captured event without re-fetching.
Deprecation warnings (Phase 2)
The legacy single-flag surface keeps working but now emits
DeprecationWarnings pointing atevaluate_flags():feature_enabled()get_feature_flag()get_feature_flag_payload()capture(send_feature_flags=...)(only when truthy)feature_enabledandget_feature_flagare restructured to call_get_feature_flag_resultdirectly instead of routing through each other, so a single user-level call emits exactly one warning instead of cascading.Phase 3 (removal in next major) ships separately.
Local evaluation
Transparent. When the poller resolves a flag, the snapshot carries
locally_evaluated=Trueand reason"Evaluated locally", matching whatget_feature_flag()emits today.Backwards compatibility
No breaking changes. All existing call paths return the same values they did before — the only behavior change is the new
DeprecationWarningemissions, which can be silenced via Python's standard warnings filter.Internals
_capture_feature_flag_calledwas refactored: the dedup + capture portion is extracted into_capture_feature_flag_called_if_needed, which is shared between the single-flag path and the newFeatureFlagEvaluationsobject. Both paths now dedupe identically.Response-level errors (
errors_while_computing_flags,quota_limited) are propagated into$feature_flag_calledevents from the snapshot, matching the granularity of the single-flag path.Tests
posthog/test/test_evaluate_flags.py— 27 tests covering remote evaluation, local evaluation, filtering helpers, capture integration,flag_keysround-trip, empty-distinct_id safety, error-granularity propagation, and deprecation warning emission (with no-cascade verification).Full suite: 489 passed.
ruff formatandruff checkclean.Created with PostHog Code