feat(flags): preload referenced cohorts in flags hypercache#52023
feat(flags): preload referenced cohorts in flags hypercache#52023
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces /flags request latency and Postgres load in the Rust feature-flags service by preloading only the cohort definitions actually referenced by a team’s active feature flags (including transitive cohort-on-cohort dependencies) into the flags HyperCache payload written by Django.
Changes:
- Python: extracts referenced cohort IDs from active flag
filters.groups, BFS-loads cohorts + transitive dependencies (depth-limited), serializes them into the flags hypercache payload, and adds cohort-change signals to invalidate the flags cache. - Rust: extends the hypercache wrapper to optionally deserialize
cohorts(backwards compatible), threads cohorts through fetch/filter flows, and consumes preloaded cohorts inFeatureFlagMatcher(falling back toCohortCacheManagerwhen absent). - Tests: adds/updates Python unit tests for extraction/loading/serialization + signal filtering, and Rust tests for cohort deserialization and round-trip helpers.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/feature-flags/src/utils/test_graph_utils.rs | Updates test builders to include the new cohorts field in the wrapper/list structures. |
| rust/feature-flags/src/handler/tests.rs | Updates handler tests to include cohorts: None in constructed wrapper/list values. |
| rust/feature-flags/src/handler/flags.rs | Plumbs cohorts through fetch_and_filter so downstream evaluation can use it. |
| rust/feature-flags/src/handler/billing.rs | Updates billing tests for the expanded FeatureFlagList shape. |
| rust/feature-flags/src/flags/test_helpers.rs | Ensures redis/hypercache test helpers round-trip cohorts alongside flags + metadata. |
| rust/feature-flags/src/flags/test_flag_matching.rs | Updates flag matching tests to include cohorts: None. |
| rust/feature-flags/src/flags/flag_service.rs | Updates cache parsing to return (flags, evaluation_metadata, cohorts) and stores cohorts in FlagResult. |
| rust/feature-flags/src/flags/flag_models.rs | Adds optional cohorts to hypercache wrapper and adds cohorts to runtime-only FeatureFlagList. |
| rust/feature-flags/src/flags/flag_matching.rs | Consumes preloaded cohorts (via Option::take) in prepare_flag_evaluation_state, with PG fallback. |
| rust/feature-flags/src/flags/feature_flag_list.rs | Adds cohort deserialization support and tests for presence/absence of cohorts. |
| posthog/models/feature_flag/test/test_flags_cache.py | Adds tests for cohort extraction, BFS loading (deps/cycles/depth), serialization, and cache invalidation signal behavior. |
| posthog/models/feature_flag/flags_cache.py | Implements cohort extraction/BFS loading/serialization into the hypercache payload + cohort-change invalidation signals. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Adds cohort definitions to the feature flags hypercache so the Rust feature-flags service can skip a separate CohortCacheManager PostgreSQL query per request. Python computes referenced cohorts (including transitive dependencies via BFS) at cache-write time and includes them in the Redis payload. Rust deserializes the preloaded cohorts and uses them with a graceful fallback to the existing PG path when absent. Key changes: - Extract cohort IDs from flag filters (groups + super_groups) - BFS load cohorts with transitive deps (depth limit 20, cycle safe) - Serialize cohorts into hypercache payload alongside flags - Batch path collects all cohort IDs across teams, loads once with team_id__in - Rust: optional cohorts field on HypercacheFlagsWrapper (#[serde(default)]) - Rust: FeatureFlagMatcher consumes preloaded cohorts via take(), falls back to CohortCacheManager - Signal handler invalidates flags cache on cohort definition changes (skips recalculation-only saves)
Fix holdout key name in docstring and tighten BFS depth-limit test assertion to match actual implementation bound of 20 cohorts.
f978afb to
667cd8f
Compare
Updates internal documentation for PR #52023: - Add `cohorts` field to cache payload structure - Add Cohort preloading subsection (BFS, batch, rolling deploy safety) - Update Cohort cache invalidation entry (recalculation-only saves skipped) - Add Cohort signal handlers to signal handlers table
…2023) - Add cohorts field to HyperCache JSON example - Update backwards compatibility section to cover cohorts - Add Cohort data source subsection explaining preloaded vs fallback paths - Rename Cohort caching to Cohort caching (fallback) with context - Update data fetching strategy to reflect preloading behavior - Document flags_cohort_source_total metric
Problem
The Rust feature-flags service queries PostgreSQL via
CohortCacheManagerfor every team's cohorts on each/flagsrequest, even though the cohort definitions rarely change. This adds unnecessary latency and PG load, especially for teams with many cohort-based flags.Changes
Preloads cohort definitions (including transitive dependencies) into the flags hypercache at cache-write time so the Rust service can skip the separate PG query.
Python (cache write path):
groupsonly —super_groupsandholdout_groupscannot contain cohort properties)post_save/post_deletesignal onCohortto invalidate the flags cache on definition changes (skips recalculation-only saves via_COHORT_RECALCULATION_FIELDSsubset check)Rust (cache read path):
HypercacheFlagsWrappergains optionalcohortsfield with#[serde(default)]for backwards compatibilityFeatureFlagMatcherconsumes preloaded cohorts viaOption::take(), falls back toCohortCacheManagerwhen absentEvaluationMetadatagainsDefaultandPartialEqderivesRolling deploy safety:
cohortskey (nodeny_unknown_fields)cohortstoNoneand falls back to PGHow did you test this code?
Test plan
The automated unit tests cover extraction, BFS loading, serialization, and signal handler logic. This plan focuses on end-to-end scenarios that exercise the full Python → Redis → Rust pipeline.
Prerequisites
hogli start)1. Python: cache write path
_get_feature_flags_for_servicereturnsflags,evaluation_metadata, andcohortsflag_countandcohort_count2. Python: signal handler (cache invalidation)
is_calculating) do NOT trigger cache invalidationname) DO trigger cache invalidationcount+name) DO trigger invalidationpost_deletetransaction.on_commit, not direct task callFLAGS_REDIS_URLis not configured3. Rust: flag evaluation with preloaded cohorts
/flagswith cohorts in cache evaluates correctly (cohort flags:no_condition_match, circular:dependency_cycle_cohort, plain:condition_match)cohortskey falls back toCohortCacheManager, identical results4. Rolling deploy safety
cohorts): silently ignores unknown key, correct resultscohorts): falls back to PG, correct resultsPublish to changelog?
no