Skip to content

Added remote feature-flag override support to the labs service#28251

Draft
jonatansberg wants to merge 8 commits into
mainfrom
feature/remote-feature-flags
Draft

Added remote feature-flag override support to the labs service#28251
jonatansberg wants to merge 8 commits into
mainfrom
feature/remote-feature-flags

Conversation

@jonatansberg
Copy link
Copy Markdown
Member

Summary

Adds an optional remote override layer to the labs feature-flag service. When configured, Ghost periodically fetches a small JSON manifest of flag overrides from a URL and overlays it on the local flag state, so flag values can be changed for a managed instance without cutting a release.

Off by default and fully inert unless remoteFlags is configured with a URL (and the instance has a hostSettings:siteId), so self-hosted and development installs are completely unaffected.

How it works

  • Resolver (services/remote-flags/resolve.js): a pure function that turns a sparse JSON manifest into a flat {flag: boolean} map for this instance. Each entry is a bare boolean (full override) or {value, percent} (a deterministic, stable percentage ramp). Validated with Zod; only flags already known to labs are honored; invalid or unknown entries are skipped individually.
  • Overlay (shared/labs.js): getAll() applies the resolved overrides at precedence config.labs > remote > GA_FEATURES > DB settings. An explicit config.labs pin always wins, and members is never remote-overridable.
  • Poller (services/remote-flags/): a fail-open background poller (conditional GET with ETag, last-known-good on error, empty on 404, never throws out of the loop). Started fire-and-forget from boot and gated entirely behind the remoteFlags config block.

Safety

  • Off by default; gated on explicit config plus a site id.
  • Fail-open: a missing, unreachable, or invalid manifest never changes flag state in a way that can break a site — it falls back to local flag state.
  • config.labs stays the unbeatable backstop, so a locally pinned flag cannot be overridden remotely.
  • Only flags already defined in labs.js can be targeted; defining a brand-new flag still requires a release.

Tests

Unit tests cover the resolver (including a pinned golden bucket value to guard ramp stability), the labs overlay precedence, the poller fail-open/caching/timer behaviour, and the config gating — plus an end-to-end test that a remote kill of a GA flag flips it off through the real labs.getAllFlags() set.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 29700872-bae4-4d19-ba8a-cc33052122e7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/remote-feature-flags

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@rob-ghost rob-ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the main comment regarding the override store and who owns it, I think all new files should be TS so we can convert those in this PR before merge which might also tidy up some things.

Comment thread ghost/core/core/server/services/remote-flags/index.js Outdated
Comment thread ghost/core/core/server/services/remote-flags/index.js Outdated
Comment thread ghost/core/core/server/services/remote-flags/index.js Outdated
- a pure, dependency-free function that resolves a sparse JSON manifest of flag
  overrides into a flat {flag: boolean} map for a single instance, given the
  instance's site id and the set of known flags
- supports bare-boolean full overrides and {value, percent} ramps bucketed by a
  stable md5(flag:siteId) % 100 hash, so raising a percent only ever adds sites
- only honors keys in the supplied known-flags list and skips malformed/unknown
  entries individually: one bad entry can't discard the manifest, and the
  manifest can never introduce a flag core does not define
- never throws on hostile input (null options, non-array knownFlags, non-scalar
  siteId); a golden bucket value is pinned in tests to guard ramp stability
- standalone and inert until wired into the labs service in a later commit
- labs.getAll() now overlays a remote override map between GA_FEATURES and
  config.labs, giving precedence config.labs > remote > GA_FEATURES > DB settings
- a remote entry can kill a GA-forced flag or beat a DB value, but an explicit
  config.labs pin still wins, and members is recomputed after the overlay so
  remote can never change it
- the store is module-global, shallow-copied on set so callers can't mutate live
  flag state, and fully inert when empty: no behaviour change until a later commit
  pushes resolved overrides in via setRemoteOverrides
- GA kill-switches and members cannot persist into the DB because the settings
  input serializer filters writes to WRITABLE_KEYS_ALLOWLIST (disjoint from GA)
- a fail-open background poller that fetches a JSON manifest over HTTP, resolves
  it for this instance, and pushes the result into labs via an injected sink
- conditional GET (ETag/If-None-Match); keeps last-known-good on network error,
  bad status, or unparseable body; applies empty on 404; never throws out of the
  poll loop, with a defensive backstop around resolve/apply
- the ETag is committed only after a manifest is successfully applied, so an
  apply failure cannot poison the cache into a 304 that skips the retry
- emits a structured remote_flags.applied event only when the resolved set
  changes, keyed on a sorted-key serialization so a mere key reorder does not
  emit a burst of spurious logs
- self-scheduling jittered poll with an unref'd timer, idempotent start, a single
  outstanding timer, and a re-entrancy guard so refreshes never race the ETag
- HTTP client, known-flags source, timing and randomness are injected so the
  fetch/cache/fail-open logic is fully unit-tested without real I/O or timers
- not wired into boot yet; inert until constructed and started
- adds the remoteFlags config block (default enabled:false, url:null), a gated
  service index, and a hook in initBackgroundServices
- opt-in: init() constructs and starts the poller only when remoteFlags.enabled
  is true with a valid url and a hostSettings:siteId is present; otherwise it
  returns null and the feature is a complete no-op, so self-hosted and dev
  installs are untouched
- the manifest url is validated once at start so a misconfigured value fails
  loudly instead of warning on every poll
- start() is fire-and-forget so boot is never blocked on (or failed by) the first
  manifest fetch; the service is fail-open
- adds an end-to-end test that a remote kill of a GA flag flips it off through
  the actual labs.getAllFlags() set, guarding the kill-switch contract
…knobs

- the resolver now validates each manifest entry with a Zod schema
  (bool | {value, percent?}) instead of hand-rolled type checks, aligning with
  the rest of the codebase (zod@4 is already used across apps). The known-flags
  list is still supplied by the labs service (no second list), and invalid or
  unknown entries are still skipped per-entry
- behaviour is unchanged for valid JSON; the only difference is that a non-finite
  or non-numeric percent (Infinity/NaN/null/string) is now rejected and skipped
  rather than coerced, which is more correct and cannot occur in real JSON
- removed the pollInterval/jitter config reads in the service index: they were
  never set anywhere, so they were dead knobs that looked tunable but were not.
  The service keeps its constructor params (used by tests) and built-in defaults
- fixed a garbled doc comment on stop()
… to TS

Addresses review feedback on the remote feature-flags PR:

- Lifted the override state out of labs into a dedicated shared
  labs-flag-overrides store. labs now only reads it and the remote-flags
  service is the sole writer, so labs no longer exposes a write API that
  only remote-flags used.
- Converted the remote-flags source (resolve, service, index) to
  TypeScript. init() now builds and hands the service a ready URL object,
  removing the bare `new URL` and its eslint-disable.
- Injected config and labs into init() instead of importing them as
  module globals, keeping the gating logic a pure function of its inputs.
- Dropped the known-flag allowlist from the resolver. The manifest can now
  carry any flag key, including ones the backend does not define yet, so the
  frontend and backend can ship on different cadences. Per-entry validation
  stays, values remain boolean-only, and prototype-dangerous keys
  (__proto__/constructor/prototype) are skipped explicitly in place of the
  allowlist guard.
- init() no longer takes labs (it was only the allowlist source); config is
  still injected.
- Added a remoteFlags.pollInterval config knob (ms). It is validated as a
  positive finite number and falls back to the 5-minute default, warning on
  a bad value so a typo can neither stop polling nor hammer the CDN.
…uard

From an adversarial review of the implementation:

- Critical: the service handed a URL object to @tryghost/request, which
  validates with validator.isURL (strings only) and rejected every poll with
  URL_MISSING_INVALID. The poller silently applied nothing in production. Now
  the client is called with url.href, with a regression test asserting the arg
  is the string href. RequestFn is narrowed to string.
- Added a 60s floor for the configured pollInterval. A too-small or
  units-confused value (e.g. seconds-as-ms) is rejected with a warning rather
  than honored, so a typo cannot make ~30k containers hammer the CDN.
- Expanded the unsafe-key skip in the resolver from a hand-listed set to every
  Object.prototype member, so a manifest key like "toString" cannot shadow a
  built-in on the labs object and throw for callers.
- Clarified the JSON.stringify key-allowlist used for change detection.
@jonatansberg jonatansberg force-pushed the feature/remote-feature-flags branch from 1059a69 to 8f32423 Compare June 3, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants