Skip to content

Add API-controlled rollout control plane#115

Merged
chrisbliss18 merged 8 commits into
v2from
feature/api-driven-container-rollout
May 15, 2026
Merged

Add API-controlled rollout control plane#115
chrisbliss18 merged 8 commits into
v2from
feature/api-driven-container-rollout

Conversation

@chrisbliss18
Copy link
Copy Markdown
Contributor

@chrisbliss18 chrisbliss18 commented May 15, 2026

Summary

Adds the API-driven rollout control plane needed for containerized v1-to-v2 migration where operators may not have shell access to the Monitor containers.

What changed

  • Adds ROLLOUT_MODE with active, standby, and api-controlled so v2 Monitors can run safely without claiming buckets until explicitly activated through the API.
  • Adds rollout API migrations for durable sessions, range locks, per-bucket locks, job records, hashed short-lived confirmation tokens, persisted method-comparison deltas, and staged policy rollback history.
  • Adds admin-scoped rollout API endpoints for capabilities, sessions, preflight, read-only smoke checks, seed/adopt, final reconcile, activate/release, status, post-handoff gates, method comparison, and staged policy migration.
  • Makes preflight validate schema version, API-controlled mode, delivery guards, configured v2 Veriflier contract support, quorum-counted vantage.id coverage, duplicate-vantage safety, and active-discovery registry entries with static fallback semantics.
  • Runs sampled read-only HEAD + legacy smoke probes without writing incident state, runtime freshness, check history, WPCOM notifications, or legacy projection rows.
  • Runs sampled non-authoritative HEAD/GET comparison probes and persists deltas in jetmon_rollout_comparison_results for rollout analysis.
  • Executes staged policy changes through jetmon_site_check_config, requires explicit cohort size, records previous values in jetmon_rollout_policy_stage_rows, and supports pause, rollback-last-stage, and rollback-all modes.
  • Seeds v2 side tables and adopts existing v1 non-running projections into v2 event state without sending duplicate down notifications.
  • Expands jetmon2 api rollout guided with session creation, final reconcile, transcript logging, resume state, run/change references, rollback release flow, safer sample defaults, and primitive rollout commands.
  • Adds execute-step idempotency keys to the guided rollout flow so operators can safely retry after a lost HTTP response.
  • Binds confirmation tokens to the authenticated API key identity, not just the consumer name or typed phrase.
  • Warns on empty smoke/comparison ranges instead of treating an empty sample as proof that a range is healthy.
  • Blocks stale stage-policy execute attempts when the live plan has become blocked.
  • Makes API-controlled Monitors poll durable bucket locks and return to standby when a lock is released, including the streaming scheduler path.
  • Disables embedded delivery workers in standby and API-controlled rollout modes.
  • Updates the rollout docs, API guide, data model, config reference, and roadmap to describe the new API-driven container rollout path.
  • Consolidates the launch-critical prelaunch checklist and documents that synthetic canary tests are required before first production activation, even while API-native canary execution remains a follow-up.

Example operator flow

./jetmon2 api rollout guided \
  --bucket-min=0 \
  --bucket-max=99 \
  --change-ref=SYSREQ-12345 \
  --allow-remote

Rollback uses the same guided surface:

./jetmon2 api rollout guided \
  --bucket-min=0 \
  --bucket-max=99 \
  --rollback \
  --allow-remote

After v2 owns the fleet and is stable, optional comparison and policy migration gates can be included:

./jetmon2 api rollout guided \
  --bucket-min=0 \
  --bucket-max=99 \
  --include-comparison \
  --include-policy-migration \
  --allow-remote

Safety notes

  • Mutating rollout endpoints require admin-scoped API keys.
  • Activation requires ROLLOUT_MODE=api-controlled.
  • Execute operations require confirmation tokens from matching dry-run plans.
  • Confirmation tokens are hashed at rest and bound to operation, range, run ID, authenticated API key identity, and request shape.
  • Guided execute requests include idempotency keys.
  • Synchronous smoke and comparison probes default to 100 samples and reject more than 1000 samples per request.
  • Stage-policy migration requires an explicit size so a missing flag cannot migrate the full eligible range.
  • A single Monitor owner may only hold one contiguous API-controlled range.
  • Synthetic canary execution remains tracked as a follow-up until production canary URLs and expected states are finalized, but the prelaunch docs now require those canary tests to be run through an approved external/manual path before first production activation.

Validation

  • go test ./...
  • make rollout-docs-verify
  • git diff --check
  • JETMON_API_CONFIG=off go run ./cmd/jetmon2 api rollout guided --bucket-min=0 --bucket-max=2 --dry-run --include-comparison --include-policy-migration
  • JETMON_API_CONFIG=off go run ./cmd/jetmon2 api rollout guided --bucket-min=0 --bucket-max=2 --rollback --dry-run
  • JETMON_API_CONFIG=off go run ./cmd/jetmon2 api commands --output table

Chris Jean added 8 commits May 14, 2026 20:45
Reframe the v1-to-v2 production rollout around fresh containerized v2 Monitor and Veriflier fleets that can be controlled through the Monitor API without shell access to the Docker hosts. The migration runbook and quick reference now describe standby states, read-only smoke checks, sidecar state seeding, explicit bucket activation and release, rollback, and staged HEAD-to-GET policy migration.

Add operator-side API CLI config loading from ~/.config/jetmon2.conf or JETMON_API_CONFIG so a standalone jetmon2 binary can carry API URL, token-file, auth-policy, timeout, output, and remote-write defaults safely. Token-bearing config and token files are required to be mode 0600, and environment variables plus command flags still override the file.
Introduce `jetmon2 local-config` as the explicit manager for the local operator API config used by `jetmon2 api`. The command can print the resolved path, initialize a secure config file, show redacted file and effective settings, update supported keys, and remove keys without implying that it edits fleet or Monitor service configuration.

The command writes config files with 0600 permissions, keeps token material redacted in display output, supports token files relative to the config directory, and preserves the existing environment/flag override order. Rollout and API CLI documentation now use `local-config` so the container rollout workflow remains clear when operators run the standalone binary from a workstation or bastion.
Add a jetmon2 local-config keys subcommand so operators can discover the supported local API CLI config keys without opening the documentation. The command reports each key, type, accepted values, default behavior, sensitivity, and a short description in table or JSON output.

Update the API CLI and rollout docs to mention local-config keys alongside init and show, and add test coverage that verifies the key list includes the supported operator config fields and enum values.
Introduce jetmon2 api rollout guided as an operator-side state machine for the container rollout path. The flow walks health and identity checks, standby preflight, read-only HEAD/legacy smoke, seed/adopt dry-run and execute, the manual v1-stopped checkpoint, v2 activation, post-handoff gates, rollback release, and optional comparison or policy-migration planning.

The command uses typed confirmations for high-risk steps, refuses remote writes without --allow-remote, captures dry-run confirmation tokens before execute calls, and fails closed if an execute step would still contain an unresolved token placeholder.

Document the guided command as the preferred wrapper in the README, API CLI guide, API reference, quick reference, migration runbook, and roadmap. Add unit coverage for dry-run planning, the happy-path API request sequence, missing endpoint guidance, and missing confirmation-token refusal.
Introduce ROLLOUT_MODE so containerized monitors can stay in standby, wait for API bucket locks, and return to standby when a range is released. The streaming scheduler now polls API-controlled locks and drains instead of continuing with stale targets after rollback.

Add durable rollout sessions, range and bucket locks, job records, and short-lived confirmation tokens for the API rollout flow. Mutating rollout operations are admin-scoped, refuse blocked confirmation tokens, and activation requires api-controlled mode plus a single contiguous owner range.

Expand jetmon2 api rollout with primitive commands, guided session creation, final reconcile, local transcript and resume state, run/change references, and updated command catalog/help coverage.

Document the API-driven rollout path, operator config usage, new config mode, current smoke-planning boundary, and tracked follow-ups for full canary execution and staged policy workers.

Verified with go test ./..., make rollout-docs-verify, git diff --check, and a JETMON_API_CONFIG=off guided rollout dry run. A standalone validate-config smoke was stopped after local MySQL timed out.
The rollout API had several endpoints that intentionally started as durable job records and operator-facing scaffolding. This change turns those placeholders into real control-plane behavior so the guided container rollout can exercise the checks it describes.

Preflight now validates configured v2 Veriflier reachability, protocol support, quorum-counted vantage identity, duplicate-vantage safety, and active-discovery registry entries with static fallback semantics. The read-only smoke endpoint now runs sampled HEAD/legacy probes without writing incident state, runtime freshness, check history, WPCOM notifications, or legacy projection rows.

The method comparison endpoint now runs sampled non-authoritative probes for the requested HEAD/GET cohorts and stores durable delta rows for rollout analysis. The staged policy endpoint now updates jetmon_site_check_config cohorts, records previous values for rollback, requires an explicit size to prevent accidental whole-range changes, and supports pause, rollback-last-stage, and rollback-all modes.

Seed/adopt execution now also adopts existing v1 non-running projections into v2 event state without sending duplicate notifications, promoting already-open bootstrap events when the legacy projection indicates confirmed down. Documentation, roadmap state, data-model notes, CLI defaults, and focused helper tests were updated to match the executable behavior.
Bind rollout confirmation tokens to the authenticated API key identity so a dry-run token cannot be replayed by another key that shares the same consumer name. Add idempotency keys to guided execute steps so operators can safely retry after a lost HTTP response without reapplying the mutation.

Also expose the confirmation token TTL in rollout capabilities, surface warnings for empty smoke/comparison ranges, keep stage-policy execute from running when the live plan is blocked, fill out the API command catalog, and correct rollout docs so primitive commands include required bucket ranges and release dry-run steps.

Verified with go test ./..., make rollout-docs-verify, git diff --check, API rollout guided forward/rollback dry-run simulations, and API command catalog output.
Document the synthetic canary checks that must happen before the first production activation, even while API-native canary execution remains a follow-up.

Consolidate the launch-critical checklist in the prelaunch readiness tracker and point the migration runbook, quick reference, API reference, and roadmap back to that required evidence.
@chrisbliss18 chrisbliss18 merged commit 2f32fff into v2 May 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant