Add API-controlled rollout control plane#115
Merged
Merged
Conversation
added 8 commits
May 14, 2026 20:45
Reframe the v1-to-v2 production rollout around fresh containerized v2 Monitor and Veriflier fleets that can be controlled through the Monitor API without shell access to the Docker hosts. The migration runbook and quick reference now describe standby states, read-only smoke checks, sidecar state seeding, explicit bucket activation and release, rollback, and staged HEAD-to-GET policy migration. Add operator-side API CLI config loading from ~/.config/jetmon2.conf or JETMON_API_CONFIG so a standalone jetmon2 binary can carry API URL, token-file, auth-policy, timeout, output, and remote-write defaults safely. Token-bearing config and token files are required to be mode 0600, and environment variables plus command flags still override the file.
Introduce `jetmon2 local-config` as the explicit manager for the local operator API config used by `jetmon2 api`. The command can print the resolved path, initialize a secure config file, show redacted file and effective settings, update supported keys, and remove keys without implying that it edits fleet or Monitor service configuration. The command writes config files with 0600 permissions, keeps token material redacted in display output, supports token files relative to the config directory, and preserves the existing environment/flag override order. Rollout and API CLI documentation now use `local-config` so the container rollout workflow remains clear when operators run the standalone binary from a workstation or bastion.
Add a jetmon2 local-config keys subcommand so operators can discover the supported local API CLI config keys without opening the documentation. The command reports each key, type, accepted values, default behavior, sensitivity, and a short description in table or JSON output. Update the API CLI and rollout docs to mention local-config keys alongside init and show, and add test coverage that verifies the key list includes the supported operator config fields and enum values.
Introduce jetmon2 api rollout guided as an operator-side state machine for the container rollout path. The flow walks health and identity checks, standby preflight, read-only HEAD/legacy smoke, seed/adopt dry-run and execute, the manual v1-stopped checkpoint, v2 activation, post-handoff gates, rollback release, and optional comparison or policy-migration planning. The command uses typed confirmations for high-risk steps, refuses remote writes without --allow-remote, captures dry-run confirmation tokens before execute calls, and fails closed if an execute step would still contain an unresolved token placeholder. Document the guided command as the preferred wrapper in the README, API CLI guide, API reference, quick reference, migration runbook, and roadmap. Add unit coverage for dry-run planning, the happy-path API request sequence, missing endpoint guidance, and missing confirmation-token refusal.
Introduce ROLLOUT_MODE so containerized monitors can stay in standby, wait for API bucket locks, and return to standby when a range is released. The streaming scheduler now polls API-controlled locks and drains instead of continuing with stale targets after rollback. Add durable rollout sessions, range and bucket locks, job records, and short-lived confirmation tokens for the API rollout flow. Mutating rollout operations are admin-scoped, refuse blocked confirmation tokens, and activation requires api-controlled mode plus a single contiguous owner range. Expand jetmon2 api rollout with primitive commands, guided session creation, final reconcile, local transcript and resume state, run/change references, and updated command catalog/help coverage. Document the API-driven rollout path, operator config usage, new config mode, current smoke-planning boundary, and tracked follow-ups for full canary execution and staged policy workers. Verified with go test ./..., make rollout-docs-verify, git diff --check, and a JETMON_API_CONFIG=off guided rollout dry run. A standalone validate-config smoke was stopped after local MySQL timed out.
The rollout API had several endpoints that intentionally started as durable job records and operator-facing scaffolding. This change turns those placeholders into real control-plane behavior so the guided container rollout can exercise the checks it describes. Preflight now validates configured v2 Veriflier reachability, protocol support, quorum-counted vantage identity, duplicate-vantage safety, and active-discovery registry entries with static fallback semantics. The read-only smoke endpoint now runs sampled HEAD/legacy probes without writing incident state, runtime freshness, check history, WPCOM notifications, or legacy projection rows. The method comparison endpoint now runs sampled non-authoritative probes for the requested HEAD/GET cohorts and stores durable delta rows for rollout analysis. The staged policy endpoint now updates jetmon_site_check_config cohorts, records previous values for rollback, requires an explicit size to prevent accidental whole-range changes, and supports pause, rollback-last-stage, and rollback-all modes. Seed/adopt execution now also adopts existing v1 non-running projections into v2 event state without sending duplicate notifications, promoting already-open bootstrap events when the legacy projection indicates confirmed down. Documentation, roadmap state, data-model notes, CLI defaults, and focused helper tests were updated to match the executable behavior.
Bind rollout confirmation tokens to the authenticated API key identity so a dry-run token cannot be replayed by another key that shares the same consumer name. Add idempotency keys to guided execute steps so operators can safely retry after a lost HTTP response without reapplying the mutation. Also expose the confirmation token TTL in rollout capabilities, surface warnings for empty smoke/comparison ranges, keep stage-policy execute from running when the live plan is blocked, fill out the API command catalog, and correct rollout docs so primitive commands include required bucket ranges and release dry-run steps. Verified with go test ./..., make rollout-docs-verify, git diff --check, API rollout guided forward/rollback dry-run simulations, and API command catalog output.
Document the synthetic canary checks that must happen before the first production activation, even while API-native canary execution remains a follow-up. Consolidate the launch-critical checklist in the prelaunch readiness tracker and point the migration runbook, quick reference, API reference, and roadmap back to that required evidence.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the API-driven rollout control plane needed for containerized v1-to-v2 migration where operators may not have shell access to the Monitor containers.
What changed
ROLLOUT_MODEwithactive,standby, andapi-controlledso v2 Monitors can run safely without claiming buckets until explicitly activated through the API.vantage.idcoverage, duplicate-vantage safety, and active-discovery registry entries with static fallback semantics.HEAD+legacysmoke probes without writing incident state, runtime freshness, check history, WPCOM notifications, or legacy projection rows.HEAD/GETcomparison probes and persists deltas injetmon_rollout_comparison_resultsfor rollout analysis.jetmon_site_check_config, requires explicit cohortsize, records previous values injetmon_rollout_policy_stage_rows, and supportspause,rollback-last-stage, androllback-allmodes.jetmon2 api rollout guidedwith session creation, final reconcile, transcript logging, resume state, run/change references, rollback release flow, safer sample defaults, and primitive rollout commands.Example operator flow
Rollback uses the same guided surface:
After v2 owns the fleet and is stable, optional comparison and policy migration gates can be included:
Safety notes
ROLLOUT_MODE=api-controlled.sizeso a missing flag cannot migrate the full eligible range.Validation
go test ./...make rollout-docs-verifygit diff --checkJETMON_API_CONFIG=off go run ./cmd/jetmon2 api rollout guided --bucket-min=0 --bucket-max=2 --dry-run --include-comparison --include-policy-migrationJETMON_API_CONFIG=off go run ./cmd/jetmon2 api rollout guided --bucket-min=0 --bucket-max=2 --rollback --dry-runJETMON_API_CONFIG=off go run ./cmd/jetmon2 api commands --output table