perf(cluster): speed up local deploy loop and harden fast path#367
Draft
johntmyers wants to merge 5 commits intomainfrom
Draft
perf(cluster): speed up local deploy loop and harden fast path#367johntmyers wants to merge 5 commits intomainfrom
johntmyers wants to merge 5 commits intomainfrom
Conversation
Improve local DX by adding profile-aware fast Docker builds, deploy/baseline reporting, and resilient cluster bootstrap/deploy behavior so iterative redeploys are faster and more predictable on macOS/Linux. Made-with: Cursor
Avoid unnecessary helm upgrades and gateway rollouts when an explicit gateway deploy produces the same digest, and keep explicit-target state writes deterministic so subsequent auto deploys do not drift. Made-with: Cursor
Stop touching gateway and supervisor main sources in Docker builds so unrelated edits keep incremental cache hits while retaining proto/build.rs invalidation safety. Made-with: Cursor
Make fast deploy paths use bounded readiness checks and rollout timeouts, and add deterministic supervisor pod reconcile behavior so local deploy outcomes are reliable without regressing steady-state no-op speed. Made-with: Cursor
Extend fast deploy classification to track cluster infrastructure paths and route those changes through full cluster bootstrap so local redeploys remain deterministic while preserving fast no-op runs. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This WIP PR improves local cluster DX by reducing redeploy latency and adding observability so local changes are faster and more predictable to deploy.
Related Issue
N/A (WIP tracking PR).
Changes
.cache/deploy-reports/.local-fast) and wired profile-aware Docker builds for gateway/cluster/supervisor paths.mapfilereplacements) in local scripts.supervisor-exportDocker stage and switched supervisor extraction to avoid heavy filesystem export overhead.--reuse-okand updated docs.Phase 1 Perf Results
Warm baseline (
mise run cluster:baseline:warm:full)rust/cli_debugdocker/gateway_imagedocker/supervisor_stagedocker/cluster_imagedeploy/cluster_task.cache/perf/cluster-baseline-20260316-100450.md.cache/perf/cluster-baseline-20260316-105244.mdReal source-change simulations (local-fast)
tx-sim-gateway-auto-20260316-105844-6443builds=31,helm=2,rollout=10,total=43tx-sim-supervisor-auto2-20260316-110345-69supervisor_build_deploy=17,total=17Key Phase 1 outcome: no-change redeploys are now near-instant, and real source-change loops are bounded to ~43s (gateway path) and ~17s (supervisor path) on current local hardware/cache state.
Phase 2 Results (digest skip + state determinism)
Digest-aware gateway no-op skip
tx-phase2-gateway-skipcheck-20260316-115158-12390builds=2,helm=0,rollout=0,total=3gateway_reconcile_skipped=1,gateway_reconcile_skip_reason=digest_already_deployed.Explicit-target state sync validation
tx-phase2-state-explicit-20260316-115340-12548supervisor_build_deploy=95,total=95tx-phase2-state-auto-20260316-115626-32596builds=0,total=1Phase 3 Results (cache-friendly invalidation)
Source-change simulations (local-fast)
tx-20260316-120903-15216builds=15,helm=1,rollout=12,total=28tx-20260316-121226-32566builds=20,helm=0,rollout=0,total=20tx-20260316-120949-29654supervisor_build_deploy=104,total=104tx-20260316-121257-28650supervisor_build_deploy=13,total=1443s) to20swith digest-aware rollout skip.17s) to14s.baseline-warmscope):.cache/perf/cluster-baseline-20260316-122121.mdwithcli=11s,gateway=26s,supervisor=10s,cluster=11s,deploy=32s.Phase 4 Results (bounded readiness + supervisor reconcile)
Reliability validation runs (local-fast)
tx-20260316-133137-30850supervisor_build_deploy=125,supervisor_reconcile=0,readiness_gate=1,total=126tx-20260316-133957-30862builds=150,helm=1,rollout=15,readiness_gate=1,total=168tx-20260316-134945-29515builds=0,helm=0,rollout=0,readiness_gate=1,total=1DEPLOY_FAST_READINESS_TIMEOUT_SECONDS,DEPLOY_FAST_READINESS_POLL_SECONDS,DEPLOY_FAST_SUPERVISOR_RECONCILE, andDEPLOY_FAST_SUPERVISOR_RECONCILE_TIMEOUT_SECONDS.~1s.Phase 5 Results (iter-04 classifier expansion)
Cluster-infra classification validation
cluster-bootstrap.sh) auto-escalates to bootstraptx-20260316-142747-16072mode=fast,recreated_cluster=1,built_cluster_image=1,total=202tx-20260316-145203-17082builds=0,helm=0,rollout=0,readiness_gate=1,total=2cluster-deploy-fast.shnow tracks a dedicatedcluster_infrafingerprint and writes it to deploy state.cluster-bootstrap.sh fastinstead of attempting incremental fast deploy.crates/openshell-bootstrap.Testing
mise run pre-commitpassescargo check -p openshell-sandboxmise run cluster:baseline:warm:fullChecklist