scripts: forward ADMIN_* env vars to remote SSH heredoc (follow-up to #669)#678
scripts: forward ADMIN_* env vars to remote SSH heredoc (follow-up to #669)#678
Conversation
The admin-flag plumbing I added in cadffe4 had a real bug: build_admin_flags lives inside the remote SSH heredoc (the `bash -s <<'REMOTE'` block in update_one_node), but the env block that forwards values from local to remote did not include any ADMIN_* variable. With `set -u` active inside the heredoc, the first access of `${ADMIN_ENABLED}` inside build_admin_flags would have crashed every rollout that landed on a node where ADMIN_ENABLED was unset on the remote — i.e., all of them, because the local defaults at line 172 only populate the local control shell, not the SSH target. The bug only fires when run_container is reached on a remote node (every rollout), so a deploy attempt with ADMIN_ENABLED=false also crashed: build_admin_flags was invoked unconditionally at line 800. The script was effectively unusable in this state. Fix is two-part: 1. **Forward every ADMIN_* through env**. Added 9 ADMIN_* variables to the env block in update_one_node, alongside the existing IMAGE / RAFT_PORT / EXTRA_ENV forwarding. Path-like values get printf %q quoting at the bottom of the local script (matches the existing _Q variants for RAFT_TO_REDIS_MAP, RAFT_TO_S3_MAP, etc.). The three boolean flags (ADMIN_ENABLED, ADMIN_ALLOW_PLAINTEXT_NON_LOOPBACK, ADMIN_ALLOW_INSECURE_DEV_COOKIE) are forwarded unquoted for readability — but only after a local validation pass that rejects anything other than the literal "true" / "false" so the unquoted forwarding stays metacharacter-safe. 2. **Defense-in-depth `:-` defaults inside build_admin_flags**. Even with explicit env forwarding, a future refactor that drops one of the variables would surface as an opaque "unbound variable" crash with no hint at which variable. The helper now reads each variable through `${VAR:-}` once at the top, then refers to the locals — so a missing forward would produce the targeted "ADMIN_* required" error instead. Smoke-tested with three cases: - ADMIN_ENABLED=invalid → "must be 'true' or 'false'" - ADMIN_ALLOW_PLAINTEXT_…=yes → same validator catches it - ADMIN_ENABLED=true (no key) → reaches the remote branch (where the remote build_admin_flags would 'aborting' on the missing signing key) `bash -n scripts/rolling-update.sh` passes.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
|
Claude finished @bootjp's task in 2m 12s —— View job Code Review
Overall: LGTM with one minor nit. The fix correctly identifies and resolves the The Core FixThe root cause analysis is accurate. Validation loop (
|
There was a problem hiding this comment.
Code Review
This pull request enhances the rolling-update.sh script by introducing validation for ADMIN_* boolean flags, applying shell-safe escaping to configuration variables, and updating the build_admin_flags function with local variable defaults to prevent unbound variable errors. The review feedback recommends further refactoring build_admin_flags to consistently use local variables for all ADMIN_* parameters, ensuring a uniform defense-in-depth approach.
| if [[ "${ADMIN_ENABLED:-false}" != "true" ]]; then | ||
| return 0 | ||
| fi | ||
|
|
||
| if [[ -z "${ADMIN_SESSION_SIGNING_KEY_FILE}" ]]; then | ||
| local signing_key="${ADMIN_SESSION_SIGNING_KEY_FILE:-}" | ||
| local full_keys="${ADMIN_FULL_ACCESS_KEYS:-}" | ||
| local read_only_keys="${ADMIN_READ_ONLY_ACCESS_KEYS:-}" | ||
| local previous_key="${ADMIN_SESSION_SIGNING_KEY_PREVIOUS_FILE:-}" | ||
| local admin_listen="${ADMIN_ADDRESS:-}" | ||
| local tls_cert="${ADMIN_TLS_CERT_FILE:-}" | ||
| local tls_key="${ADMIN_TLS_KEY_FILE:-}" |
There was a problem hiding this comment.
For consistency and to fully adhere to the "defense-in-depth" strategy described in the function's comment and the PR summary, all ADMIN_* variables should be read into local variables with defaults at the start of the function. This ensures that even if a variable is missing from the env block in a future refactor, the script fails gracefully with a clear error message instead of an unbound variable crash.
| if [[ "${ADMIN_ENABLED:-false}" != "true" ]]; then | |
| return 0 | |
| fi | |
| if [[ -z "${ADMIN_SESSION_SIGNING_KEY_FILE}" ]]; then | |
| local signing_key="${ADMIN_SESSION_SIGNING_KEY_FILE:-}" | |
| local full_keys="${ADMIN_FULL_ACCESS_KEYS:-}" | |
| local read_only_keys="${ADMIN_READ_ONLY_ACCESS_KEYS:-}" | |
| local previous_key="${ADMIN_SESSION_SIGNING_KEY_PREVIOUS_FILE:-}" | |
| local admin_listen="${ADMIN_ADDRESS:-}" | |
| local tls_cert="${ADMIN_TLS_CERT_FILE:-}" | |
| local tls_key="${ADMIN_TLS_KEY_FILE:-}" | |
| local enabled="${ADMIN_ENABLED:-false}" | |
| if [[ "$enabled" != "true" ]]; then | |
| return 0 | |
| fi | |
| local signing_key="${ADMIN_SESSION_SIGNING_KEY_FILE:-}" | |
| local full_keys="${ADMIN_FULL_ACCESS_KEYS:-}" | |
| local read_only_keys="${ADMIN_READ_ONLY_ACCESS_KEYS:-}" | |
| local previous_key="${ADMIN_SESSION_SIGNING_KEY_PREVIOUS_FILE:-}" | |
| local admin_listen="${ADMIN_ADDRESS:-}" | |
| local tls_cert="${ADMIN_TLS_CERT_FILE:-}" | |
| local tls_key="${ADMIN_TLS_KEY_FILE:-}" | |
| local allow_plaintext="${ADMIN_ALLOW_PLAINTEXT_NON_LOOPBACK:-false}" | |
| local insecure_cookie="${ADMIN_ALLOW_INSECURE_DEV_COOKIE:-false}" |
| if [[ "${ADMIN_ALLOW_PLAINTEXT_NON_LOOPBACK:-false}" == "true" ]]; then | ||
| _flags+=(--adminAllowPlaintextNonLoopback) | ||
| fi | ||
| if [[ "${ADMIN_ALLOW_INSECURE_DEV_COOKIE}" == "true" ]]; then | ||
| if [[ "${ADMIN_ALLOW_INSECURE_DEV_COOKIE:-false}" == "true" ]]; then | ||
| _flags+=(--adminAllowInsecureDevCookie) | ||
| fi |
There was a problem hiding this comment.
Use the local variables defined at the top of the function for consistency and to avoid direct references to potentially unbound global variables.
| if [[ "${ADMIN_ALLOW_PLAINTEXT_NON_LOOPBACK:-false}" == "true" ]]; then | |
| _flags+=(--adminAllowPlaintextNonLoopback) | |
| fi | |
| if [[ "${ADMIN_ALLOW_INSECURE_DEV_COOKIE}" == "true" ]]; then | |
| if [[ "${ADMIN_ALLOW_INSECURE_DEV_COOKIE:-false}" == "true" ]]; then | |
| _flags+=(--adminAllowInsecureDevCookie) | |
| fi | |
| if [[ "$allow_plaintext" == "true" ]]; then | |
| _flags+=(--adminAllowPlaintextNonLoopback) | |
| fi | |
| if [[ "$insecure_cookie" == "true" ]]; then | |
| _flags+=(--adminAllowInsecureDevCookie) | |
| fi |
Two gemini medium findings on PR #678 caught that the defense-in-depth localization I added was incomplete: ADMIN_SESSION_SIGNING_KEY_FILE et al got `${VAR:-}` defaults into locals at the top of the helper, but ADMIN_ENABLED and the two ADMIN_ALLOW_* booleans were still accessed directly from the calling environment further down in the function. The gap defeated the comment's own claim. If a future refactor ever drops one of those three booleans from the env forwarding, `set -u` would crash on `${ADMIN_ENABLED}` (and the code path below would silently fall through to defaults for the two ALLOW_* flags, masking the misconfiguration). The point of the local-with-default pattern is that every ADMIN_* reference goes through one place where the safety net is guaranteed. Localized all nine into `enabled`, `signing_key`, `full_keys`, `read_only_keys`, `previous_key`, `admin_listen`, `tls_cert`, `tls_key`, `allow_plaintext`, `insecure_cookie`. The two ALLOW_* check sites at the bottom now read the locals instead of re-fetching the globals — same value, but consistent with the rest of the helper and the comment's contract. No behaviour change for any valid input. Smoke-tested both boolean validators (`ADMIN_ENABLED=invalid` and `ADMIN_ALLOW_PLAINTEXT_NON_LOOPBACK=yes`) — local script-level errors still fire with the targeted message before reaching update_one_node.
|
@claude review Addressed both gemini medium findings (
The previous shape mixed localized path vars with direct global access on the booleans, which defeated the helper's own defense-in-depth comment — Latest commit: 3fe2c45. |
|
Claude finished @bootjp's task in 1m 11s —— View job Code Review
LGTM. The gemini findings are fully and correctly addressed. All 9 What the latest commit doesThe previous shape of
The result: all 9 One outstanding nit (cosmetic, carry-over from the previous review)
There are three boolean flags: This is cosmetic — it doesn't affect correctness. Fine to fold into this PR or defer. Result: Approve. The gemini findings are fully resolved; the defense-in-depth is now complete and consistent with its own documentation. |
## Summary State-refresh of `docs/design/2026_04_24_partial_admin_dashboard.md` now that all four phases of the original P1–P4 plan have shipped. | Phase | Was | Now | |---|---|---| | P1 | shipped | shipped (no change) | | P2 | partial — read-only #658 only | **shipped** (+ #669 writes + #673 forwarding) | | P3 | shipped | shipped (no change) | | P4 | mostly | **shipped** (+ operator doc #674, deployment runbook #669, script wiring #669+#678) | Also added the **AdminDeleteBucket TOCTOU** to Outstanding open items — coderabbitai flagged it during PR #669 review (pre-existing race, `s3.go:deleteBucket` inherits the same shape; recorded as a code comment on `AdminDeleteBucket` for the future fix). Doc stays at `_partial_` rather than promoting to `_implemented_` because two outstanding items remain: AdminForward acceptance criterion 5 (rolling-upgrade flag, explicitly deferred at design time) and the new TOCTOU. Updated the closing rename trigger to cover both. ## Test plan - [x] No design changes — this is a state refresh - [x] Status table entries cross-checked against the merge commits on main - [x] AdminDeleteBucket TOCTOU description matches the code comment in adapter/s3_admin.go and the operator-side workaround in docs/admin_deployment.md
Summary
Follow-up to #669 (now merged). The admin-flag plumbing I added in
that PR has a real bug I caught while re-reading the script: every
rollout would crash on the first remote node because the
ADMIN_*variables were never forwarded across the SSH heredoc.
What was broken
build_admin_flagslives inside the remote SSH heredoc inupdate_one_node(bash -s <<'REMOTE'), but theenvblock thatseeds the remote shell only forwarded the existing
IMAGE/RAFT_PORT/EXTRA_ENV/ etc. variables — noADMIN_*. Withset -uactive on the remote, the first access of${ADMIN_ENABLED}insidebuild_admin_flagscrashes the rolloutregardless of whether admin is enabled (the helper is invoked
unconditionally from
run_container).So an operator running this script after #669 with the default
ADMIN_ENABLED=falsewould have seenADMIN_ENABLED: unbound variableon the first node touched, leaving at most one node restarted but
otherwise the cluster intact (the per-node health check would
exit non-zero before moving on).
Fix
Forward all 9
ADMIN_*variables throughenv, alongsidethe existing forwarding pattern. Path-like values (
*_FILE,*_KEYS,ADDRESS) getprintf %qquoting at the bottom ofthe local script (matches the existing
RAFT_TO_REDIS_MAP_Qetc. pattern). The three boolean flags (
ADMIN_ENABLED,ADMIN_ALLOW_PLAINTEXT_NON_LOOPBACK,ADMIN_ALLOW_INSECURE_DEV_COOKIE)are forwarded unquoted for readability — but only after a
local validation pass that rejects anything other than the
literal
true/falseso the unquoted forwarding staysmetacharacter-safe.
Defense-in-depth
:-defaults insidebuild_admin_flags.Every
ADMIN_*reference inside the helper now reads through${VAR:-}once at the top into a local. A future refactorthat ever drops one of the forwarded variables will produce
the targeted "ADMIN_* required" error instead of an opaque
unbound variablecrash with no hint at which variable.Test plan
bash -n scripts/rolling-update.sh— passesADMIN_ENABLED=invalid bash scripts/rolling-update.sh→"must be 'true' or 'false', got 'invalid'"
ADMIN_ALLOW_PLAINTEXT_NON_LOOPBACK=yes→ same validator catches itADMIN_ENABLED=true(no signing key) → reaches the remote branchwhere build_admin_flags would 'aborting' on the missing key
ADMIN_ENABLED=true(operator to verify before merging intoa production deploy.env)
ADMIN_ENABLED=false(the previouslybroken default path)
Summary by CodeRabbit