Skip to content

ops(rolling-update): add GOMEMLIMIT=1800MiB + --memory=2500m defaults#617

Merged
bootjp merged 5 commits intomainfrom
ops/gomemlimit-memory-defaults
Apr 24, 2026
Merged

ops(rolling-update): add GOMEMLIMIT=1800MiB + --memory=2500m defaults#617
bootjp merged 5 commits intomainfrom
ops/gomemlimit-memory-defaults

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 24, 2026

Summary

Add two OOM-defense defaults to scripts/rolling-update.sh:

  • GOMEMLIMIT=1800MiB (via new DEFAULT_EXTRA_ENV, merged into the existing EXTRA_ENV plumbing)
  • --memory=2500m on the remote docker run (via new CONTAINER_MEMORY_LIMIT)

Both are env-var-controlled with empty-string opt-out (${VAR-default} so unset uses the default, but an explicit empty string disables it).

Motivation

2026-04-24 incident: all 4 live nodes were kernel-OOM-SIGKILLed 22-169 times in 24h under a traffic spike. Each kill risked WAL-tail truncation and triggered election storms, cascading into p99 GET spikes to 6-8s. The runtime defense was applied by hand during the incident; this PR makes it the script default so future rollouts inherit it.

  • GOMEMLIMIT — Go runtime GCs aggressively as heap approaches the limit, keeping RSS below the container ceiling.
  • --memory (cgroup hard limit) — if Go can't keep up (e.g. non-heap growth), the kill is scoped to the container, not host processes like qemu-guest-agent or systemd.

Behavior changes

Variable Default Opt-out
DEFAULT_EXTRA_ENV GOMEMLIMIT=1800MiB DEFAULT_EXTRA_ENV=""
CONTAINER_MEMORY_LIMIT 2500m CONTAINER_MEMORY_LIMIT=""

Operator-supplied EXTRA_ENV keys override matching keys in DEFAULT_EXTRA_ENV (e.g., EXTRA_ENV="GOMEMLIMIT=3000MiB" wins over the default).

Related

Companion PRs (defense-in-depth):

Test plan

  • bash -n scripts/rolling-update.sh passes
  • Deployed equivalents manually on all 4 live nodes during the incident (2026-04-24T07:44Z - 07:46Z); no OOM recurrence since
  • Next rolling-update invocation should produce docker run ... --memory=2500m ... -e GOMEMLIMIT=1800MiB ... on each node

Design doc reference

docs/design/2026_04_24_proposed_resilience_roadmap.md (item 1 — capacity/runtime defenses).

Prevents kernel OOM-SIGKILL cascades observed on 2026-04-24 where all
4 live nodes were killed 22-169 times in 24h under a traffic spike,
corrupting WAL tails and triggering election storms. GOMEMLIMIT makes
Go GC before the ceiling; --memory keeps any surviving OOM scoped to
the container (not qemu-guest-agent / systemd).
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 27 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 53 minutes and 27 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 58bf572c-4b20-4bab-966f-5123aac29b57

📥 Commits

Reviewing files that changed from the base of the PR and between 538f4f6 and 75db3bf.

📒 Files selected for processing (2)
  • scripts/rolling-update.env.example
  • scripts/rolling-update.sh
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ops/gomemlimit-memory-defaults

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements OOM defenses by adding configurable container memory limits and Go runtime memory ceilings (GOMEMLIMIT) to the rolling update scripts. It introduces a merging mechanism for environment variables to ensure user overrides are respected. A potential compatibility issue was identified regarding the use of Bash associative arrays, which may fail on systems running Bash versions older than 4.0, such as macOS.

Comment thread scripts/rolling-update.sh Outdated
merge_extra_env() {
local defaults="$1"
local user="$2"
local -A seen=()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of associative arrays (local -A) requires Bash 4.0 or newer. While most modern Linux distributions ship with Bash 4+, the default version of Bash on macOS is still 3.2. If this script is intended to be run from a developer's local machine (which is common for rolling update scripts), this will cause a syntax error on macOS. Consider using a more compatible approach for de-duplication if macOS support is required, or explicitly document the Bash 4 requirement.

merge_extra_env used `local -A seen=()` which requires Bash 4+. macOS
ships Bash 3.2 by default, so operators invoking the script from a
Mac workstation hit a syntax error on source (Gemini MEDIUM feedback).

Replace the associative array with a space-padded string + substring
match (" KEY "). The EXTRA_ENV list is a handful of entries, so the
linear lookup cost is trivial.

Verified behaviourally unchanged: the five merge cases (defaults+user
disjoint, user overrides default, default only, user only, both empty)
all produce the same output as the Bash-4 implementation.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

Addressed review feedback in a3f0cf0:

  • MEDIUM: replaced local -A seen (Bash 4+ only, macOS default is 3.2) with a space-padded string + substring match. EXTRA_ENV is typically a handful of entries; the linear lookup cost is trivial.
  • Verified the 5 merge cases (defaults+user disjoint, user overrides default, default only, user only, both empty) produce identical output to the previous Bash-4 implementation.

/gemini review
@codex review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces OOM defenses for the elastickv service by implementing a soft memory limit via GOMEMLIMIT and a hard container memory limit using Docker's --memory flag. It adds a merge_extra_env utility to the rolling update script to manage environment variable overrides. Review feedback identifies potential script termination issues in specific Bash environments when using read -a with empty here-strings under set -e, providing suggestions to guard these operations.

Comment thread scripts/rolling-update.sh Outdated
local -a default_pairs=()
local pair key seen=" " merged=""

read -r -a user_pairs <<< "$user"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using read -a with a here-string (<<<) on an empty variable will return a non-zero exit status in some Bash versions (like 3.2 on macOS) when set -e is active, causing the script to terminate prematurely if EXTRA_ENV is unset. Since set -euo pipefail is active at the top of the script, this is a high-risk operation. It is safer to check if the variable is non-empty before attempting to read it into an array.

Suggested change
read -r -a user_pairs <<< "$user"
if [[ -n "$user" ]]; then
read -r -a user_pairs <<< "$user"
fi

Comment thread scripts/rolling-update.sh Outdated
seen+="${key} "
done

read -r -a default_pairs <<< "$defaults"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the user variable, reading from defaults via a here-string can trigger set -e if the variable is empty. While DEFAULT_EXTRA_ENV has a non-empty default in this PR, a user could explicitly set it to an empty string to opt out, which would then crash the script here.

Suggested change
read -r -a default_pairs <<< "$defaults"
if [[ -n "$defaults" ]]; then
read -r -a default_pairs <<< "$defaults"
fi

…et -e

Gemini HIGH (line 919 + 927): on Bash 3.2 (macOS default, which this
script must stay compatible with after the previous fix) `read -a`
on an empty here-string returns non-zero, and `set -euo pipefail`
then terminates the script. Skip the read when the source string is
empty — the empty array is the intended result either way.

Verified under `set -euo pipefail` with all five merge cases
(both empty, only default, only user, user overrides default,
disjoint user + default) — all produce the same output as before.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

Addressed in edd2a91:

  • Gemini HIGH (lines 919, 927): guarded both read -a here-strings with [[ -n "$var" ]] so empty inputs don't trip set -e under Bash 3.2. Verified all 5 merge cases still produce identical output under set -euo pipefail.

/gemini review
@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edd2a9106e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/rolling-update.sh
Comment on lines +937 to +938
[[ -n "$pair" ]] || continue
[[ "$pair" == *=* ]] || continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate malformed DEFAULT_EXTRA_ENV entries

merge_extra_env silently drops default tokens that don’t contain = and only forwards the remaining pairs for downstream validation, so a typo in DEFAULT_EXTRA_ENV (for example GOMEMLIMIT1800MiB) makes the rollout continue without the intended safeguard and without any error. This differs from EXTRA_ENV, where malformed entries fail fast, and can hide configuration mistakes in production rollouts.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces OOM (Out-Of-Memory) defenses for the elastickv service by implementing container-scoped memory limits and Go runtime memory limits (GOMEMLIMIT). It adds a merge_extra_env function to the rolling-update.sh script to handle merging default and user-provided environment variables while ensuring user overrides are respected. Feedback suggests explicitly setting the IFS variable when using read -a to ensure robust string splitting across different environments.

Comment thread scripts/rolling-update.sh Outdated
# read when the source string is empty — the empty array is the
# intended result either way.
if [[ -n "$user" ]]; then
read -r -a user_pairs <<< "$user"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The read -a command relies on the current value of IFS to split the input string. While the script does not appear to modify IFS globally, it is safer and more robust to explicitly set IFS locally for this command to ensure it always splits on whitespace as intended, especially since EXTRA_ENV is documented as a whitespace-separated list.

Suggested change
read -r -a user_pairs <<< "$user"
IFS=$' \t\n' read -r -a user_pairs <<< "$user"

Comment thread scripts/rolling-update.sh Outdated
done

if [[ -n "$defaults" ]]; then
read -r -a default_pairs <<< "$defaults"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous comment, explicitly setting IFS here ensures consistent behavior when parsing the default environment variables.

Suggested change
read -r -a default_pairs <<< "$defaults"
IFS=$' \t\n' read -r -a default_pairs <<< "$defaults"

…TRA_ENV

- Gemini Medium (lines 924, 934): set IFS=$' \t\n' per read -a call so
  the split is stable regardless of any surrounding IFS mutation.
- Codex P2 (line 938): validate DEFAULT_EXTRA_ENV entries explicitly.
  Unlike user-supplied EXTRA_ENV (forgivable typo, silently dropped
  for compatibility with existing deploy.env habits), DEFAULT_EXTRA_ENV
  is where we install safeguards like GOMEMLIMIT; a typo there means
  the safeguard is silently absent. Fail the merge with a clear
  message instead.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

Round 3 in 941ed32:

  • Gemini Medium ×2: explicit IFS=$' \t\n' on both read -a calls, insulating the split from any surrounding IFS.
  • Codex P2: DEFAULT_EXTRA_ENV malformed-entry check fails loudly. User EXTRA_ENV keeps its forgiving behaviour; DEFAULT_EXTRA_ENV is where we install safeguards so a typo must not silently drop them.

/gemini review
@codex review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Out-Of-Memory (OOM) defenses for the elastickv service by implementing a default GOMEMLIMIT and a configurable CONTAINER_MEMORY_LIMIT for Docker containers. The changes include updates to the environment example file, documentation in the usage section, and a new merge_extra_env Bash function to safely combine default and user-provided environment variables. Review feedback identified a potential issue where the read command could cause the script to exit prematurely under set -e if environment variables contain only whitespace; suggestions were provided to append || true to these commands.

Comment thread scripts/rolling-update.sh
# IFS is explicitly set per-read so a caller's surrounding IFS
# doesn't change how DEFAULT_EXTRA_ENV / EXTRA_ENV are split.
if [[ -n "$user" ]]; then
IFS=$' \t\n' read -r -a user_pairs <<< "$user"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Bash, the read command returns a non-zero exit code if it reaches EOF without reading any fields. When set -e is active (as it is here), this will cause the script to exit prematurely if EXTRA_ENV (or user) contains only whitespace (e.g., a trailing space or newline in the environment file). Since [[ -n "$user" ]] is true for a string containing only spaces, the read command will be executed and fail.

Appending || true (or || :) ensures the script continues with an empty array, which is the desired behavior for whitespace-only input.

Suggested change
IFS=$' \t\n' read -r -a user_pairs <<< "$user"
IFS=$' \t\n' read -r -a user_pairs <<< "$user" || true

Comment thread scripts/rolling-update.sh
done

if [[ -n "$defaults" ]]; then
IFS=$' \t\n' read -r -a default_pairs <<< "$defaults"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the issue with user_pairs, this read command will trigger an exit under set -e if DEFAULT_EXTRA_ENV is set to a non-empty string that contains only whitespace. While less likely for a default value, it's safer to handle the potential non-zero exit code.

Suggested change
IFS=$' \t\n' read -r -a default_pairs <<< "$defaults"
IFS=$' \t\n' read -r -a default_pairs <<< "$defaults" || true

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 941ed32dd8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/rolling-update.sh
Comment on lines +943 to +944
if [[ "$pair" != *=* ]]; then
echo "rolling-update: malformed DEFAULT_EXTRA_ENV entry '$pair' (expected KEY=VALUE)" >&2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject empty-key DEFAULT_EXTRA_ENV entries early

The new DEFAULT_EXTRA_ENV validation only checks whether a token contains =, so an entry like =1800MiB is treated as valid here even though it is not a KEY=VALUE pair. That malformed default is then rejected later by run_container’s key regex check, but at that point the update has already stopped the node’s container, turning a config typo into rollout disruption. Validate a non-empty/valid key in this early guard so bad defaults fail before any node is touched.

Useful? React with 👍 / 👎.

- Codex P2 (rolling-update.sh:944, new angle over round 3): the prior
  validator only rejected entries without any equals sign; an entry
  like =1800MiB satisfied the *=* match and was let through, only to
  be rejected later by run_container key regex. Fail the merge
  explicitly for empty-key entries at the same point we fail on
  missing equals sign, so a malformed safeguard is caught with a
  clear message.

The round-3 Gemini Medium about read -a / set -e was verified against
bash 3.2 (macOS default): the existing [[ -n "$user" ]] guard prevents
the empty-here-string case and whitespace-only input reads successfully
into an empty array under bash 3.2. Not changing the || true shape the
reviewer suggested; the guard is sufficient and appending || true
would swallow real read failures.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 24, 2026

Round 4 in 75db3bf:

  • Codex P2 (rolling-update.sh:944, new angle over round 3): the prior validator only rejected entries without any =; an entry like =1800MiB satisfied the *=* match and was let through, only to be rejected later by run_container's key regex. Now failing the merge explicitly for empty-key entries at the same point we fail on missing =, so a malformed safeguard is caught with a clear message.

Declining the round-4 Gemini Medium comments on lines 926 and 936 (|| true for read -a under set -e). Verified against bash 3.2 (macOS default): the existing [[ -n "$user" ]] / [[ -n "$defaults" ]] guard prevents the empty-here-string case; whitespace-only input reads successfully into an empty array on both bash 3.2 and 5.2. Appending || true would also swallow real read failures, which is the opposite of what we want. No change.

Build: shell syntax check (bash -n) passes; sourced-and-called tests for =1800MiB, valid FOO=bar, empty defaults, and whitespace-only defaults all behave correctly under bash 3.2.

/gemini review
@codex review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thank you for the detailed update and for verifying the behavior against Bash 3.2. I appreciate the thorough testing of the merge logic and the rationale for declining the || true suggestion. Given that the [[ -n "$user" ]] / [[ -n "$defaults" ]] guards effectively prevent the empty-here-string issue, your approach is robust and maintains the desired error-handling behavior. The implementation looks solid.

@bootjp bootjp merged commit 488250c into main Apr 24, 2026
8 checks passed
@bootjp bootjp deleted the ops/gomemlimit-memory-defaults branch April 24, 2026 16:28
bootjp added a commit that referenced this pull request Apr 24, 2026
… starvation (#619)

## Summary
Design doc (only — no code in this PR) for a four-layer
workload-isolation model, prompted by the 2026-04-24 incident's
afternoon phase.

**Problem:** Today, one client host with 37 connections running a tight
XREAD loop consumed 14 CPU cores on the leader via `loadStreamAt →
unmarshalStreamValue → proto.Unmarshal` (81% of CPU per pprof). Raft
goroutines couldn't get CPU → step_queue_full = 75,692 on the leader (vs
0-119 on followers) → Raft commit p99 jumped to 6-10s, Lua p99 stuck at
6-8s. Follower replication was healthy (applied-index within 34 of
leader); the damage was entirely CPU-scheduling on the leader.

**Gap:** elastickv has no explicit workload-class isolation. Go's
scheduler treats every goroutine equally; a single heavy command path
can starve unrelated paths (raft, lease, Lua, GET/SET).

## Four-layer defense model
- **Layer 1 — heavy-command worker pool**: gate XREAD / KEYS / SCAN /
Lua onto a bounded pool (~`2 × GOMAXPROCS`); reply `-BUSY` when full.
Cheap commands keep their own fast path.
- **Layer 2 — locked OS threads for raft**: `runtime.LockOSThread()` on
the Ready loop + dispatcher lanes so the Go scheduler can't starve them.
**Not v1** — only if measurement after Layer 1 + 4 still shows
`step_queue_full > 0`.
- **Layer 3 — per-client admission control**: per-peer-IP connection cap
(default 8). Extends, doesn't replace, roadmap item 6's global in-flight
semaphore.
- **Layer 4 — XREAD O(N) → O(new)**: entry-per-key layout
(`!redis|stream|<key>|entry|<id>`) with range-scan, dual-read migration
fallback, legacy-removal gated on
`elastickv_stream_legacy_format_reads_total == 0`. Hashes/sets/zsets
share the same one-blob pattern and are called out as follow-up.

## Recommended sequencing
Layer 4 (correctness bug, concentrated change) → Layer 1 (generic
defense for next unknown hotspot) → Layer 3 (reconcile with roadmap item
6) → Layer 2 (only if forced by measurement).

## Relationship to other in-flight work
- Complements (does not replace)
`docs/design/2026_04_24_proposed_resilience_roadmap.md` item 6
(admission control). This doc's Layer 3 focuses on per-client fairness;
the roadmap's item 6 is global in-flight capping. Both are needed.
- Consistent with memwatch (#612): Layer 3 admission threshold should
fire **before** memwatch's shutdown threshold — flagged as an open
question in the doc.
- Assumes WAL auto-repair (#613), GOMEMLIMIT defaults (#617) are landed
so the cluster survives long enough to matter.

## Open questions called out in the doc
- Static vs dynamic command classification (Layer 1)
- `-BUSY` backoff semantics — how do we avoid client retry spinning
becoming the new hot loop?
- Number of locked OS threads on variable-core hosts (Layer 2)
- Stream migration soak window before removing legacy-format fallback
(Layer 4, currently 30 days, arbitrary)

## Deliverable
`docs/design/2026_04_24_proposed_workload_isolation.md` — 446 lines,
dated-prefix / `**Status: Proposed**` convention matching the rest of
`docs/design/`. No code.

## Test plan
- [x] File paths and function references in the doc spot-checked against
`origin/main`
- [x] Cross-references to `2026_04_24_proposed_resilience_roadmap.md`
reconciled (complements, doesn't duplicate)
- [ ] Design review — decide on the open questions before implementing
Layer 4 (which blocks Layer 1 on XREAD specifically)
bootjp added a commit that referenced this pull request Apr 24, 2026
- Gemini Medium (design §2.6 line 146): the design doc contradicted
  the runbook by claiming dry-runs do NOT need approval. GitHub's
  environment-protection rules cannot be made conditional on workflow
  inputs, so `environment: production` pauses BOTH dry-run and
  non-dry-run executions in v1. Aligned the design-doc wording with
  the runbook and cross-referenced §4 for the second-environment
  upgrade path.
- Gemini Medium (runbook §6 line 164 env list): the list of
  reconstruction vars was incomplete. Listed every env var the
  workflow actually exports (IMAGE, DATA_DIR, RAFT_PORT, REDIS_PORT,
  S3_PORT, ENABLE_S3, NODES, SSH_TARGETS, EXTRA_ENV) and called out
  the script-level defaults for anything not overridden, plus noted
  GOMEMLIMIT / CONTAINER_MEMORY_LIMIT are propagated via EXTRA_ENV
  once PR #617 lands.
- Gemini Medium (runbook §6 line 178 idempotency): corrected the
  "stop+recreate every node regardless" claim. The script
  (scripts/rolling-update.sh:794-798) skips nodes whose running
  image id matches the target AND whose gRPC endpoint is healthy,
  so re-running after a partial roll is safe because already-rolled
  nodes are no-ops, not stops.

Declining again Gemini HIGH "workflow file missing" — the file IS
in this PR at .github/workflows/rolling-update.yml; this is the
fourth round the bot has flagged its own misread. See prior rounds
for rationale; no change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant