ops(rolling-update): add GOMEMLIMIT=1800MiB + --memory=2500m defaults#617
ops(rolling-update): add GOMEMLIMIT=1800MiB + --memory=2500m defaults#617
Conversation
Prevents kernel OOM-SIGKILL cascades observed on 2026-04-24 where all 4 live nodes were killed 22-169 times in 24h under a traffic spike, corrupting WAL tails and triggering election storms. GOMEMLIMIT makes Go GC before the ceiling; --memory keeps any surviving OOM scoped to the container (not qemu-guest-agent / systemd).
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 53 minutes and 27 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements OOM defenses by adding configurable container memory limits and Go runtime memory ceilings (GOMEMLIMIT) to the rolling update scripts. It introduces a merging mechanism for environment variables to ensure user overrides are respected. A potential compatibility issue was identified regarding the use of Bash associative arrays, which may fail on systems running Bash versions older than 4.0, such as macOS.
| merge_extra_env() { | ||
| local defaults="$1" | ||
| local user="$2" | ||
| local -A seen=() |
There was a problem hiding this comment.
The use of associative arrays (local -A) requires Bash 4.0 or newer. While most modern Linux distributions ship with Bash 4+, the default version of Bash on macOS is still 3.2. If this script is intended to be run from a developer's local machine (which is common for rolling update scripts), this will cause a syntax error on macOS. Consider using a more compatible approach for de-duplication if macOS support is required, or explicitly document the Bash 4 requirement.
merge_extra_env used `local -A seen=()` which requires Bash 4+. macOS
ships Bash 3.2 by default, so operators invoking the script from a
Mac workstation hit a syntax error on source (Gemini MEDIUM feedback).
Replace the associative array with a space-padded string + substring
match (" KEY "). The EXTRA_ENV list is a handful of entries, so the
linear lookup cost is trivial.
Verified behaviourally unchanged: the five merge cases (defaults+user
disjoint, user overrides default, default only, user only, both empty)
all produce the same output as the Bash-4 implementation.
|
Addressed review feedback in a3f0cf0:
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces OOM defenses for the elastickv service by implementing a soft memory limit via GOMEMLIMIT and a hard container memory limit using Docker's --memory flag. It adds a merge_extra_env utility to the rolling update script to manage environment variable overrides. Review feedback identifies potential script termination issues in specific Bash environments when using read -a with empty here-strings under set -e, providing suggestions to guard these operations.
| local -a default_pairs=() | ||
| local pair key seen=" " merged="" | ||
|
|
||
| read -r -a user_pairs <<< "$user" |
There was a problem hiding this comment.
Using read -a with a here-string (<<<) on an empty variable will return a non-zero exit status in some Bash versions (like 3.2 on macOS) when set -e is active, causing the script to terminate prematurely if EXTRA_ENV is unset. Since set -euo pipefail is active at the top of the script, this is a high-risk operation. It is safer to check if the variable is non-empty before attempting to read it into an array.
| read -r -a user_pairs <<< "$user" | |
| if [[ -n "$user" ]]; then | |
| read -r -a user_pairs <<< "$user" | |
| fi |
| seen+="${key} " | ||
| done | ||
|
|
||
| read -r -a default_pairs <<< "$defaults" |
There was a problem hiding this comment.
Similar to the user variable, reading from defaults via a here-string can trigger set -e if the variable is empty. While DEFAULT_EXTRA_ENV has a non-empty default in this PR, a user could explicitly set it to an empty string to opt out, which would then crash the script here.
| read -r -a default_pairs <<< "$defaults" | |
| if [[ -n "$defaults" ]]; then | |
| read -r -a default_pairs <<< "$defaults" | |
| fi |
…et -e Gemini HIGH (line 919 + 927): on Bash 3.2 (macOS default, which this script must stay compatible with after the previous fix) `read -a` on an empty here-string returns non-zero, and `set -euo pipefail` then terminates the script. Skip the read when the source string is empty — the empty array is the intended result either way. Verified under `set -euo pipefail` with all five merge cases (both empty, only default, only user, user overrides default, disjoint user + default) — all produce the same output as before.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: edd2a9106e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| [[ -n "$pair" ]] || continue | ||
| [[ "$pair" == *=* ]] || continue |
There was a problem hiding this comment.
Validate malformed DEFAULT_EXTRA_ENV entries
merge_extra_env silently drops default tokens that don’t contain = and only forwards the remaining pairs for downstream validation, so a typo in DEFAULT_EXTRA_ENV (for example GOMEMLIMIT1800MiB) makes the rollout continue without the intended safeguard and without any error. This differs from EXTRA_ENV, where malformed entries fail fast, and can hide configuration mistakes in production rollouts.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request introduces OOM (Out-Of-Memory) defenses for the elastickv service by implementing container-scoped memory limits and Go runtime memory limits (GOMEMLIMIT). It adds a merge_extra_env function to the rolling-update.sh script to handle merging default and user-provided environment variables while ensuring user overrides are respected. Feedback suggests explicitly setting the IFS variable when using read -a to ensure robust string splitting across different environments.
| # read when the source string is empty — the empty array is the | ||
| # intended result either way. | ||
| if [[ -n "$user" ]]; then | ||
| read -r -a user_pairs <<< "$user" |
There was a problem hiding this comment.
The read -a command relies on the current value of IFS to split the input string. While the script does not appear to modify IFS globally, it is safer and more robust to explicitly set IFS locally for this command to ensure it always splits on whitespace as intended, especially since EXTRA_ENV is documented as a whitespace-separated list.
| read -r -a user_pairs <<< "$user" | |
| IFS=$' \t\n' read -r -a user_pairs <<< "$user" |
| done | ||
|
|
||
| if [[ -n "$defaults" ]]; then | ||
| read -r -a default_pairs <<< "$defaults" |
…TRA_ENV - Gemini Medium (lines 924, 934): set IFS=$' \t\n' per read -a call so the split is stable regardless of any surrounding IFS mutation. - Codex P2 (line 938): validate DEFAULT_EXTRA_ENV entries explicitly. Unlike user-supplied EXTRA_ENV (forgivable typo, silently dropped for compatibility with existing deploy.env habits), DEFAULT_EXTRA_ENV is where we install safeguards like GOMEMLIMIT; a typo there means the safeguard is silently absent. Fail the merge with a clear message instead.
|
Round 3 in 941ed32:
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces Out-Of-Memory (OOM) defenses for the elastickv service by implementing a default GOMEMLIMIT and a configurable CONTAINER_MEMORY_LIMIT for Docker containers. The changes include updates to the environment example file, documentation in the usage section, and a new merge_extra_env Bash function to safely combine default and user-provided environment variables. Review feedback identified a potential issue where the read command could cause the script to exit prematurely under set -e if environment variables contain only whitespace; suggestions were provided to append || true to these commands.
| # IFS is explicitly set per-read so a caller's surrounding IFS | ||
| # doesn't change how DEFAULT_EXTRA_ENV / EXTRA_ENV are split. | ||
| if [[ -n "$user" ]]; then | ||
| IFS=$' \t\n' read -r -a user_pairs <<< "$user" |
There was a problem hiding this comment.
In Bash, the read command returns a non-zero exit code if it reaches EOF without reading any fields. When set -e is active (as it is here), this will cause the script to exit prematurely if EXTRA_ENV (or user) contains only whitespace (e.g., a trailing space or newline in the environment file). Since [[ -n "$user" ]] is true for a string containing only spaces, the read command will be executed and fail.
Appending || true (or || :) ensures the script continues with an empty array, which is the desired behavior for whitespace-only input.
| IFS=$' \t\n' read -r -a user_pairs <<< "$user" | |
| IFS=$' \t\n' read -r -a user_pairs <<< "$user" || true |
| done | ||
|
|
||
| if [[ -n "$defaults" ]]; then | ||
| IFS=$' \t\n' read -r -a default_pairs <<< "$defaults" |
There was a problem hiding this comment.
Similar to the issue with user_pairs, this read command will trigger an exit under set -e if DEFAULT_EXTRA_ENV is set to a non-empty string that contains only whitespace. While less likely for a default value, it's safer to handle the potential non-zero exit code.
| IFS=$' \t\n' read -r -a default_pairs <<< "$defaults" | |
| IFS=$' \t\n' read -r -a default_pairs <<< "$defaults" || true |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 941ed32dd8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if [[ "$pair" != *=* ]]; then | ||
| echo "rolling-update: malformed DEFAULT_EXTRA_ENV entry '$pair' (expected KEY=VALUE)" >&2 |
There was a problem hiding this comment.
Reject empty-key DEFAULT_EXTRA_ENV entries early
The new DEFAULT_EXTRA_ENV validation only checks whether a token contains =, so an entry like =1800MiB is treated as valid here even though it is not a KEY=VALUE pair. That malformed default is then rejected later by run_container’s key regex check, but at that point the update has already stopped the node’s container, turning a config typo into rollout disruption. Validate a non-empty/valid key in this early guard so bad defaults fail before any node is touched.
Useful? React with 👍 / 👎.
- Codex P2 (rolling-update.sh:944, new angle over round 3): the prior validator only rejected entries without any equals sign; an entry like =1800MiB satisfied the *=* match and was let through, only to be rejected later by run_container key regex. Fail the merge explicitly for empty-key entries at the same point we fail on missing equals sign, so a malformed safeguard is caught with a clear message. The round-3 Gemini Medium about read -a / set -e was verified against bash 3.2 (macOS default): the existing [[ -n "$user" ]] guard prevents the empty-here-string case and whitespace-only input reads successfully into an empty array under bash 3.2. Not changing the || true shape the reviewer suggested; the guard is sufficient and appending || true would swallow real read failures.
|
Round 4 in 75db3bf:
Declining the round-4 Gemini Medium comments on lines 926 and 936 ( Build: shell syntax check ( /gemini review |
|
Thank you for the detailed update and for verifying the behavior against Bash 3.2. I appreciate the thorough testing of the merge logic and the rationale for declining the |
… starvation (#619) ## Summary Design doc (only — no code in this PR) for a four-layer workload-isolation model, prompted by the 2026-04-24 incident's afternoon phase. **Problem:** Today, one client host with 37 connections running a tight XREAD loop consumed 14 CPU cores on the leader via `loadStreamAt → unmarshalStreamValue → proto.Unmarshal` (81% of CPU per pprof). Raft goroutines couldn't get CPU → step_queue_full = 75,692 on the leader (vs 0-119 on followers) → Raft commit p99 jumped to 6-10s, Lua p99 stuck at 6-8s. Follower replication was healthy (applied-index within 34 of leader); the damage was entirely CPU-scheduling on the leader. **Gap:** elastickv has no explicit workload-class isolation. Go's scheduler treats every goroutine equally; a single heavy command path can starve unrelated paths (raft, lease, Lua, GET/SET). ## Four-layer defense model - **Layer 1 — heavy-command worker pool**: gate XREAD / KEYS / SCAN / Lua onto a bounded pool (~`2 × GOMAXPROCS`); reply `-BUSY` when full. Cheap commands keep their own fast path. - **Layer 2 — locked OS threads for raft**: `runtime.LockOSThread()` on the Ready loop + dispatcher lanes so the Go scheduler can't starve them. **Not v1** — only if measurement after Layer 1 + 4 still shows `step_queue_full > 0`. - **Layer 3 — per-client admission control**: per-peer-IP connection cap (default 8). Extends, doesn't replace, roadmap item 6's global in-flight semaphore. - **Layer 4 — XREAD O(N) → O(new)**: entry-per-key layout (`!redis|stream|<key>|entry|<id>`) with range-scan, dual-read migration fallback, legacy-removal gated on `elastickv_stream_legacy_format_reads_total == 0`. Hashes/sets/zsets share the same one-blob pattern and are called out as follow-up. ## Recommended sequencing Layer 4 (correctness bug, concentrated change) → Layer 1 (generic defense for next unknown hotspot) → Layer 3 (reconcile with roadmap item 6) → Layer 2 (only if forced by measurement). ## Relationship to other in-flight work - Complements (does not replace) `docs/design/2026_04_24_proposed_resilience_roadmap.md` item 6 (admission control). This doc's Layer 3 focuses on per-client fairness; the roadmap's item 6 is global in-flight capping. Both are needed. - Consistent with memwatch (#612): Layer 3 admission threshold should fire **before** memwatch's shutdown threshold — flagged as an open question in the doc. - Assumes WAL auto-repair (#613), GOMEMLIMIT defaults (#617) are landed so the cluster survives long enough to matter. ## Open questions called out in the doc - Static vs dynamic command classification (Layer 1) - `-BUSY` backoff semantics — how do we avoid client retry spinning becoming the new hot loop? - Number of locked OS threads on variable-core hosts (Layer 2) - Stream migration soak window before removing legacy-format fallback (Layer 4, currently 30 days, arbitrary) ## Deliverable `docs/design/2026_04_24_proposed_workload_isolation.md` — 446 lines, dated-prefix / `**Status: Proposed**` convention matching the rest of `docs/design/`. No code. ## Test plan - [x] File paths and function references in the doc spot-checked against `origin/main` - [x] Cross-references to `2026_04_24_proposed_resilience_roadmap.md` reconciled (complements, doesn't duplicate) - [ ] Design review — decide on the open questions before implementing Layer 4 (which blocks Layer 1 on XREAD specifically)
- Gemini Medium (design §2.6 line 146): the design doc contradicted the runbook by claiming dry-runs do NOT need approval. GitHub's environment-protection rules cannot be made conditional on workflow inputs, so `environment: production` pauses BOTH dry-run and non-dry-run executions in v1. Aligned the design-doc wording with the runbook and cross-referenced §4 for the second-environment upgrade path. - Gemini Medium (runbook §6 line 164 env list): the list of reconstruction vars was incomplete. Listed every env var the workflow actually exports (IMAGE, DATA_DIR, RAFT_PORT, REDIS_PORT, S3_PORT, ENABLE_S3, NODES, SSH_TARGETS, EXTRA_ENV) and called out the script-level defaults for anything not overridden, plus noted GOMEMLIMIT / CONTAINER_MEMORY_LIMIT are propagated via EXTRA_ENV once PR #617 lands. - Gemini Medium (runbook §6 line 178 idempotency): corrected the "stop+recreate every node regardless" claim. The script (scripts/rolling-update.sh:794-798) skips nodes whose running image id matches the target AND whose gRPC endpoint is healthy, so re-running after a partial roll is safe because already-rolled nodes are no-ops, not stops. Declining again Gemini HIGH "workflow file missing" — the file IS in this PR at .github/workflows/rolling-update.yml; this is the fourth round the bot has flagged its own misread. See prior rounds for rationale; no change.
Summary
Add two OOM-defense defaults to
scripts/rolling-update.sh:GOMEMLIMIT=1800MiB(via newDEFAULT_EXTRA_ENV, merged into the existingEXTRA_ENVplumbing)--memory=2500mon the remotedocker run(via newCONTAINER_MEMORY_LIMIT)Both are env-var-controlled with empty-string opt-out (
${VAR-default}so unset uses the default, but an explicit empty string disables it).Motivation
2026-04-24 incident: all 4 live nodes were kernel-OOM-SIGKILLed 22-169 times in 24h under a traffic spike. Each kill risked WAL-tail truncation and triggered election storms, cascading into p99 GET spikes to 6-8s. The runtime defense was applied by hand during the incident; this PR makes it the script default so future rollouts inherit it.
GOMEMLIMIT— Go runtime GCs aggressively as heap approaches the limit, keeping RSS below the container ceiling.--memory(cgroup hard limit) — if Go can't keep up (e.g. non-heap growth), the kill is scoped to the container, not host processes likeqemu-guest-agentorsystemd.Behavior changes
DEFAULT_EXTRA_ENVGOMEMLIMIT=1800MiBDEFAULT_EXTRA_ENV=""CONTAINER_MEMORY_LIMIT2500mCONTAINER_MEMORY_LIMIT=""Operator-supplied
EXTRA_ENVkeys override matching keys inDEFAULT_EXTRA_ENV(e.g.,EXTRA_ENV="GOMEMLIMIT=3000MiB"wins over the default).Related
Companion PRs (defense-in-depth):
memwatch— graceful shutdown before kernel OOM (prevents WAL corruption in the first place)Test plan
bash -n scripts/rolling-update.shpassesdocker run ... --memory=2500m ... -e GOMEMLIMIT=1800MiB ...on each nodeDesign doc reference
docs/design/2026_04_24_proposed_resilience_roadmap.md(item 1 — capacity/runtime defenses).