perf(raft): partition etcd dispatcher by message type to isolate heartbeats by bootjp · Pull Request #577 · bootjp/elastickv

bootjp · 2026-04-21T19:13:15Z

Summary

Adds an opt-in 4-lane etcd raft dispatcher (heartbeat / replication / snapshot / other) gated on ELASTICKV_RAFT_DISPATCHER_LANES=1. Default behavior is unchanged.
Isolates MsgSnap on its own goroutine so a multi-MiB snapshot transfer can no longer stall subsequent MsgApps queued to the same peer; addresses the residual failure mode from the 324k-heartbeat-drop incident that PR perf(kv): serve ShardStore.GetAt reads via lease, not read-index #575 partially mitigated.
Keeps every existing correctness invariant: per-peer within-type ordering, dispatchDropCount accounting, and postDispatchReport firing after every dispatch attempt (so ReportUnreachable / ReportSnapshot still land in etcd/raft).

What changed

peerQueues already had two lanes (heartbeat + normal) from PR #522. With the flag on, it now carries four:

Lane	Message types	Buffer
`heartbeat`	`MsgHeartbeat` / `MsgHeartbeatResp`, Vote / PreVote (+Resp), ReadIndex (+Resp), `MsgTimeoutNow`	512
`replication`	`MsgApp` / `MsgAppResp`	`MaxInflightMsg` (1024)
`snapshot`	`MsgSnap`	4
`other`	anything unclassified	16

Each lane gets its own goroutine, drained independently. Within-type ordering to a given peer is preserved because all of a peer's messages of one type share exactly one channel and one worker. See selectDispatchLane.

When the flag is off we construct the old heartbeat + normal pair exactly as today — no behavior change, no goroutine count change, no allocation change.

Why behind a flag

The raft dispatch path is high blast radius: a bug here can drop heartbeats cluster-wide and trigger mass elections. Rather than swap the hot path in one shot, this lands the structural change plumbed but dormant so the default production footprint is unchanged.

Rollout plan

Merge default-off (this PR).
Flip ELASTICKV_RAFT_DISPATCHER_LANES=1 in staging, watch elastickv_raft_dispatch_drop_total, elastickv_raft_dispatch_errors_total, and heartbeat loss metrics for 24h under synthetic write load + forced snapshot transfer.
Enable on one production replica, bake 24h, then fleetwide.
Once soaked, delete the flag and the 2-lane branch in startPeerDispatcher / selectDispatchLane in a follow-up PR.

Test plan

go test -race -count=1 -timeout 300s ./internal/raftengine/etcd/... — passes (flag off)
ELASTICKV_RAFT_DISPATCHER_LANES=1 go test -race -count=1 -timeout 300s ./internal/raftengine/etcd/... — passes (flag on)
New targeted tests:
- TestSelectDispatchLane_LegacyTwoLane — pins today's routing table
- TestSelectDispatchLane_FourLane — pins new routing table
- TestFourLaneDispatcher_SnapshotDoesNotBlockReplication — exercises the HOL-blocking invariant the flag is meant to fix
- TestFourLaneDispatcher_RemovePeerClosesAllLanes — ensures no goroutine leak on peer removal in 4-lane mode
- TestDispatcherLanesEnabledFromEnv — env parsing
Staging soak with flag enabled under write-heavy load (blocker for removing the flag in a follow-up)

Summary by CodeRabbit

New Features
- Optional 4-lane dispatch mode (toggled via environment flag) to route replication, snapshot, priority, and other traffic separately for reduced head-of-line blocking and improved responsiveness.
- Safer peer removal behavior ensuring in-flight lane workers exit and post-removal messages are dropped cleanly.
Tests
- Added unit and concurrency tests covering legacy vs. 4-lane routing, lifecycle, and env-var parsing.

…tbeats Adds an opt-in 4-lane dispatcher layout (heartbeat / replication / snapshot / other) behind the ELASTICKV_RAFT_DISPATCHER_LANES feature flag. Default behavior (flag unset or "0") is byte-for-byte identical to the current 2-lane (heartbeat + normal) implementation introduced in PR #522. When enabled, each per-peer peerQueues gains three extra channels and three extra goroutines so that: - MsgApp / MsgAppResp run on their own replication lane, independent of heartbeats and of MsgSnap. - MsgSnap (bulky, rare) runs on its own snapshot lane and can no longer stall subsequent MsgApps behind a multi-MiB transfer. - Heartbeat / vote / read-index traffic keeps its dedicated priority lane, so heartbeats still cannot be starved under write load. Per-peer within-type ordering (the raft invariant that matters for MsgApp) is preserved because a given peer MsgApp stream still shares one lane and one worker. dispatchDropCount and postDispatchReport firing semantics are unchanged, and the existing drop paths have been factored to close whichever subset of lanes is actually wired up. Rollout plan: ship default-off, enable in staging with ELASTICKV_RAFT_DISPATCHER_LANES=1 and watch dispatchDropCount + dispatchErrorCount + Prometheus heartbeat-drop gauge for 24h before flipping production. The flag can be removed once the 4-lane path has soaked.

coderabbitai · 2026-04-21T19:13:22Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 43 minutes and 37 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 43 minutes and 37 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5d6c1b0d-e2e8-47de-acc3-9a5aecbeeb16

📥 Commits

Reviewing files that changed from the base of the PR and between 676f1d1 and 1494ea7.

📒 Files selected for processing (2)

internal/raftengine/etcd/engine.go
internal/raftengine/etcd/engine_test.go

📝 Walkthrough

Walkthrough

Added an environment-controlled feature flag to switch between legacy 2-lane per-peer dispatch and a new opt-in 4-lane per-peer dispatch; extended peer queues, lane selection, worker startup/cleanup, and tests to validate routing and concurrent behavior.

Changes

Cohort / File(s)	Summary
Dispatch Lane Architecture `internal/raftengine/etcd/engine.go`	Added `dispatcherLanesEnabled` feature flag (env-driven), new `selectDispatchLane` routing, extended `peerQueues` with optional `replication`, `snapshot`, `other` channels, worker creation for configured lanes, `closePeerLanes` and `dispatcherLanesEnabledFromEnv`, and updated peer removal to close all non-nil lanes. Introduced buffer-sizing constants for 4-lane mode.
Lane Behavior Tests `internal/raftengine/etcd/engine_test.go`	Added unit and concurrency tests: `TestSelectDispatchLane_LegacyTwoLane`, `TestSelectDispatchLane_FourLane`, `TestFourLaneDispatcher_SnapshotDoesNotBlockReplication`, `TestFourLaneDispatcher_RemovePeerClosesAllLanes`, and `TestDispatcherLanesEnabledFromEnv` to validate routing, isolation, lifecycle, and env parsing.

Sequence Diagram(s)

(omitted — changes are internal feature additions with limited multi-actor runtime sequencing)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix(raft): report unreachable/snapshot status from dispatch workers #535: Touches the same dispatch/worker code paths in internal/raftengine/etcd/engine.go and modifies dispatch failure/reporting behavior.
feat(raft): replace unary Send with long-lived client-streaming SendStream per peer #526: Modifies per-peer dispatch worker initialization and message handling in the same engine module.
perf(raft): replace shared dispatch channel with per-peer channels to eliminate head-of-line blocking #522: Changes the raft engine per-peer dispatcher logic that this PR extends with multi-lane capabilities.

Poem

🐰 Four lanes hum beneath the code,
Heartbeats sprint on golden road,
Snapshots wander, replication runs,
A flag flips old to new in suns,
Tiny paws hopped in, and done! 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and specifically describes the main change: partitioning the etcd dispatcher by message type to isolate heartbeats, which is the core feature added in this PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/raft-dispatcher-lanes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements an opt-in 4-lane dispatcher for the Raft engine to prevent large snapshot transfers from blocking replication traffic. It introduces new message lanes for replication, snapshots, and other traffic, controlled by an environment variable. The implementation includes a new routing function, selectDispatchLane, and comprehensive tests for the multi-lane logic. Feedback suggests simplifying the routing logic by utilizing the existing isPriorityMsg helper to reduce code duplication.

gemini-code-assist · 2026-04-21T19:16:30Z

+func (e *Engine) selectDispatchLane(pd *peerQueues, msgType raftpb.MessageType) chan dispatchRequest {
+	if !e.dispatcherLanesEnabled {
+		if isPriorityMsg(msgType) {
+			return pd.heartbeat
+		}
+		return pd.normal
+	}
+	switch msgType {
+	case raftpb.MsgHeartbeat, raftpb.MsgHeartbeatResp:
+		return pd.heartbeat
+	case raftpb.MsgApp, raftpb.MsgAppResp:
+		return pd.replication
+	case raftpb.MsgSnap:
+		return pd.snapshot
+	case raftpb.MsgVote, raftpb.MsgVoteResp,
+		raftpb.MsgPreVote, raftpb.MsgPreVoteResp,
+		raftpb.MsgReadIndex, raftpb.MsgReadIndexResp,
+		raftpb.MsgTimeoutNow:
+		// Election / read-index traffic is small and latency-sensitive but
+		// rare; put it on the heartbeat lane so it keeps its priority
+		// treatment and, like today, is never stuck behind MsgApp.
+		return pd.heartbeat
+	default:
+		return pd.other
+	}
+}


The logic in selectDispatchLane can be simplified by leveraging the existing isPriorityMsg helper. Since all priority messages (heartbeats, votes, read-index, etc.) are routed to the heartbeat lane in both legacy and 4-lane modes, checking it first allows for a cleaner implementation and avoids duplicating the list of priority message types in the switch statement.

func (e *Engine) selectDispatchLane(pd *peerQueues, msgType raftpb.MessageType) chan dispatchRequest { if isPriorityMsg(msgType) { return pd.heartbeat } if !e.dispatcherLanesEnabled { return pd.normal } switch msgType { case raftpb.MsgApp, raftpb.MsgAppResp: return pd.replication case raftpb.MsgSnap: return pd.snapshot default: return pd.other } }

Route priority control traffic through isPriorityMsg up front in both legacy and 4-lane modes, dropping the duplicated message-type list in the switch. Behaviour is identical: heartbeats, votes, read-index and timeout-now still ride pd.heartbeat, MsgApp/MsgAppResp go to pd.replication, MsgSnap to pd.snapshot, and everything else falls through to pd.other (4-lane) or pd.normal (legacy).

bootjp · 2026-04-21T20:47:04Z

Addressed the gemini-code-assist review (selectDispatchLane duplicating the priority-message list).

selectDispatchLane now routes priority traffic through isPriorityMsg up front in both legacy and 4-lane modes; the duplicated MsgVote/MsgPreVote/MsgReadIndex/MsgTimeoutNow/MsgHeartbeat* case is gone. MsgApp/MsgAppResp -> replication, MsgSnap -> snapshot, default -> other (4-lane) or normal (legacy). Behaviour is identical.
go test -race -count=1 -timeout 300s ./internal/raftengine/etcd/... passes.
ELASTICKV_RAFT_DISPATCHER_LANES=1 go test -race -count=1 -timeout 300s ./internal/raftengine/etcd/... passes.
golangci-lint run ./internal/raftengine/etcd/... shows the same 2 pre-existing exhaustive warnings as the baseline (no new lint issues).

Commit: 95370c7

/gemini review

gemini-code-assist

Code Review

This pull request introduces an optional 4-lane dispatcher layout for Raft messages, splitting traffic into heartbeat, replication, snapshot, and other categories to prevent bulky snapshots from stalling replication. The feature is controlled by the ELASTICKV_RAFT_DISPATCHER_LANES environment variable. Feedback was provided to use strconv.ParseBool for more standard environment variable parsing.

gemini-code-assist · 2026-04-21T20:52:56Z

+func dispatcherLanesEnabledFromEnv() bool {
+	v := os.Getenv(dispatcherLanesEnvVar)
+	return v != "" && v != "0"
+}


The current implementation of dispatcherLanesEnabledFromEnv is quite permissive, enabling the feature for any non-empty value other than "0". While this matches the PR description, using strconv.ParseBool would provide a more standard and predictable behavior for boolean environment variables (e.g., handling "true", "1", "false", "0" consistently). Additionally, if this dispatcher lanes feature is a prototype or limited-scope implementation, ensure that its limitations are explicitly documented and runtime validation is added to prevent misuse in a general context.

Suggested change

func dispatcherLanesEnabledFromEnv() bool {

v := os.Getenv(dispatcherLanesEnvVar)

return v != "" && v != "0"

}

func dispatcherLanesEnabledFromEnv() bool {

v := os.Getenv(dispatcherLanesEnvVar)

enabled, _ := strconv.ParseBool(v)

return enabled

}

References

For prototype or limited-scope implementations, explicitly document the limitations (e.g., single-node only, blocking startup) and add runtime validation to prevent misuse in a general context.

Replace the permissive "non-empty and not 0" check in dispatcherLanesEnabledFromEnv with strconv.ParseBool so the env var accepts only the standard boolean tokens (1/t/T/TRUE/true/True and 0/f/F/FALSE/false/False). Unrecognized or empty values default to false. Addresses Gemini review on PR #577.

bootjp · 2026-04-22T05:11:49Z

/gemini review

Addressed the 2026-04-21T20:52:56Z medium-priority review:

dispatcherLanesEnabledFromEnv now uses strconv.ParseBool(strings.TrimSpace(...)). Unrecognized or empty values default to false; only the standard tokens (1/t/T/TRUE/true/True, 0/f/F/FALSE/false/False) are accepted.
Updated the doc comment to list accepted tokens.
Expanded TestDispatcherLanesEnabledFromEnv to cover empty, "0", "1", "true", "TRUE", "false", and the now-rejected "yes".
Tests pass both with the flag off and ELASTICKV_RAFT_DISPATCHER_LANES=1.

Commit: 6ff5fce

gemini-code-assist · 2026-04-22T05:11:52Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Pull request overview

This PR adds an opt-in, per-peer 4-lane outbound dispatcher to the etcd/raft engine to prevent head-of-line blocking between heartbeats, replication traffic, and large snapshot transfers, while keeping the default (legacy 2-lane) behavior unchanged.

Changes:

Introduces a 4-lane per-peer dispatcher (heartbeat / replication / snapshot / other) gated by ELASTICKV_RAFT_DISPATCHER_LANES.
Adds selectDispatchLane to centralize routing of raft message types to the appropriate lane.
Expands test coverage to pin lane selection behavior and validate that snapshot dispatch cannot block replication in 4-lane mode.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`internal/raftengine/etcd/engine.go`	Adds the opt-in 4-lane dispatcher plumbing (env flag, lane selection, per-peer lane creation, and lane-safe close on peer removal).
`internal/raftengine/etcd/engine_test.go`	Adds tests for lane routing, snapshot vs replication independence, peer removal cleanup in 4-lane mode, and env-flag parsing.

Copilot · 2026-04-22T07:07:49Z

+	cases := map[raftpb.MessageType]chan dispatchRequest{
+		raftpb.MsgHeartbeat:     pd.heartbeat,
+		raftpb.MsgHeartbeatResp: pd.heartbeat,
+		raftpb.MsgReadIndex:     pd.heartbeat,
+		raftpb.MsgVote:          pd.heartbeat,
+		raftpb.MsgTimeoutNow:    pd.heartbeat,
+		raftpb.MsgApp:           pd.normal,
+		raftpb.MsgAppResp:       pd.normal,
+		raftpb.MsgSnap:          pd.normal,
+	}


TestSelectDispatchLane_LegacyTwoLane claims to pin the legacy routing table, but it doesn’t cover several message types that are treated as priority by isPriorityMsg (e.g., MsgReadIndexResp, MsgVoteResp, MsgPreVote, MsgPreVoteResp). Adding these cases would better prevent accidental routing regressions in legacy mode.

bootjp · 2026-04-22T07:35:53Z

/gemini review

Addressed Copilot feedback: added MsgReadIndexResp, MsgVoteResp, MsgPreVote, MsgPreVoteResp cases to TestSelectDispatchLane_LegacyTwoLane so the priority-routing table is fully pinned for both legacy and 4-lane modes. Tests pass with and without ELASTICKV_RAFT_DISPATCHER_LANES=1 (-race). Commit: c385e3f.

gemini-code-assist

Code Review

This pull request introduces an optional 4-lane dispatcher layout for Raft messages, designed to prevent bulky snapshot transfers from stalling log replication. It adds specific lanes for replication, snapshots, and other control messages, while maintaining the existing heartbeat lane for priority traffic. The feature is opt-in via the ELASTICKV_RAFT_DISPATCHER_LANES environment variable. The changes include updates to the Engine and peerQueues structures, message routing logic, and peer lifecycle management, supported by new unit tests verifying lane independence and environment variable parsing. I have no feedback to provide as there were no review comments.

coderabbitai

🧹 Nitpick comments (1)

internal/raftengine/etcd/engine.go (1)
3045-3055: Optional: consider per-lane drop counters for the 4-lane rollout.

dispatchDropCount aggregates drops across all lanes, so dashboards can't distinguish a heartbeat-lane stall (cluster-liveness impacting) from a snapshot-lane stall (benign/expected under retries). Since the whole point of the 4-lane partitioning is isolation, per-lane counters (or at least a lane label on the existing counter, via msg.Type bucketed to {heartbeat, replication, snapshot, other}) would make the staging soak and single-replica production enable much easier to evaluate. Not a blocker for default-off merge, but worth adding before the fleetwide enable step in the rollout plan.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/raftengine/etcd/engine.go` around lines 3045 - 3055,
recordDroppedDispatch currently increments a single aggregate counter
(dispatchDropCount) which hides which raft lane (heartbeat, replication,
snapshot, other) dropped a message; update recordDroppedDispatch to map msg.Type
(use its value in recordDroppedDispatch) into a lane bucket and increment a
per-lane counter instead (e.g. add e.dispatchDropCounts map[string]*atomic.Int64
or a metrics counter with a "lane" label and increment the appropriate bucket),
still call shouldLogDispatchEvent using the per-lane count or both
aggregate+lane as desired, and include the lane identifier in the slog.Warn
fields; update any initialization of dispatchDropCount to create the per-lane
counters and adjust uses of dispatchDropCount accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/raftengine/etcd/engine.go`:
- Around line 3045-3055: recordDroppedDispatch currently increments a single
aggregate counter (dispatchDropCount) which hides which raft lane (heartbeat,
replication, snapshot, other) dropped a message; update recordDroppedDispatch to
map msg.Type (use its value in recordDroppedDispatch) into a lane bucket and
increment a per-lane counter instead (e.g. add e.dispatchDropCounts
map[string]*atomic.Int64 or a metrics counter with a "lane" label and increment
the appropriate bucket), still call shouldLogDispatchEvent using the per-lane
count or both aggregate+lane as desired, and include the lane identifier in the
slog.Warn fields; update any initialization of dispatchDropCount to create the
per-lane counters and adjust uses of dispatchDropCount accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2af03391-3951-472f-93d3-53b9f7639ca3

📥 Commits

Reviewing files that changed from the base of the PR and between bbd7850 and c385e3f.

📒 Files selected for processing (2)

internal/raftengine/etcd/engine.go
internal/raftengine/etcd/engine_test.go

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Copilot · 2026-04-22T09:23:38Z

@@ -65,11 +67,26 @@ const (
 	// upside is that a ~5 s transient pause (election-timeout scale)
 	// no longer drops heartbeats and force the peers' lease to expire.


Minor grammar in the constant comment: "no longer drops heartbeats and force the peers' lease to expire" should be "...and forces...".

Suggested change

// no longer drops heartbeats and force the peers' lease to expire.

// no longer drops heartbeats and forces the peers' lease to expire.

bootjp · 2026-04-22T09:44:54Z

/gemini review

Addressed both review items in 676f1d1:

exhaustive (engine_test.go:1723, also pre-existing engine.go:1523): added //nolint:exhaustive with rationale. The codebase convention (see proxy/sentry.go:178, kv/coordinator.go:87) is //nolint:exhaustive rather than enumerating every MsgXxx, since default-signifies-exhaustive is not enabled in .golangci.yaml. The test switch also got a default: branch.
grammar (engine.go:68): "force" to "forces".

Verification:

make lint -> 0 issues
go test -race -count=1 -timeout 300s ./internal/raftengine/etcd/... PASS (8.35s)
ELASTICKV_RAFT_DISPATCHER_LANES=1 go test -race ... PASS (9.08s)

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/raftengine/etcd/engine.go (1)

1475-1490: ⚠️ Potential issue | 🟠 Major

Report dropped snapshots back to raft.

In 4-lane mode, MsgSnap can be dropped before it reaches transport when the snapshot lane is full. Only recordDroppedDispatch is called on pre-transport drops, so raft never receives ReportSnapshot(..., SnapshotFailure) and the peer remains stuck in snapshot progress. Add snapshot failure reporting on pre-transport drops.

Proposed fix

 	ch := e.selectDispatchLane(pd, msg.Type)
 	// Avoid the expensive deep-clone in prepareDispatchRequest when the channel
 	// is already full. The len/cap check is safe here because this function is
 	// only ever called from the single engine event-loop goroutine.
 	if len(ch) >= cap(ch) {
 		e.recordDroppedDispatch(msg)
+		e.reportDroppedSnapshot(msg)
 		return nil
 	}
 	dispatchReq := prepareDispatchRequest(msg)
 	select {
 	case ch <- dispatchReq:
 		return nil
 	default:
 		_ = dispatchReq.Close()
 		e.recordDroppedDispatch(msg)
+		e.reportDroppedSnapshot(msg)
 		return nil
 	}
 }
+
+func (e *Engine) reportDroppedSnapshot(msg raftpb.Message) {
+	if msg.Type != raftpb.MsgSnap || e.rawNode == nil {
+		return
+	}
+	e.rawNode.ReportSnapshot(msg.To, etcdraft.SnapshotFailure)
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/raftengine/etcd/engine.go` around lines 1475 - 1490, When a dispatch
is dropped before transport (both the len(ch) >= cap(ch) fast-path and the
default case after attempting to send), detect if msg.Type == MsgSnap and invoke
ReportSnapshot to notify raft of a snapshot failure; specifically, in the
pre-transport drop branches around
selectDispatchLane/prepareDispatchRequest/recordDroppedDispatch add a call like
ReportSnapshot(msg.To, msg.RegionID, SnapshotFailure) (or the local wrapper that
sends ReportSnapshot) before returning, so raft receives SnapshotFailure for
dropped MsgSnap messages.

🧹 Nitpick comments (1)

internal/raftengine/etcd/engine.go (1)

1523-1530: Avoid the new //nolint:exhaustive.

This routing is intentionally partial, so simple conditionals avoid the linter suppression without changing behavior.

♻️ Proposed refactor

-	switch msgType { //nolint:exhaustive // only MsgApp/MsgAppResp/MsgSnap need dedicated lanes; the rest falls through to pd.other
-	case raftpb.MsgApp, raftpb.MsgAppResp:
+	if msgType == raftpb.MsgApp || msgType == raftpb.MsgAppResp {
 		return pd.replication
-	case raftpb.MsgSnap:
+	}
+	if msgType == raftpb.MsgSnap {
 		return pd.snapshot
-	default:
-		return pd.other
 	}
+	return pd.other

As per coding guidelines, Go code should “avoid adding //nolint unless absolutely required; prefer refactoring.”

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/raftengine/etcd/engine.go` around lines 1523 - 1530, The switch uses
a `//nolint:exhaustive` suppression; remove that by replacing the switch with
explicit conditionals that express the same partial routing: check msgType
against raftpb.MsgApp || raftpb.MsgAppResp and return pd.replication, else if
msgType == raftpb.MsgSnap return pd.snapshot, otherwise return pd.other — update
the logic around the msgType variable and the
pd.replication/pd.snapshot/pd.other returns so behavior is identical but without
the nolint.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@internal/raftengine/etcd/engine.go`:
- Around line 1475-1490: When a dispatch is dropped before transport (both the
len(ch) >= cap(ch) fast-path and the default case after attempting to send),
detect if msg.Type == MsgSnap and invoke ReportSnapshot to notify raft of a
snapshot failure; specifically, in the pre-transport drop branches around
selectDispatchLane/prepareDispatchRequest/recordDroppedDispatch add a call like
ReportSnapshot(msg.To, msg.RegionID, SnapshotFailure) (or the local wrapper that
sends ReportSnapshot) before returning, so raft receives SnapshotFailure for
dropped MsgSnap messages.

---

Nitpick comments:
In `@internal/raftengine/etcd/engine.go`:
- Around line 1523-1530: The switch uses a `//nolint:exhaustive` suppression;
remove that by replacing the switch with explicit conditionals that express the
same partial routing: check msgType against raftpb.MsgApp || raftpb.MsgAppResp
and return pd.replication, else if msgType == raftpb.MsgSnap return pd.snapshot,
otherwise return pd.other — update the logic around the msgType variable and the
pd.replication/pd.snapshot/pd.other returns so behavior is identical but without
the nolint.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aec7e83e-ff09-488e-9144-ddf64c77d8eb

📥 Commits

Reviewing files that changed from the base of the PR and between c385e3f and 676f1d1.

📒 Files selected for processing (2)

internal/raftengine/etcd/engine.go
internal/raftengine/etcd/engine_test.go

✅ Files skipped from review due to trivial changes (1)

internal/raftengine/etcd/engine_test.go

gemini-code-assist

Code Review

This pull request introduces an opt-in 4-lane dispatcher for Raft messages, allowing heartbeats, replication, snapshots, and other control traffic to operate on independent goroutines. This prevents large snapshot transfers from stalling critical replication or heartbeat messages. The feedback suggests improving the robustness of the message routing logic by explicitly handling all message types in the dispatcher's switch statement to avoid potential misclassification of future additions.

gemini-code-assist · 2026-04-22T09:50:37Z

+	switch msgType { //nolint:exhaustive // only MsgApp/MsgAppResp/MsgSnap need dedicated lanes; the rest falls through to pd.other
+	case raftpb.MsgApp, raftpb.MsgAppResp:
+		return pd.replication
+	case raftpb.MsgSnap:
+		return pd.snapshot
+	default:
+		return pd.other
+	}


The switch statement in selectDispatchLane uses a //nolint:exhaustive comment. To avoid silently dropping or misclassifying entries, which could lead to state inconsistencies, it is better to explicitly handle all raftpb.MessageType entries. This ensures that any future additions are correctly categorized and not just lumped into a default lane, maintaining the integrity of the message dispatching logic.

References

Avoid silently dropping entries during serialization or normalization to ensure they are preserved on disk and avoid state inconsistencies.

Address gemini medium review on PR #577: replace the //nolint:exhaustive on selectDispatchLane with an exhaustive switch that lists every raftpb.MessageType. Future additions to the upstream enum now fail the exhaustive linter instead of being silently routed to pd.other. Behaviour is preserved: MsgApp/MsgAppResp stay on the replication lane, MsgSnap stays on the snapshot lane, and all local-only / feedback messages continue to route to pd.other. MsgSnapStatus is grouped with MsgSnap for readability (it is a local feedback message and is never dispatched to peers in practice).

bootjp · 2026-04-22T11:19:19Z

/gemini review

Addressed the medium review on selectDispatchLane: replaced the //nolint:exhaustive with an exhaustive switch that enumerates every raftpb.MessageType. Future additions to the upstream enum will now fail the exhaustive linter instead of being silently routed to pd.other.

MsgApp / MsgAppResp -> replication lane
MsgSnap / MsgSnapStatus -> snapshot lane (MsgSnapStatus grouped for readability; it is a local-only feedback message)
MsgHeartbeat / MsgHeartbeatResp / MsgVote(Resp) / MsgPreVote(Resp) / MsgReadIndex(Resp) / MsgTimeoutNow -> defensively listed on heartbeat lane (actually short-circuited by isPriorityMsg above)
MsgHup / MsgBeat / MsgProp / MsgUnreachable / MsgCheckQuorum / MsgTransferLeader / MsgStorageAppend(Resp) / MsgStorageApply(Resp) / MsgForgetLeader -> other

Tests: go test -race -count=1 -timeout 300s ./internal/raftengine/etcd/... passes with flag off and with ELASTICKV_RAFT_DISPATCHER_LANES=1. golangci-lint run ./... reports 0 issues.

Commit: dbe21a6

gemini-code-assist

Code Review

This pull request introduces an optional 4-lane dispatcher layout for Raft messages to prevent large snapshot transfers from stalling log replication. By enabling the ELASTICKV_RAFT_DISPATCHER_LANES environment variable, traffic is partitioned into heartbeat, replication, snapshot, and other lanes, each handled by a dedicated goroutine. The changes include updates to the Engine and peerQueues structures, new lane selection logic, and comprehensive unit tests. A review comment identifies unreachable message types in the selectDispatchLane switch statement that should be removed to improve maintainability.

gemini-code-assist · 2026-04-22T11:25:28Z

+	switch msgType {
+	case raftpb.MsgApp, raftpb.MsgAppResp:
+		return pd.replication
+	case raftpb.MsgSnap, raftpb.MsgSnapStatus:
+		return pd.snapshot
+	case raftpb.MsgHeartbeat, raftpb.MsgHeartbeatResp,
+		raftpb.MsgVote, raftpb.MsgVoteResp,
+		raftpb.MsgPreVote, raftpb.MsgPreVoteResp,
+		raftpb.MsgReadIndex, raftpb.MsgReadIndexResp,
+		raftpb.MsgTimeoutNow:
+		// Unreachable: isPriorityMsg already short-circuited these above.
+		// Listed for exhaustiveness; keep them on the heartbeat lane so any
+		// future refactor that removes the short-circuit stays correct.
+		return pd.heartbeat
+	case raftpb.MsgHup, raftpb.MsgBeat, raftpb.MsgProp,
+		raftpb.MsgUnreachable, raftpb.MsgCheckQuorum, raftpb.MsgTransferLeader,
+		raftpb.MsgStorageAppend, raftpb.MsgStorageAppendResp,
+		raftpb.MsgStorageApply, raftpb.MsgStorageApplyResp,
+		raftpb.MsgForgetLeader:
+		return pd.other
+	}


The switch statement in selectDispatchLane includes several message types that are local to the raft node (e.g., MsgHup, MsgBeat, MsgProp, MsgCheckQuorum, MsgSnapStatus, MsgUnreachable). These cases are unreachable because these message types are filtered out by skipDispatchMessage earlier in the call chain. To improve code clarity and maintainability, these unreachable code paths should be removed.

References

Remove dead or unreachable code paths, such as asynchronous handlers for tasks that are always performed synchronously, to improve code clarity and maintainability.

Remove cases for MsgHup, MsgBeat, MsgProp, MsgUnreachable, MsgSnapStatus, MsgCheckQuorum and MsgStorage{Append,Apply}{,Resp} from selectDispatchLane. All of these are unreachable in this code path: - MsgHup/MsgBeat/MsgUnreachable/MsgSnapStatus/MsgCheckQuorum and the MsgStorage* family are local messages per etcd/raft's IsLocalMsg table, and skipDispatchMessage already drops them before enqueueDispatchMessage calls selectDispatchLane. - MsgProp is never emitted outbound because DisableProposalForwarding is set and handleProposal rejects non-leader proposals, so no follower ever forwards a proposal to the leader. - The priority control types (MsgHeartbeat/Resp, votes, read-index, MsgTimeoutNow) are short-circuited by the isPriorityMsg branch above the switch. The switch now lists only the types that can actually reach it (MsgApp/MsgAppResp, MsgSnap, MsgTransferLeader, MsgForgetLeader) plus a fallback default for any future raftpb additions. The exhaustive linter is silenced with a targeted //nolint:exhaustive that references skipDispatchMessage and isPriorityMsg so future reviewers understand why the switch isn't exhaustive.

bootjp · 2026-04-22T12:02:58Z

Verified the claim about selectDispatchLane unreachable cases.

Confirmed unreachable (commit 3076cfb):

MsgHup, MsgBeat, MsgUnreachable, MsgSnapStatus, MsgCheckQuorum and the MsgStorage* family are in upstream etcd/raft's isLocalMsg table (go.etcd.io/raft/v3@v3.6.0/util.go:29-39). skipDispatchMessage calls etcdraft.IsLocalMsg(msg.Type) at engine.go:1459, so these are dropped before enqueueDispatchMessage ever reaches selectDispatchLane at engine.go:1475.
MsgProp is NOT in isLocalMsg (so skipDispatchMessage would let it through), but the engine never emits one outbound:
- DisableProposalForwarding: true at engine.go:535, so followers don't forward proposals to the leader.
- handleProposal at engine.go:1168-1172 rejects with errNotLeader unless State() == StateLeader, and a leader's own rawNode.Propose() doesn't emit outbound MsgProp.
The priority types (MsgHeartbeat/Resp, votes, read-index, MsgTimeoutNow) are short-circuited by isPriorityMsg above the switch.

Action taken: removed the dead cases; the switch now lists only types that can actually reach it (MsgApp/MsgAppResp → replication, MsgSnap → snapshot, MsgTransferLeader/MsgForgetLeader → other) with a default: pd.other fallback for future raftpb additions. Silenced the exhaustive linter with a targeted //nolint:exhaustive that cross-references skipDispatchMessage and isPriorityMsg.

Tests pass in both modes:

go test -race -count=1 ./internal/raftengine/etcd/... (flag off): ok 8.004s
ELASTICKV_RAFT_DISPATCHER_LANES=1 go test -race -count=1 ./internal/raftengine/etcd/...: ok 7.889s
make lint: 0 issues.

/gemini review

gemini-code-assist

Code Review

This pull request introduces an optional 4-lane dispatcher layout for Raft messages, partitioned into heartbeat, replication, snapshot, and other lanes to prevent large snapshot transfers from blocking replication. The feature is opt-in via the ELASTICKV_RAFT_DISPATCHER_LANES environment variable. The implementation includes updated message routing logic, per-lane worker goroutines, and unit tests verifying lane independence. Review feedback suggests enhancing the robustness of the message routing switch statement by explicitly handling additional message types like MsgProp to guard against future library changes.

gemini-code-assist · 2026-04-22T12:08:16Z

+	switch msgType { //nolint:exhaustive // filtered types handled by skipDispatchMessage + isPriorityMsg; see comment above.
+	case raftpb.MsgApp, raftpb.MsgAppResp:
+		return pd.replication
+	case raftpb.MsgSnap:
+		return pd.snapshot
+	case raftpb.MsgTransferLeader, raftpb.MsgForgetLeader:
+		return pd.other
+	}


The selectDispatchLane function uses a switch statement with nolint:exhaustive to route non-priority messages. While the comment explains that many types are filtered out earlier, explicitly handling raftpb.MsgProp (even if unreachable due to current configuration) or providing a more descriptive default case would improve robustness against future changes in the etcd/raft library or engine configuration.

Adds an explicit MsgProp case in selectDispatchLane that panics with a message referencing DisableProposalForwarding. Today the case is unreachable because DisableProposalForwarding=true prevents outbound MsgProp, but the prior default fallback silently routed any future MsgProp to pd.other. Turning that silent misrouting into a loud failure flags the invariant if proposal forwarding is ever re-enabled upstream or in our raft.Config without also updating this switch.

bootjp · 2026-04-22T12:16:23Z

/gemini review

Addressed the latest medium on internal/raftengine/etcd/engine.go:1541.

Added an explicit case raftpb.MsgProp in selectDispatchLane that panics with a message referencing DisableProposalForwarding. This keeps behavior unchanged today (the case is unreachable because DisableProposalForwarding=true blocks outbound MsgProp), but converts the previous silent fallback into pd.other into a loud failure if proposal forwarding is ever re-enabled or etcd/raft adds a new MessageType that reaches this switch. The explicit case also documents the invariant at the call site.

Commit: 7d91662

Verification:

go test -race -count=1 -timeout 300s ./internal/raftengine/etcd/... (lane flag off): ok 8.398s
ELASTICKV_RAFT_DISPATCHER_LANES=1 go test -race -count=1 -timeout 300s ./internal/raftengine/etcd/...: ok 7.932s
make lint: 0 issues

gemini-code-assist

Code Review

This pull request introduces an optional 4-lane dispatcher layout for Raft messages to prevent bulky snapshot transfers from stalling log replication. By opting in via the ELASTICKV_RAFT_DISPATCHER_LANES environment variable, the system partitions traffic into heartbeat, replication, snapshot, and other lanes, each with its own goroutine. The implementation includes updated peer queue structures, dynamic lane selection logic, and comprehensive tests verifying lane independence. Feedback was provided regarding the use of a panic for unexpected message types, suggesting a safer fallback to error logging and a default lane to maintain system availability.

gemini-code-assist · 2026-04-22T12:22:01Z

+	case raftpb.MsgProp:
+		// DisableProposalForwarding=true (see raft.Config construction in this
+		// file) guarantees no outbound MsgProp is ever emitted, so this case
+		// is unreachable today. If it does fire, proposal forwarding was
+		// re-enabled and we must decide the correct lane explicitly rather
+		// than silently lumping proposals into pd.other.
+		panic("selectDispatchLane: outbound MsgProp requires explicit lane once DisableProposalForwarding is disabled")


Panicking in a library or engine component due to an unexpected message type is generally discouraged, as it can crash the entire process. While DisableProposalForwarding is currently set to true, a future configuration change or a bug in the upstream etcd/raft library could trigger this path. It is safer to log a critical error and fall back to a default lane (like pd.other) to maintain system availability.

Suggested change

case raftpb.MsgProp:

// DisableProposalForwarding=true (see raft.Config construction in this

// file) guarantees no outbound MsgProp is ever emitted, so this case

// is unreachable today. If it does fire, proposal forwarding was

// re-enabled and we must decide the correct lane explicitly rather

// than silently lumping proposals into pd.other.

panic("selectDispatchLane: outbound MsgProp requires explicit lane once DisableProposalForwarding is disabled")

case raftpb.MsgProp:

slog.Error("selectDispatchLane: unexpected outbound MsgProp encountered", "node_id", e.nodeID, "to", pd.heartbeat)

return pd.other

Panicking inside a raft engine dispatcher goroutine would crash the entire node. DisableProposalForwarding=true still guarantees outbound MsgProp should not reach selectDispatchLane, but if it does (upstream etcd/raft or our config changed), log loudly and route to pd.other rather than taking the process down. Per gemini review on PR #577.

bootjp · 2026-04-22T13:02:48Z

/gemini review

Addressed medium: replaced panic in selectDispatchLane for unexpected outbound MsgProp with slog.Error + fallback to pd.other. A panic inside a raft engine dispatcher goroutine would crash the whole node; log-and-fallback keeps it up while still surfacing the regression loudly.

Added TestSelectDispatchLane_MsgPropFallsBackToOther to lock in the new behavior.

Commit: 767f40f
Tests: go test -race ./internal/raftengine/etcd/... passes (both with and without ELASTICKV_RAFT_DISPATCHER_LANES=1).
Lint: make lint clean (0 issues).

gemini-code-assist

Code Review

This pull request introduces an optional 4-lane dispatcher for Raft messages to isolate heartbeats, replication, and snapshots, preventing bulky transfers from blocking critical traffic. Controlled via the ELASTICKV_RAFT_DISPATCHER_LANES environment variable, the update includes a new message routing function and comprehensive tests. Review feedback suggests removing the handling logic for MsgProp messages to simplify the dispatcher's logic, as they are currently unreachable.

gemini-code-assist · 2026-04-22T13:13:16Z

+		slog.Error("selectDispatchLane: unexpected outbound MsgProp encountered",
+			slog.Uint64("raft_node_id", e.nodeID),
+			slog.String("type", msgType.String()),
+		)


The code indicates that MsgProp is expected to be unreachable due to DisableProposalForwarding=true. To improve code clarity and maintainability, dead or unreachable code paths should be removed rather than just logged. Consider removing this unreachable logic entirely from the engine's hot path.

References

Remove dead or unreachable code paths, such as asynchronous handlers for tasks that are always performed synchronously, to improve code clarity and maintainability.

bootjp · 2026-04-22T13:17:50Z

/gemini review

Addressed gemini's medium suggestion in cdad492: removed the unreachable case raftpb.MsgProp block from selectDispatchLane and now rely on the post-switch return pd.other fallback. Added a one-line comment above the switch referencing DisableProposalForwarding so the omission stays self-documenting. The existing test was repurposed as TestSelectDispatchLane_MsgPropReachesDefaultFallback to pin the "MsgProp hits the default fallback" behaviour.

go test -race -count=1 -timeout 300s ./internal/raftengine/etcd/... passes with the flag off and with ELASTICKV_RAFT_DISPATCHER_LANES=1.
make lint clean.

gemini-code-assist

Code Review

This pull request introduces an optional 4-lane dispatcher for Raft messages, designed to prevent bulky snapshot transfers from stalling log replication. By partitioning traffic into heartbeat, replication, snapshot, and other lanes, the engine can maintain better throughput and stability under load. The feature is opt-in via the ELASTICKV_RAFT_DISPATCHER_LANES environment variable. Feedback was provided regarding a potential compilation error due to the use of a message type that may not exist in the standard Raft package.

I am having trouble creating individual review comments. Click here to see my feedback.

internal/raftengine/etcd/engine.go (1532-1539)

The switch statement for message routing in 4-lane mode uses raftpb.MsgForgetLeader. However, this message type is not part of the standard go.etcd.io/raft/v3/raftpb package. If this is a custom extension or from a specific fork, please ensure it is correctly defined in the imported package to avoid compilation errors. If it was intended to be a different standard type, please correct it.

bootjp · 2026-04-22T13:31:40Z

raftpb.MsgForgetLeader is in fact part of go.etcd.io/raft/v3/raftpb. Verified against the module already in go.sum:

$ go doc -all go.etcd.io/raft/v3/raftpb | grep -E "^\s+Msg[A-Z]"
MsgHup               MessageType = 0
MsgBeat              MessageType = 1
MsgProp              MessageType = 2
MsgApp               MessageType = 3
MsgAppResp           MessageType = 4
MsgVote              MessageType = 5
MsgVoteResp          MessageType = 6
MsgSnap              MessageType = 7
MsgHeartbeat         MessageType = 8
MsgHeartbeatResp     MessageType = 9
MsgUnreachable       MessageType = 10
MsgSnapStatus        MessageType = 11
MsgCheckQuorum       MessageType = 12
MsgTransferLeader    MessageType = 13
MsgTimeoutNow        MessageType = 14
MsgReadIndex         MessageType = 15
MsgReadIndexResp     MessageType = 16
MsgPreVote           MessageType = 17
MsgPreVoteResp       MessageType = 18
MsgStorageAppend     MessageType = 19
MsgStorageAppendResp MessageType = 20
MsgStorageApply      MessageType = 21
MsgStorageApplyResp  MessageType = 22
MsgForgetLeader      MessageType = 23

It was added to upstream etcd/raft for the leadership-transfer protocol; the branch builds cleanly and go test ./internal/raftengine/etcd/... passes with and without ELASTICKV_RAFT_DISPATCHER_LANES=1. No action needed on this item.

## Summary Adds an `EXTRA_ENV` pass-through to `scripts/rolling-update.sh` so operators can set container environment variables from `deploy.env` without editing this script each time. ## Motivation Immediate trigger: enabling `ELASTICKV_RAFT_DISPATCHER_LANES=1` (feature flag added in PR #577). We want to flip it on via the deploy envelope, not by rebuilding the image or hand-editing this script. Post-#575 raft metrics are already healthy, so the 4-lane dispatcher is being enabled as a defensive measure — it keeps heartbeats from being starved by MsgApp / MsgSnap bursts under extreme write load, rather than as a fix for anything actively broken. ## Change - `run_container`: if `EXTRA_ENV` is set, split on whitespace and forward each pair as a single `docker run -e KEY=VALUE` flag. - Comment documents the whitespace-split semantics (pairs must not contain whitespace; values may contain characters bash would otherwise interpret). ## Usage ```bash # deploy.env EXTRA_ENV="ELASTICKV_RAFT_DISPATCHER_LANES=1 ELASTICKV_PEBBLE_CACHE_MB=512" ``` ## Test plan - [x] `bash -n scripts/rolling-update.sh` passes - [x] `shellcheck` / `make lint` — 0 issues - [ ] Rolling deploy with EXTRA_ENV set; verify via `docker inspect elastickv --format '{{.Config.Env}}'` on each node  ## Summary by CodeRabbit * **New Features** * Added the ability to pass arbitrary additional environment variables to containers via a new input; these are injected into container runtime. * **Improvements** * Extra env entries are validated and safely normalized/escaped for remote transport. * Core runtime variables are now sent in escaped/quoted form to the remote execution context for safer handling.

…592) ## Root cause Every committed raft entry triggered a `pebble.Sync` inside `store.ApplyMutations` / `store.DeletePrefixAt` (see `store/lsm_store.go:1056, 1108`), on top of the raft WAL fsync that `etcd/raft` already performs per `Ready` batch in `persistReadyToWAL` (`internal/raftengine/etcd/wal_store.go:376`). The raft Ready loop in `drainReady` (`internal/raftengine/etcd/engine.go:1389`) already batches multiple entries per `Ready`, so the raft WAL fsync is fine. The hot fsync is the FSM-side `b.Commit(pebble.Sync)` that the apply loop (`applyCommitted`) hits once per entry. A prior cleanup (`docs/review_todo.md` section 3.4) intentionally kept `ApplyMutations` on `pebble.Sync`; this CL makes that choice tunable. Microbenchmark (Apple M1 Max, APFS tempdir, `BenchmarkApplyMutations_SyncMode`): | mode | ns/op | allocs/op | |--------|---------:|----------:| | sync | 16292899 | 7 | | nosync | 16293 | 8 | ~1000x on this platform. Real hardware fsync latency varies, but the sync/nosync ratio is consistently large on any WAL that enforces platform durability. ## Durability argument Pebble's FSM-commit fsync is redundant with the raft WAL under this codebase's crash-recovery model: 1. Raft WAL (`etcd/raft`) fsyncs every committed entry via `persist.Save` before `Advance`. 2. On restart, `newMemoryStorage` (`internal/raftengine/etcd/persistence.go:352`) reloads the snapshot + all WAL entries. `newRawNode` does not set `Config.Applied`, so `etcdraft` defaults it to `snapshot.Metadata.Index`. 3. The engine sets `e.applied = maxAppliedIndex(LocalSnap)` and every committed entry past the snapshot is re-delivered through `CommittedEntries` on the first `Ready`. 4. `kv/fsm.applyCommitWithIdempotencyFallback` treats an already-committed key (`LatestCommitTS >= commitTS`) as an idempotent retry, so replaying an entry whose effect survived the crash is safe. 5. FSM snapshots are fsynced (`writeFSMSnapshotFile` then `f.Sync()` in `fsm_snapshot_file.go`). Therefore a crash that loses the unfsynced tail of Pebble's own WAL is recoverable: raft replays from the last fsynced FSM snapshot onwards, and the idempotent apply path re-materialises the lost state. Pebble on the FSM commit path effectively becomes a volatile cache of applied state whose durability boundary is the raft WAL. Other `pebble.Sync` call sites (snapshot-batch commit, metadata-restore writes, compaction `commitSnapshotBatch`) are untouched: those are orthogonal durability boundaries (e.g. restore-directory swap) and are not per-proposal cost. ## Env var + default * `ELASTICKV_FSM_SYNC_MODE=sync` (default) - current behaviour. * `ELASTICKV_FSM_SYNC_MODE=nosync` - `b.Commit(pebble.NoSync)` on the FSM hot path. Raft WAL remains the durability boundary. Unknown values fall back to `sync` (fail-safe toward durability). Parsing is case-insensitive and whitespace-tolerant. A Prometheus gauge `elastickv_fsm_apply_sync_mode{mode="sync"|"nosync"}` is set at `NewRegistry` time via `store.FSMApplySyncModeLabel()`, so dashboards can alert if a rolling deploy accidentally flips the durability posture. ## Test plan - [x] `go test ./store/... ./monitoring/... ./kv/... ./internal/raftengine/... -count=1` - [x] Env var parsing: sync/nosync/mixed-case/whitespace/unknown then sync default - [x] Functional equivalence of sync vs nosync on a Pebble store - [x] Clean-shutdown reopen visibility (NoSync + Close + reopen preserves writes) - [x] Prometheus gauge mutual exclusivity across successive SetFSMApplySyncMode calls - [ ] (Follow-up) Jepsen-style OS-level crash test for unfsynced-tail recovery - tracked in `JEPSEN_TODO.md` ## Benchmark ``` go test ./store -run='^$' -bench='BenchmarkApplyMutations_SyncMode' -benchtime=2s -benchmem BenchmarkApplyMutations_SyncMode/sync-10 141 16292899 ns/op 368 B/op 7 allocs/op BenchmarkApplyMutations_SyncMode/nosync-10 129262 16293 ns/op 284 B/op 8 allocs/op ``` ## Related - Previous lever documented in `docs/review_todo.md` section 3.4 (ApplyMutations retained `pebble.Sync`). - Pebble block cache default 256 MiB (#588) - WAL retention / purge (#589) - Lease read via AppliedIndex (#575) - Raft dispatcher lanes (#577)  ## Summary by CodeRabbit * **New Features** * Configuration option for FSM commit durability mode with adjustable sync and no-sync settings * New Prometheus metric exposing the current durability mode on the `/metrics` endpoint * **Tests** * Unit tests validating FSM durability mode functionality * Integration tests ensuring functional correctness across different durability modes * Performance benchmarks evaluating FSM commit operation characteristics

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

bootjp requested a review from Copilot April 22, 2026 07:03

Copilot started reviewing on behalf of bootjp April 22, 2026 07:03 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

test(raft): cover PreVote/VoteResp/ReadIndexResp in dispatch lane tests

c385e3f

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

bootjp requested a review from Copilot April 22, 2026 09:17

Copilot started reviewing on behalf of bootjp April 22, 2026 09:18 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

fix(raft): exhaustive switch + grammar nit

676f1d1

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

refactor(raft): drop unreachable MsgProp case, rely on default fallback

cdad492

bootjp enabled auto-merge April 22, 2026 13:18

Merge branch 'main' into perf/raft-dispatcher-lanes

1494ea7

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

bootjp merged commit e9a2bb9 into main Apr 22, 2026
8 of 9 checks passed

bootjp deleted the perf/raft-dispatcher-lanes branch April 22, 2026 13:41

bootjp mentioned this pull request Apr 23, 2026

feat(rolling-update): forward EXTRA_ENV as docker -e flags #591

Merged

3 tasks

bootjp mentioned this pull request Apr 23, 2026

perf(store): add ELASTICKV_FSM_SYNC_MODE for FSM apply fsync opt-out #592

Merged

6 tasks

coderabbitai Bot mentioned this pull request Apr 23, 2026

perf(raft): raise MaxInflightMsgs=1024, MaxSizePerMsg=4MB defaults #593

Open

5 tasks

		@@ -65,11 +67,26 @@ const (
		// upside is that a ~5 s transient pause (election-timeout scale)
		// no longer drops heartbeats and force the peers' lease to expire.

	// no longer drops heartbeats and force the peers' lease to expire.
	// no longer drops heartbeats and forces the peers' lease to expire.

Conversation

bootjp commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Why behind a flag

Rollout plan

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

bootjp commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading