feat(lease-read): engine-driven lease anchor via LastQuorumAck by bootjp · Pull Request #561 · bootjp/elastickv

bootjp · 2026-04-20T15:45:31Z

Summary

Fixes the root cause of the production lease-read regression: under step-queue congestion, LinearizableRead took ~1s, exceeding the 700ms LeaseDuration. The caller-side lease sampled time.Now() BEFORE the slow read and used it as the extend base, so every extend landed ~300ms in the past and the next read always missed the cache. The lease never warmed up; the optimisation was effectively dead under load.

Switch the authority from caller-driven to engine-driven.

raftengine.LeaseProvider: add LastQuorumAck() time.Time.
etcd engine: quorumAckTracker records per-peer last-response time from MsgAppResp / MsgHeartbeatResp in handleStep. LastQuorumAck() returns the oldest ack among the top-followerQuorum peers, which is the wall-clock instant by which majority liveness was confirmed. Reset on leader-loss transitions so a re-election cannot surface a stale anchor. Single-node clusters short-circuit to time.Now().
kv: Coordinate.LeaseRead / groupLeaseRead consult LastQuorumAck first, fall back to the existing caller-side lease (preserved as a secondary fast path during rollout), and finally to LinearizableRead. The engine refreshes its anchor on every heartbeat independent of read latency, so a slow LinearizableRead no longer leaves the cache cold.

Safety

We record time.Now() when the leader observes the follower response, which is an upper bound on the follower's true ack time. leaseSafetyMargin (300ms) covers the resulting overshoot (one-way delay + scheduling slop + clock drift), keeping lease_expiry strictly inside follower_ack_time + electionTimeout.

Test plan

TestQuorumAckTracker_* covers single-node / majority threshold / oldest-of-top-N semantics / reset / concurrent access
go test -race ./kv/... ./internal/raftengine/... passes
Existing lease tests continue to work because the secondary (caller-side) path remains available
Production: deploy and verify GET on a write-idle key stays fast across >1s slow reads (key signal: linearizable_read_calls rate drops while GET p99 stays low)

Summary by CodeRabbit

Bug Fixes
- Enhanced lease read consistency by tracking quorum acknowledgment freshness, replacing previous time-and-generation-based validation for improved reliability during high concurrency.

The caller-side lease (kv.leaseState + refreshLeaseAfterDispatch) sampled time.Now() BEFORE each slow-path LinearizableRead and used it as the extend base. When production LinearizableRead latency grew to ~1s (step-queue congestion) while LeaseDuration is 700ms, every extend landed ~300ms in the past -- the next read always missed the cache and the lease never warmed up, defeating the entire lease-read optimisation. Switch the authority to the engine: - internal/raftengine: add LeaseProvider.LastQuorumAck() returning the wall-clock instant at which a majority of followers most recently responded to the leader. - internal/raftengine/etcd: implement quorumAckTracker and hook it from handleStep's follower-response path (MsgAppResp / MsgHeartbeatResp). Reset the tracker on leader-loss transitions in refreshStatus so a future re-election cannot surface a stale instant. Single-node clusters short-circuit to time.Now(). - kv: Coordinate.LeaseRead and ShardedCoordinator.groupLeaseRead consult LastQuorumAck FIRST, falling back to the existing caller-side lease and finally to LinearizableRead. The engine's lease is refreshed on every heartbeat independent of read latency, so a slow LinearizableRead no longer leaves the cache cold. Safety: we record time.Now() when the leader observes the follower response, which is an upper bound on the follower's true ack time. leaseSafetyMargin (300ms) covers the resulting overshoot (one-way delay + scheduling slop + clock drift), keeping lease_expiry strictly inside ack_time + electionTimeout. Caller-side lease machinery (leaseState, leaseRefreshingTxn, RegisterLeaderLossCallback) is retained for test affordance and as a secondary fast-path during rollout, but is no longer load-bearing.

bootjp · 2026-04-20T15:45:38Z

/gemini review

coderabbitai · 2026-04-20T15:45:52Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 55 minutes and 39 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 55 minutes and 39 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bafe9612-b611-450e-a020-7879fab2c46a

📥 Commits

Reviewing files that changed from the base of the PR and between aadf7ff and 40a923e.

📒 Files selected for processing (7)

internal/raftengine/engine.go
internal/raftengine/etcd/engine.go
internal/raftengine/etcd/quorum_ack.go
internal/raftengine/etcd/quorum_ack_test.go
kv/coordinator.go
kv/lease_read_test.go
kv/sharded_coordinator.go

📝 Walkthrough

Walkthrough

The PR extends the LeaseProvider interface with a LastQuorumAck() method that exposes the engine's most recent observation of follower majority liveness. A new concurrency-safe quorumAckTracker component records follower acknowledgments, and the etcd engine integrates it to compute and publish the majority-ack instant. The lease-read fast-path in coordinators is refactored to use this quorum acknowledgment freshness instead of time+generation-based validity.

Changes

Cohort / File(s)	Summary
LeaseProvider Interface `internal/raftengine/engine.go`	Added `LastQuorumAck() time.Time` method to the `LeaseProvider` interface, exposing the engine's last quorum acknowledgment instant.
Etcd Engine Quorum Ack Integration `internal/raftengine/etcd/engine.go`	Introduced `ackTracker` field, implemented `LastQuorumAck()` method with nil-guard and single-node shortcut, and integrated `recordQuorumAck()` call in `handleStep()` to track follower responses; resets tracker on leader-to-non-leader transitions.
Quorum Ack Tracker Component `internal/raftengine/etcd/quorum_ack.go`	New unexported component `quorumAckTracker` with concurrency-safe per-peer ack timestamp tracking, majority boundary computation, and lock-free reads via atomic int64; provides `recordAck()`, `reset()`, and `load()` methods.
Quorum Ack Tracker Tests `internal/raftengine/etcd/quorum_ack_test.go`	Added five unit tests validating quorum ack initialization, threshold publication, monotonicity, reset behavior, and concurrent correctness.
Lease-Read Fast-Path Logic `kv/coordinator.go`, `kv/sharded_coordinator.go`	Refactored fast-path from time+generation-based lease validity to engine-driven check using `State() == StateLeader` and `LastQuorumAck()` freshness; simplified error handling and updated documentation.
Test Double Implementation `kv/lease_read_test.go`	Updated `fakeLeaseEngine` test double to track and expose `LastQuorumAck()` via `atomic.Int64` field.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Coordinator
    participant Engine
    participant AckTracker
    participant Followers

    Client->>Coordinator: LeaseRead(ctx)
    Coordinator->>Engine: State()
    Engine-->>Coordinator: StateLeader
    
    Coordinator->>Engine: LastQuorumAck()
    Engine->>AckTracker: load()
    AckTracker-->>Engine: lastQuorumAckTime
    Engine-->>Coordinator: lastQuorumAckTime
    
    alt Quorum Ack Fresh (within lease duration)
        Coordinator-->>Client: Fast-path success (appliedIndex)
    else Stale or Zero
        Coordinator->>Coordinator: LinearizableRead(ctx)
        Coordinator->>Engine: extend lease
        Coordinator-->>Client: Success
    end
    
    Note over Followers,AckTracker: Meanwhile: Leader receives follower responses
    Followers->>Engine: MsgAppResp / MsgHeartbeatResp
    Engine->>AckTracker: recordAck(peerID)
    AckTracker->>AckTracker: Update peer timestamp & recompute majority boundary
    AckTracker->>AckTracker: Publish quorumAckUnixNano atomically

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

feat(lease-read): leader-local lease read for coordinator and engine #549: Introduced LeaseProvider interface with core lease-read methods (State(), LeaseDuration(), AppliedIndex(), RegisterLeaderLossCallback()); this PR extends that interface with LastQuorumAck() to enable engine-driven quorum-acknowledgment-based lease-read decisions.

Poem

🐰 A tracker hops through every ack,
Recording when the followers bounce back,
The quorum speaks its sweetest time,
And leases read without the climb!
Fast paths bloom where acks align—
Efficiency's now by design! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: introducing an engine-driven lease anchor mechanism via the LastQuorumAck method for improving lease-read performance.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/engine-driven-lease

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements an engine-driven lease mechanism to improve the performance and reliability of lease-based reads, particularly under network or scheduling congestion. By tracking the majority-ack instant directly within the Raft engine using follower heartbeat and append responses, the system can serve leader-local reads more efficiently. The review feedback highlights several critical improvement opportunities: ensuring time monotonicity by avoiding the conversion of timestamps to Unix nanoseconds, optimizing the ack-tracking logic to reduce memory allocations and lock contention on high-frequency paths, and properly handling cluster membership changes to prevent stale peer data from corrupting the quorum calculation.

gemini-code-assist · 2026-04-20T15:50:49Z

+	if followerQuorum <= 0 {
+		return
+	}
+	now := time.Now().UnixNano()


Using time.Now().UnixNano() and then reconstructing the time via time.Unix(0, ns) strips the monotonic clock reading from the time.Time object. This makes subsequent time.Since(ack) calls in the lease-read path susceptible to wall-clock jumps (e.g., NTP adjustments). If the system clock is adjusted backwards, the lease could be held for longer than the safety window, which is unsafe.

Consider storing the time.Time object directly using an atomic.Pointer[time.Time] or atomic.Value to preserve monotonicity.

References

For frequently accessed fields on hot paths, prefer atomic operations over mutexes to improve performance and maintain consistency.

gemini-code-assist · 2026-04-20T15:50:49Z

+	acks := make([]int64, 0, len(t.peerAcks))
+	for _, a := range t.peerAcks {
+		acks = append(acks, a)
+	}
+	// Sort descending so acks[0] is the most recent. The followerQuorum-th
+	// entry (1-indexed) is the oldest ack among the top quorum -- i.e. the
+	// boundary instant by which majority liveness was confirmed.
+	sort.Slice(acks, func(i, j int) bool { return acks[i] > acks[j] })
+	t.quorumAckUnixNano.Store(acks[followerQuorum-1])


Allocating a new slice and sorting it on every heartbeat or append response can be inefficient, especially in clusters with many peers. Since followerQuorum is typically very small (e.g., 1 or 2), you can find the Nth-most-recent timestamp in $O(N)$ time without a full sort and without allocating a slice if you use a small fixed-size array on the stack for common cluster sizes.

gemini-code-assist · 2026-04-20T15:50:49Z

+// See docs/lease_read_design.md for the full argument.
+type quorumAckTracker struct {
+	mu       sync.Mutex
+	peerAcks map[uint64]int64 // peer ID → last ack unix nano observed on leader


The peerAcks map is never pruned of peers that have been removed from the cluster. While reset() clears it on leader loss, a long-lived leader in a cluster with frequent membership changes will accumulate stale entries. More importantly, if the cluster size shrinks, an old timestamp from a removed peer might still be among the top followerQuorum acks, potentially inflating the majority-ack instant incorrectly.

gemini-code-assist · 2026-04-20T15:50:49Z

+	e.mu.RLock()
+	clusterSize := len(e.peers)
+	state := e.status.State
+	e.mu.RUnlock()


Acquiring e.mu.RLock() on the hot lease-read path to check clusterSize and state introduces unnecessary lock contention. Since these fields are updated infrequently compared to read volume, consider storing them in atomic variables (e.g., atomic.Int32) to allow truly lock-free peeking in LastQuorumAck.

References

For frequently accessed fields that require monotonic updates (like a configuration index), prefer atomic operations (e.g., CAS) over mutexes to improve performance on hot paths.

gemini-code-assist · 2026-04-20T15:50:49Z

+	state := c.engine.State()
+	if state == raftengine.StateLeader {
+		if ack := lp.LastQuorumAck(); !ack.IsZero() && time.Since(ack) < leaseDur {
+			return lp.AppliedIndex(), nil
+		}
+	}


There is redundant state checking and lock acquisition here. c.engine.State() is called at line 376, and then lp.LastQuorumAck() is called at line 378, which internally acquires the engine's RLock again to check the same state. This doubles the synchronization overhead on the fast path. Consider refactoring LastQuorumAck to return both the timestamp and the leader status, or pass the already-acquired state into it to ensure leadership is verified correctly relative to the lock.

References

To prevent race conditions in read-modify-write operations that require leadership, verify leadership status after acquiring the lock that protects the operation.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

kv/lease_read_test.go (1)
24-24: Add coverage for the new engine-driven fast path.

fakeLeaseEngine.LastQuorumAck() is now wired via lastQuorumAckUnixNano, but none of the TestCoordinate_LeaseRead_* cases in this file actually exercise the new primary path (state==Leader && !ack.IsZero() && time.Since(ack) < leaseDur). Every current test still reaches the fast path via c.lease.extend(...) (i.e. the secondary/caller-side lease), so the main behavioral change of this PR — that a fresh engine ack alone returns AppliedIndex without touching LinearizableRead or c.lease — is untested at the Coordinate boundary. A regression that broke the engine-driven branch would not be caught here.

Suggested additions (table-driven friendly):

Fresh ack, cold caller-side lease → fast path, 0 LinearizableRead calls, c.lease.valid(now) still false.

Stale ack (time.Since(ack) >= leaseDur), cold caller-side lease → slow path.

Fresh ack but state != Leader → slow path (and LinearizableRead invoked).

Same expectations replicated for groupLeaseRead in sharded_coordinator_test.go.

Example helper on fakeLeaseEngine:
func (e *fakeLeaseEngine) setQuorumAck(t time.Time) {
    if t.IsZero() {
        e.lastQuorumAckUnixNano.Store(0)
        return
    }
    e.lastQuorumAckUnixNano.Store(t.UnixNano())
}
As per coding guidelines: "Add test coverage for TTL/HLC behaviors when touching clocks, OCC, or replication logic in store/ and kv/ packages."

Also applies to: 68-74
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kv/lease_read_test.go` at line 24, Tests in kv/lease_read_test.go don't cover
the new engine-driven fast path implemented via fakeLeaseEngine.LastQuorumAck()
(backed by lastQuorumAckUnixNano), so add table-driven cases to the existing
TestCoordinate_LeaseRead_* that exercise: (1) fresh engine ack with a cold
caller-side lease to ensure the primary fast path returns immediately, does not
call Coordinate.LinearizableRead, and leaves c.lease.valid(now) false; (2) stale
ack with cold caller-side lease to ensure the slow path is taken; and (3) fresh
ack but state != Leader to ensure the slow path and LinearizableRead are
invoked; use a helper on fakeLeaseEngine (e.g. setQuorumAck) to set
lastQuorumAckUnixNano for each case; repeat equivalent tests for groupLeaseRead
in sharded_coordinator_test.go to cover both branches.
internal/raftengine/etcd/quorum_ack_test.go (1)
39-62: Test name claims more than it verifies.

TestQuorumAckTracker_QuorumAckIsOldestOfTopN only asserts monotonicity (!third.Before(second)), not that load() equals the boundary entry (the followerQuorum-th most recent ack). As written, an implementation bug that published the most recent ack instead of the oldest of the top N would still pass this test.

Consider asserting the exact value by capturing the individual ack times (e.g. record timestamps t2, t3, t4 via time.Now() around each recordAck, then require.Equal(t, t3.UnixNano(), tr.load().UnixNano()) since t3 is the older of the top two after peer 4's ack lands).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/raftengine/etcd/quorum_ack_test.go` around lines 39 - 62, The test
TestQuorumAckTracker_QuorumAckIsOldestOfTopN only checks monotonicity; fix it by
capturing exact timestamps for each ack and asserting the quorum instant equals
the boundary follower ack (use quorumAckTracker.recordAck to record after
calling time.Now() into variables like t2, t3, t4 and then compare tr.load() to
the expected boundary timestamp—e.g. require.Equal on UnixNano between tr.load()
and t3 for the case where t3 is the older of the top N); reference
quorumAckTracker, recordAck, load, and followerQuorum when making the assertion
so the test verifies the published instant is the followerQuorum-th most recent
ack rather than merely non‑regressing.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/raftengine/etcd/quorum_ack.go`:
- Around line 42-82: The tracker retains removed peers in peerAcks causing stale
acks and unbounded growth; add a method quorumAckTracker.forgetPeer(peerID
uint64) that locks mu, deletes peerID from peerAcks, and (optionally) recomputes
quorumAckUnixNano or leaves recompute to the next recordAck; then call this new
forgetPeer from Engine.removePeer alongside the existing teardown so removed
peers no longer contribute to quorum computation and the map does not leak
entries; ensure symbols referenced are quorumAckTracker, peerAcks,
quorumAckUnixNano, recordAck, forgetPeer, reset, and Engine.removePeer.

---

Nitpick comments:
In `@internal/raftengine/etcd/quorum_ack_test.go`:
- Around line 39-62: The test TestQuorumAckTracker_QuorumAckIsOldestOfTopN only
checks monotonicity; fix it by capturing exact timestamps for each ack and
asserting the quorum instant equals the boundary follower ack (use
quorumAckTracker.recordAck to record after calling time.Now() into variables
like t2, t3, t4 and then compare tr.load() to the expected boundary
timestamp—e.g. require.Equal on UnixNano between tr.load() and t3 for the case
where t3 is the older of the top N); reference quorumAckTracker, recordAck,
load, and followerQuorum when making the assertion so the test verifies the
published instant is the followerQuorum-th most recent ack rather than merely
non‑regressing.

In `@kv/lease_read_test.go`:
- Line 24: Tests in kv/lease_read_test.go don't cover the new engine-driven fast
path implemented via fakeLeaseEngine.LastQuorumAck() (backed by
lastQuorumAckUnixNano), so add table-driven cases to the existing
TestCoordinate_LeaseRead_* that exercise: (1) fresh engine ack with a cold
caller-side lease to ensure the primary fast path returns immediately, does not
call Coordinate.LinearizableRead, and leaves c.lease.valid(now) false; (2) stale
ack with cold caller-side lease to ensure the slow path is taken; and (3) fresh
ack but state != Leader to ensure the slow path and LinearizableRead are
invoked; use a helper on fakeLeaseEngine (e.g. setQuorumAck) to set
lastQuorumAckUnixNano for each case; repeat equivalent tests for groupLeaseRead
in sharded_coordinator_test.go to cover both branches.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cff706b6-3bee-4ea5-b080-6144b39cc565

📥 Commits

Reviewing files that changed from the base of the PR and between a45ca29 and aadf7ff.

📒 Files selected for processing (7)

internal/raftengine/engine.go
internal/raftengine/etcd/engine.go
internal/raftengine/etcd/quorum_ack.go
internal/raftengine/etcd/quorum_ack_test.go
kv/coordinator.go
kv/lease_read_test.go
kv/sharded_coordinator.go

gemini-code-assist

Code Review

This pull request introduces an engine-driven lease mechanism to improve the performance and reliability of lease-based reads, particularly under congestion. It adds a LastQuorumAck method to the LeaseProvider interface and implements a quorumAckTracker to maintain the wall-clock time of the most recent majority acknowledgment. Feedback focuses on optimizing the hot path by reducing memory allocations and redundant locking through the use of stack-allocated buffers and atomic operations. Additionally, a concern was raised regarding the handling of peer removals in the acknowledgment tracker to prevent potential lease safety violations.

gemini-code-assist · 2026-04-20T15:53:52Z

+	acks := make([]int64, 0, len(t.peerAcks))
+	for _, a := range t.peerAcks {
+		acks = append(acks, a)
+	}
+	// Sort descending so acks[0] is the most recent. The followerQuorum-th
+	// entry (1-indexed) is the oldest ack among the top quorum -- i.e. the
+	// boundary instant by which majority liveness was confirmed.
+	sort.Slice(acks, func(i, j int) bool { return acks[i] > acks[j] })
+	t.quorumAckUnixNano.Store(acks[followerQuorum-1])


This block executes inside the main Raft event loop for every MsgAppResp and MsgHeartbeatResp. Allocating a new slice and performing a reflective sort (sort.Slice) on every response can introduce significant overhead and GC pressure on high-throughput clusters. Since Raft clusters are typically small (3-7 nodes), we can use a stack-allocated buffer to avoid allocations in the common case.

var buf [16]int64 acks := buf[:0] if len(t.peerAcks) > len(buf) { acks = make([]int64, 0, len(t.peerAcks)) } for _, a := range t.peerAcks { acks = append(acks, a) } // Sort descending so acks[0] is the most recent. The followerQuorum-th // entry (1-indexed) is the oldest ack among the top quorum. sort.Slice(acks, func(i, j int) bool { return acks[i] > acks[j] }) t.quorumAckUnixNano.Store(acks[followerQuorum-1])

gemini-code-assist · 2026-04-20T15:53:53Z

+// the strictly-safe boundary by at most the one-way delay + scheduling
+// slop -- which is exactly what leaseSafetyMargin is sized to cover.
+// See docs/lease_read_design.md for the full argument.
+type quorumAckTracker struct {


The quorumAckTracker currently lacks a mechanism to handle peer removal. If a peer is removed from the cluster, its last recorded acknowledgment remains in the peerAcks map indefinitely. This could allow a leader to incorrectly satisfy the quorum requirement using a "ghost" acknowledgment from a non-existent peer, potentially violating lease safety after a configuration change. Please add a removePeer(peerID uint64) method to prune the map and ensure it is called from Engine.removePeer.

References

To maintain consistency between the application's peer list and the Raft ConfState, ensure the peer list remains complete and consistent across configuration changes.

gemini-code-assist · 2026-04-20T15:53:53Z

+	e.mu.RLock()
+	clusterSize := len(e.peers)
+	state := e.status.State
+	e.mu.RUnlock()


Acquiring e.mu.RLock() here introduces contention on the hot lease-read path. Since the caller (Coordinate.LeaseRead) likely already checked the engine state, this lock is often redundant. To achieve a truly lock-free fast path, consider mirroring the cluster size and Raft state in atomic variables (similar to how appliedIndex is handled).

References

For frequently accessed fields that require monotonic updates (like a configuration index), prefer atomic operations (e.g., CAS) over mutexes to improve performance on hot paths.

gemini-code-assist · 2026-04-20T15:53:53Z

+	state := c.engine.State()
+	if state == raftengine.StateLeader {
+		if ack := lp.LastQuorumAck(); !ack.IsZero() && time.Since(ack) < leaseDur {
+			return lp.AppliedIndex(), nil
+		}
+	}


This logic results in redundant locking and state checks. c.engine.State() acquires a read lock, and then lp.LastQuorumAck() acquires the same lock again. Since LastQuorumAck is only useful when the node is the leader, consider refactoring this to minimize lock acquisitions on the fast path, perhaps by making LastQuorumAck lock-free using atomic operations.

References

For frequently accessed fields that require monotonic updates (like a configuration index), prefer atomic operations (e.g., CAS) over mutexes to improve performance on hot paths.

gemini-code-assist · 2026-04-20T15:53:53Z

+	state := engine.State()
+	if state == raftengine.StateLeader {
+		if ack := lp.LastQuorumAck(); !ack.IsZero() && time.Since(ack) < leaseDur {
+			return lp.AppliedIndex(), nil
+		}
+	}


Similar to Coordinate.LeaseRead, this implementation performs redundant state checks and locking. Both engine.State() and lp.LastQuorumAck() acquire the engine's RWMutex. Optimizing LastQuorumAck to be lock-free would improve performance on this hot path, consistent with repository rules regarding atomic operations.

References

For frequently accessed fields that require monotonic updates (like a configuration index), prefer atomic operations (e.g., CAS) over mutexes to improve performance on hot paths.

Addresses review feedback on PR #561. Safety fix (gemini HIGH): quorumAckTracker now exposes removePeer, called from Engine.removePeer so a shrinking-then-growing cluster cannot keep a stale ack from a removed peer falsely satisfying the majority threshold. recomputeLocked factored out of recordAck to share the sort-and-publish path. Hot-path fix (gemini MEDIUM x3): LastQuorumAck is now fully lock-free. Multi-node path reads ackTracker.quorumAckUnixNano atomically as before; single-node leader path reads a new singleNodeLeaderAckUnixNano atomic, populated from refreshStatus every tick while leader of a 1-node cluster and cleared otherwise. The previous implementation held e.mu.RLock to check peer count and state, doubling the lock traffic with the caller's engine.State(). Coverage (coderabbit nitpick): - TestCoordinate_LeaseRead_EngineAckFastPath: fresh ack + cold caller-side lease -> fast path, 0 LinearizableRead. - TestCoordinate_LeaseRead_EngineAckStaleFallsThrough: stale ack -> slow path. - TestCoordinate_LeaseRead_EngineAckIgnoredWhenNotLeader: fresh ack with non-leader state -> slow path. - TestQuorumAckTracker_RemovedPeerCannotSatisfyQuorum: removed peer ack no longer counts toward the new cluster's quorum. - TestQuorumAckTracker_RemovePeerZeroQuorumKeepsCurrent: followerQuorum=0 preserves the published instant so the next recordAck refreshes it.

bootjp · 2026-04-20T16:24:43Z

/gemini review

gemini-code-assist · 2026-04-20T16:24:46Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Pull request overview

This PR fixes a lease-read regression by shifting the lease “anchor time” authority from the caller (pre-read time.Now() sampling) to the Raft engine, using a majority-ack observation (LastQuorumAck) derived from follower responses to keep lease reads warm even when LinearizableRead is slow.

Changes:

Extend raftengine.LeaseProvider with LastQuorumAck() time.Time and update KV lease-read logic to prefer this engine-driven anchor before falling back to the existing caller-side lease and then LinearizableRead.
Implement quorum-ack tracking in the etcd Raft engine (per-peer ack timestamps, majority “oldest-of-top-N” computation), including reset on leader loss and peer removal handling.
Add unit tests for quorum-ack tracking and KV coordinator behavior using the new fast path.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`internal/raftengine/engine.go`	Adds `LastQuorumAck()` to `LeaseProvider` contract and documents required safety checks.
`internal/raftengine/etcd/engine.go`	Implements `LastQuorumAck()` plus leader-side ack tracking, reset on leader loss, and peer-removal pruning.
`internal/raftengine/etcd/quorum_ack.go`	Introduces `quorumAckTracker` to compute and publish majority-ack instant.
`internal/raftengine/etcd/quorum_ack_test.go`	Adds unit tests for quorum ack semantics, reset, pruning, and concurrency.
`kv/coordinator.go`	Makes engine-driven quorum-ack the primary LeaseRead fast path, preserving the old caller-side lease as secondary.
`kv/sharded_coordinator.go`	Mirrors the engine-driven lease-read fast path for sharded groups.
`kv/lease_read_test.go`	Extends fake engine and adds tests validating the new engine-ack fast path behavior.

Comments suppressed due to low confidence (1)

internal/raftengine/etcd/engine.go:1262

recordQuorumAck() is called before rawNode.Step(). If Step returns etcdraft.ErrStepPeerNotFound (e.g., response from a removed/unknown peer), the ack still gets recorded and can incorrectly advance LastQuorumAck / satisfy quorum after a config change. Consider moving recordQuorumAck after a successful Step, or explicitly ignoring acks from peers not present in e.peers (and/or only recording when Step does not return ErrStepPeerNotFound).

	e.recordLeaderContact(msg)
	e.recordQuorumAck(msg)
	if err := e.rawNode.Step(msg); err != nil {
		if errors.Is(err, etcdraft.ErrStepPeerNotFound) {
			return
		}

Copilot · 2026-04-20T16:32:44Z

+	// when the local node is not the leader. Single-node clusters
+	// report time.Now() unconditionally since self is the quorum.


The LeaseProvider doc says single-node clusters "report time.Now() unconditionally", but the etcd engine implementation can return zero (e.g., before the first refreshStatus tick) and otherwise returns the last tick time rather than calling time.Now at read time. Please either adjust the documentation to match the actual contract (e.g., "returns a recent non-zero time while leader in a single-node cluster") or change implementations to satisfy the documented behavior.

Suggested change

// when the local node is not the leader. Single-node clusters

// report time.Now() unconditionally since self is the quorum.

// when the local node is not the leader. In a single-node cluster,

// self is the quorum, so while leader this returns a recent non-zero

// timestamp maintained by the engine; callers must not assume it is

// computed via time.Now at the instant of the call.

Copilot · 2026-04-20T16:32:45Z

 	linearizableErr          error
 	linearizableCalls        atomic.Int32
 	state                    atomic.Value // stores raftengine.State; default Leader
+	lastQuorumAckUnixNano    atomic.Int64 // 0 = no ack yet. Updated by ackNow().


The struct field comment mentions it is "Updated by ackNow()", but there is no ackNow helper on fakeLeaseEngine (the tests use setQuorumAck). Update the comment to match the actual helper/method so the test harness stays self-explanatory.

Suggested change

lastQuorumAckUnixNano atomic.Int64 // 0 = no ack yet. Updated by ackNow().

lastQuorumAckUnixNano atomic.Int64 // 0 = no ack yet. Updated by setQuorumAck().

Copilot · 2026-04-20T16:32:45Z

+	// Recorder alternates between two peer IDs so a 3-node followerQuorum
+	// always has at least one entry and the sort path runs.
+	go func() {
+		defer wg.Done()
+		for {
+			select {
+			case <-stop:
+				return
+			default:
+				tr.recordAck(2, 1)
+				tr.recordAck(3, 1)
+			}
+		}
+	}()
+	go func() {
+		defer wg.Done()
+		for {
+			select {
+			case <-stop:
+				return
+			default:
+				_ = tr.load()
+			}
+		}


TestQuorumAckTracker_ConcurrentRecordAndLoad spins in tight loops with a select default in two goroutines. In parallel test runs this can burn CPU and cause flakiness under constrained CI. Consider adding a small backoff (e.g., time.Sleep / runtime.Gosched) or using a time-bounded loop with a ticker to reduce busy-waiting while still exercising concurrent access.

Addresses copilot review on PR #561. recordQuorumAck ran before rawNode.Step, so a late MsgAppResp / MsgHeartbeatResp from a just-removed peer (which rawNode.Step would reject with ErrStepPeerNotFound) still landed an ack in the tracker -- resurrecting the "ghost" entry that removePeer just pruned. Filter on e.peers membership before recording; the event loop is the sole writer to e.peers so the map read is race-free.

bootjp · 2026-04-20T16:36:39Z

/gemini review

gemini-code-assist · 2026-04-20T16:36:42Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Copilot · 2026-04-20T16:48:21Z

+	state := c.engine.State()
+	if state == raftengine.StateLeader {
+		if ack := lp.LastQuorumAck(); !ack.IsZero() && time.Since(ack) < leaseDur {
+			return lp.AppliedIndex(), nil
+		}


The engine-driven fast path uses time.Since(ack) < leaseDur. Because LastQuorumAck() is reconstructed from a unix-nano value (no monotonic component), a backwards wall-clock adjustment can make time.Since(ack) negative and incorrectly treat a very old ack as "fresh". To keep the lease safety argument intact, guard against ack.After(time.Now()) (treat as stale/force slow path), or compare using now := time.Now(); elapsed := now.Sub(ack); if elapsed < 0 { ... } before applying the duration check.

Copilot · 2026-04-20T16:48:22Z

+// The lease is maintained inside the engine from ongoing
+// MsgAppResp / MsgHeartbeatResp traffic, so callers do not sample
+// time.Now() before the slow path to "extend" a lease afterwards.
+// That earlier pre-read sampling was racy under congestion: if a
+// LinearizableRead took longer than LeaseDuration, the extension
+// would land already expired and the lease never warmed up. The
+// engine-driven anchor is refreshed every heartbeat independent of
+// read latency.
+//


The doc comment says callers "do not sample time.Now() before the slow path to extend a lease afterwards", but this function still samples now := time.Now() before LinearizableRead and uses it to extend the secondary caller-side lease (c.lease.extend(now.Add(leaseDur), ...)). Please adjust the comment to clarify that only the primary path is engine-driven, while the legacy caller-side lease (with pre-read sampling) remains as a secondary fallback during rollout.

Suggested change

// The lease is maintained inside the engine from ongoing

// MsgAppResp / MsgHeartbeatResp traffic, so callers do not sample

// time.Now() before the slow path to "extend" a lease afterwards.

// That earlier pre-read sampling was racy under congestion: if a

// LinearizableRead took longer than LeaseDuration, the extension

// would land already expired and the lease never warmed up. The

// engine-driven anchor is refreshed every heartbeat independent of

// read latency.

//

// The primary lease path is maintained inside the engine from ongoing

// MsgAppResp / MsgHeartbeatResp traffic, so that path does not rely on

// callers sampling time.Now() before the slow path to "extend" a lease

// afterwards. That earlier pre-read sampling was racy under

// congestion: if a LinearizableRead took longer than LeaseDuration,

// the extension would land already expired and the lease never warmed

// up. The engine-driven anchor is refreshed every heartbeat

// independent of read latency.

//

// During rollout, a legacy caller-side lease extension remains as a

// secondary fallback after a successful slow-path read, including the

// older pre-read time sampling used to anchor that fallback window.

//

Copilot · 2026-04-20T16:48:22Z

+	// Engine-driven lease anchor -- see Coordinate.LeaseRead for why
+	// this is the primary check.
+	state := engine.State()
+	if state == raftengine.StateLeader {
+		if ack := lp.LastQuorumAck(); !ack.IsZero() && time.Since(ack) < leaseDur {
+			return lp.AppliedIndex(), nil
+		}
+	}
 	now := time.Now()


Same wall-clock skew issue as in Coordinate.LeaseRead: if LastQuorumAck() is in the future relative to time.Now() (e.g. due to a backwards clock step), time.Since(ack) becomes negative and this fast path can incorrectly accept a stale quorum ack. Add a guard to treat ack.After(now) as invalid/stale before applying the duration check.

Suggested change

// Engine-driven lease anchor -- see Coordinate.LeaseRead for why

// this is the primary check.

state := engine.State()

if state == raftengine.StateLeader {

if ack := lp.LastQuorumAck(); !ack.IsZero() && time.Since(ack) < leaseDur {

return lp.AppliedIndex(), nil

}

}

now := time.Now()

now := time.Now()

// Engine-driven lease anchor -- see Coordinate.LeaseRead for why

// this is the primary check.

state := engine.State()

if state == raftengine.StateLeader {

if ack := lp.LastQuorumAck(); !ack.IsZero() && !ack.After(now) && now.Sub(ack) < leaseDur {

return lp.AppliedIndex(), nil

}

}

Copilot · 2026-04-20T16:48:23Z

+	// Safety: callers must verify the lease with
+	//   time.Since(LastQuorumAck()) < LeaseDuration() &&
+	//   engine.State() == raftengine.StateLeader
+	// before serving a leader-local read. The LeaseDuration is bounded
+	// by electionTimeout - safety_margin, which guarantees that any new
+	// leader candidate cannot yet accept writes during that window.


The interface contract recommends time.Since(LastQuorumAck()) < LeaseDuration() as the safety check. Since implementations may return a time.Time reconstructed from unix nanos (no monotonic clock), a backwards wall-clock adjustment can make time.Since(...) negative and incorrectly treat an old ack as fresh. Consider updating the contract to require treating LastQuorumAck() values later than time.Now() as invalid/stale (or otherwise specify a monotonic-safe comparison).

Suggested change

// Safety: callers must verify the lease with

// time.Since(LastQuorumAck()) < LeaseDuration() &&

// engine.State() == raftengine.StateLeader

// before serving a leader-local read. The LeaseDuration is bounded

// by electionTimeout - safety_margin, which guarantees that any new

// leader candidate cannot yet accept writes during that window.

// Safety: callers must verify the lease against a single `now := time.Now()`

// and MUST treat any LastQuorumAck() later than `now` as invalid/stale

// before serving a leader-local read, e.g.

// ack := LastQuorumAck()

// now := time.Now()

// !ack.IsZero() &&

// !ack.After(now) &&

// now.Sub(ack) < LeaseDuration() &&

// engine.State() == raftengine.StateLeader

// The LeaseDuration is bounded by electionTimeout - safety_margin,

// which guarantees that any new leader candidate cannot yet accept

// writes during that window.

Addresses copilot review on PR #561. LastQuorumAck() returns a time reconstructed from UnixNano, which has no monotonic component. A backwards wall-clock step can leave now.Sub(ack) negative and let a stale ack pass the `<` duration comparison -- a lease-safety violation. Three related changes: 1. Coordinate.LeaseRead / groupLeaseRead: sample now := time.Now() once, then validate the engine-driven ack through a new engineLeaseAckValid helper that requires state==Leader, ack non-zero, !ack.After(now), and now.Sub(ack) < leaseDur. Treating future-dated acks as invalid forces the slow path instead of trusting a negative elapsed duration. Factoring the check into a helper keeps both call sites under the cyclomatic-complexity cap. 2. raftengine.LeaseProvider.LastQuorumAck doc: specify the skew-safe check (single-now sample + !ack.After(now)) so every implementation and every caller follows the same contract. 3. Coordinate.LeaseRead doc refreshed to clarify that only the PRIMARY engine-driven path skips pre-read sampling; the SECONDARY caller-side lease fallback still uses now.Add(leaseDur) during rollout.

bootjp · 2026-04-20T17:04:55Z

/gemini review

gemini-code-assist · 2026-04-20T17:04:59Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-20T17:12:49Z

+func (t *quorumAckTracker) removePeer(peerID uint64, followerQuorum int) {
+	t.mu.Lock()
+	defer t.mu.Unlock()
+	if _, ok := t.peerAcks[peerID]; !ok {
+		return
+	}
+	delete(t.peerAcks, peerID)
+	if followerQuorum <= 0 {
+		return
+	}
+	t.recomputeLocked(followerQuorum)


In quorumAckTracker.removePeer, returning early when peerID has no recorded entry skips recomputing quorumAckUnixNano. If membership changes reduce followerQuorum, the tracker can unnecessarily keep quorumAck at 0 (or stale) until the next recordAck, even though the remaining recorded peers may already satisfy the new quorum. Consider always recomputing when followerQuorum > 0 (and delete() on a nil/missing entry is safe), or providing an explicit recompute-on-config-change path.

Copilot · 2026-04-20T17:12:50Z

+	// when the local node is not the leader. Single-node clusters
+	// report time.Now() unconditionally since self is the quorum.


LeaseProvider.LastQuorumAck docs say "Single-node clusters report time.Now() unconditionally since self is the quorum" while also stating the method returns zero when the local node is not leader. In practice the etcd engine only publishes a non-zero value while leader (and updates it periodically), so this wording is misleading/contradictory. Please tighten the contract wording (e.g., "single-node leaders may return a recent time.Now()"), so callers/other implementations don't interpret it as returning non-zero when not leader.

Suggested change

// when the local node is not the leader. Single-node clusters

// report time.Now() unconditionally since self is the quorum.

// when the local node is not the leader. Single-node leaders may

// return a recent time.Now() value since self is the quorum.

Addresses copilot review on PR #561. - quorumAckTracker.removePeer no longer early-returns when peerID has no recorded entry. A shrink that reduces followerQuorum may let the remaining peers now satisfy the smaller threshold, and without an explicit recompute the published instant stays stale (or zero) until the next recordAck. delete() is safe on a missing key, so the unconditional path is harmless otherwise. - LeaseProvider.LastQuorumAck doc: the previous wording said "single- node clusters report time.Now() unconditionally" which contradicted the "zero when not leader" clause. Tighten to "single-node leaders may return a recent time.Now()" so callers and alternate engines don't interpret it as returning non-zero on a non-leader.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-20T18:15:48Z

+func (e *Engine) LastQuorumAck() time.Time {
+	if e == nil {
+		return time.Time{}
+	}
+	if ns := e.singleNodeLeaderAckUnixNano.Load(); ns != 0 {
+		return time.Unix(0, ns)
+	}
+	return e.ackTracker.load()
+}


LastQuorumAck() currently returns ackTracker.load() unconditionally for multi-node clusters. That can return a non-zero time even when the node is follower/candidate (e.g., late MsgAppResp/MsgHeartbeatResp arriving after a step-down can repopulate ackTracker), which violates the LeaseProvider contract (“zero time when ... not the leader”) and can also leak stale quorum acks across re-elections. Consider gating the return value on leader state (or maintaining an atomic leader flag) so non-leaders always observe time.Time{} for this API.

Copilot · 2026-04-20T18:15:48Z

+	if !isFollowerResponse(msg.Type) {
+		return
+	}
+	if msg.From == 0 || msg.From == e.nodeID {
+		return
+	}
+	// Reject acks from peers not in the current membership. Without
+	// this filter, a late MsgAppResp from a just-removed peer (which
+	// rawNode.Step will immediately reject with ErrStepPeerNotFound)
+	// would still land an ack in the tracker -- resurrecting the
+	// "ghost" entry that removePeer just pruned. Since we run on the
+	// event-loop goroutine (the sole writer to e.peers), the map read
+	// here is race-free.
+	if _, ok := e.peers[msg.From]; !ok {
+		return
+	}
+	clusterSize := len(e.peers)
+	if clusterSize <= 1 {
+		return
+	}


recordQuorumAck updates the tracker for follower-response messages without checking that the local node is currently leader. After a leader-loss transition (where ackTracker.reset() runs), late follower responses can still arrive and repopulate ackTracker while we are follower/candidate, making LastQuorumAck() non-zero and potentially reusing stale liveness on a later re-election. Suggest adding a leader-state gate here (or otherwise ensuring the tracker cannot be updated outside leader role).

Copilot · 2026-04-20T18:15:49Z

+	time.Sleep(2 * time.Millisecond)
+	tr.recordAck(3, 2)
+	second := tr.load()
+	require.False(t, second.IsZero())
+
+	// Now peer 4 acks with a later timestamp. Quorum ack should still
+	// be the older of the top two (either 2 or 3, not 4) because the
+	// 5-node quorum is 3 including self (2 followers + self), and the
+	// OLDEST of the top two followers is still the limiting factor.
+	time.Sleep(2 * time.Millisecond)


These time.Sleep(2 * time.Millisecond) calls aren’t required for the assertions (the test only checks non-zero/non-regression) and can add flakiness/latency to the test suite on slow CI. Consider removing them or rewriting the tracker to accept an injected timestamp in tests so ordering can be controlled deterministically.

Suggested change

time.Sleep(2 * time.Millisecond)

tr.recordAck(3, 2)

second := tr.load()

require.False(t, second.IsZero())

// Now peer 4 acks with a later timestamp. Quorum ack should still

// be the older of the top two (either 2 or 3, not 4) because the

// 5-node quorum is 3 including self (2 followers + self), and the

// OLDEST of the top two followers is still the limiting factor.

time.Sleep(2 * time.Millisecond)

tr.recordAck(3, 2)

second := tr.load()

require.False(t, second.IsZero())

// Now peer 4 acks. Quorum ack should still be the older of the top

// two (either 2 or 3, not 4) because the 5-node quorum is 3

// including self (2 followers + self), and the OLDEST of the top

// two followers is still the limiting factor.

Addresses copilot review on PR #561. A transport-level MsgAppResp / MsgHeartbeatResp can land at handleStep shortly after a step-down -- ackTracker.reset() has already fired, but the response is already in flight. Previously recordQuorumAck would admit the late ack and repopulate the tracker, and LastQuorumAck would surface the resulting non-zero instant. A subsequent re-election would then observe stale liveness from the prior term, violating the LeaseProvider contract ("zero time when the local node is not the leader") and potentially widening an already-expired lease window. Fix: atomic isLeader mirror on the Engine. - refreshStatus writes it every tick before the other lease-related atomics, so every role transition is observed before the next handleStep runs. - recordQuorumAck loads it and drops the response on the floor when not leader. Prevents the tracker from being repopulated after reset() and before the next refreshStatus pass. - LastQuorumAck returns time.Time{} when not leader, honoring the interface contract and covering the narrow window between recordQuorumAck's check and ack-tracker publication. Also drop the flaky 2ms time.Sleep calls from TestQuorumAckTracker_QuorumAckIsOldestOfTopN: the require.False( third.Before(second)) assertion holds trivially when timestamps are equal, so the sleeps added latency without exercising new behaviour and could flake on slow CI.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-20T18:39:20Z

+	// Recorder alternates between two peer IDs so a 3-node followerQuorum
+	// always has at least one entry and the sort path runs.
+	go func() {
+		defer wg.Done()
+		for {
+			select {
+			case <-stop:
+				return
+			default:
+				tr.recordAck(2, 1)
+				tr.recordAck(3, 1)
+			}
+		}
+	}()
+	go func() {
+		defer wg.Done()
+		for {
+			select {
+			case <-stop:
+				return
+			default:
+				_ = tr.load()
+			}
+		}
+	}()


This concurrency test uses two goroutines with tight default: loops and no backoff, which can busy-spin and consume significant CPU on CI (especially with t.Parallel()). Consider adding a small sleep/Gosched in the loops or using a time-bounded context/ticker-driven loop so the test still exercises concurrency without pegging a core.

Copilot · 2026-04-20T18:39:21Z

+func (t *quorumAckTracker) recomputeLocked(followerQuorum int) {
+	if len(t.peerAcks) < followerQuorum {
+		// Not enough peers have reported to form a majority yet.
+		t.quorumAckUnixNano.Store(0)
+		return
+	}
+	acks := make([]int64, 0, len(t.peerAcks))
+	for _, a := range t.peerAcks {
+		acks = append(acks, a)
+	}
+	// Sort descending so acks[0] is the most recent. The followerQuorum-th
+	// entry (1-indexed) is the oldest ack among the top quorum -- i.e.
+	// the boundary instant by which majority liveness was confirmed.
+	sort.Slice(acks, func(i, j int) bool { return acks[i] > acks[j] })
+	t.quorumAckUnixNano.Store(acks[followerQuorum-1])
+}


quorumAckTracker.recomputeLocked allocates a fresh slice and sorts all peer ack timestamps on every follower response. Since recordAck is called for each MsgAppResp / MsgHeartbeatResp, this can become a hot/allocating path as cluster size or heartbeat frequency grows. Consider reducing allocations (e.g., reuse a buffer) and computing the Nth-most-recent timestamp without a full sort.

Copilot · 2026-04-20T18:39:21Z

+	if postRemovalClusterSize > 1 {
+		followerQuorum = postRemovalClusterSize / 2 //nolint:mnd
+	}
+	e.ackTracker.removePeer(nodeID, followerQuorum)


ackTracker is only recomputed on peer removal. When postRemovalClusterSize <= 1, followerQuorum is passed as 0, and quorumAckTracker.removePeer intentionally retains the previously published quorum instant. If the cluster later grows (or any config change increases the required follower quorum) before new follower responses arrive, LastQuorumAck can expose a stale non-zero instant and incorrectly allow lease-read fast paths under the new membership. Consider clearing/resetting the published quorum instant (or recomputing against the new followerQuorum) on membership changes that reduce quorum to 0 and on additions/increases to the follower quorum, so a new configuration cannot inherit a quorum-ack from the prior one.

Suggested change

e.ackTracker.removePeer(nodeID, followerQuorum)

ackTrackerQuorum := followerQuorum

if ackTrackerQuorum == 0 {

// Force the tracker to recompute/clear any previously published

// quorum instant instead of retaining stale state from the old

// configuration when the cluster shrinks to a single node.

ackTrackerQuorum = 1

}

e.ackTracker.removePeer(nodeID, ackTrackerQuorum)

… sort buffer Addresses copilot review on PR #561. - engine.removePeer: when the cluster shrinks to a single node the prior code passed followerQuorum=0 to quorumAckTracker.removePeer, which by design preserves the currently published instant so the next recordAck can refresh it. But if the cluster subsequently grew back without a recordAck arriving first, LastQuorumAck's multi-node fallback would surface the stale instant from the previous configuration. Route shrink-to-<=1 through ackTracker.reset() so any future multi-node membership starts fresh. - quorumAckTracker.recomputeLocked: reuse a per-tracker ackBuf across calls instead of allocating a fresh []int64 on every recordAck. Heartbeat frequency × cluster size keeps the allocation rate non-trivial otherwise; re-slicing in place + append-on-growth brings the steady-state alloc to zero. - quorum_ack_test concurrency loops now Gosched() between iterations so the test exercises interleavings without pegging a CI core.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-20T19:03:11Z

+func (e *fakeLeaseEngine) LastQuorumAck() time.Time {
+	ns := e.lastQuorumAckUnixNano.Load()
+	if ns == 0 {
+		return time.Time{}
+	}
+	return time.Unix(0, ns)
+}


fakeLeaseEngine.LastQuorumAck() doesn't honor the raftengine.LeaseProvider contract that non-leaders return a zero time. As written it can return a non-zero ack even when e.State() is follower, which could let tests accidentally pass if production code ever stops checking state before using LastQuorumAck. Consider gating on e.State()==raftengine.StateLeader (or the atomic state value) and returning time.Time{} otherwise.

Copilot · 2026-04-20T19:03:11Z

+	// Followers needed for majority = floor(clusterSize / 2): 1 for a
+	// 3-node cluster, 2 for 5-node, matching raft quorum semantics.
+	followerQuorum := clusterSize / 2 //nolint:mnd
+	e.ackTracker.recordAck(msg.From, followerQuorum)


New //nolint:mnd on the quorum calculation looks avoidable here. To keep the code self-explanatory without suppressing the linter, consider introducing a small helper (e.g., followerQuorumForClusterSize) or a named constant for the divisor, and reuse it in both recordQuorumAck and removePeer.

Copilot · 2026-04-20T19:03:11Z

+	if postRemovalClusterSize <= 1 {
+		e.ackTracker.reset()
+	} else {
+		e.ackTracker.removePeer(nodeID, postRemovalClusterSize/2) //nolint:mnd
+	}


New //nolint:mnd on the quorum calculation looks avoidable here as well. Consider reusing the same named helper/constant used in recordQuorumAck so the quorum math is centralized and the linter suppression isn't needed.

…der state Addresses copilot review on PR #561. - followerQuorumForClusterSize helper centralises the cluster/2 formula so recordQuorumAck and removePeer share a single point of truth. The //nolint:mnd suppression now lives only in the helper where the formula is documented in context, rather than sprinkled at each call site. - fakeLeaseEngine.LastQuorumAck now gates on State() == StateLeader, matching the production etcd engine and the LeaseProvider contract ("zero time when the local node is not the leader"). Without this mirror, a test could set a fresh ack plus a non-leader state and still take the fast path, masking a regression if production code ever stopped gating on engine.State() before consulting LastQuorumAck.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-20T19:32:04Z

+	// Single time.Now() sample so primary/secondary/extension all see
+	// the same instant. Clock-skew safety delegated to
+	// engineLeaseAckValid (see Coordinate.LeaseRead).
 	now := time.Now()
+	state := engine.State()
+	if engineLeaseAckValid(state, lp.LastQuorumAck(), now, leaseDur) {


now is taken before LastQuorumAck(). If the engine updates the quorum-ack timestamp between these two reads, ack.After(now) can become true and the primary fast path will be skipped even though the ack is fresh. Consider reading ack first and then sampling now (reused for secondary/extend) to avoid spurious slow-path reads.

Suggested change

// Single time.Now() sample so primary/secondary/extension all see

// the same instant. Clock-skew safety delegated to

// engineLeaseAckValid (see Coordinate.LeaseRead).

now := time.Now()

state := engine.State()

if engineLeaseAckValid(state, lp.LastQuorumAck(), now, leaseDur) {

// Read the quorum ack before sampling time so a concurrent ack

// update cannot appear to be in the future relative to now and

// spuriously disable the primary fast path. Reuse the same now

// sample for primary/secondary/extension checks.

ack := lp.LastQuorumAck()

now := time.Now()

state := engine.State()

if engineLeaseAckValid(state, ack, now, leaseDur) {

Copilot · 2026-04-20T19:32:05Z

+	// Publish leader state atomically so recordQuorumAck / LastQuorumAck
+	// can gate on it without acquiring e.mu. MUST run before the
+	// single-node ack store below, otherwise a brand-new leader tick
+	// could publish a ack instant while isLeader is still false.


Minor grammar nit in the comment: "a ack instant" should be "an ack instant".

Suggested change

// could publish a ack instant while isLeader is still false.

// could publish an ack instant while isLeader is still false.

Resolve conflicts from #561 (engine-driven lease) landing: - internal/raftengine/etcd/engine.go: keep both accessor groups -- LastQuorumAck from #561 and the DispatchDropCount / DispatchErrorCount / StepQueueFullCount accessors this branch introduced for the monitoring DispatchCollector. - kv/coordinator.go: the secondary caller-side lease fast path now consults the locally-cached `state` from the engineLeaseAckValid call instead of re-reading engine.State(), and observes the hit via c.leaseObserver. Also wire the engine-driven primary fast path through the observer so the lease-hit ratio panel counts every fast-path hit regardless of which tier produced it. Extract a one-line observeLeaseRead helper to keep LeaseRead under the cyclomatic-complexity cap after the extra branch. - kv/sharded_coordinator.go: mirror the Coordinate.LeaseRead restructure for groupLeaseRead -- use the shared `state`, observe hits on both tiers, and share an observeLeaseRead package helper to keep the function body under the cyclomatic-complexity cap.

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread internal/raftengine/etcd/quorum_ack.go

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

bootjp requested a review from Copilot April 20, 2026 16:26

Copilot started reviewing on behalf of bootjp April 20, 2026 16:27 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

bootjp requested a review from Copilot April 20, 2026 16:42

Copilot started reviewing on behalf of bootjp April 20, 2026 16:42 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

bootjp requested a review from Copilot April 20, 2026 17:06

Copilot started reviewing on behalf of bootjp April 20, 2026 17:07 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

bootjp requested a review from Copilot April 20, 2026 18:09

Copilot started reviewing on behalf of bootjp April 20, 2026 18:10 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

bootjp requested a review from Copilot April 20, 2026 18:31

Copilot started reviewing on behalf of bootjp April 20, 2026 18:32 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

bootjp requested a review from Copilot April 20, 2026 18:58

Copilot started reviewing on behalf of bootjp April 20, 2026 18:58 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

bootjp added 2 commits April 21, 2026 04:23

Merge branch 'main' into feat/engine-driven-lease

40a923e

bootjp requested a review from Copilot April 20, 2026 19:26

Copilot started reviewing on behalf of bootjp April 20, 2026 19:27 View session

bootjp merged commit a9a9faf into main Apr 20, 2026
10 checks passed

bootjp deleted the feat/engine-driven-lease branch April 20, 2026 19:29

Copilot AI reviewed Apr 20, 2026

View reviewed changes

		// when the local node is not the leader. Single-node clusters
		// report time.Now() unconditionally since self is the quorum.

-	// when the local node is not the leader. Single-node clusters
-	// report time.Now() unconditionally since self is the quorum.
+	// when the local node is not the leader. In a single-node cluster,
+	// self is the quorum, so while leader this returns a recent non-zero
+	// timestamp maintained by the engine; callers must not assume it is
+	// computed via time.Now at the instant of the call.

	lastQuorumAckUnixNano atomic.Int64 // 0 = no ack yet. Updated by ackNow().
	lastQuorumAckUnixNano atomic.Int64 // 0 = no ack yet. Updated by setQuorumAck().

-// The lease is maintained inside the engine from ongoing
-// MsgAppResp / MsgHeartbeatResp traffic, so callers do not sample
-// time.Now() before the slow path to "extend" a lease afterwards.
-// That earlier pre-read sampling was racy under congestion: if a
-// LinearizableRead took longer than LeaseDuration, the extension
-// would land already expired and the lease never warmed up. The
-// engine-driven anchor is refreshed every heartbeat independent of
-// read latency.
-//
+// The primary lease path is maintained inside the engine from ongoing
+// MsgAppResp / MsgHeartbeatResp traffic, so that path does not rely on
+// callers sampling time.Now() before the slow path to "extend" a lease
+// afterwards. That earlier pre-read sampling was racy under
+// congestion: if a LinearizableRead took longer than LeaseDuration,
+// the extension would land already expired and the lease never warmed
+// up. The engine-driven anchor is refreshed every heartbeat
+// independent of read latency.
+//
+// During rollout, a legacy caller-side lease extension remains as a
+// secondary fallback after a successful slow-path read, including the
+// older pre-read time sampling used to anchor that fallback window.
+//

-	// Safety: callers must verify the lease with
-	//   time.Since(LastQuorumAck()) < LeaseDuration() &&
-	//   engine.State() == raftengine.StateLeader
-	// before serving a leader-local read. The LeaseDuration is bounded
-	// by electionTimeout - safety_margin, which guarantees that any new
-	// leader candidate cannot yet accept writes during that window.
+	// Safety: callers must verify the lease against a single `now := time.Now()`
+	// and MUST treat any LastQuorumAck() later than `now` as invalid/stale
+	// before serving a leader-local read, e.g.
+	//   ack := LastQuorumAck()
+	//   now := time.Now()
+	//   !ack.IsZero() &&
+	//   !ack.After(now) &&
+	//   now.Sub(ack) < LeaseDuration() &&
+	//   engine.State() == raftengine.StateLeader
+	// The LeaseDuration is bounded by electionTimeout - safety_margin,
+	// which guarantees that any new leader candidate cannot yet accept
+	// writes during that window.

-	e.ackTracker.removePeer(nodeID, followerQuorum)
+	ackTrackerQuorum := followerQuorum
+	if ackTrackerQuorum == 0 {
+		// Force the tracker to recompute/clear any previously published
+		// quorum instant instead of retaining stale state from the old
+		// configuration when the cluster shrinks to a single node.
+		ackTrackerQuorum = 1
+	}
+	e.ackTracker.removePeer(nodeID, ackTrackerQuorum)

	// could publish a ack instant while isLeader is still false.
	// could publish an ack instant while isLeader is still false.

Conversation

bootjp commented Apr 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Safety

Test plan

Summary by CodeRabbit

Uh oh!

bootjp commented Apr 20, 2026

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

bootjp commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading