perf(lua): verify leader once at startTS, use snapshot reads thereafter by bootjp · Pull Request #546 · bootjp/elastickv

bootjp · 2026-04-19T16:11:51Z

Previously leaderAwareGetAt called VerifyLeader() on every redis.call() inside a Lua script (up to 3× per GET: prefixed key + legacy TTL index + bare key). With p50=7 calls and VerifyLeader ~500ms each, this caused 5-10 s total latency matching the observed Grafana data.

Changes:

newLuaScriptContext calls coordinator.VerifyLeader() once and errors early if the node is not the leader
leaderAwareGetAt refactored into doGetAt(verify bool); snapshotGetAt added as the no-verify variant
readRedisStringAt chain (decodePrefixedString, readBareLegacyString, readLegacyTTL) parameterised with rawGetFn; readRedisStringAtSnapshot added using snapshotGetAt
Lua string reads use readRedisStringAtSnapshot, reducing VerifyLeader calls from O(redis.call count) to O(1) per script invocation

Summary by CodeRabbit

Release Notes

New Features
- Enhanced Lua script execution with proper linearizable read coordination.
Bug Fixes
- Improved error propagation when linearizable reads fail during Lua script execution (EVAL, EVALSHA, and compatibility paths).
Refactor
- Refactored Redis string read operations to support pluggable read implementations.
- Centralized leadership verification logic to prevent unnecessary re-verification during snapshot reads.

Previously leaderAwareGetAt called VerifyLeader() on every redis.call() inside a Lua script (up to 3× per GET: prefixed key + legacy TTL index + bare key). With p50=7 calls and VerifyLeader ~500ms each, this caused 5-10 s total latency matching the observed Grafana data. Changes: - newLuaScriptContext calls coordinator.VerifyLeader() once and errors early if the node is not the leader - leaderAwareGetAt refactored into doGetAt(verify bool); snapshotGetAt added as the no-verify variant - readRedisStringAt chain (decodePrefixedString, readBareLegacyString, readLegacyTTL) parameterised with rawGetFn; readRedisStringAtSnapshot added using snapshotGetAt - Lua string reads use readRedisStringAtSnapshot, reducing VerifyLeader calls from O(redis.call count) to O(1) per script invocation

coderabbitai · 2026-04-19T16:11:58Z

📝 Walkthrough

Walkthrough

This change introduces a new LinearizableRead() method to the Coordinator interface in the KV package. It integrates this method into Lua script context initialization for leadership verification, updates test doubles to implement the new interface, and refactors Redis string read helpers to support pluggable GetAt implementations with snapshot-based reads.

Changes

Cohort / File(s)	Summary
Coordinator Interface `kv/coordinator.go`	Extended `Coordinator` interface with new `LinearizableRead(ctx context.Context) (uint64, error)` method; `Coordinate` type already implements it.
Test Stubs & Doubles `adapter/distribution_server_test.go`, `adapter/dynamodb_test.go`, `adapter/redis_info_test.go`, `adapter/redis_keys_pattern_test.go`, `adapter/redis_retry_test.go`, `adapter/s3_test.go`	Added `LinearizableRead()` stub implementations returning either `(0, nil)` or `(0, kv.ErrLeaderNotFound)` to satisfy updated interface in test doubles.
Lua Script Context `adapter/redis_lua_context.go`, `adapter/redis_lua.go`	Integrated `LinearizableRead()` call into Lua script context initialization; updated context construction to accept `context.Context` and return `error` on leadership verification failure; adjusted retry paths to handle context creation errors.
Redis String Read Helpers `adapter/redis_compat_helpers.go`	Refactored string reads via new `readRedisStringWith()` helper accepting pluggable `rawGetFn`; introduced `readRedisStringAtSnapshot()` and `snapshotGetAt()` for snapshot reads; centralized leadership verification logic in `doGetAt()` with optional verification flag; updated legacy read paths to reuse provided get function.
Lua LinearizableRead Tests `adapter/redis_lua_linearizable_read_test.go`	New test file validating Lua execution paths (`EVAL`, `EVALSHA`, `ExecLuaCompat`) propagate `LinearizableRead` failures and succeed when no error occurs; includes test helpers and sentinel error patterns.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A new reader joins the Linux crew,
With snapshot calls and verification true—
Lua scripts now await their turn,
Ensuring leaders lead (and we all learn),
Where once there was one path, now two! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main performance optimization: verifying leader once at script start and using snapshot reads thereafter, which is the core change addressing repeated leader verification latency in Lua scripts.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/trace-redis

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request optimizes Redis Lua script execution by performing a single leadership verification at the start of the script context, allowing subsequent reads to bypass per-call verification. Key changes include refactoring string read helpers to support snapshot reads and updating the Lua script context initialization to handle leadership checks. Feedback suggests using context.Context for the leadership verification to respect timeouts and ensuring the read timestamp is acquired after the linearizable read fence for consistency. Additionally, a variable shadowing issue was noted in the Lua script execution loop.

bootjp · 2026-04-19T16:19:29Z

/gemini review

gemini-code-assist · 2026-04-19T16:19:32Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…ompat Covers the gap identified in the 5-perspective code review: EVAL, EVALSHA, and execLuaCompat all propagate LinearizableRead errors to the client as Redis error replies, and the happy path returns the expected result.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

adapter/redis_lua_context.go (1)
193-205: ⚠️ Potential issue | 🟡 Minor

Clarify why the LinearizableRead return value is discarded in favor of server.readTS().

LinearizableRead(ctx) returns a Raft log index (as documented in internal/raftengine/engine.go), which serves as a fence confirming the FSM has applied all committed entries. However, the subsequent call to server.readTS() obtains the MVCC timestamp independently. While the current comment explains the reasoning, adding an explicit note that the Raft index is intentionally not used for the snapshot (since the MVCC timestamp is obtained separately after FSM application) would clarify the design for future maintainers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/redis_lua_context.go` around lines 193 - 205, The comment should
explicitly state that the Raft log index returned by
coordinator.LinearizableRead(ctx) is intentionally discarded because it only
serves as a fence to ensure the FSM has applied committed entries; the code then
reads the MVCC snapshot timestamp via server.readTS() for snapshotGetAt. Update
the comment inside newLuaScriptContext to mention LinearizableRead is used
solely for leader/fence verification and that startTS is derived from
server.readTS() (not from the returned Raft index) before constructing
luaScriptContext with startTS and readPin.

🧹 Nitpick comments (2)

adapter/distribution_server_test.go (1)
747-749: Mirror the stub’s follower behavior in LinearizableRead.

This stub already returns kv.ErrLeaderNotFound from VerifyLeader when leader is false; LinearizableRead should do the same so future leader-check paths do not accidentally pass in follower-mode tests.
Proposed test-stub fix
 func (s *distributionCoordinatorStub) LinearizableRead(_ context.Context) (uint64, error) {
+	if !s.leader {
+		return 0, kv.ErrLeaderNotFound
+	}
 	return 0, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/distribution_server_test.go` around lines 747 - 749, The
LinearizableRead method on distributionCoordinatorStub currently always returns
(0, nil) which bypasses follower-mode checks; update
distributionCoordinatorStub.LinearizableRead to mirror VerifyLeader’s behavior
by returning kv.ErrLeaderNotFound when the stub's leader flag is false (same as
VerifyLeader) and otherwise perform the normal successful return, ensuring
follower-mode tests fail leader-only paths consistently.
adapter/redis_lua_linearizable_read_test.go (1)
24-96: Add a test verifying the one-time leader-read contract for multi-call scripts.

These tests verify error propagation but miss the core perf regression guard. A Lua script with multiple redis.call() operations should verify that LinearizableRead is called exactly once upfront and subsequent operations do not trigger per-call VerifyLeaderForKey checks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/redis_lua_linearizable_read_test.go` around lines 24 - 96, Add a test
(e.g., TestEval_MultiCall_OneLeaderRead) that constructs a lua test server via
newLuaTestServer but with a spy/fixture that increments a counter each time the
LinearizableRead/VerifyLeaderForKey path is invoked, then run a multi-call
script (e.g., "redis.call('GET','k1'); redis.call('GET','k2'); return 1") by
calling r.eval with a recordingConn and the EVAL args, and finally assert the
script returns successfully (conn.err empty) and that the spy counter == 1 to
ensure VerifyLeaderForKey/LinearizableRead was performed only once for the whole
script; reference newLuaTestServer, recordingConn, r.eval and the
VerifyLeaderForKey/LinearizableRead hook when implementing the spy.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapter/redis_compat_helpers.go`:
- Around line 383-385: Update the precondition comments to reflect that snapshot
reads are gated by LinearizableRead(ctx) instead of coordinator.VerifyLeader();
specifically change the comment above readRedisStringAtSnapshot to say the
caller must have already performed a LinearizableRead(ctx) (e.g., at Lua script
startTS acquisition) and make the same wording change in the other similar
comment block around the second snapshot-read helper (the block at the other
location referenced in the review). Ensure the revised comments mention
LinearizableRead(ctx) as the required precondition and keep the example context
(Lua script startTS acquisition).

---

Outside diff comments:
In `@adapter/redis_lua_context.go`:
- Around line 193-205: The comment should explicitly state that the Raft log
index returned by coordinator.LinearizableRead(ctx) is intentionally discarded
because it only serves as a fence to ensure the FSM has applied committed
entries; the code then reads the MVCC snapshot timestamp via server.readTS() for
snapshotGetAt. Update the comment inside newLuaScriptContext to mention
LinearizableRead is used solely for leader/fence verification and that startTS
is derived from server.readTS() (not from the returned Raft index) before
constructing luaScriptContext with startTS and readPin.

---

Nitpick comments:
In `@adapter/distribution_server_test.go`:
- Around line 747-749: The LinearizableRead method on
distributionCoordinatorStub currently always returns (0, nil) which bypasses
follower-mode checks; update distributionCoordinatorStub.LinearizableRead to
mirror VerifyLeader’s behavior by returning kv.ErrLeaderNotFound when the stub's
leader flag is false (same as VerifyLeader) and otherwise perform the normal
successful return, ensuring follower-mode tests fail leader-only paths
consistently.

In `@adapter/redis_lua_linearizable_read_test.go`:
- Around line 24-96: Add a test (e.g., TestEval_MultiCall_OneLeaderRead) that
constructs a lua test server via newLuaTestServer but with a spy/fixture that
increments a counter each time the LinearizableRead/VerifyLeaderForKey path is
invoked, then run a multi-call script (e.g., "redis.call('GET','k1');
redis.call('GET','k2'); return 1") by calling r.eval with a recordingConn and
the EVAL args, and finally assert the script returns successfully (conn.err
empty) and that the spy counter == 1 to ensure
VerifyLeaderForKey/LinearizableRead was performed only once for the whole
script; reference newLuaTestServer, recordingConn, r.eval and the
VerifyLeaderForKey/LinearizableRead hook when implementing the spy.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 033d9567-7318-4457-8d0a-993c410e6017

📥 Commits

Reviewing files that changed from the base of the PR and between b2ff126 and 1aa4f96.

📒 Files selected for processing (11)

adapter/distribution_server_test.go
adapter/dynamodb_test.go
adapter/redis_compat_helpers.go
adapter/redis_info_test.go
adapter/redis_keys_pattern_test.go
adapter/redis_lua.go
adapter/redis_lua_context.go
adapter/redis_lua_linearizable_read_test.go
adapter/redis_retry_test.go
adapter/s3_test.go
kv/coordinator.go

coderabbitai · 2026-04-19T16:36:05Z

+// readRedisStringAtSnapshot reads a string without re-verifying leadership on
+// every sub-call. The caller must have already called coordinator.VerifyLeader()
+// once before invoking this (e.g. at Lua script startTS acquisition time).


⚠️ Potential issue | 🟡 Minor

Update the snapshot-read precondition comments.

The Lua path now gates snapshot reads with LinearizableRead(ctx), not coordinator.VerifyLeader(). Keeping the old wording makes the safety contract ambiguous for future callers.

📝 Proposed comment update

-// every sub-call. The caller must have already called coordinator.VerifyLeader() -// once before invoking this (e.g. at Lua script startTS acquisition time). +// every sub-call. The caller must have already completed a LinearizableRead +// before invoking this (e.g. at Lua script startTS acquisition time). func (r *RedisServer) readRedisStringAtSnapshot(key []byte, readTS uint64) ([]byte, *time.Time, error) { return r.readRedisStringWith(key, readTS, r.snapshotGetAt) }

-// The caller must have already called coordinator.VerifyLeader() once (e.g. at -// Lua script startTS acquisition time) before using this method. +// The caller must have already completed a LinearizableRead (e.g. at Lua script +// startTS acquisition time) before using this method. func (r *RedisServer) snapshotGetAt(key []byte, readTS uint64) ([]byte, error) { return r.doGetAt(key, readTS, false) }

Also applies to: 467-469

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@adapter/redis_compat_helpers.go` around lines 383 - 385, Update the precondition comments to reflect that snapshot reads are gated by LinearizableRead(ctx) instead of coordinator.VerifyLeader(); specifically change the comment above readRedisStringAtSnapshot to say the caller must have already performed a LinearizableRead(ctx) (e.g., at Lua script startTS acquisition) and make the same wording change in the other similar comment block around the second snapshot-read helper (the block at the other location referenced in the review). Ensure the revised comments mention LinearizableRead(ctx) as the required precondition and keep the example context (Lua script startTS acquisition).

gemini-code-assist Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread adapter/redis_lua_context.go Outdated

Comment thread adapter/redis_lua.go Outdated

perf(lua): use LinearizableRead instead of VerifyLeader at script start

9cb1600

coderabbitai Bot reviewed Apr 19, 2026

View reviewed changes

bootjp merged commit 19f2ebb into main Apr 19, 2026
8 checks passed

bootjp deleted the feat/trace-redis branch April 19, 2026 16:42

This was referenced Apr 19, 2026

feat(lease-read): leader-local lease read for coordinator and engine #549

Merged

chore(raft): drop hashicorp/raft backend, kv/ on raftengine natively #590

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(lua): verify leader once at startTS, use snapshot reads thereafter#546

perf(lua): verify leader once at startTS, use snapshot reads thereafter#546
bootjp merged 3 commits intomainfrom
feat/trace-redis

bootjp commented Apr 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

bootjp commented Apr 19, 2026

Uh oh!

gemini-code-assist Bot commented Apr 19, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bootjp commented Apr 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

bootjp commented Apr 19, 2026

Uh oh!

gemini-code-assist Bot commented Apr 19, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bootjp commented Apr 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 19, 2026 •

edited

Loading