Skip to content

perf(lua): verify leader once at startTS, use snapshot reads thereafter#546

Merged
bootjp merged 3 commits intomainfrom
feat/trace-redis
Apr 19, 2026
Merged

perf(lua): verify leader once at startTS, use snapshot reads thereafter#546
bootjp merged 3 commits intomainfrom
feat/trace-redis

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 19, 2026

Previously leaderAwareGetAt called VerifyLeader() on every redis.call() inside a Lua script (up to 3× per GET: prefixed key + legacy TTL index + bare key). With p50=7 calls and VerifyLeader ~500ms each, this caused 5-10 s total latency matching the observed Grafana data.

Changes:

  • newLuaScriptContext calls coordinator.VerifyLeader() once and errors early if the node is not the leader
  • leaderAwareGetAt refactored into doGetAt(verify bool); snapshotGetAt added as the no-verify variant
  • readRedisStringAt chain (decodePrefixedString, readBareLegacyString, readLegacyTTL) parameterised with rawGetFn; readRedisStringAtSnapshot added using snapshotGetAt
  • Lua string reads use readRedisStringAtSnapshot, reducing VerifyLeader calls from O(redis.call count) to O(1) per script invocation

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced Lua script execution with proper linearizable read coordination.
  • Bug Fixes

    • Improved error propagation when linearizable reads fail during Lua script execution (EVAL, EVALSHA, and compatibility paths).
  • Refactor

    • Refactored Redis string read operations to support pluggable read implementations.
    • Centralized leadership verification logic to prevent unnecessary re-verification during snapshot reads.

Previously leaderAwareGetAt called VerifyLeader() on every redis.call()
inside a Lua script (up to 3× per GET: prefixed key + legacy TTL index +
bare key). With p50=7 calls and VerifyLeader ~500ms each, this caused
5-10 s total latency matching the observed Grafana data.

Changes:
- newLuaScriptContext calls coordinator.VerifyLeader() once and errors
  early if the node is not the leader
- leaderAwareGetAt refactored into doGetAt(verify bool); snapshotGetAt
  added as the no-verify variant
- readRedisStringAt chain (decodePrefixedString, readBareLegacyString,
  readLegacyTTL) parameterised with rawGetFn; readRedisStringAtSnapshot
  added using snapshotGetAt
- Lua string reads use readRedisStringAtSnapshot, reducing VerifyLeader
  calls from O(redis.call count) to O(1) per script invocation
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

📝 Walkthrough

Walkthrough

This change introduces a new LinearizableRead() method to the Coordinator interface in the KV package. It integrates this method into Lua script context initialization for leadership verification, updates test doubles to implement the new interface, and refactors Redis string read helpers to support pluggable GetAt implementations with snapshot-based reads.

Changes

Cohort / File(s) Summary
Coordinator Interface
kv/coordinator.go
Extended Coordinator interface with new LinearizableRead(ctx context.Context) (uint64, error) method; Coordinate type already implements it.
Test Stubs & Doubles
adapter/distribution_server_test.go, adapter/dynamodb_test.go, adapter/redis_info_test.go, adapter/redis_keys_pattern_test.go, adapter/redis_retry_test.go, adapter/s3_test.go
Added LinearizableRead() stub implementations returning either (0, nil) or (0, kv.ErrLeaderNotFound) to satisfy updated interface in test doubles.
Lua Script Context
adapter/redis_lua_context.go, adapter/redis_lua.go
Integrated LinearizableRead() call into Lua script context initialization; updated context construction to accept context.Context and return error on leadership verification failure; adjusted retry paths to handle context creation errors.
Redis String Read Helpers
adapter/redis_compat_helpers.go
Refactored string reads via new readRedisStringWith() helper accepting pluggable rawGetFn; introduced readRedisStringAtSnapshot() and snapshotGetAt() for snapshot reads; centralized leadership verification logic in doGetAt() with optional verification flag; updated legacy read paths to reuse provided get function.
Lua LinearizableRead Tests
adapter/redis_lua_linearizable_read_test.go
New test file validating Lua execution paths (EVAL, EVALSHA, ExecLuaCompat) propagate LinearizableRead failures and succeed when no error occurs; includes test helpers and sentinel error patterns.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A new reader joins the Linux crew,
With snapshot calls and verification true—
Lua scripts now await their turn,
Ensuring leaders lead (and we all learn),
Where once there was one path, now two! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main performance optimization: verifying leader once at script start and using snapshot reads thereafter, which is the core change addressing repeated leader verification latency in Lua scripts.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/trace-redis

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes Redis Lua script execution by performing a single leadership verification at the start of the script context, allowing subsequent reads to bypass per-call verification. Key changes include refactoring string read helpers to support snapshot reads and updating the Lua script context initialization to handle leadership checks. Feedback suggests using context.Context for the leadership verification to respect timeouts and ensuring the read timestamp is acquired after the linearizable read fence for consistency. Additionally, a variable shadowing issue was noted in the Lua script execution loop.

Comment thread adapter/redis_lua_context.go Outdated
Comment thread adapter/redis_lua.go Outdated
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 19, 2026

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…ompat

Covers the gap identified in the 5-perspective code review: EVAL, EVALSHA,
and execLuaCompat all propagate LinearizableRead errors to the client as
Redis error replies, and the happy path returns the expected result.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
adapter/redis_lua_context.go (1)

193-205: ⚠️ Potential issue | 🟡 Minor

Clarify why the LinearizableRead return value is discarded in favor of server.readTS().

LinearizableRead(ctx) returns a Raft log index (as documented in internal/raftengine/engine.go), which serves as a fence confirming the FSM has applied all committed entries. However, the subsequent call to server.readTS() obtains the MVCC timestamp independently. While the current comment explains the reasoning, adding an explicit note that the Raft index is intentionally not used for the snapshot (since the MVCC timestamp is obtained separately after FSM application) would clarify the design for future maintainers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/redis_lua_context.go` around lines 193 - 205, The comment should
explicitly state that the Raft log index returned by
coordinator.LinearizableRead(ctx) is intentionally discarded because it only
serves as a fence to ensure the FSM has applied committed entries; the code then
reads the MVCC snapshot timestamp via server.readTS() for snapshotGetAt. Update
the comment inside newLuaScriptContext to mention LinearizableRead is used
solely for leader/fence verification and that startTS is derived from
server.readTS() (not from the returned Raft index) before constructing
luaScriptContext with startTS and readPin.
🧹 Nitpick comments (2)
adapter/distribution_server_test.go (1)

747-749: Mirror the stub’s follower behavior in LinearizableRead.

This stub already returns kv.ErrLeaderNotFound from VerifyLeader when leader is false; LinearizableRead should do the same so future leader-check paths do not accidentally pass in follower-mode tests.

Proposed test-stub fix
 func (s *distributionCoordinatorStub) LinearizableRead(_ context.Context) (uint64, error) {
+	if !s.leader {
+		return 0, kv.ErrLeaderNotFound
+	}
 	return 0, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/distribution_server_test.go` around lines 747 - 749, The
LinearizableRead method on distributionCoordinatorStub currently always returns
(0, nil) which bypasses follower-mode checks; update
distributionCoordinatorStub.LinearizableRead to mirror VerifyLeader’s behavior
by returning kv.ErrLeaderNotFound when the stub's leader flag is false (same as
VerifyLeader) and otherwise perform the normal successful return, ensuring
follower-mode tests fail leader-only paths consistently.
adapter/redis_lua_linearizable_read_test.go (1)

24-96: Add a test verifying the one-time leader-read contract for multi-call scripts.

These tests verify error propagation but miss the core perf regression guard. A Lua script with multiple redis.call() operations should verify that LinearizableRead is called exactly once upfront and subsequent operations do not trigger per-call VerifyLeaderForKey checks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/redis_lua_linearizable_read_test.go` around lines 24 - 96, Add a test
(e.g., TestEval_MultiCall_OneLeaderRead) that constructs a lua test server via
newLuaTestServer but with a spy/fixture that increments a counter each time the
LinearizableRead/VerifyLeaderForKey path is invoked, then run a multi-call
script (e.g., "redis.call('GET','k1'); redis.call('GET','k2'); return 1") by
calling r.eval with a recordingConn and the EVAL args, and finally assert the
script returns successfully (conn.err empty) and that the spy counter == 1 to
ensure VerifyLeaderForKey/LinearizableRead was performed only once for the whole
script; reference newLuaTestServer, recordingConn, r.eval and the
VerifyLeaderForKey/LinearizableRead hook when implementing the spy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapter/redis_compat_helpers.go`:
- Around line 383-385: Update the precondition comments to reflect that snapshot
reads are gated by LinearizableRead(ctx) instead of coordinator.VerifyLeader();
specifically change the comment above readRedisStringAtSnapshot to say the
caller must have already performed a LinearizableRead(ctx) (e.g., at Lua script
startTS acquisition) and make the same wording change in the other similar
comment block around the second snapshot-read helper (the block at the other
location referenced in the review). Ensure the revised comments mention
LinearizableRead(ctx) as the required precondition and keep the example context
(Lua script startTS acquisition).

---

Outside diff comments:
In `@adapter/redis_lua_context.go`:
- Around line 193-205: The comment should explicitly state that the Raft log
index returned by coordinator.LinearizableRead(ctx) is intentionally discarded
because it only serves as a fence to ensure the FSM has applied committed
entries; the code then reads the MVCC snapshot timestamp via server.readTS() for
snapshotGetAt. Update the comment inside newLuaScriptContext to mention
LinearizableRead is used solely for leader/fence verification and that startTS
is derived from server.readTS() (not from the returned Raft index) before
constructing luaScriptContext with startTS and readPin.

---

Nitpick comments:
In `@adapter/distribution_server_test.go`:
- Around line 747-749: The LinearizableRead method on
distributionCoordinatorStub currently always returns (0, nil) which bypasses
follower-mode checks; update distributionCoordinatorStub.LinearizableRead to
mirror VerifyLeader’s behavior by returning kv.ErrLeaderNotFound when the stub's
leader flag is false (same as VerifyLeader) and otherwise perform the normal
successful return, ensuring follower-mode tests fail leader-only paths
consistently.

In `@adapter/redis_lua_linearizable_read_test.go`:
- Around line 24-96: Add a test (e.g., TestEval_MultiCall_OneLeaderRead) that
constructs a lua test server via newLuaTestServer but with a spy/fixture that
increments a counter each time the LinearizableRead/VerifyLeaderForKey path is
invoked, then run a multi-call script (e.g., "redis.call('GET','k1');
redis.call('GET','k2'); return 1") by calling r.eval with a recordingConn and
the EVAL args, and finally assert the script returns successfully (conn.err
empty) and that the spy counter == 1 to ensure
VerifyLeaderForKey/LinearizableRead was performed only once for the whole
script; reference newLuaTestServer, recordingConn, r.eval and the
VerifyLeaderForKey/LinearizableRead hook when implementing the spy.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 033d9567-7318-4457-8d0a-993c410e6017

📥 Commits

Reviewing files that changed from the base of the PR and between b2ff126 and 1aa4f96.

📒 Files selected for processing (11)
  • adapter/distribution_server_test.go
  • adapter/dynamodb_test.go
  • adapter/redis_compat_helpers.go
  • adapter/redis_info_test.go
  • adapter/redis_keys_pattern_test.go
  • adapter/redis_lua.go
  • adapter/redis_lua_context.go
  • adapter/redis_lua_linearizable_read_test.go
  • adapter/redis_retry_test.go
  • adapter/s3_test.go
  • kv/coordinator.go

Comment on lines +383 to +385
// readRedisStringAtSnapshot reads a string without re-verifying leadership on
// every sub-call. The caller must have already called coordinator.VerifyLeader()
// once before invoking this (e.g. at Lua script startTS acquisition time).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the snapshot-read precondition comments.

The Lua path now gates snapshot reads with LinearizableRead(ctx), not coordinator.VerifyLeader(). Keeping the old wording makes the safety contract ambiguous for future callers.

📝 Proposed comment update
-// every sub-call. The caller must have already called coordinator.VerifyLeader()
-// once before invoking this (e.g. at Lua script startTS acquisition time).
+// every sub-call. The caller must have already completed a LinearizableRead
+// before invoking this (e.g. at Lua script startTS acquisition time).
 func (r *RedisServer) readRedisStringAtSnapshot(key []byte, readTS uint64) ([]byte, *time.Time, error) {
 	return r.readRedisStringWith(key, readTS, r.snapshotGetAt)
 }
-// The caller must have already called coordinator.VerifyLeader() once (e.g. at
-// Lua script startTS acquisition time) before using this method.
+// The caller must have already completed a LinearizableRead (e.g. at Lua script
+// startTS acquisition time) before using this method.
 func (r *RedisServer) snapshotGetAt(key []byte, readTS uint64) ([]byte, error) {
 	return r.doGetAt(key, readTS, false)
 }

Also applies to: 467-469

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/redis_compat_helpers.go` around lines 383 - 385, Update the
precondition comments to reflect that snapshot reads are gated by
LinearizableRead(ctx) instead of coordinator.VerifyLeader(); specifically change
the comment above readRedisStringAtSnapshot to say the caller must have already
performed a LinearizableRead(ctx) (e.g., at Lua script startTS acquisition) and
make the same wording change in the other similar comment block around the
second snapshot-read helper (the block at the other location referenced in the
review). Ensure the revised comments mention LinearizableRead(ctx) as the
required precondition and keep the example context (Lua script startTS
acquisition).

@bootjp bootjp merged commit 19f2ebb into main Apr 19, 2026
8 checks passed
@bootjp bootjp deleted the feat/trace-redis branch April 19, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant