refactor(kv): plumb caller context through write + verify-leader paths by bootjp · Pull Request #749 · bootjp/elastickv

bootjp · 2026-05-09T20:36:22Z

Summary

Follow-up to #745. PR #745 capped verifyLeaderEngine's no-context call site at 5s as an incident hotfix; this PR plumbs the caller's context.Context end-to-end through the kv write path and the verify-leader path so callers that hold a request deadline see THEIR deadline honoured, not the 5s safety bound.

The 5s bound stays in place as defense-in-depth for callers without an upstream context (LockResolver, HLC lease tick).

What changed

Interface signatures (kv-internal):

type Transactional interface {
    Commit(ctx context.Context, reqs []*pb.Request) (*TransactionResponse, error)
    Abort(ctx context.Context, reqs []*pb.Request) (*TransactionResponse, error)
}

type Coordinator interface {
    // …
    VerifyLeader(ctx context.Context) error
    VerifyLeaderForKey(ctx context.Context, key []byte) error
}

type LeaderProbe interface {
    IsVerifiedLeader(ctx context.Context) bool
}

Plumbing chains (every step now carries ctx):

Dispatch path:
  ShardedCoordinator.Dispatch(ctx)
    → dispatchTxn(ctx) / dispatchSingleShardTxn(ctx) /
      commitPrimaryTxn(ctx) / commitSecondaryTxns(ctx) /
      abortPreparedTxn(ctx) / broadcastToAllGroups(ctx) /
      dispatchDelPrefixBroadcast(ctx)
    → Transactional.Commit/Abort(ctx)
    → leaseRefreshingTxn / LeaderProxy / TransactionManager / ShardRouter
    → applyRequests(ctx) → proposer.Propose(ctx)
    → verifyLeaderEngineCtx(ctx)

Lock-resolver / shard-store path:
  LockResolver.resolveExpiredLock(ctx)
    → applyTxnResolution(ctx) / applyScanLockResolutions(ctx) /
      tryAbortExpiredPrimary(ctx)
    → g.Txn.Commit(ctx)

Healthz:
  S3 / SQS / DynamoDB /healthz/leader handlers
    → isVerifiedSQSLeader(ctx, …) / isVerifiedDynamoLeader(ctx, …) /
      S3Server.isVerifiedS3Leader(ctx) /
      DistributionServer.verifyCatalogLeader(ctx)
    → coordinator.VerifyLeader(ctx) / VerifyLeaderForKey(ctx, key)

Admin LeaderProbe:
  /admin/healthz/leader handler → LeaderProbe.IsVerifiedLeader(r.Context())
    → coordinate.VerifyLeader(ctx)

Also fixes the inline context.Background() at kv/transaction.go:152 (proposer.Propose) — same shape as the original verifyLeaderEngine bug, on the propose path.

Behaviour

For callers with upstream deadlines:

caller	before	after
Redis BLPOP `timeout=2s` on slow ReadIndex	fails after 5s (verifyLeaderTimeout)	fails after 2s (caller's deadline)
Caddy active probe `timeout=1s`	fails after 5s	fails after 1s
gRPC `Dispatch(ctx)` with explicit deadline	dropped to 5s	honours ctx deadline

For background callers without upstream ctx:

LockResolver tick, HLC lease renewer, etc. continue to hit verifyLeaderEngine() (no-arg variant) which still wraps with context.WithTimeout(context.Background(), verifyLeaderTimeout). PR fix(kv): bound verifyLeaderEngine ReadIndex with 5s deadline #745's 5s bound is unchanged for them.

Batched-commit caveat

TransactionManager.commitRaw merges many callers into a single propose via a separate goroutine, so no single ctx can bound the underlying applyRequests. The fix:

Wait site in commitRaw selects between item.done and ctx.Done() so per-caller cancellation works (caller exits early; the propose still completes; other waiters in the same batch get their results normally).
Goroutine's applyRequests call uses context.Background by design, documented in the code.

Self-review (5 lenses)

Data loss — none. ctx is value-passed; no persisted state changes. verifyLeaderEngineCtx is the existing path; only the caller of the no-arg variant changes.
Concurrency — ctx flows through the existing locking unchanged. commitRaw goroutine intentionally uses Background to avoid mixing per-caller deadlines into a shared batched propose; documented.
Performance — same call count, no extra allocations beyond the ctx value passed by reference. context.WithTimeout wrapping in the no-ctx fallback is unchanged.
Data consistency — verify is a freshness check, not a write path. Shorter caller deadlines surface ErrLeaderNotFound earlier instead of waiting out the 5s safety bound.
Test coverage — every Transactional / Coordinator / LeaderProbe stub gains the ctx parameter (currently _ context.Context, ready for future tests asserting cancel propagation). 16 stubs / signatures updated.

Test plan

go test -race -count=1 -short ./kv ./adapter ./internal/admin — all green
- kv: 6.6s
- adapter: 565s (long Jepsen-shaped tests)
- internal/admin: 1.4s
No semantic change beyond the ctx propagation; all existing tests pass without modification of their assertions.

Follow-up scope (not in this PR)

The audit also surfaced ~210 context.Background() usages in tree, mostly concentrated in:

file	count	nature
adapter/redis_compat_commands.go	66	dispatch helpers, internal commands
adapter/redis_lua_context.go	25	Lua script context loaders
adapter/redis.go	25	handler context fallback paths
proxy/*	28	proxy package, separate concern

These are "easy win" candidates where a ctx is in scope but Background() was used instead. Not bundled here to keep the PR reviewable; a follow-up "audit pass" PR targeting Redis adapter helpers is the natural next step.

The single context.TODO() (kv/fsm.go) is structural — raft Apply() does not provide a ctx — and is left as is.

Design doc

docs/design/2026_05_10_proposed_kv_ctx_plumbing.md — included in this PR.

Summary by CodeRabbit

Refactor
- Updated internal APIs to propagate request context through leader verification and transaction operations, enabling request deadlines to be respected instead of using fixed internal timeouts.
Documentation
- Added design document outlining context plumbing through KV write paths and leader-probe verification across adapters.

Follow-up to #745 (verifyLeaderEngine 5s deadline incident hotfix). PR #745 capped the no-context verify path at 5s as defense-in-depth; this PR plumbs the caller's context.Context end-to-end through: Dispatch path: ShardedCoordinator.Dispatch(ctx) → dispatchTxn(ctx) → dispatchSingleShardTxn(ctx) / commitPrimaryTxn(ctx) / commitSecondaryTxns(ctx) / abortPreparedTxn(ctx) / broadcastToAllGroups(ctx) / dispatchDelPrefixBroadcast(ctx) → Transactional.Commit/Abort(ctx, reqs) → leaseRefreshingTxn / LeaderProxy / TransactionManager / ShardRouter → applyRequests(ctx) → proposer.Propose(ctx) → verifyLeaderEngineCtx(ctx) Lock-resolver / shard-store path: LockResolver.resolveExpiredLock(ctx) → applyTxnResolution(ctx) / ShardStore.applyScanLockResolutions(ctx) / ShardStore.tryAbortExpiredPrimary(ctx) → g.Txn.Commit(ctx) Leader-probe / healthz path: Coordinator.VerifyLeader(ctx) / Coordinator.VerifyLeaderForKey(ctx, key) LeaderProbe.IsVerifiedLeader(ctx) adapter S3/SQS/DynamoDB /healthz/leader handlers feed r.Context() Adapter healthz helpers: isVerifiedSQSLeader(ctx, coordinator) isVerifiedDynamoLeader(ctx, coordinator) S3Server.isVerifiedS3Leader(ctx) DistributionServer.verifyCatalogLeader(ctx) Admin LeaderProbe wiring: main_admin.go newAdminLeaderProbe → coordinate.VerifyLeader(ctx) After: a Redis BLPOP timeout=2s whose dispatch lands on a slow ReadIndex now fails after 2s (its own deadline), not after the 5s verifyLeaderTimeout safety bound. A Caddy active health probe with a 1s budget likewise fails after 1s. The 5s bound stays in place as defense-in-depth for the no-arg verifyLeaderEngine() — still hit by LockResolver background loops, HLC lease ticks, and any future internal caller that genuinely cannot inherit a deadline. Also fixes proposer.Propose at kv/transaction.go:152 which used context.Background() inline; same shape as the original verifyLeaderEngine bug, just on the propose path. Now plumbed. Batched commit caveat: TransactionManager.commitRaw merges many callers into a single propose via a separate goroutine, so no single ctx can bound the underlying applyRequests. The wait site in commitRaw selects between item.done and ctx.Done so per-caller cancellation still works (the propose continues, the abandoned caller exits early); the goroutine's applyRequests call uses context.Background by design and is documented as such. Test stub updates (3 ripples through the Transactional interface, 6 ripples through Coordinator, 2 through LeaderProbe): kv/coordinator_txn_test.go::stubTransactional kv/coordinator_retry_test.go::scriptedTransactional kv/sharded_coordinator_txn_test.go::recordingTransactional kv/sharded_coordinator_abort_test.go::failingTransactional kv/sharded_lease_test.go::fixedTransactional kv/shard_router_test.go::fakeTM kv/shard_router_partition_test.go::fakeTxn kv/leader_routed_store_test.go::stubLeaderCoordinator adapter/redis_retry_test.go::retryOnceCoordinator adapter/redis_info_test.go::infoTestCoordinator adapter/redis_hello_test.go::helloTestCoordinator adapter/redis_keys_pattern_test.go::stubAdapterCoordinator adapter/distribution_server_test.go::distributionCoordinatorStub adapter/dynamodb_test.go::testCoordinatorWrapper adapter/s3_test.go::routeAwareS3Coordinator + followerS3Coordinator internal/admin/router_test.go::LeaderProbeFunc literal forms Self-review (5 lenses): 1. Data loss — none. ctx is value-passed; no persisted state changes. verifyLeaderEngineCtx already existed and is what we now route to; only the caller of the no-arg variant changes (background loops only). 2. Concurrency — ctx flows through the existing locking unchanged. commitRaw goroutine intentionally uses Background to avoid mixing per-caller deadlines into a shared batched propose; documented. 3. Performance — same call count, no extra allocations beyond the ctx value passed by reference. context.WithTimeout wrapping in the no-ctx fallback is unchanged. 4. Data consistency — verify is a freshness check, not a write path. Shorter caller deadlines surface ErrLeaderNotFound earlier instead of waiting out the 5s safety bound. 5. Test coverage — every Transactional / Coordinator / LeaderProbe stub gains the ctx parameter (currently `_ context.Context`, ready for future tests asserting cancel propagation). Test: go test -race -count=1 -short ./kv ./adapter ./internal/admin -- 9.6m total, all green (adapter is the long pole on Jepsen-shaped tests; kv 6.6s, admin 1.4s). Design doc: docs/design/2026_05_10_proposed_kv_ctx_plumbing.md.

coderabbitai · 2026-05-09T20:36:35Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 30 minutes and 34 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3a40f421-3ce9-4c3f-98ba-b6f6d3f24f01

📥 Commits

Reviewing files that changed from the base of the PR and between d11258a and 13b8707.

📒 Files selected for processing (9)

adapter/dynamodb_test.go
adapter/redis.go
adapter/redis_compat_commands.go
adapter/redis_hello_test.go
adapter/redis_info_test.go
docs/design/2026_05_10_proposed_kv_ctx_plumbing.md
kv/raft_engine.go
kv/sharded_coordinator.go
kv/transaction.go

📝 Walkthrough

Walkthrough

This PR threads context.Context through KV transaction commit/abort operations, leader verification checks, and admin leader-probe interfaces, ensuring request deadlines are respected instead of blocking on internal timeouts. Adapters, test stubs, and resolution paths are updated to propagate the context consistently.

Changes

Context Plumbing for KV Write and Leader Verification

Layer / File(s)	Summary
Interface Contracts `kv/transaction.go`, `kv/coordinator.go`, `internal/admin/router.go`	`Transactional.Commit/Abort`, `Coordinator.VerifyLeader/VerifyLeaderForKey`, and `LeaderProbe.IsVerifiedLeader` now accept `context.Context` as the first parameter.
Transaction Manager `kv/transaction.go`	`TransactionManager.Commit` and `Abort` propagate context through `applyRequests` to `proposer.Propose`; callers can abandon waiting for batched results via context cancellation while the batched propose continues in background.
Coordinator Dispatch `kv/coordinator.go`, `kv/sharded_coordinator.go`	`dispatchOnce` and dispatch paths route context through `dispatchTxn`/`dispatchRaw` to `transactionManager.Commit`; `VerifyLeader/VerifyLeaderForKey` accept context; multi-shard PREPARE/COMMIT, primary/secondary commit, and abort paths all propagate context.
Leader Proxy & Routing `kv/leader_proxy.go`, `kv/shard_router.go`, `kv/leader_routed_store.go`	`LeaderProxy.Commit/Abort` and `ShardRouter.Commit/Abort` accept context; leader-proxy retry budget derives parent context from caller context instead of `context.Background()`; leader-routed store passes context to `VerifyLeaderForKey`.
Admin Leader Probe `internal/admin/router.go`, `main_admin.go`	`LeaderProbe.IsVerifiedLeader` becomes context-aware; `/admin/healthz/leader` handler passes `r.Context()` to the probe; `newAdminLeaderProbe` wires coordinator verification with request context.
Adapter Leader Verification `adapter/redis.go`, `adapter/redis_compat_commands.go`, `adapter/redis_compat_helpers.go`, `adapter/s3.go`, `adapter/s3_admin.go`, `adapter/sqs.go`, `adapter/sqs_admin.go`, `adapter/dynamodb.go`, `adapter/dynamodb_admin.go`, `adapter/distribution_server.go`	Redis handler context, S3/SQS request context, DynamoDB/Distribution request context are passed into leader verification calls (`VerifyLeader(ctx)` or helper functions like `isVerifiedS3Leader(ctx)`).
Shard Transaction Resolution `kv/shard_store.go`, `kv/lock_resolver.go`	Transaction and lock resolution helpers (`applyTxnResolution`, `resolveScanLocks`, `resolveExpiredLock`) updated to accept and propagate context through transaction commits.
Test Infrastructure `adapter/_test.go`, `kv/_test.go`, `internal/admin/router_test.go`	All test stubs (`distributionCoordinatorStub`, `testCoordinatorWrapper`, `stubLeaderCoordinator`, `fakeTxn`, `fakeTM`, `recordingTransactional`, `fixedTransactional`, and S3 test doubles) updated to accept context; test callsites pass `context.Background()` where context was previously unused.
Admin Router Tests `internal/admin/router_test.go`	All leader-probe callbacks updated to accept `context.Context` parameter; test wiring for `/admin/healthz/leader` endpoint and HTTP method handling updated accordingly.
Design Document `docs/design/2026_05_10_proposed_kv_ctx_plumbing.md`	Describes problem (blocking on internal timeouts instead of respecting request deadlines), goals (thread context through KV and leader paths while preserving 5s no-ctx bound), implementation surface, and behavioral effects.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

bootjp/elastickv#618: Modifies coordinator dispatch and leader-proxy paths, overlapping with context propagation changes in this PR.
bootjp/elastickv#745: Bounds the no-context verifyLeaderEngine path with a 5s timeout, complementing this PR's context-aware leader verification.
bootjp/elastickv#539: Modifies S3 leader health check and leader verification logic that this PR updates to be context-aware.

Poem

🐰 Through the KV paths we hop and bound,
With context trails that flow around,
No more timeouts blocking the way,
Request deadlines lead the day!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 23.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: plumbing caller context through write and verify-leader paths in the KV module, which is the core objective of this refactoring PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/verify-leader-context-plumbing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request plumbs context.Context through the KV write and leadership verification paths, allowing request-specific deadlines to govern operations and preventing resource exhaustion during stalls. The changes update the Transactional, Coordinator, and LeaderProbe interfaces to accept context parameters across the system's adapters and internal managers. Feedback focuses on ensuring that transaction cleanup (Abort) operations use detached contexts to prevent orphaned locks when a caller's context has already expired. Additionally, it is recommended to enforce a default safety timeout within the VerifyLeader methods when the provided context lacks a deadline and to explicitly document the blocking nature of these interface methods.

gemini-code-assist · 2026-05-09T20:38:49Z

 		// abort cleanup on failure.
 		if needsTxnCleanup(reqs) {
-			_, _err := t.Abort(reqs)
+			_, _err := t.Abort(ctx, reqs)


The Abort call here uses the caller's context, which may already be cancelled or expired (causing the previous applyRequests to fail). If the context is cancelled, the Abort proposal will fail immediately, leaving transaction locks held until the lock resolver eventually cleans them up. Consider using a detached context with a short timeout for cleanup operations to ensure they have a chance to complete, aligning with the practice of using context.Context for managing deadlines.

References

When designing interfaces, use context.Context for managing deadlines and cancellation.

gemini-code-assist · 2026-05-09T20:38:49Z

+	primaryGid, maxIndex, err := c.commitPrimaryTxn(ctx, startTS, primaryKey, grouped, commitTS)
 	if err != nil {
-		c.abortPreparedTxn(startTS, primaryKey, prepared, abortTSFrom(startTS, commitTS))
+		c.abortPreparedTxn(ctx, startTS, primaryKey, prepared, abortTSFrom(startTS, commitTS))


Similar to the sequential commit path, abortPreparedTxn here uses the potentially cancelled caller context. This may prevent the abort from being successfully proposed to the shards, leading to prolonged lock contention. Using a detached context for cleanup is recommended to ensure the operation has a managed deadline via the context.

References

When designing interfaces, use context.Context for managing deadlines and cancellation.

gemini-code-assist · 2026-05-09T20:38:49Z

-func (c *Coordinate) VerifyLeader() error {
-	return verifyLeaderEngine(c.engine)
+func (c *Coordinate) VerifyLeader(ctx context.Context) error {
+	return verifyLeaderEngineCtx(ctx, c.engine)


To maintain the 'defense-in-depth' safety bound mentioned in the PR description, VerifyLeader should ideally enforce a default timeout (e.g., 5s) if the provided context does not already have a deadline. This prevents background callers that might pass context.Background() from blocking indefinitely if a ReadIndex round-trip stalls. Additionally, ensure this blocking semantic is explicitly stated in the documentation to avoid ambiguity.

References

When designing interfaces, use context.Context for managing deadlines and cancellation.

When an interface method is intended to block until a certain condition is met, explicitly state this blocking semantic in the documentation.

gemini-code-assist · 2026-05-09T20:38:50Z

 		return errors.WithStack(ErrLeaderNotFound)
 	}
-	return verifyLeaderEngine(engineForGroup(g))
+	return verifyLeaderEngineCtx(ctx, engineForGroup(g))


As with the single-shard coordinator, VerifyLeader should enforce a default timeout when the input context lacks a deadline to ensure the 5s safety bound is preserved for all callers. Furthermore, the blocking nature of this method should be explicitly documented in the interface description.

References

When designing interfaces, use context.Context for managing deadlines and cancellation.

When an interface method is intended to block until a certain condition is met, explicitly state this blocking semantic in the documentation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d11258a668

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-09T20:39:43Z


 	if r.coordinator.IsLeader() {
-		if err := r.coordinator.VerifyLeader(); err != nil {
+		if err := r.coordinator.VerifyLeader(r.handlerContext()); err != nil {


Reinstate bounded context for Redis KEYS leader checks

This call switched from VerifyLeader() (which used the 5s verifyLeaderEngine guard) to VerifyLeader(r.handlerContext()), but handlerContext() is rooted in context.WithCancel(context.Background()) and typically has no deadline. When ReadIndex/VerifyLeader stalls, KEYS can now block indefinitely instead of failing fast, tying up the command handler until client disconnect or server shutdown and reintroducing the unbounded wait behavior the 5s guard was meant to prevent.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-09T20:39:43Z


 	if err := r.retryRedisWrite(ctx, func() error {
-		if err := r.coordinator.VerifyLeader(); err != nil {
+		if err := r.coordinator.VerifyLeader(r.handlerContext()); err != nil {


Pass FLUSHDB timeout context into VerifyLeader

flushDatabase creates a redisDispatchTimeout-bounded context and uses it for retries, but leader verification now uses r.handlerContext() instead of that ctx. If VerifyLeader blocks, the command can run past the intended dispatch timeout because cancellation from ctx no longer reaches the verify call. Using the same ctx here keeps FLUSHDB bounded as designed.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

kv/transaction.go (1)

237-246: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use a non-canceled context for transactional cleanup on commit failure.

At Line 243, Abort reuses the caller ctx. If the original failure is context deadline exceeded/canceled, cleanup can be skipped, leaving intents behind and prolonging lock contention.

💡 Proposed fix

 	if err != nil {
 		// Only attempt transactional cleanup for transactional batches. Raw request
 		// batches may partially succeed across shards by design. One-phase
 		// transactional requests do not leave intents behind, so they do not need
 		// abort cleanup on failure.
 		if needsTxnCleanup(reqs) {
-			_, _err := t.Abort(ctx, reqs)
+			cleanupCtx := ctx
+			var cancel context.CancelFunc
+			if cleanupCtx == nil || cleanupCtx.Err() != nil {
+				cleanupCtx, cancel = context.WithTimeout(context.Background(), 5*time.Second)
+				defer cancel()
+			}
+			_, _err := t.Abort(cleanupCtx, reqs)
 			if _err != nil {
 				return nil, errors.WithStack(errors.CombineErrors(err, _err))
 			}
 		}
 		return nil, errors.WithStack(err)
 	}

As per coding guidelines, kv/transaction.go: Ensure cross-shard transaction atomicity is maintained and OCC commit-ts ordering is correct.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kv/transaction.go` around lines 237 - 246, The Abort call in the
transactional cleanup currently reuses the caller ctx which may be canceled or
timed out, causing cleanup to be skipped; change the cleanup path inside the
needsTxnCleanup handling to detect if ctx.Err() != nil and, if so, run t.Abort
using a fresh non-canceled context (e.g., context.Background() or a
context.WithTimeout(context.Background(), cleanupTimeout) and cancel it after
use) when invoking t.Abort(ctxForCleanup, reqs) so cleanup runs even if the
original ctx was canceled.

🧹 Nitpick comments (1)

kv/transaction_batch_test.go (1)

84-154: ⚡ Quick win

Add a regression test for canceled-context commit behavior.

The API is now context-aware, but this suite doesn’t yet pin cancellation/deadline behavior (especially the batched raw-commit wait semantics). Adding one focused case would prevent silent contract drift.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kv/transaction_batch_test.go` around lines 84 - 154, Add a new test (e.g.,
TestTransactionManagerBatchesConcurrentRawCommits_CancelContext) that verifies
Commit respects context cancellation while waiting in the raw batch window: set
rawBatchWindow to a non-trivial duration, create two concurrent Commit calls via
NewTransactionWithProposer(r) where one Commit is passed a context that you
cancel before the batch window elapses and the other uses a background context;
assert the canceled Commit returns a context.Canceled (or
context.DeadlineExceeded) error and does not write its key (use st.GetAt to
verify), while the non-canceled Commit succeeds and its key is present; use the
same setup symbols from the existing test (rawBatchWindow, newSingleRaft,
NewKvFSMWithHLC, NewTransactionWithProposer, tm.Commit, store.GetAt) and clean
up/reset rawBatchWindow in t.Cleanup.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@adapter/dynamodb_test.go`:
- Around line 1834-1836: The testCoordinatorWrapper.VerifyLeader method
incorrectly discards the caller ctx and uses context.Background(); update
testCoordinatorWrapper.VerifyLeader to forward the passed ctx to
w.inner.VerifyLeader(ctx) (matching how VerifyLeaderForKey forwards context) so
caller deadlines and cancellation propagate through the wrapper to the inner
coordinator.

In `@docs/design/2026_05_10_proposed_kv_ctx_plumbing.md`:
- Around line 90-98: The doc uses outdated method names VerifyLeaderCtx and
VerifyLeaderForKeyCtx; update all references and function signatures to the
merged API names VerifyLeader(ctx context.Context) error and
VerifyLeaderForKey(ctx context.Context, key []byte) error (and note that any
no-arg wrappers remain the no-ctx entry points that call these with
context.Background()); update the occurrences around the current function
listings and the later mention (previously lines referencing VerifyLeaderCtx /
VerifyLeaderForKeyCtx) to match the new symbols so tests and follow-up work are
not misdirected.

---

Outside diff comments:
In `@kv/transaction.go`:
- Around line 237-246: The Abort call in the transactional cleanup currently
reuses the caller ctx which may be canceled or timed out, causing cleanup to be
skipped; change the cleanup path inside the needsTxnCleanup handling to detect
if ctx.Err() != nil and, if so, run t.Abort using a fresh non-canceled context
(e.g., context.Background() or a context.WithTimeout(context.Background(),
cleanupTimeout) and cancel it after use) when invoking t.Abort(ctxForCleanup,
reqs) so cleanup runs even if the original ctx was canceled.

---

Nitpick comments:
In `@kv/transaction_batch_test.go`:
- Around line 84-154: Add a new test (e.g.,
TestTransactionManagerBatchesConcurrentRawCommits_CancelContext) that verifies
Commit respects context cancellation while waiting in the raw batch window: set
rawBatchWindow to a non-trivial duration, create two concurrent Commit calls via
NewTransactionWithProposer(r) where one Commit is passed a context that you
cancel before the batch window elapses and the other uses a background context;
assert the canceled Commit returns a context.Canceled (or
context.DeadlineExceeded) error and does not write its key (use st.GetAt to
verify), while the non-canceled Commit succeeds and its key is present; use the
same setup symbols from the existing test (rawBatchWindow, newSingleRaft,
NewKvFSMWithHLC, NewTransactionWithProposer, tm.Commit, store.GetAt) and clean
up/reset rawBatchWindow in t.Cleanup.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d54ecd57-dc97-4a30-a539-2ad77b01ed1b

📥 Commits

Reviewing files that changed from the base of the PR and between 07ff07d and d11258a.

📒 Files selected for processing (45)

adapter/distribution_server.go
adapter/distribution_server_test.go
adapter/dynamodb.go
adapter/dynamodb_admin.go
adapter/dynamodb_test.go
adapter/internal.go
adapter/redis.go
adapter/redis_compat_commands.go
adapter/redis_compat_helpers.go
adapter/redis_hello_test.go
adapter/redis_info_test.go
adapter/redis_keys_pattern_test.go
adapter/redis_retry_test.go
adapter/s3.go
adapter/s3_admin.go
adapter/s3_test.go
adapter/sqs.go
adapter/sqs_admin.go
docs/design/2026_05_10_proposed_kv_ctx_plumbing.md
internal/admin/router.go
internal/admin/router_test.go
kv/coordinator.go
kv/coordinator_leader_test.go
kv/coordinator_retry_test.go
kv/coordinator_txn_test.go
kv/leader_proxy.go
kv/leader_proxy_test.go
kv/leader_routed_store.go
kv/leader_routed_store_test.go
kv/lock_resolver.go
kv/lock_resolver_test.go
kv/shard_router.go
kv/shard_router_partition_test.go
kv/shard_router_test.go
kv/shard_store.go
kv/shard_store_txn_lock_test.go
kv/sharded_coordinator.go
kv/sharded_coordinator_abort_test.go
kv/sharded_coordinator_leader_test.go
kv/sharded_coordinator_sampler_test.go
kv/sharded_coordinator_txn_test.go
kv/sharded_lease_test.go
kv/transaction.go
kv/transaction_batch_test.go
main_admin.go

… bounded redis (PR #749 r1) Round-1 review on commit d11258a: P1 (Codex): adapter/redis.go::keys() switched from VerifyLeader() (5 s engineEngine guard) to VerifyLeader(r.handlerContext()), but handlerContext is the server's long-lived baseCtx with no deadline. A stalled ReadIndex on KEYS could now hang the command handler indefinitely. Fixed at TWO layers: 1. verifyLeaderEngineCtx (kv/raft_engine.go) now applies a default verifyLeaderTimeout when the inbound ctx has no deadline. This covers handlerContext, future Background-passing callers, and anything the audit missed. Callers with a tighter deadline keep theirs because context.WithTimeout returns the earlier of the two expirations. 2. keys() now wraps r.handlerContext() with redisDispatchTimeout (matching every other Redis dispatch path) so its overall command budget is the per-call timeout, not 5 s. P2 (Codex): adapter/redis_compat_commands.go::flushDatabase had a redisDispatchTimeout-bounded ctx for retries but called VerifyLeader(r.handlerContext()) — leader verification escaped the flush command's deadline. Now uses the per-call ctx so the entire FLUSHDB path is bounded as designed. Major (CodeRabbit): adapter/dynamodb_test.go::testCoordinatorWrapper swallowed the caller ctx and substituted context.Background(), silently undoing this PR's deadline propagation for any test routing through the wrapper. Now forwards ctx unchanged. Medium (gemini): commitSequential's Abort cleanup and dispatchTxn's abortPreparedTxn cleanup both used the (likely already-cancelled) caller ctx, risking a no-op cleanup that leaves intents until LockResolver picks them up at a future tick. Both paths now use context.WithTimeout(context.WithoutCancel(ctx), verifyLeaderTimeout) so cleanup MUST go through, bounded at 5 s. Same pattern monitoring/* uses for shutdown ctx. Medium (gemini): VerifyLeader should enforce a default timeout when the input ctx has no deadline — covered by the verifyLeaderEngineCtx guard described above. The Coordinator.VerifyLeader / VerifyLeaderForKey methods themselves stay clean (just forward ctx); the deadline discipline is centralized in the helper. Minor (CodeRabbit): docs/design/2026_05_10_proposed_kv_ctx_plumbing.md described pre-merge VerifyLeaderCtx / VerifyLeaderForKeyCtx names. Updated to match the merged surface (VerifyLeader(ctx) / VerifyLeaderForKey(ctx, key)) and the new internal-guard mechanism. reviewdog gci: adapter/redis_hello_test.go and redis_info_test.go import groups merged into the project's single non-stdlib group via golangci-lint fmt. Caller audit (semantic change in verifyLeaderEngineCtx — adding default-deadline wrap): - All in-tree callers: kv/coordinator.go::VerifyLeader, kv/sharded_coordinator.go::VerifyLeader/ForKey, kv/leader_proxy.go::Commit/Abort. - Behavior for callers that already pass a bounded ctx: unchanged. - Behavior for callers passing Background / long-lived ctx (handlerContext, future audits): now bounded at 5 s — strict improvement, matches the pre-PR-749 default behaviour callers expected from the no-arg verifyLeaderEngine wrapper. - No caller depends on "wait forever" semantics; the Redis KEYS call site that triggered Codex P1 is the canonical example. Test: go test -race -count=1 -short ./kv ./internal/admin -- 7.7s, all green.

bootjp · 2026-05-09T20:58:29Z

@claude review

claude · 2026-05-09T20:58:44Z

Claude finished @bootjp's task in 4m 1s —— View job

PR Review: refactor(kv): plumb caller context through write + verify-leader paths

Tasks:

Read PR context and understand the changes
Analyze key changed files
Check reviewer comments for validity
Review context propagation correctness
Check for missed edge cases
Post final review

Summary

The core goal of this PR — threading the caller's context.Context end-to-end through the kv write and leader-verify paths so request deadlines are honoured — is sound, well-scoped, and the batched-commit design (Background ctx in the batch goroutine, per-caller wait via select) is a correct solution to the inherent tension between batching and per-caller deadline propagation.

Two issues need addressing before merge: one CI blocker and one correctness gap.

Issues

1. CI Blocker — `gci` lint failures in two test files

The reviewdog bot flagged both adapter/redis_hello_test.go and adapter/redis_info_test.go for gci import formatting. These will block CI until fixed. Run:

make lint  # or: gci write adapter/redis_hello_test.go adapter/redis_info_test.go

The diff shows the files changed from aligned single-line methods to padded alignment for the wider VerifyLeader(context.Context) error signatures, which likely touched the import block during auto-formatting. Easiest fix is to re-run gci write on those two files.

2. Correctness gap — `prewriteTxn` inline abort paths still use caller `ctx`

The PR correctly applies context.WithoutCancel at both cleanup sites in dispatchTxn (for commitPrimaryTxn failure) and commitSequential (for t.Abort). However, the two inline abort calls inside prewriteTxn (kv/sharded_coordinator.go:494–495 and kv/sharded_coordinator.go:504–505) still pass the caller's ctx to abortPreparedTxn:

// sharded_coordinator.go:494
if _, err := g.Txn.Commit(ctx, []*pb.Request{req}); err != nil {
    c.abortPreparedTxn(ctx, startTS, primaryKey, prepared, abortTSFrom(startTS, commitTS))
    return nil, errors.WithStack(err)
}

// sharded_coordinator.go:504
if err := c.validateReadOnlyShards(ctx, groupedReadKeys, gids, startTS); err != nil {
    c.abortPreparedTxn(ctx, startTS, primaryKey, prepared, abortTSFrom(startTS, commitTS))
    return nil, err
}

When ctx expires (e.g., Redis client disconnects mid-prewrite), both the shard commit and the subsequent abortPreparedTxn will fail immediately with context.Canceled, leaving all already-prepared intents behind. These then linger until LockResolver picks them up. The defense-in-depth in verifyLeaderEngineCtx does not help here: it only applies the 5 s fallback when there is no deadline, but a cancelled context's cancellation propagates immediately regardless.

The same pattern as the dispatchTxn fix should be applied: Fix this →

// Suggested fix — same shape as dispatchTxn's cleanup
if _, err := g.Txn.Commit(ctx, []*pb.Request{req}); err != nil {
    cleanupCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), verifyLeaderTimeout)
    c.abortPreparedTxn(cleanupCtx, startTS, primaryKey, prepared, abortTSFrom(startTS, commitTS))
    cancel()
    return nil, errors.WithStack(err)
}

// and:
if err := c.validateReadOnlyShards(ctx, groupedReadKeys, gids, startTS); err != nil {
    cleanupCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), verifyLeaderTimeout)
    c.abortPreparedTxn(cleanupCtx, startTS, primaryKey, prepared, abortTSFrom(startTS, commitTS))
    cancel()
    return nil, err
}

Observations (non-blocking)

rangeList and readValueAt use handlerContext() without a deadline bound (adapter/redis.go:3264, adapter/redis.go:3520). These are protected by the verifyLeaderEngineCtx defense-in-depth (the 5 s fallback applies when no deadline is set), but they're inconsistent with the keys fix (which explicitly bounds with redisDispatchTimeout). Low severity, could be cleaned up in the follow-up Redis adapter audit pass.

Reviewer comments addressed in the latest commit:

CodeRabbit's testCoordinatorWrapper.VerifyLeader drop-ctx bug (adapter/dynamodb_test.go:1834) — ✅ correctly forward ctx in the current diff
Design doc method names (VerifyLeaderCtx vs VerifyLeader) — ✅ doc already uses VerifyLeader(ctx context.Context) in the current state
Gemini's "enforce default timeout inside VerifyLeader" — ✅ addressed by the verifyLeaderEngineCtx defense-in-depth (applies 5 s when ctx has no deadline)
Codex P1 (KEYS unbounded block) — ✅ fixed with the explicit redisDispatchTimeout-bounded ctx in keys()
Codex P2 (FLUSHDB using handlerContext() instead of dispatch ctx) — ✅ fixed to use the retryRedisWrite ctx

The commitRaw batched-commit design is correct. Using context.Background() in the batch goroutine while exiting early via select { case <-ctx.Done() } at the wait site is the right model. Other waiters in the same batch are unaffected by one caller's cancellation.

forwardWithRetry parentCtx change (from context.Background() to callerCtx) is correct: context.WithDeadline(callerCtx, deadline) correctly picks the earlier of the two expirations, so a Redis client deadline shorter than leaderProxyRetryBudget now exits early.

Bottom line

Two required fixes:

Run gci write on the two flagged test files to clear the CI lint blocker
Apply context.WithoutCancel to the two abortPreparedTxn calls inside prewriteTxn so cancelled-ctx cases don't leave stale intents

Everything else is clean. The architecture of the change is correct and the PR description accurately describes what shipped.

Round-2 review on commit fe8162e from claude bot: > Correctness gap — prewriteTxn inline abort paths still use > caller ctx > > The PR correctly applies context.WithoutCancel at both cleanup > sites in dispatchTxn (for commitPrimaryTxn failure) and > commitSequential (for t.Abort). However, the two inline abort > calls inside prewriteTxn still pass the caller's ctx to > abortPreparedTxn: > > * sharded_coordinator.go:494 (prepare-loop Commit failure) > * sharded_coordinator.go:504 (validateReadOnlyShards failure) > > When ctx expires (e.g., Redis client disconnects mid-prewrite), > both the shard commit and the subsequent abortPreparedTxn fail > immediately with context.Canceled, leaving prepared intents > behind. The defense-in-depth in verifyLeaderEngineCtx does not > help: it only applies the 5 s fallback when there is no > deadline, but a cancelled context's cancellation propagates > immediately regardless. Fixed both prewriteTxn cleanup sites the same way as round-1 patched dispatchTxn / commitSequential: cleanupCtx, cancel := context.WithTimeout( context.WithoutCancel(ctx), verifyLeaderTimeout) c.abortPreparedTxn(cleanupCtx, ...) cancel() Now any prewrite intent we already persisted on prior shards is guaranteed an Abort attempt within 5 s, regardless of whether the upstream cancellation cascade comes from the client (ctx expired) or from a slow shard (Commit returned ctx.Err). The other Claude bot finding (CI gci) was already addressed in round-2 (fe8162e) — golangci-lint passes on the current commit. The Jepsen test failure on the round-2 CI run is a Maven dependency-resolution flake (com.cognitect.aws:dynamodb / com.cognitect.aws:sqs / org.slf4j:slf4j-* checksum failures during dependency download), unrelated to the diff. Non-blocking observation acknowledged: rangeList / readValueAt in adapter/redis.go still pass r.handlerContext() to VerifyLeaderForKey rather than a per-call ctx. Those calls are correctly bounded at 5 s by verifyLeaderEngineCtx's default- deadline guard added in r1, so they are not a regression vs the pre-PR-749 behaviour. Tightening them to the per-call dispatch budget (consistent with the keys() fix) belongs to the follow-up Redis-adapter audit pass already noted in the design doc — not in scope for this PR. Caller audit (semantic change in prewriteTxn cleanup): - Only call sites of abortPreparedTxn that previously used a potentially-cancelled ctx now derive a detached cleanup ctx. - abortPreparedTxn itself is unchanged; it just takes the ctx it is given and forwards to g.Txn.Commit. The caller-side detachment is the fix. - No other prewriteTxn paths reach abortPreparedTxn. Test: go test -race -count=1 -short ./kv -- 6.2s, all green.

bootjp · 2026-05-09T21:05:49Z

@claude review

claude · 2026-05-09T21:06:09Z

Claude finished @bootjp's task in 4m 40s —— View job

PR Review: refactor(kv): plumb caller context through write + verify-leader paths

Tasks:

Read PR context and understand the changes
Review current state of key files (post r1 + r2 commits)
Check if previous review issues were addressed
Analyze remaining concerns and new code
Post final review

Summary

This is a re-review after two follow-up commits (fe8162e r1 and 13b8707 r2). All issues from the prior round have been addressed. The PR is ready to merge.

Issues addressed since round 1

✅ gci lint failures (`redis_hello_test.go`, `redis_info_test.go`) — fixed in r1

Both test files were reformatted. CI now passes golangci-lint.

✅ `prewriteTxn` abort cleanup with cancelled ctx — fixed in r2 (`13b8707`)

Both abortPreparedTxn call sites inside prewriteTxn (sharded_coordinator.go:501-503 and :517-519) now correctly use context.WithTimeout(context.WithoutCancel(ctx), verifyLeaderTimeout), matching the pattern already applied in dispatchTxn and commitSequential. This closes the stale-intent risk when a client disconnects mid-prewrite.

✅ KEYS unbounded ReadIndex block (Codex P1) — fixed in r1

keys() now wraps r.handlerContext() with redisDispatchTimeout before passing to VerifyLeader.

✅ FLUSHDB using `handlerContext()` for VerifyLeader (Codex P2) — fixed in r1

flushDatabase now passes the per-call ctx (already bounded by redisDispatchTimeout) to VerifyLeader.

✅ `verifyLeaderEngineCtx` defense-in-depth (Gemini) — fixed in r1

verifyLeaderEngineCtx (kv/raft_engine.go:67-71) now applies verifyLeaderTimeout as a fallback when the incoming ctx has no deadline, covering handlerContext, background loops, and any future caller that passes context.Background() directly.

✅ `commitSequential` Abort with cancelled ctx (CodeRabbit/Gemini) — fixed in r1

transaction.go:251-253 correctly uses context.WithoutCancel(ctx) + verifyLeaderTimeout for the transactional Abort cleanup path.

✅ `testCoordinatorWrapper.VerifyLeader` dropped ctx (CodeRabbit) — fixed in initial commit

adapter/dynamodb_test.go:1834-1835 now forwards the caller ctx to w.inner.VerifyLeader(ctx).

✅ Design doc stale method names (CodeRabbit) — fixed in r1

docs/design/2026_05_10_proposed_kv_ctx_plumbing.md uses the correct VerifyLeader(ctx context.Context) / VerifyLeaderForKey(ctx context.Context, key []byte) names throughout.

Observations (non-blocking)

rangeList and readValueAt use r.handlerContext() without per-call deadline (adapter/redis.go:3264 and :3520). These remain correctly bounded at 5 s by the verifyLeaderEngineCtx defense-in-depth added in r1 — no regression versus pre-PR-745 behavior. Tightening them to redisDispatchTimeout (consistent with keys()) belongs to the follow-up Redis adapter audit pass already scoped in the design doc.

commitSecondaryWithRetry uses the caller's ctx — this is correct by design. Secondary commits are best-effort; if ctx expires, lock resolution handles the remaining secondaries. No WithoutCancel pattern needed here.

applyRawBatch uses context.Background() (transaction.go:411) — intentional and correctly documented. The batch goroutine cannot inherit any single caller's ctx; per-caller cancellation is honoured at the wait site in commitRaw via select { case <-item.done / case <-ctx.Done() }.

Bottom line

All three mandatory fixes (gci lint blocker, prewriteTxn abort cleanup, KEYS/FLUSHDB deadline binding) are in. The architecture of the change is correct, the WithoutCancel pattern is applied consistently to every abort cleanup site, and the verifyLeaderEngineCtx defense-in-depth correctly covers the handlerContext gap. Ready to merge.

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 9, 2026

View reviewed changes

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

Comment thread adapter/dynamodb_test.go Outdated

Comment thread docs/design/2026_05_10_proposed_kv_ctx_plumbing.md Outdated

bootjp merged commit 79e3545 into main May 10, 2026
9 of 10 checks passed

bootjp deleted the fix/verify-leader-context-plumbing branch May 10, 2026 08:39

bootjp mentioned this pull request May 10, 2026

refactor(redis): bound rangeList / readValueAt VerifyLeaderForKey to per-call ctx #752

Open

3 tasks

Conversation

bootjp commented May 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Behaviour

Batched-commit caveat

Self-review (5 lenses)

Test plan

Follow-up scope (not in this PR)

Design doc

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bootjp commented May 9, 2026

Uh oh!

claude Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: refactor(kv): plumb caller context through write + verify-leader paths

Summary

Issues

1. CI Blocker — gci lint failures in two test files

2. Correctness gap — prewriteTxn inline abort paths still use caller ctx

Observations (non-blocking)

Bottom line

Uh oh!

bootjp commented May 9, 2026

Uh oh!

claude Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: refactor(kv): plumb caller context through write + verify-leader paths

Summary

Issues addressed since round 1

✅ gci lint failures (redis_hello_test.go, redis_info_test.go) — fixed in r1

✅ prewriteTxn abort cleanup with cancelled ctx — fixed in r2 (13b8707)

✅ KEYS unbounded ReadIndex block (Codex P1) — fixed in r1

✅ FLUSHDB using handlerContext() for VerifyLeader (Codex P2) — fixed in r1

✅ verifyLeaderEngineCtx defense-in-depth (Gemini) — fixed in r1

✅ commitSequential Abort with cancelled ctx (CodeRabbit/Gemini) — fixed in r1

✅ testCoordinatorWrapper.VerifyLeader dropped ctx (CodeRabbit) — fixed in initial commit

bootjp commented May 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading

claude Bot commented May 9, 2026 •

edited

Loading

1. CI Blocker — `gci` lint failures in two test files

2. Correctness gap — `prewriteTxn` inline abort paths still use caller `ctx`

claude Bot commented May 9, 2026 •

edited

Loading

✅ gci lint failures (`redis_hello_test.go`, `redis_info_test.go`) — fixed in r1

✅ `prewriteTxn` abort cleanup with cancelled ctx — fixed in r2 (`13b8707`)

✅ FLUSHDB using `handlerContext()` for VerifyLeader (Codex P2) — fixed in r1

✅ `verifyLeaderEngineCtx` defense-in-depth (Gemini) — fixed in r1

✅ `commitSequential` Abort with cancelled ctx (CodeRabbit/Gemini) — fixed in r1

✅ `testCoordinatorWrapper.VerifyLeader` dropped ctx (CodeRabbit) — fixed in initial commit