feat(redis): embed TTL inline in string values, remove TTL buffer by bootjp · Pull Request #521 · bootjp/elastickv

bootjp · 2026-04-17T13:32:07Z

Summary

Implements docs/design/ttl-inline-value.md (Phases 0–3): TTL is now embedded directly inside Redis string values rather than stored as a separate !redis|ttl|<key> MVCC key.
Removes the TTL buffer (redis_ttl_buffer.go) and its flusher goroutine entirely — they were a workaround for write conflicts that this design makes unnecessary.
All existing tests pass; Jepsen Redis, DynamoDB, and S3 workloads report {:valid? true}.

Problem

The previous design stored TTL in a separate key, causing three structural issues:

Read amplification — every read issued two store lookups per key (data + TTL).
Broken MVCC snapshot semantics — data and TTL could be committed in different Raft entries, so a point-in-time read might see a data version from one commit and a TTL from another.
TTL write conflicts — concurrent EXPIRE commands on the same key conflicted in the MVCC layer, requiring the in-memory buffer workaround.

Solution

String values now use a new binary encoding:

[0xFF 0x01][flags(1)][expireAtMs(8, optional)][user bytes]

The 0xFF 0x01 magic prefix distinguishes new-format values from legacy raw bytes, allowing zero-cost backward compatibility (legacy values decode as-is with no TTL).

Key changes:

File	Change
`redis_compat_types.go`	`encodeRedisStr` / `decodeRedisStr` / `isNewRedisStrFormat`; `ttlAt` reads embedded TTL for new-format strings
`redis_compat_helpers.go`	`readRedisStringAt` decodes header; `saveString` encodes with TTL
`redis.go`	`replaceWithStringTxn`, `buildKeyElems`, `txnContext.commit`, `applyExpire` updated; TTL buffer fields/goroutine removed
`redis_compat_commands.go`	`setExpire` performs atomic read-modify-write on string anchor; `INCR` encodes with `encodeRedisStr`
`redis_lua_context.go`	`stringCommitElems` encodes TTL in value; `nonStringTTLElems` handles non-string keys; `flushTTLForKeyToBuffer` removed
`redis_ttl_buffer.go`	Deleted
`redis_ttl_buffer_test.go`	Deleted

The !redis|ttl|<key> key is still written alongside the value as a scan index for TTL expiry sweeps, but it is no longer the authoritative source — the embedded value is.

Test plan

go test ./adapter/... -timeout 120s -count=1 — all pass
Jepsen unit tests (lein test) — 11 tests, 0 failures
Jepsen Redis workload — {:valid? true}
Jepsen DynamoDB workload — {:valid? true}
Jepsen S3 workload — {:valid? true}
golangci-lint run — 0 issues

Summary by CodeRabbit

Release Notes

New Features
- TTL operations are now atomic with data writes for improved consistency
- INCR command now preserves existing TTL on values
Bug Fixes
- Improved TTL handling efficiency by eliminating separate buffer mechanism
Refactor
- Simplified TTL persistence model through transactional data updates
Documentation
- Added design documentation for new TTL handling architecture

Architecture proposal to move TTL from a separate !redis|ttl|<key> MVCC key to an inline field in each type's anchor value. Problems with the current design: - Read amplification: every key-existence check issues two store reads - Broken MVCC snapshot semantics: data and TTL committed in separate Raft entries can yield mismatched versions at the same readTS - TTL write conflicts (the TTL buffer is a workaround, not a root fix) Proposed approach: - Embed expireAt in the anchor value for all types (string header flags, ListMeta/HashMeta/SetMeta/ZSetMeta struct field) - !redis|ttl|<key> retained only as a background expiry scan index - Four-phase migration plan for rolling upgrades without downtime - TTL buffer and background flush goroutine removed in Phase 3

Implements the design from docs/design/ttl-inline-value.md (Phases 0–3). Previously, TTL was stored as a separate !redis|ttl|<key> MVCC key, which caused three structural problems: 1. Read amplification — every read issued two store lookups per key. 2. Broken MVCC snapshot semantics — data and TTL could come from different commits, making point-in-time reads inconsistent. 3. TTL write conflicts — concurrent EXPIRE commands could conflict, motivating the TTL buffer workaround. Changes: - New binary encoding for Redis string values: [0xFF 0x01][flags(1)][expireAtMs(8, optional)][user bytes] The magic prefix allows backward compatibility: legacy raw bytes are decoded as-is (no TTL); new-format values carry the TTL embedded directly in the value. - encodeRedisStr / decodeRedisStr / isNewRedisStrFormat added to redis_compat_types.go; ttlAt updated to read the embedded TTL for new-format strings and fall back to !redis|ttl| for legacy data and non-string types. - readRedisStringAt decodes the header transparently; saveString, replaceWithStringTxn, and setExpire write data + TTL index in the same Raft entry (IsTxn=true for string keys, false for others). - txnContext.buildKeyElems encodes TTL into the string value and emits the scan-index key in the same batch; buildTTLElems handles non-string keys. commit includes both sets atomically. - luaScriptContext.stringCommitElems encodes TTL in the value; nonStringTTLElems handles non-string keys. commit includes both in a single Raft transaction. - TTL buffer (redis_ttl_buffer.go, redis_ttl_buffer_test.go, runTTLFlusher goroutine, cancelFlush/flushWG) removed entirely. - Tests updated to decode the new binary header before asserting user values read directly from the store. - applyExpire and setExpire refactored into smaller helpers to satisfy cyclop/gocognit lint limits. All unit tests pass; Jepsen Redis, DynamoDB, and S3 workloads all report {:valid? true}.

coderabbitai · 2026-04-17T13:32:16Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 24 minutes and 11 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 24 minutes and 11 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c369be1-f674-4476-8e82-439e46c3a7ed

📥 Commits

Reviewing files that changed from the base of the PR and between e65648d and c3738fa.

📒 Files selected for processing (7)

adapter/redis.go
adapter/redis_compat_commands.go
adapter/redis_compat_helpers.go
adapter/redis_compat_types.go
adapter/redis_lua_context.go
adapter/redis_ttl_compat_test.go
docs/design/ttl-inline-value.md

📝 Walkthrough

Walkthrough

This PR removes the Redis TTL buffer—an in-memory write buffer with periodic flushing—and refactors TTL handling to embed TTL directly in string values via encoding and in collection metadata via fields, while committing TTL changes in the same Raft transaction as data updates.

Changes

Cohort / File(s)	Summary
TTL Buffer Removal `adapter/redis.go`, `adapter/redis_ttl_buffer.go`, `adapter/redis_ttl_buffer_test.go`	Deleted entire TTL buffer infrastructure including `TTLBuffer` type, flusher goroutine, and `WithTTLFlushInterval` option; removed `ttlBuffer` and `ttlFlushInterval` fields from `RedisServer`.
String TTL Encoding `adapter/redis_compat_types.go`, `adapter/redis_compat_helpers.go`	Added `encodeRedisStr`/`decodeRedisStr` for embedding TTL in string values; modified `readRedisStringAt` to return `(value, *time.Time, error)` and `saveString` to encode TTL inline with the value.
Command Refactoring `adapter/redis_compat_commands.go`	Updated `INCR` to preserve existing TTL instead of clearing it; refactored `EXPIRE` into `doSetExpire`, `expireDeleteKey`, and `dispatchStringExpire` helpers; updated `GETDEL` to unpack new `readRedisStringAt` signature.
Transaction & Lua Handling `adapter/redis_lua_context.go`	Refactored transaction commit to combine data and TTL operations in a single `Dispatch(IsTxn=true)` call; added `nonStringTTLElems` helper and updated `stringCommitElems` to embed TTL in encoded values.
Test Updates `adapter/redis_lua_context_test.go`, `adapter/redis_ttl_compat_test.go`, `adapter/redis_retry_test.go`, `adapter/redis_txn_test.go`	Removed `flushTTLForKeyToBuffer` tests; renamed and inverted INCR TTL test expectation (clear→preserve); updated value decoding in retry and transaction tests to work with encoded string format.
Design Documentation `docs/design/ttl-inline-value.md`	New document specifying TTL embedding in string values and collection metadata, migration strategy, and removal of secondary TTL reads from the hot path.

Sequence Diagram(s)

The changes satisfy the conditions for a sequence diagram: they introduce a significant architectural modification to TTL handling with multiple component interactions (Client, Commands, Transaction Layer, MVCC/KV Store, Raft), multiple sequential operations (encode/commit/flush flow), and altered control flow. However, the core architectural change is more structural than sequential—the primary effect is moving TTL from a separate flushing mechanism to embedded encoding and inline transaction commits, which is better understood through code structure than interaction sequence. Thus, a sequence diagram would not meaningfully clarify the changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat(redis): eliminate TTL write conflicts via in-memory buffer with batch Raft flush #518: Adds and expands TTLBuffer usage with batched TTL writes; this PR removes the same TTLBuffer mechanism entirely, representing an opposite architectural direction for TTL persistence.

Poem

🐰 Hops and encodes with glee,
No more buffers in the tree,
TTL nestled in the string,
One transaction—everything!
Swift as carrots, clean and lean. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(redis): embed TTL inline in string values, remove TTL buffer' directly and clearly summarizes the main change: implementing TTL embedding in string values while removing the separate TTL buffer mechanism. It is concise, specific, and accurately reflects the primary objective of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/sad-hermann-0a3a28

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request embeds TTL metadata directly into Redis string values and collection anchor keys, eliminating the TTLBuffer and background flusher to reduce read amplification and ensure MVCC snapshot consistency. String values now use a versioned header for optional expiration times. Feedback was provided to refactor repeated type-checking logic into a shared helper function for better maintainability.

gemini-code-assist · 2026-04-17T13:35:15Z

-		keyPlans = append(keyPlans, plan)
+		// For non-string keys with dirty TTL: include !redis|ttl| in the same txn.
+		// String keys already have TTL embedded in the value via stringCommitElems.
+		if plan.finalType != redisTypeNone && plan.finalType != redisTypeString {


The condition plan.finalType != redisTypeNone && plan.finalType != redisTypeString is repeated. To improve maintainability and avoid code duplication, consider extracting this logic into a shared helper function like isNonStringCollectionType(typ redisValueType).

References

Centralize common complex logic into shared helper functions to avoid code duplication and improve maintainability.

- Keep TTL inline encoding; drop redis_ttl_buffer.go/test additions from main - Restore compactor field (added independently in main, unrelated to TTL buffer) - Remove redis_lua_context_test.go added in main that tested flushTTLForKeyToBuffer (the function no longer exists; TTL is now embedded in string values)

…uplicated condition

bootjp · 2026-04-17T13:57:22Z

/gemini review

gemini-code-assist

Code Review

This pull request implements a significant architectural change by embedding TTL metadata directly into Redis string values and collection metadata, as detailed in the new design document. By co-locating TTL with data, the change eliminates the need for the TTLBuffer and its background flusher, reduces read amplification by removing extra lookups for the !redis|ttl| key, and restores MVCC snapshot consistency. The !redis|ttl| key is now maintained solely as a secondary scan index for background expiration. Feedback was provided regarding potential race conditions in the dispatchStringExpire read-modify-write operation, which requires careful leadership verification and locking.

gemini-code-assist · 2026-04-17T14:00:06Z

+	elems := []*kv.Elem[kv.OP]{
+		{Op: kv.Put, Key: redisStrKey(key), Value: encoded},
+		{Op: kv.Put, Key: redisTTLKey(key), Value: encodeRedisTTL(expireAt)},
+	}
+	return r.dispatchElems(ctx, true, readTS, elems)


The dispatchStringExpire function performs a read-modify-write operation. To prevent race conditions, ensure that leadership status is verified after acquiring the lock that protects the operation. Additionally, do not release the lock between reading the state variable and performing the conditional check to prevent the use of stale data.

References

To prevent race conditions in read-modify-write operations that require leadership, verify leadership status after acquiring the lock that protects the operation.

When performing atomic operations based on a version or state variable, do not release the lock between reading the variable and performing the conditional check to prevent race conditions and the use of stale data.

…StringExpire

Re-apply deletion of adapter/redis_ttl_buffer.go that main modified. Auto-merged redis.go and redis_txn_test.go (new ttlStates-initialized OCC tests from main are preserved).

bootjp · 2026-04-17T14:13:15Z

/gemini review

gemini-code-assist

Code Review

This pull request implements a significant architectural change by embedding TTL metadata directly into Redis string values, as detailed in the new design documentation. This change eliminates the need for the TTLBuffer and its background flusher, simplifying the codebase and improving MVCC snapshot consistency. Key updates include a new versioned encoding for string values and modifications to transaction logic to handle atomic data and TTL updates. Feedback focuses on ensuring data integrity during the migration phase by using fallback-aware helpers and maintaining consistent concurrency control for non-string expiration.

gemini-code-assist · 2026-04-17T14:19:09Z

+	raw, readErr := r.store.GetAt(ctx, redisStrKey(key), readTS)
+	if readErr != nil && !cockerrors.Is(readErr, store.ErrKeyNotFound) {
+		return cockerrors.WithStack(readErr)
+	}
+	var userValue []byte
+	if readErr == nil {
+		userValue, _, readErr = decodeRedisStr(raw)
+		if readErr != nil {
+			return readErr
+		}
+	}


The current implementation of dispatchStringExpire performs a direct GetAt on the prefixed key (redisStrKey). If the key is a legacy bare string key (Phase 0/1 migration), this read will return ErrKeyNotFound, causing userValue to remain nil. The subsequent encodeRedisStr will then create an encoded value with an empty user payload, which is written to the prefixed key. This effectively shadows the legacy bare key with an empty value, leading to data loss for the reader.

You should use the fallback-aware readRedisStringAt helper to ensure the current value is correctly retrieved regardless of its storage format.

Suggested change

raw, readErr := r.store.GetAt(ctx, redisStrKey(key), readTS)

if readErr != nil && !cockerrors.Is(readErr, store.ErrKeyNotFound) {

return cockerrors.WithStack(readErr)

}

var userValue []byte

if readErr == nil {

userValue, _, readErr = decodeRedisStr(raw)

if readErr != nil {

return readErr

}

}

userValue, readErr := r.readRedisStringAt(key, readTS)

if readErr != nil && !cockerrors.Is(readErr, store.ErrKeyNotFound) {

return cockerrors.WithStack(readErr)

}

encoded := encodeRedisStr(userValue, &expireAt)

gemini-code-assist · 2026-04-17T14:19:09Z

+	elems := []*kv.Elem[kv.OP]{{Op: kv.Put, Key: redisTTLKey(key), Value: encodeRedisTTL(expireAt)}}
+	return 1, r.dispatchElems(ctx, false, 0, elems)


For non-string types, EXPIRE is now dispatched with IsTxn=false and startTS=0. While this matches the 'last-writer-wins' behavior of the removed TTLBuffer, it bypasses the Optimistic Concurrency Control (OCC) provided by the retryRedisWrite loop. If the key is deleted or its type changes between the prepareExpire/rawKeyTypeAt checks and the Dispatch call, a dangling !redis|ttl| key will be created. Although the background scanner can handle this, it would be more consistent with the new design to use IsTxn=true with readTS to ensure the key still exists and is of the correct type at commit time.

Suggested change

elems := []*kv.Elem[kv.OP]{{Op: kv.Put, Key: redisTTLKey(key), Value: encodeRedisTTL(expireAt)}}

return 1, r.dispatchElems(ctx, false, 0, elems)

elems := []*kv.Elem[kv.OP]{{Op: kv.Put, Key: redisTTLKey(key), Value: encodeRedisTTL(expireAt)}}

return 1, r.dispatchElems(ctx, true, readTS, elems)

- dispatchStringExpire now uses readRedisStringAt to dual-read prefixed and legacy bare string keys, avoiding data loss during migration. - Non-string EXPIRE now dispatches with IsTxn=true, StartTS=readTS so OCC catches type changes or deletions between the type check and commit, preventing dangling !redis|ttl| entries.

bootjp · 2026-04-17T14:48:13Z

/gemini review

gemini-code-assist

Code Review

This pull request implements inline TTL encoding for Redis string values, embedding the expiration timestamp directly within the data value to reduce read amplification and restore MVCC snapshot consistency. The changes include new encoding/decoding logic, updates to transaction and Lua script contexts, and the removal of the TTLBuffer mechanism. Feedback highlights several critical issues: the implementation incorrectly clears existing TTLs during INCR and other string modifications, and the removal of the TTLBuffer is considered premature since collection types have not yet been migrated to the inline format and still rely on separate TTL keys prone to write conflicts. Additionally, it is suggested that internal read functions return the decoded TTL to further optimize performance.

gemini-code-assist · 2026-04-17T14:50:49Z

+			var ttl *time.Time
+			if ttlSt := t.ttlStates[string(userKey)]; ttlSt != nil && ttlSt.dirty {
+				ttl = ttlSt.value
+			}
+			value = encodeRedisStr(tv.raw, ttl)


This logic clears the TTL for any string modification where the TTL state is not explicitly marked dirty (e.g., via EXPIRE or SET EX). In Redis, commands like INCR (when executed via MULTI/EXEC) or SETBIT must preserve the existing TTL. Since txnContext.load currently discards the TTL when reading from the store, and txnValue does not store it, any update to the value via txnContext will result in TTL loss. To fix this, txnValue should be updated to store the TTL found during load, and buildKeyElems should use that stored TTL as a fallback when ttlSt is not dirty.

gemini-code-assist · 2026-04-17T14:50:49Z

 	r := &RedisServer{
-		listen:           listen,
-		store:            store,
-		coordinator:      coordinate,
-		redisTranscoder:  newRedisTranscoder(),
-		redisAddr:        redisAddr,
-		relay:            relay,
-		leaderRedis:      leaderRedis,
-		leaderClients:    make(map[string]*redis.Client),
-		pubsub:           newRedisPubSub(),
-		scriptCache:      map[string]string{},
-		traceCommands:    os.Getenv("ELASTICKV_REDIS_TRACE") == "1",
-		ttlBuffer:        newTTLBuffer(),
-		ttlFlushInterval: defaultTTLFlushInterval,
+		listen:          listen,
+		store:           store,
+		coordinator:     coordinate,
+		redisTranscoder: newRedisTranscoder(),
+		redisAddr:       redisAddr,
+		relay:           relay,
+		leaderRedis:     leaderRedis,
+		leaderClients:   make(map[string]*redis.Client),
+		pubsub:          newRedisPubSub(),
+		scriptCache:     map[string]string{},
+		traceCommands:   os.Getenv("ELASTICKV_REDIS_TRACE") == "1",
 	}


Removing the TTLBuffer and its flusher goroutine is premature because the inline TTL design has not yet been implemented for collection types (Hash, Set, List, ZSet). As these types still use the separate !redis|ttl| key, concurrent EXPIRE operations on the same collection key will once again produce write conflicts in the MVCC layer. The TTLBuffer should be retained until all types have been migrated to the inline format as described in the design document.

gemini-code-assist · 2026-04-17T14:50:49Z

+	if typ == redisTypeString {
+		return 1, r.dispatchStringExpire(ctx, key, readTS, expireAt)
+	}
+	elems := []*kv.Elem[kv.OP]{{Op: kv.Put, Key: redisTTLKey(key), Value: encodeRedisTTL(expireAt)}}


For non-string types, the implementation still writes to the separate !redis|ttl| key. This contradicts the design document (§3.1, §3.3) which states that ListMeta, HashMeta, etc., should carry the expireAt field. More importantly, it means that write conflicts on the TTL key are not resolved for collection types, which makes the removal of the TTLBuffer premature for these types.

gemini-code-assist · 2026-04-17T14:50:49Z

+		// INCR clears any TTL (PERSIST semantics); encode value with no TTL.
+		encoded := encodeRedisStr([]byte(strconv.FormatInt(current, 10)), nil)


INCR must preserve the existing TTL of the key. The current implementation incorrectly clears it by passing nil to encodeRedisStr (as noted in the comment which incorrectly identifies this as PERSIST semantics).

// INCR must preserve the existing TTL. ttl, err := r.ttlAt(ctx, cmd.Args[1], readTS) if err != nil { return err } encoded := encodeRedisStr([]byte(strconv.FormatInt(current, 10)), ttl)

gemini-code-assist · 2026-04-17T14:50:50Z

 	v, err := r.readValueAt(redisStrKey(key), readTS)
 	if err == nil {
-		return v, nil
+		userValue, _, decErr := decodeRedisStr(v)


readRedisStringAt should return the decoded TTL along with the user value. This would allow callers (like incr or txnContext.load) to preserve the existing TTL during value updates without issuing an additional ttlAt store lookup, fulfilling the design goal of reducing read amplification.

Critical correctness fixes from Gemini PR review: - readRedisStringAt now also returns the decoded TTL, eliminating the extra ttlAt lookup callers would otherwise need. - txnValue caches the loaded TTL; buildKeyElems falls back to it when no EXPIRE/PERSIST ran in the transaction, so INCR/SETBIT/etc. inside MULTI/EXEC no longer silently clear TTL. - INCR now preserves the existing TTL (Redis semantics); the test that asserted PERSIST semantics was wrong and is updated. For non-string EXPIRE, OCC (IsTxn=true, StartTS=readTS) + retryRedisWrite already gives correct last-writer-wins under concurrent EXPIRE, so the TTLBuffer stays removed; inline TTL for collection metas remains a follow-up per docs/design/ttl-inline-value.md.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

adapter/redis.go (1)

2318-2339: ⚠️ Potential issue | 🟠 Major

MULTI/EXEC leaves orphaned TTL indexes for strings.

This path never deletes !redis|ttl|<key> for strings: not when a string is deleted, and not when a dirty string is re-encoded with ttl == nil. Because buildTTLElems skips string keys, those orphaned scan-index entries survive the transaction and can later expire a persistent or recreated value incorrectly.

Suggested fix

 		if tv.deleted {
 			elems = append(elems, &kv.Elem[kv.OP]{Op: kv.Del, Key: storageKey})
+			if bytes.HasPrefix(storageKey, []byte(redisStrPrefix)) {
+				userKey := storageKey[len(redisStrPrefix):]
+				elems = append(elems, &kv.Elem[kv.OP]{Op: kv.Del, Key: redisTTLKey(userKey)})
+			}
 			continue
 		}
 		value := tv.raw
 		// For string keys: embed TTL in the encoded value.
 		if bytes.HasPrefix(storageKey, []byte(redisStrPrefix)) {
@@
 			value = encodeRedisStr(tv.raw, ttl)
 			// Write !redis|ttl| scan index if TTL is set.
 			if ttl != nil {
 				elems = append(elems, &kv.Elem[kv.OP]{Op: kv.Put, Key: redisTTLKey(userKey), Value: encodeRedisTTL(*ttl)})
+			} else {
+				elems = append(elems, &kv.Elem[kv.OP]{Op: kv.Del, Key: redisTTLKey(userKey)})
 			}
 		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@adapter/redis.go` around lines 2318 - 2339, The code path for string keys
never removes the "!redis|ttl|<key>" index: when tv.deleted is true you must
append a kv.Del for redisTTLKey(userKey) before continuing, and when re-encoding
a string with ttl == nil (i.e. previously had a TTL but now cleared) you must
append a kv.Del for redisTTLKey(userKey) instead of leaving the old index;
locate the string handling block around storageKey/redisStrPrefix/userKey/tv.ttl
and add elems = append(... &kv.Elem{Op: kv.Del, Key: redisTTLKey(userKey)}) both
in the tv.deleted branch and in the branch where ttl==nil (while preserving the
existing kv.Put when ttl != nil and using encodeRedisTTL as before).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapter/redis_compat_commands.go`:
- Around line 393-404: dispatchStringExpire currently treats
store.ErrKeyNotFound as non-fatal and proceeds to write an empty-string anchor
and TTL; change it to treat ErrKeyNotFound as a no-op by returning early without
dispatching any kv.Elem so the key is not resurrected (update
dispatchStringExpire to detect cockerrors.Is(readErr, store.ErrKeyNotFound) and
return a “not applied” signal instead of encoding/putting), and plumb that “not
applied” result back to the caller path (doSetExpire) so the final reply becomes
0 when expire was not applied rather than unconditionally 1.

In `@adapter/redis_compat_helpers.go`:
- Around line 392-401: In saveString, when ttl is nil the code currently
overwrites the embedded value but leaves an existing TTL scan index (!redis|ttl|
via redisTTLKey) in place; change saveString (and its use of redisStrKey,
redisTTLKey, encodeRedisTTL, dispatchElems) to, when ttl==nil, append a kv.Elem
with Op: kv.Delete (or equivalent) for redisTTLKey(key) so the old TTL index is
removed; ensure dispatchElems is called with this delete element alongside the
Put for redisStrKey to atomically remove the stale TTL entry.

In `@adapter/redis_lua_context.go`:
- Around line 2562-2572: The Lua-string path currently only writes the TTL index
when finalTTL(...) returns non-nil, but it must also remove any existing TTL
index when ttl == nil to avoid stale !redis|ttl| entries; update the block in
adapter/redis_lua_context.go (around finalTTL, encodeRedisStr, redisStrKey,
redisTTLKey usage) so that when ttl == nil you append a kv.Elem with Op:
kv.Delete for redisTTLKey([]byte(key)) (instead of doing nothing), while
preserving the existing Put of the encoded value via encodeRedisStr and the Put
of the TTL when ttl != nil.

In `@adapter/redis_ttl_compat_test.go`:
- Around line 97-101: Read and store the key TTL before calling INCR (e.g.,
ttlBefore := rdb.TTL(ctx, "incr:key").Result()), then after INCR use the
existing ttlAfter check and add an assertion comparing ttlAfter to ttlBefore to
ensure the expiry was preserved (for example assert ttlAfter > 0 and ttlAfter <=
ttlBefore or assert ttlAfter is within a small delta of ttlBefore); reference
the rdb.TTL calls and the ttlAfter/ttlBefore variables in your change.

In `@adapter/redis.go`:
- Around line 742-748: replaceWithStringTxn currently only writes a TTL index
when ttl != nil, but when ttl == nil it must remove any existing
!redis|ttl|<key> entry to avoid leaving stale expiry markers; in the
replaceWithStringTxn block (where encoded := encodeRedisStr(...) and elems is
appended) add an elems append that deletes the TTL index when ttl == nil (use
the same redisTTLKey(key) and the kv delete operation consistent with other
code), while keeping the existing elems append for ttl != nil that writes
encodeRedisTTL(*ttl); then call r.dispatchElems as before.

In `@docs/design/ttl-inline-value.md`:
- Around line 316-320: The Phase 2 table row contains the literal key string
`!redis|ttl|` which Markdown treats as cell separators; update that cell to
escape the pipe characters (e.g., `!redis\|ttl\|`) or reword/move the key (e.g.,
put it in backticks outside the cell or describe it as "legacy key
`!redis|ttl|`") so the three-column table remains valid and the MD056 warning is
resolved.

---

Outside diff comments:
In `@adapter/redis.go`:
- Around line 2318-2339: The code path for string keys never removes the
"!redis|ttl|<key>" index: when tv.deleted is true you must append a kv.Del for
redisTTLKey(userKey) before continuing, and when re-encoding a string with ttl
== nil (i.e. previously had a TTL but now cleared) you must append a kv.Del for
redisTTLKey(userKey) instead of leaving the old index; locate the string
handling block around storageKey/redisStrPrefix/userKey/tv.ttl and add elems =
append(... &kv.Elem{Op: kv.Del, Key: redisTTLKey(userKey)}) both in the
tv.deleted branch and in the branch where ttl==nil (while preserving the
existing kv.Put when ttl != nil and using encodeRedisTTL as before).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f29bb93d-ef51-40f8-bb4f-7b0afbdf4739

📥 Commits

Reviewing files that changed from the base of the PR and between 5be1b5c and e65648d.

📒 Files selected for processing (12)

adapter/redis.go
adapter/redis_compat_commands.go
adapter/redis_compat_helpers.go
adapter/redis_compat_types.go
adapter/redis_lua_context.go
adapter/redis_lua_context_test.go
adapter/redis_retry_test.go
adapter/redis_ttl_buffer.go
adapter/redis_ttl_buffer_test.go
adapter/redis_ttl_compat_test.go
adapter/redis_txn_test.go
docs/design/ttl-inline-value.md

💤 Files with no reviewable changes (3)

adapter/redis_lua_context_test.go
adapter/redis_ttl_buffer_test.go
adapter/redis_ttl_buffer.go

bootjp · 2026-04-17T15:05:36Z

/gemini review

CodeRabbit review follow-ups: - All string write paths (saveString, replaceWithStringTxn, txnContext.buildKeyElems PERSIST, lua stringCommitElems) now emit a Del on redisTTLKey when the new TTL is nil, so the background sweeper cannot expire a key that was just made persistent. - dispatchStringExpire treats ErrKeyNotFound as a no-op and reports back whether the EXPIRE was applied; doSetExpire returns 0 in that race instead of resurrecting the key with an empty anchor. - INCR TTL-preservation test now asserts ttlAfter <= ttlBefore so a silent reset would be caught. - Escape pipes in docs/design table to restore 3-column layout. - Extract stringValueAndTTLElem helper to keep buildKeyElems under the cyclop limit.

gemini-code-assist

Code Review

This pull request implements a significant architectural change by embedding TTL metadata directly within Redis string values, moving away from a separate TTL key and the asynchronous TTLBuffer. This change aims to reduce read amplification and restore MVCC snapshot consistency. Key updates include a new versioned string encoding format, updated transaction logic to handle embedded TTLs, and the removal of the background TTL flusher. Feedback focuses on ensuring TTL preservation for legacy keys during the migration window, optimizing the read path to avoid redundant store lookups, and properly cleaning up the secondary TTL index when an expiration is removed.

I am having trouble creating individual review comments. Click here to see my feedback.

adapter/redis_compat_helpers.go (385-389)

The readRedisStringAt function returns a nil TTL for legacy (unprefixed) string keys. This causes commands that preserve TTL (such as INCR or string updates within a MULTI/EXEC block) to unintentionally strip the expiration from legacy keys during their first update after migration, as these commands rely on the returned TTL to re-encode the value. The function should explicitly fetch the TTL from the secondary index for legacy keys to ensure it is preserved during the transition to the new format.

	legacy, err := r.readValueAt(key, readTS)
	if err != nil {
		return nil, nil, err
	}
	ttl, err := r.ttlAt(context.Background(), key, readTS)
	if err != nil {
		return nil, nil, err
	}
	return legacy, ttl, nil

adapter/redis_compat_helpers.go (373-375)

This implementation introduces redundant read operations for string values. readRedisStringAt calls readValueAt, which invokes hasExpiredTTLAt -> ttlAt. For new-format strings, ttlAt performs a GetAt on the prefixed key to extract the embedded TTL, and then readValueAt performs another GetAt on the same key to retrieve the value. This double lookup on the hot path negates some of the performance benefits of embedding the TTL. Consider refactoring the read path to retrieve both the value and TTL in a single store operation.

adapter/redis.go (2335-2337)

When a string key's TTL is removed (e.g., via PERSIST or a plain SET), the secondary !redis|ttl| index is not deleted. This results in stale entries in the TTL index. While the embedded TTL is the authoritative source for reads, these leaked index entries increase storage overhead and cause unnecessary work for the background expiration scanner. The logic should be updated to emit a Del operation for the TTL key when the TTL is cleared, especially if a TTL was previously loaded.

			if ttl != nil {
				elems = append(elems, &kv.Elem[kv.OP]{Op: kv.Put, Key: redisTTLKey(userKey), Value: encodeRedisTTL(*ttl)})
			} else if tv.ttl != nil {
				elems = append(elems, &kv.Elem[kv.OP]{Op: kv.Del, Key: redisTTLKey(userKey)})
			}

Gemini PR review follow-ups: - readRedisStringAt now fetches the TTL from the !redis|ttl| index when falling back to a legacy bare-key read, so INCR / MULTI-EXEC writes on unmigrated keys don't strip the TTL on first rewrite. - stringValueAndTTLElem emits a Del on redisTTLKey whenever the effective TTL is nil and there was any prior TTL (loaded or dirty-cleared), not only the dirty-PERSIST case. The remaining double-read suggestion (ttlAt + readValueAt both touching the prefixed key) is left as a follow-up — the fix needs a new GetWithTTLAt helper on the store layer.

bootjp · 2026-04-17T16:00:52Z

/gemini review

gemini-code-assist

Code Review

This pull request implements a new storage format for Redis strings that embeds TTL metadata directly within the data value, addressing issues with read amplification and MVCC snapshot consistency. By including the expiry timestamp in the anchor key's value using a versioned header, the implementation eliminates the need for a separate TTL key lookup on the hot read path and allows data and expiry to be committed in a single Raft transaction. Consequently, the TTLBuffer and its background flusher have been removed. The changes include updated encoding/decoding logic, support for legacy data fallbacks, and adjustments to commands like INCR to preserve existing TTLs. Feedback was provided regarding an opportunity to further optimize the legacy fallback path in readRedisStringAt to reduce store read amplification.

gemini-code-assist · 2026-04-17T16:06:17Z

+func (r *RedisServer) readRedisStringAt(key []byte, readTS uint64) ([]byte, *time.Time, error) {
 	v, err := r.readValueAt(redisStrKey(key), readTS)
 	if err == nil {
-		return v, nil
+		userValue, ttl, decErr := decodeRedisStr(v)
+		if decErr != nil {
+			return nil, nil, decErr
+		}
+		return userValue, ttl, nil
 	}
 	if !errors.Is(err, store.ErrKeyNotFound) {
-		return nil, err
+		return nil, nil, err
+	}
+	legacy, err := r.readValueAt(key, readTS)
+	if err != nil {
+		return nil, nil, err
 	}
-	return r.readValueAt(key, readTS)
+	// Legacy bare-key data still stores its TTL in the !redis|ttl| index;
+	// fetch it so callers that re-encode (INCR, MULTI/EXEC writes) preserve it.
+	legacyTTL, err := r.ttlAt(context.Background(), key, readTS)
+	if err != nil {
+		return nil, nil, err
+	}
+	return legacy, legacyTTL, nil
 }


The current implementation of the legacy fallback path in this function introduces significant read amplification. For a legacy key, this function and its callees (readValueAt, ttlAt) may perform up to 6 store reads, including redundant lookups for !redis|str| and !redis|ttl|. Refactor this to check for the TTL once and then read the bare key value, reducing store reads to 3. When refactoring, ensure that the timestamp used for the MVCC read in readValueAt is acquired after the linearizable read fence has completed to guarantee a consistent snapshot view.

References

Linearizable reads in MVCC systems must acquire the read timestamp after the fence to prevent stale data, particularly for operations like readValueAt.

Gemini PR review: the legacy fallback path repeatedly re-entered readValueAt → hasExpiredTTLAt → ttlAt, which could issue up to 6 GetAt calls per string read (two redundant checks each on !redis|str| and !redis|ttl|). Refactor to call a new leaderAwareGetAt helper directly and perform expiration checks locally on the TTL we already decoded: - new-format path: 1 GetAt on !redis|str| (value + embedded TTL). - legacy path: 1 GetAt on !redis|ttl|, then 1 GetAt on the bare key. readTS is still acquired via snapshotTS before this function runs, so the linearizable-read fence remains intact.

Copilot

Pull request overview

This PR changes the Redis adapter’s TTL persistence model to embed TTL metadata directly inside Redis string values (new binary header format), and removes the prior in-memory TTL buffer + flusher workaround. This is intended to make TTL updates atomic with string value writes and reduce inconsistencies caused by separate TTL-key commits.

Changes:

Introduces a versioned binary encoding for Redis string values with optional embedded expireAtMs, plus encode/decode helpers.
Updates Redis command, Lua, and MULTI/EXEC transaction paths to read/modify/write embedded TTL for strings and to write !redis|ttl| only as a scan index.
Removes the TTL buffer implementation and its unit tests; updates existing tests to decode new-format string payloads.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
docs/design/ttl-inline-value.md	New design doc describing TTL-in-value approach and migration phases
adapter/redis_compat_types.go	Adds new string encoding/decoding helpers; updates `ttlAt` to read embedded TTL
adapter/redis_compat_helpers.go	Updates string read/save helpers to decode/encode embedded TTL and manage scan-index key
adapter/redis_compat_commands.go	Refactors `EXPIRE` to RMW for strings; updates `INCR` to preserve TTL
adapter/redis_lua_context.go	Updates Lua commit to encode embedded TTL for strings and include scan-index TTL elems in-txn
adapter/redis.go	Removes TTL buffer lifecycle; updates txn commit to embed TTL for strings and emit scan-index mutations
adapter/redis_ttl_compat_test.go	Updates TTL behavior tests (notably INCR semantics)
adapter/redis_txn_test.go	Adjusts assertions to decode stored string value format
adapter/redis_retry_test.go	Adjusts assertions to decode stored string value format
adapter/redis_ttl_buffer.go	Deleted (TTL buffer removed)
adapter/redis_ttl_buffer_test.go	Deleted (buffer tests removed)
adapter/redis_lua_context_test.go	Deleted (buffer-related Lua tests removed)

Copilot · 2026-04-17T16:08:43Z

 func (r *RedisServer) ttlAt(ctx context.Context, userKey []byte, readTS uint64) (*time.Time, error) {
-	// The buffer holds the most recent TTL write regardless of Raft flush state.
-	if expireAt, found := r.ttlBuffer.Get(userKey); found {
-		return expireAt, nil
+	// For string keys with new encoding: TTL is embedded in the string value.
+	// Trust only the embedded TTL; do not fall back to !redis|ttl| for new-format strings.
+	raw, err := r.store.GetAt(ctx, redisStrKey(userKey), readTS)
+	if err == nil {
+		if isNewRedisStrFormat(raw) {
+			_, expireAt, decErr := decodeRedisStr(raw)
+			return expireAt, decErr
+		}
+		// Legacy string format: fall through to !redis|ttl| key.
+	} else if !errors.Is(err, store.ErrKeyNotFound) {
+		return nil, errors.WithStack(err)
 	}
-	raw, err := r.store.GetAt(ctx, redisTTLKey(userKey), readTS)
+
+	// Fall back to !redis|ttl| key (non-string types and legacy string data).
+	raw, err = r.store.GetAt(ctx, redisTTLKey(userKey), readTS)
 	if err != nil {


ttlAt now always does a GetAt(!redis|str|) before consulting !redis|ttl|. For non-string keys this adds an extra store read (the !redis|str| lookup will almost always miss), which is a regression vs the previous single read of !redis|ttl|. Consider passing the known type into ttlAt/hasExpiredTTLAt (call sites already compute rawKeyTypeAt), or splitting into string/non-string TTL lookups so non-strings don’t pay the extra GetAt.

Copilot · 2026-04-17T16:08:44Z

+	// For string keys with new encoding: check embedded TTL.
+	if raw, err := r.tryLeaderGetAt(redisStrKey(key), 0); err == nil {
+		if isNewRedisStrFormat(raw) {
+			_, expireAt, _ := decodeRedisStr(raw)
+			if expireAt == nil {
+				return false
+			}
+			return !expireAt.After(time.Now())
+		}


isLeaderKeyExpired ignores decodeRedisStr errors for new-format strings (the error return is discarded). If a value is malformed or if a legacy payload accidentally matches the magic prefix, this will silently treat the key as non-expired and skip the TTL-key fallback. Handle the decode error explicitly (e.g., fall back to !redis|ttl| or treat it as expired/not-found) so existence checks don’t hinge on silent parse failures.

Copilot · 2026-04-17T16:08:44Z

+### 3.3 Collection metadata encoding
+
+`ListMeta`, `HashMeta`, `SetMeta`, `ZSetMeta` are fixed-width binary structs.
+Each gains an `ExpireAt uint64` field (0 = no TTL):
+
+```go
+// Example: ListMeta before
+type ListMeta struct {
+    Head int64
+    Len  int64
+    Tail int64   // computed field, not stored
+}
+
+// ListMeta after
+type ListMeta struct {
+    Head     int64
+    Len      int64
+    ExpireAt uint64  // ms since epoch; 0 = no TTL
+}
+```
+
+The on-disk format gains 8 bytes; the existing 24-byte format is distinguished
+from the new 32-byte format by length during decode (backward-compatible read).
+
+For HLL and stream types the TTL is appended to the serialised value using the
+same flags+expireAt header described in §3.2.
+


This design doc states that collection anchor metadata (ListMeta/HashMeta/SetMeta/ZSetMeta) gains an ExpireAt field and that TTL becomes part of each anchor value. In the current codebase, ListMeta (and related helpers) still uses the pre-existing 24-byte format with Head/Tail/Len only, and the adapter still relies on !redis|ttl| for non-string TTL. Please either scope the doc to “strings only” for this PR/phase, or update it to match the actually-implemented collection TTL representation.

Copilot · 2026-04-17T16:08:44Z

+A leading `flags` byte equal to `0x00` (no TTL) followed by raw bytes is fully
+backward-compatible with no-TTL reads once a migration is applied; the old raw
+bytes that start with `0x00` are rare in practice (binary data) but must be
+handled during migration (see §7).
+
+Alternatively, a two-byte magic + version prefix (`0xFF 0x01`) avoids the
+collision problem at the cost of two extra bytes per value:
+
+```
+┌─────────────────────────────────────────────────────┐
+│ 2 bytes magic+version  0xFF 0x01                    │
+│ 1 byte  flags                                       │
+│ 8 bytes expireAtMillis (present iff flags & 0x01)   │
+│ N bytes user value                                  │
+└─────────────────────────────────────────────────────┘
+```
+
+A raw value not starting with `0xFF 0x01` is treated as a legacy (no-TTL)
+encoding during the migration window.


The doc claims the 0xFF 0x01 magic prefix “avoids the collision problem”, but legacy Redis string payloads are arbitrary bytes and can still legitimately start with 0xFF 0x01, which would be misinterpreted as the new header. Please call out this remaining ambiguity explicitly and/or document the additional validation/escape strategy used to make decoding unambiguous during migration.

Copilot · 2026-04-17T16:08:45Z

+// path does a single GetAt on the prefixed key (value + TTL in one read);
+// the legacy path does at most two extra reads (TTL index, then bare key).
+// Expiration is checked locally from the TTL we just decoded.
+func (r *RedisServer) readRedisStringAt(key []byte, readTS uint64) ([]byte, *time.Time, error) {


readRedisStringAt currently returns (userValue, nil TTL) when the value exists under !redis|str| but is still in the legacy (no-magic) format. During the rolling-upgrade/phase-0 window, TTL for those legacy string values still lives in !redis|ttl|, so callers like INCR and MULTI/EXEC will silently drop the existing TTL when they re-encode and write back. Consider: when decodeRedisStr sees legacy bytes (or when !isNewRedisStrFormat(v)), fetch TTL via ttlAt(readTS) and return it so TTL is preserved until migration completes.

Suggested change

func (r *RedisServer) readRedisStringAt(key []byte, readTS uint64) ([]byte, *time.Time, error) {

}

if !isNewRedisStrFormat(v) {

// During the rolling-upgrade window, some values may already live under

// !redis|str| while still using the legacy payload encoding, with TTL

// remaining in the !redis|ttl| index. Fetch it so rewrites preserve TTL.

legacyTTL, ttlErr := r.ttlAt(context.Background(), key, readTS)

if ttlErr != nil {

return nil, nil, ttlErr

}

return userValue, legacyTTL, nil

}

Copilot · 2026-04-17T16:08:45Z

+// decodeRedisStr decodes a Redis string value.
+// For new format (magic prefix 0xFF 0x01): returns user value and optional expireAt.
+// For legacy format (no magic): returns raw bytes as user value with nil expireAt.
+func decodeRedisStr(raw []byte) (value []byte, expireAt *time.Time, err error) {
+	if isNewRedisStrFormat(raw) {
+		if len(raw) < redisStrBaseHeader {
+			return nil, nil, errors.New("invalid encoded string: too short")
+		}
+		flags := raw[2]
+		rest := raw[redisStrBaseHeader:]
+		if flags&redisStrHasTTL != 0 {
+			if len(rest) < redisUint64Bytes {
+				return nil, nil, errors.New("invalid encoded string: missing TTL bytes")
+			}
+			ms := min(binary.BigEndian.Uint64(rest[:redisUint64Bytes]), math.MaxInt64)
+			t := time.UnixMilli(int64(ms)) // #nosec G115
+			return rest[redisUint64Bytes:], &t, nil
+		}
+		return rest, nil, nil
+	}
+	// Legacy format: raw bytes with no TTL header.
+	return raw, nil, nil
+}
+
+// isNewRedisStrFormat reports whether raw uses the new magic+version prefix.
+func isNewRedisStrFormat(raw []byte) bool {
+	return len(raw) >= 2 && raw[0] == redisStrMagic && raw[1] == redisStrVersion
+}


The new string encoding detection is ambiguous for legitimate legacy Redis string values that begin with the chosen magic prefix (0xFF 0x01…). Those values will be mis-decoded (header stripped / TTL bytes interpreted) or hard-fail decode, which is user-visible data corruption. To make this safe, add stronger framing/validation (e.g., reject/ignore unknown flag bits and treat invalid headers as legacy, or add an integrity check such as a CRC/versioned header) so legacy payload bytes are never reinterpreted as metadata.

Copilot PR review follow-ups: - ttlAt: split off legacyIndexTTLAt and add hasExpired(nonStringOnly) so readValueAt on non-string internal keys skips the redundant GetAt on !redis|str|<key>. - isLeaderKeyExpired: handle decodeRedisStr errors for new-format strings by treating malformed payloads as expired rather than silently alive; extract leaderEmbeddedTTLExpired helper. - readRedisStringAt: when the !redis|str| payload is still in legacy format during rolling upgrade, consult the secondary !redis|ttl| index so INCR/MULTI-EXEC rewrites preserve TTL. Split into decodePrefixedString / readBareLegacyString helpers to keep the function under cyclop/nestif limits. - docs: scope collection-metadata encoding as future work; call out the 0xFF 0x01 magic-prefix ambiguity on legacy payloads and the mitigation (decode-error fallback, migration compactor).

bootjp · 2026-04-17T16:21:40Z

/gemini review

gemini-code-assist

Code Review

This pull request implements embedding TTL timestamps directly within Redis string values to reduce read amplification and improve MVCC snapshot consistency. It removes the asynchronous TTLBuffer mechanism, opting instead to commit TTL updates in the same transaction as the data. Key changes include a new versioned string encoding, updated read/write paths that prioritize embedded TTLs while maintaining backward compatibility with the legacy !redis|ttl| index, and a fix for INCR to correctly preserve existing TTLs. Feedback suggests refining leadership checks to use logical keys, stabilizing expiration timestamps across transaction retries, and ensuring consistent cleanup of legacy TTL indices during increment operations.

gemini-code-assist · 2026-04-17T16:26:11Z

+func (r *RedisServer) leaderAwareGetAt(key []byte, readTS uint64) ([]byte, error) {
+	if r.coordinator.IsLeaderForKey(key) {
+		if err := r.coordinator.VerifyLeaderForKey(key); err != nil {
+			return nil, errors.WithStack(err)
+		}
+		v, err := r.store.GetAt(context.Background(), key, readTS)
+		return v, errors.WithStack(err)
+	}
+	return r.tryLeaderGetAt(key, readTS)
 }


The leadership check in leaderAwareGetAt is performed using the physical key. However, leadership is partitioned by the logical user key. It is safer to extract the user key using extractRedisInternalUserKey before performing leadership checks. Additionally, ensure that leadership status is verified after acquiring any locks that protect the operation to prevent race conditions.

func (r *RedisServer) leaderAwareGetAt(key []byte, readTS uint64) ([]byte, error) { routingKey := key if userKey := extractRedisInternalUserKey(key); userKey != nil { routingKey = userKey } if r.coordinator.IsLeaderForKey(routingKey) { if err := r.coordinator.VerifyLeaderForKey(routingKey); err != nil { return nil, errors.WithStack(err) } v, err := r.store.GetAt(context.Background(), key, readTS) return v, errors.WithStack(err) } return r.tryLeaderGetAt(key, readTS) }

References

To prevent race conditions in read-modify-write operations that require leadership, verify leadership status after acquiring the lock that protects the operation.

gemini-code-assist · 2026-04-17T16:26:11Z

+	if ttl <= 0 {
+		return r.expireDeleteKey(ctx, key, readTS)
+	}
+	expireAt := time.Now().Add(time.Duration(ttl) * unit)


The expireAt timestamp is calculated using time.Now() inside the retryRedisWrite closure. If the transaction retries due to a write conflict, the expiration time will shift forward on each attempt. For consistency and to ensure all operations within the transaction share a consistent reference point, the expiration timestamp should be calculated once before entering the retry loop and passed into the closure.

References

In an MVCC system, all operations within a single transaction must share the same start timestamp to ensure atomicity and a consistent snapshot view.

gemini-code-assist · 2026-04-17T16:26:11Z

+		if existingTTL != nil {
+			elems = append(elems, &kv.Elem[kv.OP]{Op: kv.Put, Key: redisTTLKey(cmd.Args[1]), Value: encodeRedisTTL(*existingTTL)})
+		}


When existingTTL is nil, the INCR command should explicitly emit a Del operation for the redisTTLKey scan index. While a key with no TTL shouldn't have an index entry, emitting an explicit Del (as done in replaceWithStringTxn and saveString) is safer to ensure any stale or legacy index entries are cleared when the key is updated.

if existingTTL != nil { elems = append(elems, &kv.Elem[kv.OP]{Op: kv.Put, Key: redisTTLKey(cmd.Args[1]), Value: encodeRedisTTL(*existingTTL)}) } else { elems = append(elems, &kv.Elem[kv.OP]{Op: kv.Del, Key: redisTTLKey(cmd.Args[1])}) }

…r INCR TTL index Gemini PR review follow-ups: - leaderAwareGetAt now strips the internal prefix before asking the coordinator so leadership is resolved on the logical user key. - setExpire computes expireAt once before the retryRedisWrite loop so OCC retries all write the same deadline instead of shifting it forward on each attempt. Adds the matching overflow guard. - INCR explicitly Dels the redisTTLKey scan index when the key has no TTL, defensively clearing any stale/legacy entry.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Copilot · 2026-04-17T16:30:53Z

+func (r *RedisServer) doSetExpire(ctx context.Context, key []byte, ttl int64, expireAt time.Time, nxOnly bool) (int, error) {
+	readTS, eligible, err := r.prepareExpire(key, nxOnly)
+	if err != nil {
+		return 0, err
+	}
+	if !eligible {
+		return 0, nil
+	}
+	if ttl <= 0 {
+		return r.expireDeleteKey(ctx, key, readTS)
+	}
+	typ, err := r.rawKeyTypeAt(ctx, key, readTS)


doSetExpire treats every redisTypeString key as a plain Redis string and routes to dispatchStringExpire (read-modify-write on !redis|str|<key>). But rawKeyTypeAt also returns redisTypeString for HyperLogLog keys stored under !redis|hll|<key>, so EXPIRE on an HLL will fail to update TTL (and may incorrectly return 0). Consider distinguishing plain strings vs HLL (e.g., check ExistsAt(redisStrKey(key)) / bare key vs ExistsAt(redisHLLKey(key))), and for HLL keep using the !redis|ttl| index-only path until inline TTL for HLL is implemented.

Copilot · 2026-04-17T16:30:54Z

+	if typ == redisTypeString {
+		return t.markStringDirty(key)
+	}
+	return redisResult{typ: resultInt, integer: 1}, nil
+}
+


applyPositiveExpire calls markStringDirty for any key whose staged type is redisTypeString, which will load/write !redis|str|<key> so the embedded TTL can be updated. However, this codebase also reports HyperLogLog keys (!redis|hll|<key>) as redisTypeString; for those keys, markStringDirty will load a non-existent plain-string anchor and the transaction may end up creating an empty !redis|str|<key> value just to carry TTL. You likely need to only mark dirty for plain string anchors (prefixed/bare string), and for HLL keep TTL in !redis|ttl| (so buildTTLElems emits the TTL mutation).

Suggested change

if typ == redisTypeString {

return t.markStringDirty(key)

}

return redisResult{typ: resultInt, integer: 1}, nil

}

if typ == redisTypeString && isPlainStringExpireTarget(key) {

return t.markStringDirty(key)

}

return redisResult{typ: resultInt, integer: 1}, nil

}

func isPlainStringExpireTarget(key []byte) bool {

if bytes.HasPrefix(key, []byte("!redis|")) {

return bytes.HasPrefix(key, []byte("!redis|str|"))

}

return true

}

Copilot · 2026-04-17T16:30:54Z

+		// String keys encode TTL inside the value in buildKeyElems; skip them here.
+		if _, isString := t.working[string(redisStrKey([]byte(k)))]; isString {
+			continue
+		}
+		if st.value == nil {
+			elems = append(elems, &kv.Elem[kv.OP]{Op: kv.Del, Key: redisTTLKey([]byte(k))})
+		} else {


buildTTLElems skips emitting any !redis|ttl| mutations whenever the key has a !redis|str|<key> entry in t.working. That’s correct for string updates (since buildKeyElems emits the scan-index elem), but it breaks string deletes: buildKeyElems only deletes !redis|str|<key> and does not delete the !redis|ttl|<key> index, and buildTTLElems will currently skip it. This can leave a stale TTL index behind, which may later cause incorrect expiry behavior when the key is recreated as a non-string. Consider emitting a Del !redis|ttl|<key> when deleting a string key (either in buildKeyElems for the delete case, or by making buildTTLElems skip strings only when they are being put/updated, not when deleted).

Copilot · 2026-04-17T16:30:54Z

 				return err
 			}
+			existingTTL = ttl
 			current, err = strconv.ParseInt(string(raw), 10, 64)
 			if err != nil {
 				return fmt.Errorf("ERR value is not an integer or out of range")
 			}
 		}
 		current++


INCR only writes the !redis|ttl|<key> scan-index when existingTTL != nil. If a stale !redis|ttl|<key> entry exists (e.g., from legacy buffered TTL writes or partial cleanup), INCR on a missing key will create a new persistent string but leave the old TTL index in place, which could later cause background TTL sweeping (or any TTL-index-based logic) to expire the new value incorrectly. Consider explicitly deleting !redis|ttl|<key> when existingTTL == nil (or generally mirroring the SET behavior of clearing the index when making a key persistent).

…delete Copilot PR review follow-ups: - rawKeyTypeAt reports HLL as redisTypeString, but HLL payloads live under !redis|hll|<key> and do not carry inline TTL. doSetExpire and applyPositiveExpire now call isPlainRedisString to distinguish plain Redis strings from HLLs; HLLs keep TTL in the !redis|ttl| scan index so EXPIRE/PEXPIRE work correctly on HyperLogLogs. - buildKeyElems now emits Del on redisTTLKey when deleting a string anchor, so stale scan-index entries don't outlive the string and incorrectly expire a future key reincarnation.

Three follow-up issues from Codex on top of round 1. #619 Codex P1 -- assemble only manifest-declared parts. filterChunksForManifest previously matched on uploadID only; the manifest declares specific (partNo, partVersion) tuples, and S3's overwrite-then-async-cleanup window can leave older partVersion chunks present under the same (bucket, generation, object, uploadID). Mixing them produces corrupted bytes. s3ObjectState gains declaredParts map[s3PartKey]struct{}, populated by HandleObjectManifest. filterChunksForManifest takes the set and restricts emission to declared (partNo, partVersion) tuples. Test TestS3_StalePartVersionExcludedFromAssembledBody asserts a stale partVersion=7 cannot leak when the manifest declares partVersion=9. #497 Codex P2 -- preserve dot segments. safeJoinUnderRoot used filepath.Clean which collapses "a/../b" to "b". S3 treats those bytes literally; "a/../b" and "b" are distinct keys that would have silently merged into one output file. Now explicitly rejects any object key whose segments are "." or ".." with ErrS3MalformedKey. NUL bytes also rejected. Test TestS3_DotSegmentObjectKeyRejected covers the four forms ("a/../b", "a/./b", "..", "."). #521 Codex P2 -- cross-generation collision. s3BucketState gains activeGen captured from the bucket-meta record. flushBucket suppresses objects whose generation differs from activeGen (under --include-orphans, those flow to _orphans/; by default they're dropped with an s3_stale_generation_objects warning). Tests TestS3_StaleGenerationObjectExcluded. flushBucket cyclomatic complexity stayed under the cap by extracting flushBucketObjects.

bootjp added 2 commits April 17, 2026 01:52