Skip to content

perf(redis): fast-path HGET / HEXISTS / SISMEMBER wide-column lookups#565

Merged
bootjp merged 8 commits intomainfrom
perf/redis-type-specific-fastpath
Apr 21, 2026
Merged

perf(redis): fast-path HGET / HEXISTS / SISMEMBER wide-column lookups#565
bootjp merged 8 commits intomainfrom
perf/redis-type-specific-fastpath

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 20, 2026

Summary

Extends the GET / EXISTS / SET fast-path work (PRs #560, #562, #563) to three more hot type-specific reads.

Before, every HGET / HEXISTS / SISMEMBER paid the full ~17-seek rawKeyTypeAt probe before even calling loadHashAt / loadSetAt, even though the specific wide-column entry for the requested field / member is addressable in a single seek.

New flow per command

  1. Probe the wide-column entry directly:
    • HGETstore.GetAt(store.HashFieldKey(key, field))
    • HEXISTSstore.ExistsAt(store.HashFieldKey(key, field))
    • SISMEMBERstore.ExistsAt(store.SetMemberKey(key, member))
  2. On hit, apply hasExpiredTTLAt(key) and return the appropriate Redis response. Expired → nil / 0, matching the pre-optimisation keyTypeAt semantics.
  3. On miss, delegate to a per-command slow helper (hgetSlow / hexistsSlow / sismemberSlow) that preserves the exact legacy keyTypeAt + loadHashAt / loadSetAt path. Legacy-blob encodings and nil / WRONGTYPE disambiguation retain their prior behaviour.

Per-command effect on a live wide-column key

Command Before After
HGET ~17 seeks 2–3 seeks
HEXISTS ~17 seeks 2–3 seeks
SISMEMBER ~17 seeks 2–3 seeks

Legacy-blob-encoded hashes/sets go through the slow path (miss on the wide-column key → fallback) so behaviour there is unchanged.

Test plan

  • go test -race ./adapter/... -short passes
  • Existing HGET / HEXISTS / SISMEMBER tests cover: hit / miss / WRONGTYPE / nil-key / TTL-expired / legacy-blob
  • Production: deploy and verify per-command GetAt counters (once Grafana dashboard in obs(redis): add hot-path dashboard and metrics to verify PR #560 #564 lands) show HGET/HEXISTS/SISMEMBER drop into the same latency band as GET

Summary by CodeRabbit

  • Refactor

    • Improved performance and responsiveness for HGET, HEXISTS, and SISMEMBER commands via faster lookup paths and more efficient handling of type/TTL conditions.
  • Tests

    • Added extensive end-to-end tests covering fast-path hits, TTL-expired cases, misses, and type-mismatch behavior to ensure correctness and stability.

Extends the GET / EXISTS / SET fast-path work (PRs #560, #562, #563)
to the three most common type-specific read commands. Before, every
HGET / HEXISTS / SISMEMBER paid the full ~17-seek rawKeyTypeAt probe
before falling through to loadHashAt / loadSetAt, even though the
specific wide-column entry for the requested field/member is
addressable in a single seek.

Each handler now:

1. Probes the wide-column entry directly:
   - HGET      store.GetAt(store.HashFieldKey(key, field))
   - HEXISTS   store.ExistsAt(store.HashFieldKey(key, field))
   - SISMEMBER store.ExistsAt(store.SetMemberKey(key, member))
2. On hit, applies hasExpiredTTLAt(key) and returns the appropriate
   Redis protocol response. Expired -> nil / 0, matching the
   pre-optimisation keyTypeAt semantics.
3. On miss, delegates to a per-command slow helper (hgetSlow /
   hexistsSlow / sismemberSlow) that preserves the exact legacy
   keyTypeAt + loadHashAt / loadSetAt path, so legacy-blob encodings
   and nil / WRONGTYPE disambiguation retain their prior behaviour.

Effect per hit on a live wide-column key: ~17 seeks → 2-3 seeks
(1 field/member lookup + 1-2 for the TTL probe).

Split into fast/slow helpers per command so each handler body stays
under the cyclomatic-complexity cap.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

📝 Walkthrough
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding fast-path optimizations for HGET, HEXISTS, and SISMEMBER wide-column lookups, which is the primary focus of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/redis-type-specific-fastpath

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

HGET / HEXISTS / SISMEMBER are type-specific: they only succeed on
hash / set keys. Call hasExpired(..., nonStringOnly=true) directly
instead of hasExpiredTTLAt, which defaults to probing !redis|str|
for an embedded TTL that those keys never carry. One guaranteed-miss
pebble GetAt saved per fast-path hit.

Also prepares #565 for #562 dropping the unused hasExpiredTTLAt
wrapper.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Redis adapter’s existing “avoid keyTypeAt / rawKeyTypeAt probe storm” work by adding fast paths for three hot, type-specific reads (HGET, HEXISTS, SISMEMBER) when the data is stored in wide-column format.

Changes:

  • Add direct wide-column probes for HGET (GetAt(HashFieldKey)), HEXISTS (ExistsAt(HashFieldKey)), and SISMEMBER (ExistsAt(SetMemberKey)), with TTL filtering.
  • Introduce per-command slow-path helpers (hgetSlow, hexistsSlow, sismemberSlow) to preserve legacy behavior (nil vs WRONGTYPE, legacy-blob formats) when the fast probe misses.
  • Refactor the handlers to reuse a shared ctx, key, and field/member variables.

Comment on lines +1670 to +1682
func (r *RedisServer) hashFieldFastLookup(ctx context.Context, key, field []byte, readTS uint64) (raw []byte, hit, alive bool, err error) {
raw, err = r.store.GetAt(ctx, store.HashFieldKey(key, field), readTS)
if err != nil {
if errors.Is(err, store.ErrKeyNotFound) {
return nil, false, false, nil
}
return nil, false, false, cockerrors.WithStack(err)
}
expired, expErr := r.hasExpired(ctx, key, readTS, true)
if expErr != nil {
return nil, false, false, cockerrors.WithStack(expErr)
}
return raw, true, !expired, nil
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashFieldFastLookup returns a hash field value solely based on the wide-column field key existing. In the (documented) dual-encoding corruption scenario where the same user key has both a string/list encoding and leftover wide-column hash keys, rawKeyTypeAt/keyTypeAt would treat the key as string/list (string-first tiebreaker) and HGET should return WRONGTYPE (or nil if the winning type is TTL-expired). Consider adding a cheap guard before treating this as a hit (e.g., check for string/list encodings first, or fall back to the slow path when any higher-priority encoding is present) so HGET stays consistent with keyTypeAt semantics.

Copilot uses AI. Check for mistakes.
Comment thread adapter/redis_compat_commands.go Outdated
Comment on lines +1661 to +1662
conn.WriteBulkString(string(raw))
return
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fast path writes the value via WriteBulkString(string(raw)), which forces a []byte→string conversion/allocation. Since the value is already []byte from MVCC, prefer writing it as bulk bytes to avoid the extra allocation/copy on this hot path.

Copilot uses AI. Check for mistakes.
Comment on lines +1672 to +1675
if err != nil {
if errors.Is(err, store.ErrKeyNotFound) {
return nil, false, false, nil
}
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function uses errors.Is(err, store.ErrKeyNotFound) while most other store.ErrKeyNotFound checks in this file use cockerrors.Is. Standardizing on one (typically cockerrors.Is in this codebase) avoids subtle mismatches if error wrapping differs across layers.

Copilot uses AI. Check for mistakes.
Comment on lines +1237 to +1249
func (r *RedisServer) setMemberFastExists(ctx context.Context, key, member []byte, readTS uint64) (hit, alive bool, err error) {
exists, err := r.store.ExistsAt(ctx, store.SetMemberKey(key, member), readTS)
if err != nil {
return false, false, cockerrors.WithStack(err)
}
if !exists {
return false, false, nil
}
expired, expErr := r.hasExpired(ctx, key, readTS, true)
if expErr != nil {
return false, false, cockerrors.WithStack(expErr)
}
return true, !expired, nil
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setMemberFastExists treats any existing wide-column member row as an authoritative SISMEMBER hit. In the dual-encoding corruption scenario (string wins per rawKeyTypeAt’s documented tiebreaker), leftover set member rows alongside a live string key would cause SISMEMBER to return 1/0 instead of WRONGTYPE (or 0 if the winning string type is TTL-expired). Add a cheap guard/fallback (e.g., check for higher-priority string/list encodings before honoring the member-row hit) to keep behavior consistent with keyTypeAt.

Copilot uses AI. Check for mistakes.
Comment on lines +1933 to +1946
func (r *RedisServer) hashFieldFastExists(ctx context.Context, key, field []byte, readTS uint64) (hit, alive bool, err error) {
exists, err := r.store.ExistsAt(ctx, store.HashFieldKey(key, field), readTS)
if err != nil {
return false, false, cockerrors.WithStack(err)
}
if !exists {
return false, false, nil
}
expired, expErr := r.hasExpired(ctx, key, readTS, true)
if expErr != nil {
return false, false, cockerrors.WithStack(expErr)
}
return true, !expired, nil
}
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashFieldFastExists can return 1 based solely on the wide-column field row existing, even if the logical key would be classified as string/list under rawKeyTypeAt’s string-first tiebreaker (corrupt dual-encoding). That would change HEXISTS from WRONGTYPE to 1/0 in those cases, and it also bypasses embedded-TTL checks for strings. Consider guarding against higher-priority encodings (or falling back to the slow path) before treating the field-row as an authoritative hit.

Copilot uses AI. Check for mistakes.
bootjp added a commit that referenced this pull request Apr 20, 2026
…ursts

Under production congestion the etcd engine's inbound stepCh and
outbound heartbeat priority queue were both overflowing:
- leader log: dispatchDropCount=1.6M+ on MsgHeartbeat (priority
  queue cap=64 filled because the transport drained slower than
  heartbeat tick issuance under CPU starvation)
- follower log: RPC returned "etcd raft inbound step queue is full"
  (stepCh cap=256 filled because the follower's event loop was
  held up by adapter-side pebble seek storms)

The pebble-seek CPU work that drove the starvation is gone in #560
through #565. This change is defence-in-depth for bursts:

- defaultMaxInflightMsg: 256 → 1024 (sizes stepCh,
  dispatchReportCh, and the per-peer outbound "normal" queue). 4x
  safety margin; worst-case memory stays under a few MB per engine
  at defaultMaxSizePerMsg = 1 MiB.
- defaultHeartbeatBufPerPeer: 64 → 512 (outbound priority queue).
  Heartbeats are tiny (<100 B), so 512 × numPeers is ≪ 1 MB total;
  a ~5 s transient pause (election-timeout scale) no longer drops
  heartbeats and forces peers' leases to expire.

Both existing raftengine test suites (etcd + hashicorp) pass under
-race with no code changes required elsewhere.
bootjp added a commit that referenced this pull request Apr 20, 2026
…ursts (#566)

## Summary
Defence-in-depth bump for the Raft engine's step queue / heartbeat queue
capacities after the production step-queue-saturation incident.

### Production evidence
- **Leader logs** (192.168.0.210):
  ```
dropping etcd raft outbound message type=MsgHeartbeat
drop_count=1,600,000+
  ```
The priority queue (capacity 64) filled because the transport drained
slower than heartbeat tick issuance under CPU starvation.
- **Follower logs** (192.168.0.211):
  ```
etcd raft outbound dispatch failed err="etcd raft inbound step queue is
full"
  ```
Peer stepCh (capacity 256) filled because the event loop was held up by
adapter-side pebble seek storms.

### What changed
- `defaultMaxInflightMsg`: **256 → 1024** (sizes stepCh,
dispatchReportCh, and the per-peer outbound "normal" dispatch queue). 4×
safety margin; worst-case memory stays under a few MB per engine at
`defaultMaxSizePerMsg = 1 MiB`.
- `defaultHeartbeatBufPerPeer`: **64 → 512** (outbound priority queue).
Heartbeats are ~100 B, so `512 × numPeers` is ≪ 1 MB; a ~5 s transient
pause (election-timeout scale) no longer drops heartbeats and forces
peer leases to expire.

### Why this is only "defence in depth"
The primary fix for the starvation is #560 / #562 / #563 / #565, which
removed the adapter-side pebble seek storm. This PR widens the safety
margin against future bursts but does not attempt to paper over any
remaining throughput issue.

## Test plan
- [x] `go test -race ./internal/raftengine/... -short` passes
- [x] golangci-lint clean
- [ ] Production: after deploy, verify
`elastickv_raft_dispatch_dropped_total` (from PR #564) stays at zero
across a tick cycle and `elastickv_raft_step_queue_full_total` does not
increment under load
Addresses copilot review on PR #565.

rawKeyTypeAt (since #563) probes string prefixes first -- in a
dual-encoding corruption where the same user key carries both a
string row and a wide-column collection row, rawKeyTypeAt returns
String, and keyTypeAt-based HGET / HEXISTS / SISMEMBER return
WRONGTYPE (or nil if the string is TTL-expired).

The new fast paths treated the presence of the wide-column field /
member row as authoritative. That flipped the tiebreaker under
corruption (hash/set value returned instead of WRONGTYPE).

Add a cheap hasHigherPriorityStringEncoding() guard (single
ExistsAt on redisStrKey) in all three fast-path helpers. Hits the
slow path when a string encoding exists, preserving keyTypeAt
semantics. Non-corrupt collection data still has at most one
encoding per user key, so the guard is a guaranteed miss and only
adds 1 seek to the fast path.

Also address the two remaining copilot items on this PR:
- Replace stdlib errors.Is(..., store.ErrKeyNotFound) with
  cockerrors.Is for consistency with the surrounding file.
- Replace WriteBulkString(string(raw)) with WriteBulk(raw) to avoid
  the []byte → string copy on every HGET fast-path hit.

Rarer legacy corruption mixing the collection with HLL / bare /
list encodings will still let the fast path fire; those edge cases
are documented in hashFieldFastLookup's comment.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread adapter/redis_compat_commands.go Outdated
Comment on lines +1677 to +1689
// reports whether it is present and TTL-alive. Returns hit=false when
// the wide-column key is absent OR when a higher-priority encoding
// (string / list) exists on the same user key, so the caller falls
// through to hgetSlow and ends up with the keyTypeAt-consistent
// WRONGTYPE / nil response.
//
// The priority guard costs a single ExistsAt on redisStrKey, covering
// the most common dual-encoding corruption (SET on a key that
// previously held a hash, where cleanup did not run). Rarer legacy
// corruption mixing hash with HLL / bare-key / list encodings will
// still let this fast-path hit; those workloads should take the slow
// path too, but in normal operation at most one encoding exists per
// user key and the guard is a guaranteed miss.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fast-path doc says it returns hit=false when a higher-priority encoding (string/list) exists, but the implementation only guards on redisStrKey(key) (via hasHigherPriorityStringEncoding). That means dual-encoded keys involving list meta/delta keys, legacy bare string keys, or HLL keys can still return a hash/set fast-path answer instead of the keyTypeAt-consistent WRONGTYPE/nil behavior. Consider expanding the guard to match rawKeyTypeAt’s actual priority checks (string types + list), or update the doc to reflect the narrower guarantee.

Suggested change
// reports whether it is present and TTL-alive. Returns hit=false when
// the wide-column key is absent OR when a higher-priority encoding
// (string / list) exists on the same user key, so the caller falls
// through to hgetSlow and ends up with the keyTypeAt-consistent
// WRONGTYPE / nil response.
//
// The priority guard costs a single ExistsAt on redisStrKey, covering
// the most common dual-encoding corruption (SET on a key that
// previously held a hash, where cleanup did not run). Rarer legacy
// corruption mixing hash with HLL / bare-key / list encodings will
// still let this fast-path hit; those workloads should take the slow
// path too, but in normal operation at most one encoding exists per
// user key and the guard is a guaranteed miss.
// reports whether it is present and TTL-alive. It returns hit=false
// when the wide-column key is absent, or when the string-encoding
// guard in hasHigherPriorityStringEncoding fires for the same user
// key so the caller falls through to hgetSlow.
//
// This fast path does not fully mirror rawKeyTypeAt/keyTypeAt
// priority checks. In particular, it only performs the redisStrKey
// existence probe described below; other mixed-encoding corruption
// cases (for example list / HLL / bare-key remnants) are not filtered
// here and may still hit the wide-column fast path instead of taking
// the slow path that produces keyTypeAt-consistent WRONGTYPE / nil
// behavior.
//
// The priority guard costs a single ExistsAt on redisStrKey, covering
// the most common dual-encoding corruption (SET on a key that
// previously held a hash, where cleanup did not run). In normal
// operation at most one encoding exists per user key and the guard is
// a guaranteed miss.

Copilot uses AI. Check for mistakes.
Comment thread adapter/redis_compat_commands.go Outdated
Comment on lines +1710 to +1721
// hasHigherPriorityStringEncoding returns true iff a string-encoded
// entry exists for key. Matches the "string wins" tiebreaker in
// rawKeyTypeAt (see adapter/redis_compat_helpers.go), so a corrupt
// dual-encoded key where both a string and a collection row are
// present will take the slow path and return WRONGTYPE / nil from
// keyTypeAt rather than the collection-specific fast-path answer.
func (r *RedisServer) hasHigherPriorityStringEncoding(ctx context.Context, key []byte, readTS uint64) (bool, error) {
exists, err := r.store.ExistsAt(ctx, redisStrKey(key), readTS)
if err != nil {
return false, cockerrors.WithStack(err)
}
return exists, nil
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasHigherPriorityStringEncoding currently only checks existence of !redis|str|<key>, but rawKeyTypeAt’s string priority includes !redis|hll|<key> and the legacy bare <key> as well (and list types are probed before wide-column collections). With dual-encoded keys, HGET/HEXISTS/SISMEMBER can return a collection-specific fast-path result where rawKeyTypeAt/keyTypeAt would classify the key as string/list and return WRONGTYPE/nil. To keep semantics aligned, the guard should mirror rawKeyTypeAt’s priority probes (or be renamed/docs adjusted if the behavior change is intentional).

Suggested change
// hasHigherPriorityStringEncoding returns true iff a string-encoded
// entry exists for key. Matches the "string wins" tiebreaker in
// rawKeyTypeAt (see adapter/redis_compat_helpers.go), so a corrupt
// dual-encoded key where both a string and a collection row are
// present will take the slow path and return WRONGTYPE / nil from
// keyTypeAt rather than the collection-specific fast-path answer.
func (r *RedisServer) hasHigherPriorityStringEncoding(ctx context.Context, key []byte, readTS uint64) (bool, error) {
exists, err := r.store.ExistsAt(ctx, redisStrKey(key), readTS)
if err != nil {
return false, cockerrors.WithStack(err)
}
return exists, nil
// hasHigherPriorityStringEncoding returns true iff keyTypeAt would
// classify the key as a higher-priority string-like encoding before a
// wide-column collection fast-path should answer. This mirrors the
// effective type-priority behavior of rawKeyTypeAt/keyTypeAt for
// explicit string keys, HLL-backed strings, legacy bare-key strings,
// and list encodings that outrank wide-column collection rows.
func (r *RedisServer) hasHigherPriorityStringEncoding(ctx context.Context, key []byte, readTS uint64) (bool, error) {
typ, err := r.keyTypeAt(ctx, key, readTS)
if err != nil {
return false, cockerrors.WithStack(err)
}
return typ == redisTypeString || typ == redisTypeList, nil

Copilot uses AI. Check for mistakes.
Addresses copilot review on PR #565.

The previous hashFieldFastLookup doc claimed the guard fires on
"string / list" higher-priority encodings, but the implementation
only probes redisStrKey. Expanding the guard to fully mirror
rawKeyTypeAt would add ~3-5 extra seeks per fast-path hit (string
variants + list meta + list delta scan), negating most of the gain
versus the ~17-seek keyTypeAt slow path.

Narrow the doc to describe the actual scope: only the common
SET-over-previous-hash corruption case is caught; HLL / bare /
list dual-encoding remnants still take the fast path. Call out the
trade-off explicitly so future readers don't assume stricter
alignment than we provide, and document that in normal operation at
most one encoding exists per user key so the gap is invisible.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment on lines 1949 to +1964
func (r *RedisServer) hexists(conn redcon.Conn, cmd redcon.Command) {
if r.proxyToLeader(conn, cmd, cmd.Args[1]) {
return
}
key := cmd.Args[1]
field := cmd.Args[2]
readTS := r.readTS()
typ, err := r.keyTypeAt(context.Background(), cmd.Args[1], readTS)
ctx := context.Background()

// Fast path: direct wide-column field existence check. ExistsAt
// is cheaper than GetAt since we don't need the value payload.
hit, alive, err := r.hashFieldFastExists(ctx, key, field, readTS)
if err != nil {
conn.WriteError(err.Error())
return
}
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HEXISTS now has a wide-column fast path (hashFieldFastExists) with TTL filtering and a slow-path fallback. I couldn’t find any Go tests that directly exercise HEXISTS (including fast-path hit/miss, TTL-expired -> 0, nil key -> 0, and WRONGTYPE). Please add coverage for these cases to ensure the new branching preserves legacy behavior.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +1216 to +1224
key := cmd.Args[1]
member := cmd.Args[2]
readTS := r.readTS()
typ, err := r.keyTypeAt(context.Background(), cmd.Args[1], readTS)
ctx := context.Background()

hit, alive, err := r.setMemberFastExists(ctx, key, member, readTS)
if err != nil {
conn.WriteError(err.Error())
return
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SISMEMBER now has a fast path (setMemberFastExists) with new TTL/priority-guard behavior and a slow-path fallback. I couldn’t find targeted tests exercising the fast-path hit/miss and TTL-expired cases for SISMEMBER (current usage in tests appears limited to a basic happy-path). Please add/extend tests to lock in the intended semantics for wide-column hits, misses (fallback), WRONGTYPE, and TTL-expired keys.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +1653 to +1669
// Fast path: look the wide-column field up directly. Live
// wide-column hashes resolve here in 1 seek + TTL probe versus
// the ~17 seeks rawKeyTypeAt issues through keyTypeAt. Legacy-
// blob hashes miss the wide-column key and fall through.
raw, hit, alive, err := r.hashFieldFastLookup(ctx, key, field, readTS)
if err != nil {
conn.WriteError(err.Error())
return
}
if hit {
if !alive {
conn.WriteNull()
return
}
// WriteBulk sends the payload directly from the []byte backing
// store; WriteBulkString(string(raw)) would force a []byte →
// string copy on every fast-path hit.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HGET now has a new wide-column fast path (hashFieldFastLookup) with TTL filtering and a corruption-priority guard before falling back to hgetSlow. I couldn’t find tests that specifically cover the fast-path behavior (wide-column hit, TTL-expired -> nil, miss -> slow path, and WRONGTYPE handling). Please add/extend tests to exercise these scenarios so the optimization can’t accidentally change Redis semantics.

Copilot generated this review using guidance from repository custom instructions.
Addresses copilot review on PR #565.

Pin the semantics of the new wide-column fast paths at the Redis
wire-protocol level. Each of HGET / HEXISTS / SISMEMBER gets five
cases driven by a real redis.Client against createNode(t, 3):

- FastPathHit: HSET/SAdd then read -> value / 1
- Miss: read on a nonexistent user key -> nil / 0
- UnknownField (or UnknownMember): collection exists but the
  requested field / member does not -> nil / 0 (exercises the miss
  branch of the fast-path lookup)
- WRONGTYPE: user key exists as a string -> WRONGTYPE error from
  the slow-path fallback
- TTLExpired: HSET + PEXPIRE with a short TTL, sleep, then read
  -> nil / 0 (exercises the !alive branch of the fast-path lookup)

Additionally, HGET_FastPathGuardDualEncodingSmoke drives the
string-priority guard (hasHigherPriorityStringEncoding) at the wire
level: HSET then SET on the same key must never surface the hash
field value from the fast path -- either WRONGTYPE or nil are
acceptable depending on cleanup timing, but the hash value leaking
through the string-priority tiebreaker is not.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment on lines +31 to +39
const collectionFastPathTTL = 80 * time.Millisecond

// waitForTTLExpiry sleeps past a short TTL so the next read sees the
// key as expired. A small headroom accounts for wall-clock jitter on
// CI; tests that are sensitive to this use short TTLs (~80 ms) so the
// delay stays well under a second.
func waitForTTLExpiry() {
time.Sleep(collectionFastPathTTL + 50*time.Millisecond)
}
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitForTTLExpiry uses a fixed time.Sleep to wait out expiry. This tends to be flaky under CI scheduling jitter and clock granularity; other TTL tests in this repo use require.Eventually polling for redis.Nil/false instead. Consider replacing the fixed sleep with an Eventually loop (with a short poll interval and ~1s timeout) so the test only waits as long as needed and remains stable on slow machines.

Copilot uses AI. Check for mistakes.
Comment on lines +281 to +285
// readTS for this HGET. The test tolerates either "WRONGTYPE" or the
// old hash value being returned, since the exact race window is tiny,
// but it must NOT panic or silently leak wide-column data as a string
// hit: the guard is what keeps that outcome consistent with
// keyTypeAt semantics.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above this test says it "tolerates either "WRONGTYPE" or the old hash value being returned", but the assertion below explicitly requires that "hashVal" is not returned. Please update the comment to match the actual expected outcomes (e.g., WRONGTYPE or nil/cleanup), otherwise it’s misleading when debugging failures.

Suggested change
// readTS for this HGET. The test tolerates either "WRONGTYPE" or the
// old hash value being returned, since the exact race window is tiny,
// but it must NOT panic or silently leak wide-column data as a string
// hit: the guard is what keeps that outcome consistent with
// keyTypeAt semantics.
// readTS for this HGET. The test tolerates either "WRONGTYPE" or a
// nil/cleanup result if the hash keys are already gone, but it must
// NOT return the old hash value, panic, or silently leak wide-column
// data as a string hit: the guard is what keeps that outcome
// consistent with keyTypeAt semantics.

Copilot uses AI. Check for mistakes.
Comment thread adapter/redis_compat_commands.go Outdated
Comment on lines +1714 to +1725
// hasHigherPriorityStringEncoding returns true iff a string-encoded
// entry exists for key. Matches the "string wins" tiebreaker in
// rawKeyTypeAt (see adapter/redis_compat_helpers.go), so a corrupt
// dual-encoded key where both a string and a collection row are
// present will take the slow path and return WRONGTYPE / nil from
// keyTypeAt rather than the collection-specific fast-path answer.
func (r *RedisServer) hasHigherPriorityStringEncoding(ctx context.Context, key []byte, readTS uint64) (bool, error) {
exists, err := r.store.ExistsAt(ctx, redisStrKey(key), readTS)
if err != nil {
return false, cockerrors.WithStack(err)
}
return exists, nil
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasHigherPriorityStringEncoding’s docstring says it returns true iff "a string-encoded entry exists for key" and that it matches the rawKeyTypeAt "string wins" tiebreaker, but the implementation only checks redisStrKey(key) (it does not check HLL or legacy bare string keys, which probeStringTypes treats as string too). Please tighten the comment (and/or rename the helper) so it accurately reflects the narrower guard it actually provides.

Copilot uses AI. Check for mistakes.
Addresses the Medium-severity miss-path / legacy-blob regression
flagged in the independent review of PR #565.

The priority guard (ExistsAt on redisStrKey) was run BEFORE the
wide-column probe, so every miss (nonexistent user key, legacy-blob
hash / set encoding, wrong-type key) paid the guard seek for no
benefit. For miss-heavy workloads and legacy-blob data the PR could
be net-slower than the pre-PR keyTypeAt path.

Reorder: probe the wide-column field / member key first. On miss,
return and let the caller take the slow path (1 seek total added,
down from 2). Only run the string-priority guard when we have a
real hit to defer to the slow path -- preserving the
dual-encoding-corruption consistency the original PR aimed for
while eliminating the negative-case tax.

Per-command effect on non-hit traffic:

  HGET / HEXISTS / SISMEMBER miss   now: 1 seek  (was: 2 seeks)
  HGET / HEXISTS / SISMEMBER hit    now: 3 seeks (was: 3 seeks)

All existing adapter tests pass unchanged.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
adapter/redis_collection_fastpath_test.go (1)

41-270: Consider consolidating the fast-path tests into a table-driven suite.

The fourteen HGET/HEXISTS/SISMEMBER cases share ~6 lines of identical setup (t.Parallel(), createNode, shutdown, NewClient, defer Close) and then differ only in the pre-state and the assertion. A single table-driven test per command would remove the ~80 lines of duplication and make it easier to add new cases (e.g. wrong-type variants for legacy-blob hashes).

As per coding guidelines: "Unit tests should be co-located with packages in *_test.go files; prefer table-driven test cases."

♻️ Sketch of a table-driven shape for the HGET group
func TestRedis_HGET_FastPath(t *testing.T) {
    t.Parallel()
    nodes, _, _ := createNode(t, 3)
    defer shutdown(nodes)
    rdb := redis.NewClient(&redis.Options{Addr: nodes[0].redisAddress})
    defer func() { _ = rdb.Close() }()
    ctx := context.Background()

    cases := []struct {
        name      string
        setup     func(t *testing.T)
        key       string
        field     string
        wantErr   string // "", "WRONGTYPE", "redis.Nil"
        wantValue string
    }{
        {name: "hit", setup: func(t *testing.T) {
            require.NoError(t, rdb.HSet(ctx, "h:fast", "field", "value").Err())
        }, key: "h:fast", field: "field", wantValue: "value"},
        {name: "miss_key", key: "h:missing", field: "field", wantErr: "redis.Nil"},
        // ...
    }
    for _, tc := range cases {
        t.Run(tc.name, func(t *testing.T) {
            if tc.setup != nil { tc.setup(t) }
            got, err := rdb.HGet(ctx, tc.key, tc.field).Result()
            switch tc.wantErr {
            case "":
                require.NoError(t, err)
                require.Equal(t, tc.wantValue, got)
            case "redis.Nil":
                require.ErrorIs(t, err, redis.Nil)
            default:
                require.Error(t, err)
                require.Contains(t, err.Error(), tc.wantErr)
            }
        })
    }
}

Subtests that need TTL setup can be kept as separate cases using a helper that PExpires + waits.

As per coding guidelines: **/*_test.go: "prefer table-driven test cases".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/redis_collection_fastpath_test.go` around lines 41 - 270, Consolidate
the duplicated tests (TestRedis_HGET_FastPathHit, TestRedis_HGET_Miss,
TestRedis_HGET_UnknownField, TestRedis_HGET_WRONGTYPE, TestRedis_HGET_TTLExpired
and the analogous HEXISTS and SISMEMBER tests) into three table-driven tests
(e.g. TestRedis_HGET_FastPath, TestRedis_HEXISTS_FastPath,
TestRedis_SISMEMBER_FastPath): move the common setup (t.Parallel(), createNode,
shutdown, redis.NewClient, defer Close, ctx) to the top of each table test,
define a cases slice with fields like name, setup func, key, field/member,
wantErr/wantValue, and loop with t.Run for subtests, invoking per-case setup to
create pre-state (including PExpire + waitForTTLExpiry for TTL cases) and then
assert according to wantErr/wantValue (handle redis.Nil via require.ErrorIs).
Ensure you reference the existing helpers createNode, shutdown, waitForTTLExpiry
and the redis client usage so the refactor replaces the duplicated blocks with
the table-driven loops.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapter/redis_collection_fastpath_test.go`:
- Around line 286-307: The smoke test
TestRedis_HGET_FastPathGuardDualEncodingSmoke doesn't reliably exercise the
guard because replaceWithStringTxn (the SET path) may atomically clean up hash
rows, so HGET observes nil/WRONGTYPE for unrelated reasons; either
rename/document the test as a "no-panic/no-leak regression" instead of a
guard-fires assertion, or add a deterministic unit test that bypasses SET's
cleanup by writing both HashFieldKey(...) and redisStrKey(...) directly through
the store interface (or lower-level helper) so hasHigherPriorityStringEncoding
and the guard path are reached predictably; update test name/comments or add the
new test harness exercising the guard explicitly.

---

Nitpick comments:
In `@adapter/redis_collection_fastpath_test.go`:
- Around line 41-270: Consolidate the duplicated tests
(TestRedis_HGET_FastPathHit, TestRedis_HGET_Miss, TestRedis_HGET_UnknownField,
TestRedis_HGET_WRONGTYPE, TestRedis_HGET_TTLExpired and the analogous HEXISTS
and SISMEMBER tests) into three table-driven tests (e.g.
TestRedis_HGET_FastPath, TestRedis_HEXISTS_FastPath,
TestRedis_SISMEMBER_FastPath): move the common setup (t.Parallel(), createNode,
shutdown, redis.NewClient, defer Close, ctx) to the top of each table test,
define a cases slice with fields like name, setup func, key, field/member,
wantErr/wantValue, and loop with t.Run for subtests, invoking per-case setup to
create pre-state (including PExpire + waitForTTLExpiry for TTL cases) and then
assert according to wantErr/wantValue (handle redis.Nil via require.ErrorIs).
Ensure you reference the existing helpers createNode, shutdown, waitForTTLExpiry
and the redis client usage so the refactor replaces the duplicated blocks with
the table-driven loops.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1fe881c4-e583-432d-b32c-bad81862611e

📥 Commits

Reviewing files that changed from the base of the PR and between 21425cf and 80df00c.

📒 Files selected for processing (2)
  • adapter/redis_collection_fastpath_test.go
  • adapter/redis_compat_commands.go

Comment on lines +286 to +307
func TestRedis_HGET_FastPathGuardDualEncodingSmoke(t *testing.T) {
t.Parallel()
nodes, _, _ := createNode(t, 3)
defer shutdown(nodes)

ctx := context.Background()
rdb := redis.NewClient(&redis.Options{Addr: nodes[0].redisAddress})
defer func() { _ = rdb.Close() }()

require.NoError(t, rdb.HSet(ctx, "dual:key", "field", "hashVal").Err())
require.NoError(t, rdb.Set(ctx, "dual:key", "stringVal", 0).Err())

// After the SET, HGET must either see WRONGTYPE (string-priority
// guard fires, slow path reports the key as string) or nil (the
// hash keys were cleaned up). It must NOT return "hashVal".
got, err := rdb.HGet(ctx, "dual:key", "field").Result()
if err != nil {
require.Contains(t, err.Error(), "WRONGTYPE")
return
}
require.NotEqual(t, "hashVal", got, "fast path must not leak collection data when a string has taken over the key")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Smoke test likely does not exercise the guard in practice.

If replaceWithStringTxn (the SET path's cleanup for pre-existing non-string encodings) removes the hash wide-column rows atomically with the SET commit, then any HGET at readTS >= SET.commitTS will never observe the dual-encoded state — the guard's ExistsAt on redisStrKey simply fires against an already-cleaned hash, and the test falls into the "nil or WRONGTYPE" branch for reasons unrelated to hasHigherPriorityStringEncoding.

As the comment at lines 281-285 acknowledges, the race window is tiny and this is a regression-safety smoke test rather than a direct guard assertion. Consider either:

  1. Renaming/documenting it as a "no-panic, no-leak" regression guard rather than a guard-fires test, or
  2. Adding a direct unit test that writes both HashFieldKey(...) and redisStrKey(...) through the store interface (bypassing SET's cleanup) so the guard path is deterministically reached.

Either way, hasHigherPriorityStringEncoding remains lightly covered — worth a follow-up if the dual-encoding corruption scenario is considered important to regression-test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/redis_collection_fastpath_test.go` around lines 286 - 307, The smoke
test TestRedis_HGET_FastPathGuardDualEncodingSmoke doesn't reliably exercise the
guard because replaceWithStringTxn (the SET path) may atomically clean up hash
rows, so HGET observes nil/WRONGTYPE for unrelated reasons; either
rename/document the test as a "no-panic/no-leak regression" instead of a
guard-fires assertion, or add a deterministic unit test that bypasses SET's
cleanup by writing both HashFieldKey(...) and redisStrKey(...) directly through
the store interface (or lower-level helper) so hasHigherPriorityStringEncoding
and the guard path are reached predictably; update test name/comments or add the
new test harness exercising the guard explicitly.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread adapter/redis_compat_commands.go Outdated
Comment on lines +1723 to +1734
// hasHigherPriorityStringEncoding returns true iff a string-encoded
// entry exists for key. Matches the "string wins" tiebreaker in
// rawKeyTypeAt (see adapter/redis_compat_helpers.go), so a corrupt
// dual-encoded key where both a string and a collection row are
// present will take the slow path and return WRONGTYPE / nil from
// keyTypeAt rather than the collection-specific fast-path answer.
func (r *RedisServer) hasHigherPriorityStringEncoding(ctx context.Context, key []byte, readTS uint64) (bool, error) {
exists, err := r.store.ExistsAt(ctx, redisStrKey(key), readTS)
if err != nil {
return false, cockerrors.WithStack(err)
}
return exists, nil
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for hasHigherPriorityStringEncoding says it “matches the 'string wins' tiebreaker in rawKeyTypeAt”, but the implementation only checks redisStrKey(key) and does not include the other string encodings rawKeyTypeAt considers (redisHLLKey(key) and the legacy bare key). This is misleading for future maintainers and for callers relying on the comment. Either expand the check to match probeStringTypes’ candidate set, or update the comment/name to explicitly say it only guards against the new-format string encoding.

Copilot uses AI. Check for mistakes.
Addresses copilot follow-up on PR #565.

The previous docstring claimed the guard "matches the 'string wins'
tiebreaker in rawKeyTypeAt", but rawKeyTypeAt's full tiebreaker also
covers redisHLLKey and the legacy bare key, while this guard only
probes redisStrKey. The wording was misleading for future
maintainers and for callers relying on the comment.

Rewrite to be explicit:
- name the narrow scope (new-format string encoding only)
- call out which dual-encoding cases still slip through (HLL, legacy
  bare, list encodings that outrank wide-column collections)
- justify the narrow scope: a complete mirror would add ~4-5 extra
  seeks per fast-path hit, undoing the miss-path reordering win from
  the previous follow-up commit
- point callers that need strict alignment at keyTypeAt
@bootjp bootjp enabled auto-merge April 21, 2026 06:31
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
adapter/redis_compat_commands.go (1)

1234-1234: Optional: wide-column misses still do a full-collection load.

When the fast-path ExistsAt/GetAt on the specific field/member misses but the key is a live wide-column hash/set (e.g. HGET k nonexistent_field on a 10k-field hash, or SISMEMBER k nonmember on a large set), the slow path runs keyTypeAtloadHashAt/loadSetAt, which loads the entire collection just to return nil/0. That is pre-existing behavior — not a regression from this PR — but it means the fast-path win is concentrated on hits only.

A cheap follow-up would be: after a fast-path miss, probe store.HashMetaKey(key) / store.SetMetaKey(key) — if present, the key is a live wide-column collection and you can short-circuit to WriteNull/WriteInt(0) without loadHashAt/loadSetAt. This keeps negative lookups at O(1) seeks for wide-column encodings and only falls through to the full slow path for legacy-blob / WRONGTYPE disambiguation. Defer if out of scope for this PR.

Also applies to: 1674-1674, 1991-1991

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/redis_compat_commands.go` at line 1234, After a fast-path miss in the
ExistsAt/GetAt flow (and in callers like sismemberSlow which currently fall
through to keyTypeAt → loadHashAt/loadSetAt), probe the wide-column meta keys
first: check store.HashMetaKey(key) for hash ops or store.SetMetaKey(key) for
set ops; if the meta key exists treat the key as a live wide-column collection
and short-circuit the slow path by emitting WriteNull (for missing hash field)
or WriteInt(0) (for SISMEMBER negative) instead of calling loadHashAt/loadSetAt;
keep falling through to keyTypeAt/load* only when the meta key is absent so
legacy blob/WRONGTYPE handling still occurs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@adapter/redis_compat_commands.go`:
- Line 1234: After a fast-path miss in the ExistsAt/GetAt flow (and in callers
like sismemberSlow which currently fall through to keyTypeAt →
loadHashAt/loadSetAt), probe the wide-column meta keys first: check
store.HashMetaKey(key) for hash ops or store.SetMetaKey(key) for set ops; if the
meta key exists treat the key as a live wide-column collection and short-circuit
the slow path by emitting WriteNull (for missing hash field) or WriteInt(0) (for
SISMEMBER negative) instead of calling loadHashAt/loadSetAt; keep falling
through to keyTypeAt/load* only when the meta key is absent so legacy
blob/WRONGTYPE handling still occurs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f853974c-df25-4ac6-861e-2fb8e65ccfcf

📥 Commits

Reviewing files that changed from the base of the PR and between 80df00c and d3323e0.

📒 Files selected for processing (1)
  • adapter/redis_compat_commands.go

@bootjp bootjp merged commit 363544c into main Apr 21, 2026
8 of 9 checks passed
@bootjp bootjp deleted the perf/redis-type-specific-fastpath branch April 21, 2026 06:37
bootjp added a commit that referenced this pull request Apr 21, 2026
…pts (#567)

## Summary
Follow-up to PR #565 (top-level HGET / HEXISTS / SISMEMBER fast-paths).
BullMQ and similar Lua-heavy workloads never saw the benefit because
`luaScriptContext.cmdHGet` etc. still went through `hashState` /
`setState`, which paid the full `keyTypeAt` probe (~8 seeks post-#563)
before loading the value.

### Effect on BullMQ-style scripts
A typical BullMQ script (`moveToActive`, `moveToCompleted`, etc.)
touches 5–10 keys. Before this PR, each key cost ~8 seeks just for type
classification, i.e. **40–80 wasted seeks per script invocation**.
After: 1–2 seeks per fast-path hit.

### Changes
Port #565's pattern verbatim into the Lua command handlers:

| Lua cmd       | Fast-path helper reused from #565 |
|---------------|-----------------------------------|
| `cmdHGet`     | `hashFieldFastLookup`             |
| `cmdHExists`  | `hashFieldFastExists`             |
| `cmdSIsMember`| `setMemberFastExists`             |

Each handler still consults `c.hashes` / `c.sets` **first**, so
in-script mutations (`HSET` followed by `HGET` in the same `Eval`) stay
visible — the fast path only fires when the key has not yet been loaded
into the script-local cache. Misses fall through to the legacy
`hashState` / `setState` loader so legacy-blob encodings, `WRONGTYPE`,
and TTL-expired keys behave identically to before.

Small helpers factored out (`hgetFromHashState`, `hexistsFromHashState`,
`sismemberFromSetState`) to keep each command body under the
cyclomatic-complexity cap.

## Test plan
- [x] `go test -race -short ./adapter/...` passes
- [x] `adapter/redis_lua_collection_fastpath_test.go` drives each
command through a real `redis.Eval`:
  - fast-path hit returns the value / 1
- in-script write (HSET / SADD) is visible to a subsequent HGET /
SISMEMBER
  - miss / unknown field / member → nil / 0 via slow-path fallback
  - TTL-expired key → nil / 0
  - WRONGTYPE propagates as script error

## Depends on
- PR #565 (not yet merged) — provides `hashFieldFastLookup`,
`hashFieldFastExists`, `setMemberFastExists`. This branch is based on
#565; once #565 merges, rebase to main will be clean.
bootjp added a commit that referenced this pull request Apr 21, 2026
BullMQ and similar Lua-heavy workloads call ZSCORE for delay-queue
position checks and job-existence checks; the legacy path went
through zsetState (~8-seek keyTypeAt + full zset state init) before
a direct GetAt on ZSetMemberKey.

Reuse PR #565 / #567's pattern for the single-member ZSET case:

- adapter/redis_compat_commands.go: new zsetMemberFastScore helper.
  Probes store.ZSetMemberKey directly. Probe-first-then-guard
  ordering matches the post-PR #565 helpers so missing / legacy-
  blob / wrong-type workloads do not pay the priority-guard seek.
  Narrow-scope priority guard covers the redisStrKey dual-encoding
  corruption case only, same scope as hashFieldFastLookup.
- adapter/redis_lua_context.go: cmdZScore rewrite.
  * Enforce exact arity (zscoreArgCount = 2) -- extras rejected,
    not silently ignored.
  * luaZSetAlreadyLoaded helper mirrors luaHashAlreadyLoaded /
    luaSetAlreadyLoaded (from PR #567 follow-up): a loaded zset
    state (even exists=false) short-circuits to the slow path so
    repeated ZSCORE on the same missing key does not re-probe.
  * cachedType defer: prior ZADD / ZREM in this Eval, explicit DEL
    tombstone, or type change via SET all go through the slow path
    so zsetState -> keyType -> cachedType produces the correct
    WRONGTYPE / nil / cached-score answer.
  * Fast path via zsetMemberFastScore when both guards pass.
  * zscoreFromSlowPath helper factored out to keep the function
    body under the cyclomatic-complexity cap.

Per-call effect on BullMQ position checks:
  Before: ~8 seeks (keyTypeAt) + scan / GetAt inside zsetState +
          GetAt inside memberScore
  After (fast-path hit): 3 seeks -- ZSetMemberKey GetAt +
          redisStrKey ExistsAt guard + TTL probe

Falls back to the legacy flow on miss so legacy-blob zsets and
WRONGTYPE paths retain their prior behaviour.

Tests: 10 new ZSCORE cases in redis_lua_collection_fastpath_test.go:
fast-path hit / in-script ZADD visible / miss / unknown member /
WRONGTYPE / TTL-expired / SET-then-ZSCORE (WRONGTYPE via cachedType
guard) / DEL-then-ZSCORE (nil via cachedType guard) / arity too few
(1 arg rejected) / arity too many (3 args rejected).
bootjp added a commit that referenced this pull request Apr 21, 2026
BullMQ and similar Lua-heavy workloads call ZSCORE for delay-queue
position checks and job-existence checks; the legacy path went through
zsetState (~8-seek keyTypeAt + full zset state init) before a direct
GetAt on ZSetMemberKey.

Reuse PR #565 / #567's pattern for the single-member ZSET case:

- adapter/redis_compat_commands.go: new zsetMemberFastScore helper.
Probes store.ZSetMemberKey directly. Probe-first-then-guard ordering
matches the post-PR #565 helpers so missing / legacy- blob / wrong-type
workloads do not pay the priority-guard seek. Narrow-scope priority
guard covers the redisStrKey dual-encoding corruption case only, same
scope as hashFieldFastLookup.
- adapter/redis_lua_context.go: cmdZScore rewrite.
* Enforce exact arity (zscoreArgCount = 2) -- extras rejected, not
silently ignored.
* luaZSetAlreadyLoaded helper mirrors luaHashAlreadyLoaded /
luaSetAlreadyLoaded (from PR #567 follow-up): a loaded zset state (even
exists=false) short-circuits to the slow path so repeated ZSCORE on the
same missing key does not re-probe.
* cachedType defer: prior ZADD / ZREM in this Eval, explicit DEL
tombstone, or type change via SET all go through the slow path so
zsetState -> keyType -> cachedType produces the correct WRONGTYPE / nil
/ cached-score answer.
  * Fast path via zsetMemberFastScore when both guards pass.
* zscoreFromSlowPath helper factored out to keep the function body under
the cyclomatic-complexity cap.

Per-call effect on BullMQ position checks:
  Before: ~8 seeks (keyTypeAt) + scan / GetAt inside zsetState +
          GetAt inside memberScore
  After (fast-path hit): 3 seeks -- ZSetMemberKey GetAt +
          redisStrKey ExistsAt guard + TTL probe

Falls back to the legacy flow on miss so legacy-blob zsets and WRONGTYPE
paths retain their prior behaviour.

Tests: 10 new ZSCORE cases in redis_lua_collection_fastpath_test.go:
fast-path hit / in-script ZADD visible / miss / unknown member /
WRONGTYPE / TTL-expired / SET-then-ZSCORE (WRONGTYPE via cachedType
guard) / DEL-then-ZSCORE (nil via cachedType guard) / arity too few (1
arg rejected) / arity too many (3 args rejected).
bootjp added a commit that referenced this pull request Apr 21, 2026
Production pprof after #565 / #567 / #568 showed
`cmdZRangeByScore` + `cmdZRangeByScoreAsc` holding 40 active
goroutines (top Lua-path signal), driven by BullMQ delay-queue
polls. The legacy path loaded ALL zset members via
zsetState -> ensureZSetLoaded before filtering by score bounds --
O(N) GetAts per ZRANGEBYSCORE, so an N=1000 delay queue cost
several seconds per call and was the dominant residual after the
previous keyTypeAt fast-paths landed.

Replace the load-then-filter flow with a bounded scan over the
pre-sorted score index:

- adapter/redis_compat_commands.go adds zsetRangeByScoreFast which
  translates [startKey, endKey) byte bounds + offset / limit /
  scoreFilter into a ScanAt / ReverseScanAt on
  ZSetScoreRangeScanPrefix. Legacy-blob zsets, string-priority
  corruption, and empty-result-but-zset-is-missing cases bail out
  with hit=false so the slow path's WRONGTYPE / disambiguation
  semantics are preserved. zsetRangeEmptyFastResult checks
  ZSetMetaKey to distinguish "empty range on a live zset" from "no
  zset here". Split into zsetFastPathEligible, zsetFastScanLimit,
  zsetScoreScan, decodeZSetScoreRange, finalizeZSetFastRange
  helpers so the main entrypoint stays under the cyclomatic-
  complexity cap.
- adapter/redis_lua_context.go cmdZRangeByScore consults
  luaZSetAlreadyLoaded + cachedType (mirrors PR #567 / #568 safety
  guards) before entering the fast path. The legacy full-load path
  lives in cmdZRangeByScoreSlow and still runs for in-script
  mutations / legacy-blob / type-change cases. zsetScoreScanBounds
  maps zScoreBound (-Inf / value-inc / value-exc / +Inf) to the
  byte range; exclusive edges are handled post-scan by
  scoreInRange.

Per-call effect on BullMQ-style delay-queue polls with LIMIT:
  Before: O(N) member GetAts + in-process filter
  After  (fast-path hit): 1 legacy-blob ExistsAt + a single bounded
           ScanAt (N_range <= offset + limit, clamped by
           maxWideScanLimit) + 1 priority-guard ExistsAt + 1 TTL
           probe.

Tests: 10 new cases in redis_lua_collection_fastpath_test.go --
fast-path hit / WITHSCORES / LIMIT offset / exclusive bounds /
empty range / missing key / WRONGTYPE / in-script ZADD visibility
/ ZREVRANGEBYSCORE / TTL-expired. All pass under -race along with
the existing adapter suite.
bootjp added a commit that referenced this pull request Apr 21, 2026
## Summary
Phase B of the ZSet fast-path plan. Production pprof after PRs #565 /
#567 / #568 showed `cmdZRangeByScore` + `cmdZRangeByScoreAsc` holding
**40 active goroutines** — the top residual Lua-path signal — driven by
BullMQ delay-queue polls. The legacy path loaded ALL zset members via
`zsetState` → `ensureZSetLoaded` before filtering by score bounds: O(N)
`GetAt`s per `ZRANGEBYSCORE`, so an N=1000 delay queue cost several
seconds per call.

Replace load-then-filter with a **bounded scan over the pre-sorted score
index** (`!zs|scr|<user>|<sortable-score>|<member>`).

### Changes

**`adapter/redis_compat_commands.go`** — new `zsetRangeByScoreFast`
helper plus small utilities:
- Legacy-blob zsets, string-priority corruption, and
empty-result-but-zset-missing cases return `hit=false` so the slow
path's WRONGTYPE / disambiguation semantics remain intact.
- `zsetRangeEmptyFastResult` checks `ZSetMetaKey` to separate "empty
range on a live zset" from "no zset here".
- Split into `zsetFastPathEligible` / `zsetFastScanLimit` /
`zsetScoreScan` / `decodeZSetScoreRange` / `finalizeZSetFastRange` to
stay under the cyclomatic-complexity cap.

**`adapter/redis_lua_context.go`** — `cmdZRangeByScore` gated on
`luaZSetAlreadyLoaded` + `cachedType` (mirrors PR #567 / #568 safety
guards):
- In-script ZADD / ZREM / DEL / SET / type-change still go through the
legacy full-load path (`cmdZRangeByScoreSlow`).
- `zsetScoreScanBounds` maps `zScoreBound` (-Inf / value-inc / value-exc
/ +Inf) to a byte range; exclusive edges are handled post-scan by
`scoreInRange`.

### Per-call effect on BullMQ-style delay-queue polls

| Scenario | Before | After (fast-path hit) |
|---|---|---|
| `ZRANGEBYSCORE key 0 now LIMIT 0 10` over N=1000 zset | ~1000 member
`GetAt`s | 1 legacy-blob `ExistsAt` + 1 bounded `ScanAt` (≤10 rows) + 1
priority-guard `ExistsAt` + 1 TTL probe |
| Empty range | ~1000 `GetAt`s | 1 legacy-blob + 1 `ZSetMetaKey` + 1 TTL
|
| In-script `ZADD` then `ZRANGEBYSCORE` | unchanged | unchanged (slow
path via `luaZSetAlreadyLoaded`) |

## Test plan
- [x] `go test -race -short ./adapter/...` passes
- [x] 10 new cases in `adapter/redis_lua_collection_fastpath_test.go`:
  - fast-path hit (ascending)
  - WITHSCORES
  - LIMIT + offset
  - exclusive bounds `(min (max`
  - empty range on live zset
  - missing key
  - WRONGTYPE (key holds a string)
  - in-script `ZADD` visible to subsequent `ZRANGEBYSCORE`
  - `ZREVRANGEBYSCORE` (reverse iteration)
  - TTL-expired zset
- [ ] Production: deploy and verify `cmdZRangeByScore` active goroutines
drop from ~20+20 back toward the keyTypeAt-style residual. Expect p50
redis.call latency inside Lua to fall from the current 3s toward
ms-scale as delay-queue polls stop pulling the whole zset.

## Follow-ups (out of scope)
- `ZRANGE` (by rank) streaming — needs the same pattern with positional
iteration.
- `ZPOPMIN` — writes; needs the streaming scan plus the delete-member
transaction plumbing.
- `HGETALL` / `HMGET` — separate wide-column prefix scans.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Optimized sorted-set range-by-score fast path for faster, paginated
queries with correct inclusive/exclusive bound handling.

* **Bug Fixes**
* Safer fallback to full-path when results may be incomplete; fixed
empty-range, truncation, type-mismatch and TTL/expiration behaviors;
negative LIMIT offsets now rejected and negative limits treated as “no
limit.”

* **Tests**
* Added extensive Lua-scripted tests covering range reads, WITHSCORES,
LIMIT semantics, exclusive bounds, in-script mutations, TTL, and
regressions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants