obs(redis-lua): per-cmd fast-path outcome counter by bootjp · Pull Request #571 · bootjp/elastickv

bootjp · 2026-04-21T13:12:45Z

Summary

Add elastickv_lua_cmd_fastpath_total{cmd,outcome} counter to measure how often Lua-side fast paths actually take the fast path vs fall back (hit / skip_loaded / skip_cached_type / fallback).
Wire LuaFastPathObserver through Registry → RedisServer option → luaScriptContext.observeFastPathOutcome.
Instrument cmdZRangeByScore as the first caller.

Motivation

Post-deploy CPU profile on a BullMQ-style workload shows cmdZRangeByScoreSlow consuming ~14x the CPU of zsetRangeByScoreFast (860ms vs 60ms in a 30 s profile), despite the fast path being eligible on paper. Without per-outcome instrumentation it is not possible to tell whether scripts are hitting luaZSetAlreadyLoaded (a prior ZCARD/ZCOUNT on the same key), cachedType (SET/DEL earlier in the script), or the server-side hit=false fallback (legacy-blob zset, truncation, large-offset short-circuit).

Shipping this counter first so we can decide whether the follow-up should loosen the luaZSetAlreadyLoaded guard, or target a different cause.

Test plan

go test ./adapter/... ./monitoring/... green
New TestLuaFastPathObserver* in monitoring/hotpath_test.go
Zero-value observer safe (already used by tests that do not wire a registry)
Deploy to prod, watch rate(elastickv_lua_cmd_fastpath_total{cmd=zrangebyscore}[5m]) by outcome for ~5 minutes

Summary by CodeRabbit

New Features
- Added Prometheus metrics to monitor Lua script fast-path execution performance, tracking per-command outcomes including successful hits, skipped operations from cached data, and fallback scenarios.
Tests
- Added tests validating Lua fast-path metrics accuracy and zero-value safety.

Instrument redis.call() fast-path decisions inside Lua scripts so operators can see how often a given command actually takes the fast path vs falls back. Triggered by a production profile showing cmdZRangeByScoreSlow consuming ~14x the CPU of zsetRangeByScoreFast on a BullMQ-style workload, despite the fast path being eligible on paper. - Add elastickv_lua_cmd_fastpath_total{cmd,outcome} counter, with outcome = hit | skip_loaded | skip_cached_type | fallback. - Wire LuaFastPathObserver through Registry, RedisServer option, and luaScriptContext.observeFastPathOutcome. - Instrument cmdZRangeByScore as the first caller; other fast-path commands (ZSCORE, HGET, HEXISTS, SISMEMBER) will follow once this metric validates the instrumentation shape.

coderabbitai · 2026-04-21T13:12:53Z

📝 Walkthrough

Walkthrough

The PR adds Lua fast-path monitoring instrumentation for ZRANGEBYSCORE operations. It introduces a new LuaFastPathObserver type to track outcomes (hit, skip_loaded, skip_cached_type, fallback), integrates the observer into the Redis server adapter, instruments the cmdZRangeByScore function with metric calls, and provides test coverage.

Changes

Cohort / File(s)	Summary
Redis Adapter Configuration `adapter/redis.go`, `adapter/redis_lua_context.go`	Adds `luaFastPathObserver` and `luaFastPathZRange` fields to `RedisServer`, introduces `WithLuaFastPathObserver` option function, and instruments `cmdZRangeByScore` with four observer outcome metrics (`ObserveHit`, `ObserveSkipLoaded`, `ObserveSkipCachedType`, `ObserveFallback`).
Monitoring Infrastructure `monitoring/hotpath.go`, `monitoring/hotpath_test.go`, `monitoring/registry.go`	Defines new `LuaFastPathObserver` and `LuaFastPathCmd` types with outcome label constants, adds Prometheus counter metric `luaFastPathTotal`, provides `ForCommand` method for per-command metric resolution, implements zero-safe observer methods, and adds unit tests for observer behavior and counter emission.
Server Integration `main.go`	Wires `LuaFastPathObserver` into Redis server initialization via `metricsRegistry.LuaFastPathObserver()`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

perf(redis-lua): stream ZRANGEBYSCORE from the score index #570: Implements the ZRANGEBYSCORE fast-path that this PR instruments with monitoring observability.
obs(redis): add hot-path dashboard and metrics to verify PR #560 #564: Extends the same monitoring infrastructure (hotpath.go, registry.go) to add observer types and counter metrics.

Poem

🐰 Fast paths shimmer in the Lua night,
Each hop tracked with metrics bright,
Hit or skip, cached type's call—
Observers count them all! 🎯✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'obs(redis-lua): per-cmd fast-path outcome counter' accurately and concisely summarizes the main change: adding per-command Lua fast-path outcome metrics instrumentation via a Prometheus counter.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/lua-fastpath-metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a monitoring framework to track fast-path execution outcomes for Redis commands within Lua scripts, starting with ZRANGEBYSCORE. It adds a new LuaFastPathObserver, defines Prometheus metrics for various outcomes like hits and fallbacks, and integrates these into the Redis adapter and monitoring registry. Feedback was provided regarding the performance impact of using Prometheus label lookups on hot paths, suggesting the use of atomic operations or pre-resolved counters to minimize mutex contention.

gemini-code-assist · 2026-04-21T13:21:01Z

+func (c *luaScriptContext) observeFastPathOutcome(cmd, outcome string) {
+	if c.server == nil {
+		return
+	}
+	c.server.luaFastPathObserver.ObserveLuaFastPath(cmd, outcome)
+}


The observeFastPathOutcome method is called on every ZRANGEBYSCORE execution within a Lua script. Frequent map lookups in Prometheus (WithLabelValues) involve internal mutexes and can introduce overhead on hot paths. Following repository guidelines for hot paths, consider using atomic operations or pre-resolving the counter to avoid mutex contention and improve performance.

References

For frequently accessed fields that require monotonic updates, prefer atomic operations (e.g., CAS) over mutexes to improve performance on hot paths.

Address gemini review on PR #571: CounterVec.WithLabelValues takes a mutex-guarded map lookup, which runs inside every redis.call() on the ZRANGEBYSCORE fast path. Resolve the four outcome counters once at server construction (WithLuaFastPathObserver) and call the pre-resolved handles from the Lua context so the hot path becomes a single atomic Inc per call. - LuaFastPathObserver.ForCommand(cmd) returns a LuaFastPathCmd bundle with pre-resolved Counter handles for hit / skip_loaded / skip_cached_type / fallback. - RedisServer stores the ZRANGEBYSCORE bundle and calls ObserveHit/etc. directly, no map lookups on the hot path. - Zero-value LuaFastPathCmd still safe: each ObserveXxx checks the handle and no-ops when nil (tests without a wired registry).

bootjp · 2026-04-21T13:25:20Z

Addressed gemini's review (1cd79de):

Medium — WithLabelValues mutex on hot path: the observer API now pre-resolves counter handles at server construction via LuaFastPathObserver.ForCommand(cmd), which returns a LuaFastPathCmd bundle with four resolved prometheus.Counter handles. cmdZRangeByScore calls .ObserveHit() / .ObserveSkipAlreadyLoaded() / .ObserveSkipCachedType() / .ObserveFallback() directly — each is a single atomic Inc, no map lookups on the redis.call() path.

Zero-value LuaFastPathCmd remains safe (each ObserveXxx nil-checks the handle), so tests that do not wire a registry continue to work without change.

/gemini review

gemini-code-assist

Code Review

This pull request introduces a LuaFastPathObserver to monitor the performance and hit ratios of Redis command fast paths within Lua scripts, specifically targeting ZRANGEBYSCORE and ZREVRANGEBYSCORE. The implementation includes new monitoring structures, pre-resolved Prometheus counters for hot-path efficiency, and integration within the Redis adapter. Review feedback focuses on correcting a documentation inaccuracy regarding method names and improving naming consistency between internal fields and metric labels.

gemini-code-assist · 2026-04-21T13:28:53Z

+// inside Lua scripts. The zero value is safe and silently drops
+// samples so tests can pass LuaFastPathObserver{} as a stub.
+//
+// Hot-path shape: each ObserveLuaFastPath-on-handle call is a single


This comment is inaccurate as it refers to a non-existent method ObserveLuaFastPath-on-handle. The correct observer methods on LuaFastPathCmd are named ObserveHit, ObserveSkip, etc. Updating this improves code clarity by removing references to non-existent paths. Additionally, ensure these methods are non-blocking as they are on the hot path; if any blocking behavior is intended, it must be explicitly documented in the interface description.

Suggested change

// Hot-path shape: each ObserveLuaFastPath-on-handle call is a single

// Hot-path shape: each Observe* call on a LuaFastPathCmd handle is a single

References

Removing references to non-existent code paths improves clarity. Interface methods intended to block must have their blocking semantics explicitly documented to avoid ambiguity.

gemini-code-assist · 2026-04-21T13:28:53Z

+// Observe* methods per redis.call(). Safe to copy.
+type LuaFastPathCmd struct {
+	hit             prometheus.Counter
+	skipAlreadyLoad prometheus.Counter


The name skipAlreadyLoad is verbose and inconsistent with the metric label "skip_loaded". Consider renaming the field to skipLoaded and updating the associated constants and methods for consistency. Note that using prometheus.Counter for these monotonic updates on a hot path is preferred over mutexes as it leverages atomic operations, which aligns with performance best practices for frequently accessed fields.

Suggested change

skipAlreadyLoad prometheus.Counter

skipLoaded prometheus.Counter

References

For frequently accessed fields requiring monotonic updates on hot paths, atomic operations are preferred over mutexes to improve performance.

Address gemini review on PR #571: - Fix the doc comment on LuaFastPathObserver that referenced a non-existent "ObserveLuaFastPath-on-handle" method; describe the actual Observe* call shape and note the counters are non-blocking atomic Inc (client_golang's default Counter uses sync/atomic). - Rename the skipAlreadyLoad field, LuaFastPathOutcomeSkipAlreadyLoad constant, and ObserveSkipAlreadyLoaded method to skipLoaded / LuaFastPathOutcomeSkipLoaded / ObserveSkipLoaded so every layer agrees with the metric label value "skip_loaded".

bootjp · 2026-04-21T13:31:42Z

Addressed gemini's review round 2 (e75442e):

Fixed the doc comment on LuaFastPathObserver that referenced a non-existent ObserveLuaFastPath-on-handle method. Now describes the actual Observe* call shape on LuaFastPathCmd and notes the counters are non-blocking atomic Inc (client_golang's default Counter uses sync/atomic internally).
Renamed the field skipAlreadyLoad / constant LuaFastPathOutcomeSkipAlreadyLoad / method ObserveSkipAlreadyLoaded → skipLoaded / LuaFastPathOutcomeSkipLoaded / ObserveSkipLoaded so every layer agrees with the metric label value skip_loaded.

/gemini review

gemini-code-assist

Code Review

This pull request introduces a monitoring infrastructure to track Lua fast-path outcomes for Redis commands, specifically implementing it for ZRANGEBYSCORE and ZREVRANGEBYSCORE. It adds a LuaFastPathObserver to the monitoring package, integrates it into the RedisServer adapter, and includes unit tests to ensure correct metric reporting and zero-value safety. I have no feedback to provide.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

adapter/redis.go (1)

269-346: ⚠️ Potential issue | 🟡 Minor

Keep the cmd label command-specific.

cmdZRevRangeByScore also reaches the Lua fast-path code, but this single bundle records it as cmd="zrangebyscore". That makes the new “per-command” metric conflate ZRANGEBYSCORE and ZREVRANGEBYSCORE, which can skew the production validation query.

Suggested direction

-	// luaFastPathZRange is the pre-resolved counter bundle for the
-	// ZRANGEBYSCORE / ZREVRANGEBYSCORE Lua fast path. Resolved once in
+	// luaFastPathZRangeByScore and luaFastPathZRevRangeByScore are
+	// pre-resolved counter bundles for the Lua fast path. Resolved once in
 	// WithLuaFastPathObserver so the hot path does not pay for
 	// CounterVec.WithLabelValues on every redis.call().
-	luaFastPathZRange monitoring.LuaFastPathCmd
+	luaFastPathZRangeByScore    monitoring.LuaFastPathCmd
+	luaFastPathZRevRangeByScore monitoring.LuaFastPathCmd
...
 	return func(r *RedisServer) {
 		r.luaFastPathObserver = observer
-		r.luaFastPathZRange = observer.ForCommand(luaFastPathCmdZRangeByScore)
+		r.luaFastPathZRangeByScore = observer.ForCommand(luaFastPathCmdZRangeByScore)
+		r.luaFastPathZRevRangeByScore = observer.ForCommand(luaFastPathCmdZRevRangeByScore)
 	}
 }
 
-// luaFastPathCmdZRangeByScore is the shared label for ZRANGEBYSCORE
-// and ZREVRANGEBYSCORE fast-path outcomes. Both directions take the
-// same branch through zsetRangeByScoreFast so sharing one label
-// keeps the counter cardinality bounded.
-const luaFastPathCmdZRangeByScore = "zrangebyscore"
+const (
+	luaFastPathCmdZRangeByScore    = "zrangebyscore"
+	luaFastPathCmdZRevRangeByScore = "zrevrangebyscore"
+)

Then select the appropriate bundle in cmdZRangeByScore based on reverse.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@adapter/redis.go` around lines 269 - 346, The fast-path counter currently
conflates ZRANGEBYSCORE and ZREVRANGEBYSCORE by always resolving
r.luaFastPathZRange to luaFastPathCmdZRangeByScore in WithLuaFastPathObserver;
add a distinct constant (e.g. luaFastPathCmdZRevRangeByScore) and stop using one
pre-resolved bundle for both. In practice, remove/stop relying on the single
luaFastPathZRange field and instead select the correct counter from
r.luaFastPathObserver inside cmdZRangeByScore based on the reverse boolean (call
r.luaFastPathObserver.ForCommand(reverse ? luaFastPathCmdZRevRangeByScore :
luaFastPathCmdZRangeByScore)), ensuring each command has its own label.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@adapter/redis.go`:
- Around line 269-346: The fast-path counter currently conflates ZRANGEBYSCORE
and ZREVRANGEBYSCORE by always resolving r.luaFastPathZRange to
luaFastPathCmdZRangeByScore in WithLuaFastPathObserver; add a distinct constant
(e.g. luaFastPathCmdZRevRangeByScore) and stop using one pre-resolved bundle for
both. In practice, remove/stop relying on the single luaFastPathZRange field and
instead select the correct counter from r.luaFastPathObserver inside
cmdZRangeByScore based on the reverse boolean (call
r.luaFastPathObserver.ForCommand(reverse ? luaFastPathCmdZRevRangeByScore :
luaFastPathCmdZRangeByScore)), ensuring each command has its own label.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 20cd12ec-978e-4359-a4ed-9967158c078f

📥 Commits

Reviewing files that changed from the base of the PR and between f1dcc14 and 138cb73.

📒 Files selected for processing (6)

adapter/redis.go
adapter/redis_lua_context.go
main.go
monitoring/hotpath.go
monitoring/hotpath_test.go
monitoring/registry.go

Production metric from PR #571 on n1 (leader) after 15 min of BullMQ traffic: fallback 1463 (96.4%) hit 54 (3.6%) skip_loaded 0 skip_cached_type 0 The Lua-layer guards (luaZSetAlreadyLoaded / cachedType) are not the issue; the fast path is being rejected server-side. The single "fallback" bucket is too coarse to tell whether that is the legacy-blob eligibility check, the truncation guard, the large-offset short-circuit, the empty-key missing-zset path, or the string-priority wrong-type guard. Subdivide the outcome label and thread the reason back from zsetRangeByScoreFast: - fallback_ineligible (legacy-blob zset exists) - fallback_large_offset (offset >= maxWideScanLimit short-circuit) - fallback_truncated (scan saturated) - fallback_missing_key (zsetRangeEmptyFastResult: zsetExists=false) - fallback_wrong_type (hasHigherPriorityStringEncoding=true) - fallback_other (error / fallback we didn't classify) All counters are pre-resolved at ForCommand() time (same hot-path shape as #571) so routing by reason is still a single atomic Inc per call. ObserveFallback(reason) dispatches via a small switch. The hypothesis from the metric pattern is that most fallbacks are "fallback_missing_key" (BullMQ polling empty delayed queues). Ship this first, measure in prod, then write the fix accordingly.

…572) ## Summary Follow-up to PR #571. The counter it added showed (on n1 / leader, 15 min of BullMQ traffic): ``` fallback 1463 (96.4%) hit 54 (3.6%) skip_loaded 0 skip_cached_type 0 ``` The Lua-layer guards are not the issue; the fast path is being rejected server-side. The current single `fallback` bucket is too coarse to tell *why*: legacy-blob eligibility, large-offset short-circuit, truncation, empty-key missing-zset, or wrong-type guard. This PR subdivides the `fallback` outcome label so the real cause is visible in 5 minutes of prod scrapes. ## Changes - New outcome labels (replacing single `fallback`): - `fallback_ineligible` — `zsetFastPathEligible` rejected (legacy-blob zset) - `fallback_large_offset` — `offset >= maxWideScanLimit` short-circuit - `fallback_truncated` — scan hit `scanLimit` with request unsatisfied - `fallback_missing_key` — `zsetRangeEmptyFastResult` saw `zsetExists=false` - `fallback_wrong_type` — `hasHigherPriorityStringEncoding=true` - `fallback_other` — error / unclassified - `zsetRangeByScoreFast` grows a `reason string` return (empty when `hit=true`); all `hit=false` sites populate a specific reason. - `zsetRangeEmptyFastResult` now returns `(hit, reason, err)`; the old `[]redisZSetEntry` return was always `nil`. - `LuaFastPathCmd` pre-resolves 6 new counters (same hot-path shape as #571: one atomic `Inc` per redis.call()). - `ObserveFallback(reason)` dispatches via a small switch; unknown reasons land on `fallback_other`. ## Motivation Pre-deploy hypothesis: most fallbacks are `fallback_missing_key` (BullMQ polling empty delayed queues). Ship this first, measure in prod, then write the actual fix accordingly. ## Test plan - [x] `go test -race -short ./adapter/...` (53s) green - [x] `go test -race ./monitoring/...` green - [x] `TestLuaFastPathObserverCountsByCmdAndOutcome` updated for new labels + unknown-reason dispatch - [ ] Deploy to prod; watch `sum by (outcome) (rate(elastickv_lua_cmd_fastpath_total{cmd="zrangebyscore"}[5m]))`  ## Summary by CodeRabbit * **New Features** * Improved query performance monitoring with categorized fallback reasons for sorted set operations, providing enhanced visibility into query optimization effectiveness and performance diagnostics.

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

style: realign gofmt columns after skip_loaded rename

138cb73

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

bootjp merged commit e067ec5 into main Apr 21, 2026
8 checks passed

bootjp deleted the feat/lua-fastpath-metrics branch April 21, 2026 14:22

bootjp mentioned this pull request Apr 21, 2026

obs(redis-lua): subdivide ZRANGEBYSCORE fast-path fallback by reason #572

Merged

4 tasks

	// Hot-path shape: each ObserveLuaFastPath-on-handle call is a single
	// Hot-path shape: each Observe* call on a LuaFastPathCmd handle is a single

	skipAlreadyLoad prometheus.Counter
	skipLoaded prometheus.Counter

Conversation

bootjp commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bootjp commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

bootjp commented Apr 21, 2026 •

edited

Loading