Skip to content

obs(redis): expose unsupported-command names via bounded metric#594

Merged
bootjp merged 5 commits intomainfrom
obs/redis-unsupported-command-names
Apr 23, 2026
Merged

obs(redis): expose unsupported-command names via bounded metric#594
bootjp merged 5 commits intomainfrom
obs/redis-unsupported-command-names

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 23, 2026

Summary

  • Add a new Prometheus counter elastickv_redis_unsupported_commands_total{command} that records the actual name of Redis commands the adapter rejects at the route-miss path (adapter/redis.go). Today those are all collapsed into command="unknown", which leaves a large unnamed bar in Grafana's "Errors by Command".
  • The new metric is additive: the existing elastickv_redis_requests_total{command="unknown",outcome="error"} and elastickv_redis_errors_total{command="unknown"} counters are unchanged, so existing dashboards and alerts keep working.

Cardinality guarantee

  • Hard cap of 32 distinct command-name label values (maxUnsupportedCommandLabels = 32), guarded by a sync.Mutex on an in-memory seen-set in RedisMetrics.
  • Once the cap is reached, any further novel name folds into command="other". Already-seen names still record with their real label even after the cap is full.
  • Label values are uppercased, trimmed, and truncated to 64 chars (maxUnsupportedCommandLabelLen = 64) before use, so a pathological client cannot pollute label values via protocol abuse (e.g. sending a huge first argument).

Wiring

  • monitoring.RedisRequestReport gains an Unsupported bool field. The adapter sets it to true only at the route-miss path via a new helper observeRedisUnsupported; all other call sites are unchanged.
  • RedisMetrics.ObserveRedisRequest still runs the existing "unknown"-bucket path for unsupported commands, then additionally increments the new bounded counter when Unsupported is set.

Test plan

  • go test -race ./monitoring/...
  • go test -race ./adapter/...
  • golangci-lint run ./monitoring/... ./adapter/...
  • Unsupported command records its real (uppercased) name.
  • Two different unsupported commands produce two label values.
  • The 33rd distinct name folds into command="other" and further overflow increments accumulate there.
  • An already-seen name (after cap reached) still records with its real label, not "other".
  • Extreme-long (200-char) name is truncated to 64 chars.
  • Concurrent observers do not race (checked with -race).
  • Existing NOTACOMMAND test now asserts the new counter carries NOTACOMMAND alongside the unchanged unknown bucket.

Summary by CodeRabbit

  • New Features

    • Added dedicated monitoring for unsupported Redis commands with a new metric tracking command names and frequency, enabling clearer visibility into command rejections.
  • Tests

    • Expanded test coverage for unsupported command tracking, including label cardinality management and concurrent operation validation.

Add elastickv_redis_unsupported_commands_total{command} counter so
operators can see which commands clients are probing instead of a
single opaque "unknown" bar in Grafana.

Cardinality is capped at 32 distinct observed names (guarded by a
sync.Mutex); novel names observed after the cap are folded into
command="other". Label values are uppercased, trimmed, and truncated
to 64 chars so a pathological client cannot pollute labels via
protocol abuse.

Wiring is additive: RedisRequestReport gains an Unsupported bool that
the adapter sets only at the route-miss path. The existing
elastickv_redis_requests_total{command="unknown",outcome="error"} and
elastickv_redis_errors_total{command="unknown"} counters are
unchanged so existing dashboards and alerts keep working.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 52 minutes and 25 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 52 minutes and 25 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ddfac6ea-4660-4704-8830-fd9fa4926fb2

📥 Commits

Reviewing files that changed from the base of the PR and between 91e16e9 and 214d71d.

📒 Files selected for processing (3)
  • adapter/redis.go
  • adapter/redis_metrics_test.go
  • monitoring/redis.go
📝 Walkthrough

Walkthrough

A new monitoring capability distinguishes unsupported Redis commands from other errors through a dedicated observer path. The Redis adapter now routes unsupported commands to a specialized handler that integrates with the monitoring layer to emit a bounded-cardinality Prometheus counter with normalized, truncated command labels.

Changes

Cohort / File(s) Summary
Core Implementation
monitoring/redis.go, adapter/redis.go
Added Unsupported field to RedisRequestReport, implemented observeRedisUnsupported method, and introduced a new Prometheus counter elastickv_redis_unsupported_commands_total with command-label deduplication and truncation logic; labels are bounded by mapping excess commands to an "other" bucket.
Test Coverage
adapter/redis_metrics_test.go
Extended Redis metrics tests to validate unsupported command tracking, including label cardinality bounds, command name normalization and truncation, counter aggregation, and concurrent-call correctness.

Sequence Diagram

sequenceDiagram
    participant Client
    participant RedisServer
    participant Monitoring
    participant Prometheus

    Client->>RedisServer: Send unsupported command
    RedisServer->>RedisServer: Detect unsupported command
    RedisServer->>Monitoring: observeRedisUnsupported(command, duration)
    Monitoring->>Monitoring: Normalize & uppercase command
    Monitoring->>Monitoring: Truncate label to max length
    Monitoring->>Monitoring: Check command cardinality bound
    alt Within cardinality limit
        Monitoring->>Monitoring: Add to tracked commands set
    else Exceeds cardinality limit
        Monitoring->>Monitoring: Map to "other" bucket
    end
    Monitoring->>Prometheus: Emit elastickv_redis_unsupported_commands_total
    Prometheus->>Prometheus: Increment counter with command label
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Hops with glee!
Unsupported commands now dance in their own spotlight,
No longer lost in the error shadows,
Bounded labels keep the chaos at bay,
Prometheus counts them all with precision! 📊✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introducing a new Prometheus metric to expose unsupported Redis command names with bounded cardinality.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch obs/redis-unsupported-command-names

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@monitoring/redis.go`:
- Around line 217-237: The truncation in RedisMetrics.observeUnsupportedCommand
uses byte slicing which can split multi-byte UTF-8 sequences and cause
Prometheus to panic; fix by first ensuring name is valid UTF-8 (e.g., replace
invalid sequences or convert bytes to a valid string using utf8/bytes.Runes) and
then truncate on rune boundaries to maxUnsupportedCommandLabelLen (counting
runes, not bytes), preserve the existing redisCommandUnknown fallback and the
logic around m.unsupportedNames and m.unsupportedCommands.WithLabelValues;
update the truncation at the place where name is shortened (originally name =
name[:maxUnsupportedCommandLabelLen]) to perform UTF-8-safe rune truncation
before labeling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 372c888a-215c-4d3f-928f-54941d65bc28

📥 Commits

Reviewing files that changed from the base of the PR and between 306eb0b and 91e16e9.

📒 Files selected for processing (3)
  • adapter/redis.go
  • adapter/redis_metrics_test.go
  • monitoring/redis.go

Comment thread monitoring/redis.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91e16e9a5d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread monitoring/redis.go Outdated
Comment on lines +222 to +223
if len(name) > maxUnsupportedCommandLabelLen {
name = name[:maxUnsupportedCommandLabelLen]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve UTF-8 when truncating unsupported command labels

Truncating name by raw bytes can split a multibyte rune, producing invalid UTF-8 label values; prometheus.(*CounterVec).WithLabelValues then panics during unsupported-command observation. A client can trigger this by sending an unsupported command name longer than 64 bytes containing non-ASCII characters (for example, 63 ASCII bytes plus é), which makes the Redis request path panic instead of returning an error metric. Please truncate by rune boundary (or validate/re-encode) before calling WithLabelValues.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new metric, elastickv_redis_unsupported_commands_total, to track unsupported Redis commands with bounded cardinality and label length. The implementation includes a mechanism to collapse excess command names into an "other" bucket and truncate long command names. Feedback was provided to address a potential Denial of Service (DoS) vulnerability by truncating raw input before performing expensive string operations and to ensure that string truncation does not result in invalid UTF-8 sequences.

Comment thread monitoring/redis.go
Comment on lines +217 to +224
func (m *RedisMetrics) observeUnsupportedCommand(raw string) {
name := strings.ToUpper(strings.TrimSpace(raw))
if name == "" {
name = redisCommandUnknown
}
if len(name) > maxUnsupportedCommandLabelLen {
name = name[:maxUnsupportedCommandLabelLen]
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The current truncation logic uses byte-slicing, which can cut through multi-byte UTF-8 characters and produce invalid strings, potentially causing Prometheus scraping to fail. Additionally, performing operations on the full input string before truncation presents a DoS vector. This change introduces a defensive limit on the input size to prevent unbounded processing, aligning with repository standards for handling external input to prevent Denial of Service attacks.

func (m *RedisMetrics) observeUnsupportedCommand(raw string) {
	// Defensive truncation to avoid expensive string operations on pathological input.
	if len(raw) > 256 {
		raw = raw[:256]
	}

	name := strings.ToUpper(strings.TrimSpace(raw))
	if name == "" {
		name = redisCommandUnknown
	}

	if len(name) > maxUnsupportedCommandLabelLen {
		name = name[:maxUnsupportedCommandLabelLen]
		// Ensure the truncated string is valid UTF-8.
		for len(name) > 0 && !utf8.ValidString(name) {
			name = name[:len(name)-1]
		}
	}
}
References
  1. When handling external requests or data that can grow from external input, apply fixed bounds or size limits to prevent Denial of Service (DoS) attacks and unbounded memory growth.

Comment thread monitoring/redis.go

import (
"strings"
"sync"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Add unicode/utf8 to the imports to support safe string truncation in observeUnsupportedCommand.

Suggested change
"sync"
"sync"
"unicode/utf8"

…d panic

The 64-byte byte-slice truncation in RedisMetrics.observeUnsupportedCommand
could split a multibyte UTF-8 rune, producing an invalid UTF-8 label value.
prometheus.(*CounterVec).WithLabelValues panics on invalid UTF-8, so a
client sending an unsupported command name >64 bytes containing non-ASCII
bytes could crash the goroutine handling the request.

Fix:
- Truncate on UTF-8 rune boundaries while preserving the 64-byte cap.
- Add an ingress guard that folds commands whose raw bytes are already
  invalid UTF-8 into an "invalid_utf8" sentinel, since strings.ToUpper
  silently rewrites invalid bytes to U+FFFD and would mask the problem.

Tests cover a long multibyte-terminated name (no panic, valid UTF-8,
byte length <= 64), a purely invalid-UTF-8 input (no panic, sentinel
label), and extend the existing length-cap test to assert UTF-8
validity.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

Addressed the CodeRabbit + codex P1 in 8539857.

Bug (attack vector)

RedisMetrics.observeUnsupportedCommand truncated to 64 bytes with name = name[:maxUnsupportedCommandLabelLen], a raw byte slice. If the name contained multibyte UTF-8 runes (e.g. 63 ASCII bytes + é), the cut could land mid-rune and produce invalid UTF-8. prometheus.(*CounterVec).WithLabelValues then panics — not errors — so a Redis client could crash the handling goroutine by sending an unsupported command name >64 bytes containing any non-ASCII byte.

Fix

  • Truncate on UTF-8 rune boundaries, preserving the 64-byte cap (semantics unchanged for ASCII, which is the expected path).
  • Add an ingress guard: if the raw command bytes are not valid UTF-8, fold to a fixed "invalid_utf8" sentinel. The guard runs before strings.ToUpper, because ToUpper silently rewrites invalid bytes to U+FFFD and would mask the problem.

Behaviour

  • Before: 63 ASCII + é... crashed the goroutine via prometheus panic. Raw binary blob as a command name did the same.
  • After: no panic in either case. Valid-UTF-8 inputs are truncated on a rune boundary with byte length <= 64; invalid-UTF-8 inputs are folded to invalid_utf8.

Tests (adapter/redis_metrics_test.go)

  • TestRedisMetricsUnsupportedCommandTruncatesOnRuneBoundary — 63 'A' + many 'é', asserts NotPanics, len <= 64, utf8.ValidString.
  • TestRedisMetricsUnsupportedCommandRejectsInvalidUTF8 — raw 0xff 0xfe 0xfd 0xc3 0x28, asserts sentinel label emitted.
  • TestRedisMetricsUnsupportedCommandTruncatesLongName — now also asserts utf8.ValidString on the label.

go test -race ./monitoring/... ./adapter/... passes for the redis-metrics tests (pre-existing Test_grpc_transaction flake on this branch is unrelated — same failure occurs with our change reverted). golangci-lint run ./monitoring/... ./adapter/... reports 0 issues.

/gemini review
@codex review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new metric, "elastickv_redis_unsupported_commands_total", to track Redis commands rejected as unsupported. The implementation includes robust safeguards such as cardinality capping, length truncation on rune boundaries, and UTF-8 validation to ensure stability and prevent metric explosion. The review feedback suggests optimizing performance under high load by switching to a "sync.RWMutex" for a fast-path check of previously seen commands and simplifying the truncation logic.

Comment thread monitoring/redis.go Outdated
errorsTotal *prometheus.CounterVec
unsupportedCommands *prometheus.CounterVec

unsupportedMu sync.Mutex
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To reduce lock contention on the error path, consider using a sync.RWMutex instead of a sync.Mutex. While this is an error path, it is externally triggerable and can be flooded by a malicious or misbehaving client. Once the cardinality cap is reached or the set of common unsupported commands is established, most requests will only need a read lock to verify the command name exists in the map.

Suggested change
unsupportedMu sync.Mutex
unsupportedMu sync.RWMutex

Comment thread monitoring/redis.go Outdated
Comment on lines +245 to +257
b := 0
cut := len(name)
for i, r := range name {
size := utf8.RuneLen(r)
if size < 0 {
size = 1
}
if b+size > maxUnsupportedCommandLabelLen {
cut = i
break
}
b += size
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The truncation logic can be simplified. The variable b is redundant because the range loop index i already provides the byte offset of the current rune. Additionally, since the string has been validated as UTF-8 (or is a known safe sentinel), utf8.RuneLen will return a positive value, making the negative check unnecessary.

cut := len(name)
		for i, r := range name {
			if i+utf8.RuneLen(r) > maxUnsupportedCommandLabelLen {
				cut = i
				break
			}
		}
		name = name[:cut]

Comment thread monitoring/redis.go Outdated
Comment on lines +261 to +269
label := redisUnsupportedCommandOther
m.unsupportedMu.Lock()
if _, seen := m.unsupportedNames[name]; seen {
label = name
} else if len(m.unsupportedNames) < maxUnsupportedCommandLabels {
m.unsupportedNames[name] = struct{}{}
label = name
}
m.unsupportedMu.Unlock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Leverage the RWMutex to implement a fast-path check for already-seen commands. This avoids acquiring a write lock for every unsupported command request once the command name is known to the monitoring layer.

label := redisUnsupportedCommandOther
	m.unsupportedMu.RLock()
	_, seen := m.unsupportedNames[name]
	m.unsupportedMu.RUnlock()
	if seen {
		label = name
	} else {
		m.unsupportedMu.Lock()
		if _, seen := m.unsupportedNames[name]; seen {
			label = name
		} else if len(m.unsupportedNames) < maxUnsupportedCommandLabels {
			m.unsupportedNames[name] = struct{}{}
			label = name
		}
		m.unsupportedMu.Unlock()
	}

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 85398572e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/redis.go Outdated
r.traceCommandError(conn, name, cmd.Args[1:], "unsupported")
conn.WriteError("ERR unsupported command '" + string(cmd.Args[0]) + "'")
r.observeRedisError(name, time.Since(start))
r.observeRedisUnsupported(name, time.Since(start))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pass raw command text to unsupported-command observer

Run forwards the already-normalized name to observeRedisUnsupported, but name was built with strings.ToUpper(string(cmd.Args[0])) before routing. For malformed/binary command names, strings.ToUpper rewrites invalid UTF-8 into replacement runes, so observeUnsupportedCommand in monitoring/redis.go never sees invalid UTF-8 and cannot use the invalid_utf8 sentinel path introduced in this commit. Under hostile traffic this consumes real-label slots with synthetic �... values and can crowd out genuine unsupported command names.

Useful? React with 👍 / 👎.

The Redis adapter's Run() loop uppercased the command name with
strings.ToUpper before passing it to the unsupported-command observer.
ToUpper silently rewrites invalid UTF-8 bytes into the U+FFFD
replacement character, so by the time the observer runs its
utf8.ValidString check the value is already synthetically valid and the
invalid_utf8 sentinel path is unreachable. A hostile client sending
binary garbage as a command name could therefore burn through the
bounded distinct-name slots with replacement-character garbage,
displacing legitimate unsupported-command labels.

Pass the raw bytes of cmd.Args[0] to observeRedisUnsupported and
document the contract on both sides (adapter and monitoring). The
routing path still uses the existing uppercased name; only the metric
path takes the raw form so utf8.ValidString runs against the original
bytes.

Adds TestRedisMetricsUnsupportedCommandRawInvalidUTF8Bytes which feeds
three 0xff bytes through the observer and asserts the sentinel label
is incremented.
bootjp added a commit that referenced this pull request Apr 23, 2026
…595)

## Summary

Adds a regression-preventing end-to-end test that pins the observability
contract around `elastickv_redis_errors_total` /
`elastickv_redis_requests_total{outcome="error"}`: commands whose Redis
protocol semantics report a missing key via a **null / 0 / "none"**
reply
must be counted as `outcome="success"`, not as errors.

### Audit

Starting from `grep 'WriteError(err.Error())' adapter/*.go`, I
classified
every call site where `err` could plausibly be `store.ErrKeyNotFound`:

- **(a) Redis spec: null/0/none — FIX if leaked**
  GET, HGET, HMGET, HLEN, HEXISTS, HGETALL, LLEN, LINDEX, LRANGE, LPOP,
  RPOP, SCARD, SMEMBERS, SISMEMBER, ZCARD, ZSCORE, ZRANGE, XLEN, XRANGE,
  EXISTS, TYPE, TTL, PTTL.
  **Result:** already correct in the current codebase. Every handler
  either routes through `keyTypeAt` / `logicalExistsAt` (which return
  `redisTypeNone` / `false` for a missing key, not an error) or
  through helpers like `loadHashAt` / `loadSetAt` / `loadZSetAt` /
  `loadStreamAt` / `listPopClaim` / `hashFieldFastLookup`, all of
  which internalise `store.ErrKeyNotFound` and translate it to the
  caller-visible empty reply. No production-code change was needed.

- **(b) Redis spec: ERR reply — LEAVE AS-IS**
  RENAME (`cmdRename` in `redis_lua_context.go:1339`), LSET
  (`cmdLSet` in `redis_lua_context.go:2131`). Both return
  `errors.New("ERR no such key")` — not `store.ErrKeyNotFound` — so
  the Lua compatibility layer at `redis_lua.go:857` sends the error
  to the client, the metric correctly records `outcome="error"`, and
  the wire-format matches real Redis. RPOPLPUSH / LPOS / etc. route
  through the same Lua layer and inherit the same semantics.

- **(c) Not implemented — no call site**
  DUMP, OBJECT, RESTORE, COPY: no handler exists yet, nothing to audit.

### What changed

`adapter/redis_metrics_missing_key_test.go`: a single new test
(`TestRedisMetrics_MissingKeyNotCountedAsError`) with three subtests
that spin up a 3-node cluster, share a `monitoring.RedisRequestObserver`
across every node (so `proxyToLeader`-induced routing is captured too),
and:

1. Issue the 21 missing-key commands listed above and assert that
   `elastickv_redis_errors_total` does not move (with a per-command
   diagnostic dump on failure). GET is additionally checked against
   `elastickv_redis_requests_total{outcome="success"}` to guard against
   a regression that silently drops the success sample.
2. Issue RENAME on a missing key and assert the errors counter
   increments by exactly 1.
3. Issue LSET on a missing key and assert the errors counter
   increments by exactly 1.

The `monitoring/redis.go` / `RedisRequestReport` struct is left
untouched per the coordination note in the task brief (PR #594 owns
that file).

## Test plan

- [x] `go test -race -count=1 -run
'^TestRedisMetrics_MissingKeyNotCountedAsError$' ./adapter/` passes.
- [x] `go test -race -count=1 ./adapter/` passes (394 → 395 tests, +1
new with 3 subtests).
- [x] `golangci-lint run ./adapter/...` is clean.
- [x] No production-code file modified; behaviour change is zero.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Tests
* Added test suite validating Redis metrics behavior for missing key
operations, ensuring commands like GET, HGET, HLEN, LLEN, and ZCARD
don't incorrectly count as errors in metrics while verifying actual
errors are properly tracked.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
…path

Three review items addressed at once because they all touch
observeUnsupportedCommand in monitoring/redis.go:

1. Security: cap the raw input length to 256 bytes BEFORE any O(n)
   string operations (utf8.ValidString, strings.ToUpper, rune
   iteration). A hostile client sending a multi-megabyte first
   argument would otherwise burn CPU on every unsupported-command
   observation. Cutting mid-rune is safe here: the UTF-8 validity
   check below rejects the result (producing the invalid_utf8
   sentinel), or the rune-aware truncation further down trims to a
   clean boundary.

2. Simpler rune truncation: replace the manual byte accumulator with
   the range-loop byte index. Since the upstream UTF-8 validity check
   guarantees utf8.RuneLen never returns -1, the redundant helper and
   its negative-size guard can go.

3. RWMutex fast path: switch unsupportedMu from sync.Mutex to
   sync.RWMutex. Observations for already-admitted names take a read
   lock and return without contending on a write lock; only novel
   names acquire the write lock, with a double-check to avoid races
   between RUnlock and Lock.

New tests:
- TestRedisMetricsUnsupportedCommandRejectsPathologicalLength: 1 MiB
  input must not panic and must still emit a 64-byte-capped valid-UTF-8
  label.
- TestRedisMetricsUnsupportedCommandRWMutexFastPath: 32 workers x 500
  iterations hitting a mix of pre-seeded and novel names, exercised
  under go test -race.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 23, 2026

Addressed the five review items from the prior round.

db4fa30 — fix(obs): preserve raw command bytes to reach invalid_utf8 sentinel
Codex P2. The adapter's Run() loop was handing the already-strings.ToUpper'd name to observeRedisUnsupported. strings.ToUpper silently rewrites invalid UTF-8 bytes to U+FFFD, so by the time observeUnsupportedCommand ran utf8.ValidString the value was synthetically "valid" and the invalid_utf8 sentinel path was unreachable — a hostile client sending binary garbage could burn through the 32 distinct-name slots with replacement-char strings. Now the adapter passes raw string(cmd.Args[0]); ToUpper/TrimSpace happen inside the observer, AFTER the UTF-8 check. Contract documented on both sides. Added TestRedisMetricsUnsupportedCommandRawInvalidUTF8Bytes.

cc41633 — perf(obs): cap raw input, simplify rune truncation, add RWMutex fast path
Three items bundled because they all touch observeUnsupportedCommand:

  • Gemini HIGH (security): early if len(raw) > 256 { raw = raw[:256] } before any O(n) string op. Cutting mid-rune is fine; the downstream utf8.ValidString or rune-aware truncation handles it. Guards against multi-megabyte first-argument CPU abuse.
  • Gemini Medium (simplify): rune-truncation loop now uses the range-loop byte index i and utf8.RuneLen(r) directly; the manual accumulator and negative-size guard are gone since the upstream UTF-8 validity check makes them dead code.
  • Gemini Medium (lock): unsupportedMu is now sync.RWMutex. Already-admitted names take the read lock and return; only novel names acquire the write lock with a double-check to avoid the RUnlock→Lock race.

unicode/utf8 was already imported (Item 5 no-op).

New tests: TestRedisMetricsUnsupportedCommandRejectsPathologicalLength (1 MiB input, no panic, label still 64-byte-capped valid UTF-8) and TestRedisMetricsUnsupportedCommandRWMutexFastPath (32 workers x 500 iters, mixed fast/slow paths, runs clean under -race).

Local verification:

  • go test -race -timeout 60s -run TestRedisMetrics ./adapter/... — pass.
  • go test -race ./monitoring/... — pass.
  • golangci-lint run ./monitoring/... ./adapter/... — 0 issues.

/gemini review
@codex review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new metric, elastickv_redis_unsupported_commands_total, to track the specific names of unsupported Redis commands while maintaining bounded cardinality. The implementation includes robust defensive measures against malformed or malicious input, such as raw byte length capping, UTF-8 validation to prevent Prometheus panics, and rune-aware truncation. It also utilizes a thread-safe mechanism with a read-lock fast path for recording previously seen command names. Comprehensive tests have been added to verify cardinality limits, invalid UTF-8 handling, and concurrent performance. I have no feedback to provide.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@bootjp bootjp enabled auto-merge April 23, 2026 16:06
@bootjp bootjp merged commit 830c3e5 into main Apr 23, 2026
7 checks passed
@bootjp bootjp deleted the obs/redis-unsupported-command-names branch April 23, 2026 16:13
bootjp added a commit that referenced this pull request Apr 23, 2026
Codex flagged that the "Unsupported Commands by Name" (Redis summary) and
"FSM Apply Sync Mode" (Pebble internals) panels query metrics not yet on
main, so during an incident operators could mistake the empty panels for
a scrape failure. Append a sentence to each panel's description calling
out the dependency and that the panel will populate automatically once
PR #594 / PR #592 merge.
bootjp added a commit that referenced this pull request Apr 23, 2026
…ctor Redis/Cluster dashboards (#597)

## Summary

- Add **Elastickv Pebble Internals** dashboard covering block cache hit
rate/capacity, L0 pressure, compactions, memtables, FSM apply sync mode,
and store write conflicts.
- Add **Unsupported Commands by Name** panel to the Redis dashboard
(driven by PR #594 `elastickv_redis_unsupported_commands_total`) and
annotate the legacy **Errors by Command** panel so operators know the
`unknown` bucket is broken down by real name in the new panel.
- Fold the standalone Redis hot-path (PR #560) dashboard into a
collapsed row at the bottom of the Redis summary dashboard, delete the
standalone file, and retitle the summary to **Elastickv Redis** so there
is a single Redis-facing dashboard.
- Rename the misnamed `elastickv-cluster-summary.json` (which was
already a pure DynamoDB dashboard) to `elastickv-dynamodb.json` /
**Elastickv DynamoDB** with `uid=elastickv-dynamodb`, and retitle
`elastickv-cluster-overview.json` to **Elastickv Cluster** now that it
is the sole cluster landing page.

## Before / After file list

| State | File |
| --- | --- |
| Added |
`monitoring/grafana/dashboards/elastickv-pebble-internals.json` |
| Renamed | `elastickv-cluster-summary.json` to
`elastickv-dynamodb.json` |
| Retitled | `elastickv-cluster-overview.json` (title now `Elastickv
Cluster`) |
| Modified | `elastickv-redis-summary.json` (title `Elastickv Redis`;
added Unsupported Commands panel and a collapsed Hot Path row) |
| Deleted | `monitoring/grafana/dashboards/elastickv-redis-hotpath.json`
|
| Unchanged | `monitoring/grafana/dashboards/elastickv-raft-status.json`
|

## What each dashboard now covers

- **Elastickv Cluster** - leader identity, cluster-wide latency/error
posture, per-node Raft health, and an embedded Redis row.
- **Elastickv DynamoDB** - per-operation throughput/latency and
per-table error/volume breakdown for the DynamoDB-compatible API.
- **Elastickv Raft Status** - unchanged: membership, leader changes,
failed proposals, node state, index drift, backlog, leader contact.
- **Elastickv Redis** - per-command throughput/latency/errors,
unsupported-commands breakdown, secondary-write and Lua diagnostics, and
a collapsible `Hot Path (legacy PR #560)` row with the 9 GET-fast-path
panels preserved verbatim.
- **Elastickv Pebble Internals** - block cache hit rate and
size-vs-capacity, hits/misses rate, L0 sublevels/files with write-stall
thresholds, compaction debt/progress/rate, memtable count/size/zombies,
FSM apply sync mode, store write conflicts by prefix.

## Dead-metric warnings

Two metrics referenced by panels in this PR are **not yet present in
`monitoring/*.go` on `main`**:

- `elastickv_redis_unsupported_commands_total` (PR #594) - the spec
asked for a panel targeting this and this PR adds one; it will stay
empty until #594 lands.
- `elastickv_pebble_fsm_apply_sync_mode` (PR #592) - same situation;
panel is in place and will populate once #592 lands.

Both are noted here so reviewers can decide whether to sequence this PR
after the upstream ones.

## Test plan

- [x] `python3 -c "import json; json.load(open('<file>'))"` passes on
every modified dashboard file.
- [x] Every dashboard exposes `schemaVersion`, `panels`, `time`,
`title`, `uid`.
- [x] Panel IDs are unique within each dashboard (including nested
hot-path panels).
- [x] No overlapping gridPos cells in the modified Redis dashboard
(checked with a programmatic sweep).
- [x] `go build ./...` succeeds after updating the four source comments
/ doc page that referenced the deleted hotpath JSON by path.
- [x] `go test ./...` passes, including `./monitoring/...`.
- [ ] Import each of the five JSON files into Grafana (10.x,
schemaVersion 41) via UI; every panel renders without errors and the Hot
Path row expands/collapses cleanly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant