Skip to content

feature: slowlog FT.* fix, vector index health metrics, commandstats time-series#111

Merged
KIvanow merged 10 commits intomasterfrom
feature/vector-and-commandstats-backend
Apr 21, 2026
Merged

feature: slowlog FT.* fix, vector index health metrics, commandstats time-series#111
KIvanow merged 10 commits intomasterfrom
feature/vector-and-commandstats-backend

Conversation

@jamby77
Copy link
Copy Markdown
Collaborator

@jamby77 jamby77 commented Apr 17, 2026

Project items addressed

These draft items from the project board:

Item (databaseId) Status after this PR Files
Normalize FT. commands in COMMANDLOG pattern analyzer* (175675683) Fully delivered apps/api/src/metrics/slowlog-analyzer.ts, apps/api/src/metrics/__tests__/slowlog-analyzer.spec.ts
Poll vector indexes and persist metrics (175680207) Fully delivered (extends existing VectorSearchService rather than creating a parallel poller) packages/shared/src/types/vector-index-snapshots.ts, apps/api/src/common/types/metrics.types.ts, apps/api/src/database/parsers/vector-index.parser.ts, apps/api/src/database/parsers/vector-index.parser.spec.ts, apps/api/src/vector-search/vector-search.service.ts, apps/api/src/vector-search/vector-search.module.ts, apps/api/src/vector-search/__tests__/vector-search.service.spec.ts, apps/api/src/prometheus/prometheus.service.ts
"Vector / AI" tab in the monitor UI (175680694) Backend slice only — commandstats time-series that the tab's FT.SEARCH ops/sec and avg-latency charts consume. The tab itself is in PR #112. apps/api/src/metrics/commandstats-parser.ts, apps/api/src/metrics/__tests__/commandstats-parser.spec.ts, apps/api/src/metrics/commandstats-poller.service.ts, apps/api/src/metrics/__tests__/commandstats-poller.service.spec.ts, apps/api/src/metrics/commandstats.controller.ts, apps/api/src/metrics/__tests__/commandstats.controller.spec.ts, apps/api/src/metrics/metrics.module.ts

Shared between vector-index-snapshot extension and commandstats (touched by both items because the changes land in the same files):

  • apps/api/src/common/interfaces/storage-port.interface.ts — new StoredCommandStatsSample + CommandStatsHistoryQueryOptions types and three new StoragePort methods, alongside the extended VectorIndexSnapshot shape re-exported from @betterdb/shared
  • apps/api/src/storage/adapters/sqlite.adapter.ts, apps/api/src/storage/adapters/postgres.adapter.ts, apps/api/src/storage/adapters/memory.adapter.ts — new command_stats_samples table plus seven additional columns on vector_index_snapshots; migrations are idempotent (addColumnIfMissing / ADD COLUMN IF NOT EXISTS)
  • apps/api/src/storage/adapters/__tests__/vector-index-snapshots.spec.ts, apps/api/src/storage/adapters/__tests__/commandstats-samples.spec.ts — adapter-level round-trip specs for both features

Not tied to a project item:

  • 816c747 (vector-search.service.ts, __tests__/vector-search.service.spec.ts) — stale-label fix for the four betterdb_vector_index_* gauges when the index list becomes empty; flagged by Cursor Bugbot on this PR
  • cd8e63c, cbe519b, 3f8a52d — housekeeping: spec relocation into metrics/__tests__/, human-readable duration comments on the commandstats poller, and @ApiQuery OpenAPI documentation for the new history endpoint

Summary

Backend work for the upcoming Vector / AI monitor tab, plus an isolated slowlog bug fix.

  • Slowlog FT. normalization* (402c1d6) — exclude FT.* commands from byKeyPrefix aggregation (index names aren't keyspace keys), and sanitize example args with non-UTF-8 content, control characters, or >200 char length to <blob> to keep binary PARAMS data out of the UI
  • VectorIndexInfo extensions (12aee54) — surface numDeletedDocs and totalIndexingTime from FT.INFO (fields are RediSearch-specific; default to 0 on Valkey Search)
  • VectorIndexSnapshot extensions (df72120) — persist numRecords, numDeletedDocs, indexingFailures, indexingFailuresDelta, percentIndexed, indexingState, totalIndexingTime alongside existing fields. Idempotent ALTER migrations across SQLite / Postgres / Memory adapters. Delta is clamped at 0 to handle FT.DROPINDEX + recreate as a fresh baseline
  • Prometheus vector index gauges (e475289) — betterdb_vector_index_docs, _memory_bytes, _indexing_failures, _percent_indexed labeled by connection + index. Stale labels are removed when an index disappears between polls
  • Commandstats delta poller + history endpoint (c68a23c) — new CommandstatsPollerService polls INFO commandstats every 15 s, establishes a baseline on first poll, persists per-command deltas thereafter, and re-baselines on counter resets (current < previous). GET /metrics/commandstats/:command/history serves those deltas so the UI can derive ops/sec and avg latency client-side. New command_stats_samples table across the three adapters

Why this is one PR

Items A (slowlog fix), B (vector health metrics), and C (commandstats) are structurally different, but bundled because they share the same poller/adapter/Prometheus patterns and together unlock the follow-up Vector / AI tab PR without leaving backend-only half-state on master.

Test plan

  • Unit specs for slowlog analyzer, parser, snapshot round-trip, poller delta math, controller query shape — 125 new tests, all green
  • Integration tests (pnpm exec jest) — 1038 pass, 1 pre-existing failure in proprietary/licenses unrelated to these changes
  • Live smoke against Valkey with Search module loaded:
    • /vector-search/indexes/:name returns numDeletedDocs and totalIndexingTime (0 on Valkey Search as expected)
    • Snapshots round-trip all extended fields through SQLite
    • /metrics/commandstats/ft.search/history after 30 real FT.SEARCH calls: { callsDelta: 30, usecDelta: 1776, intervalMs: 15005 }; zero-delta commands are dropped from the batch
    • /prometheus/metrics exposes all four betterdb_vector_index_* gauges with connection + index labels
    • /metrics/slowlog/patterns shows zero FT.* entries in byKeyPrefix; a real FT.SEARCH with PARAMS blob renders as ['FT.SEARCH', 'idx:...', '...', 'DIALECT', '2', 'PARAMS', '2', 'vec', '<blob>']
    • Follow-up manual verification after stale-label fix: FT.DROPINDEX idx:products_vec_flat + 40 s poll → that index's labels disappear from all four betterdb_vector_index_* gauges; the other index's labels remain
    • Counter-reset safety: CONFIG RESETSTAT → next poll writes no new commandstats sample, logs commandstats counter reset on <connection>, re-baselining

Note

Medium Risk
Adds new polling services, a new metrics API endpoint, and schema migrations across all storage adapters (new table + altered vector_index_snapshots), which could affect storage compatibility and runtime load if misconfigured.

Overview
Adds a new command-level time-series pipeline: CommandstatsPollerService polls INFO commandstats, computes per-interval deltas (with re-baselining on counter resets), persists samples via new StoragePort methods, and exposes GET /metrics/commandstats/:command/history for querying those deltas.

Extends vector index monitoring end-to-end by parsing additional FT.INFO fields (num_deleted_docs, total_indexing_time), persisting richer VectorIndexSnapshot records (including indexing failure deltas), and exporting per-(connection,index) Prometheus gauges with stale-label cleanup.

Improves slowlog pattern output by excluding FT.* commands from key-prefix aggregation and sanitizing example command arguments (long/binary/control chars replaced with <blob>).

Reviewed by Cursor Bugbot for commit 8ec8631. Bugbot is set up for automated code reviews on this repo. Configure here.

jamby77 added 5 commits April 17, 2026 10:58
- Skip FT.* commands when building byKeyPrefix aggregation (index
  names are not keyspace keys, so prefix grouping is meaningless)
- Replace examples[].fullCommand args with <blob> when they exceed
  200 chars, contain the Unicode replacement character, or contain
  non-printable control characters (binary PARAMS blobs in FT.SEARCH)
- Preserve FT.SEARCH pattern grouping and non-FT prefix aggregation
Extend VectorIndexInfo and parseVectorIndexInfo to surface
numDeletedDocs and totalIndexingTime so they can be persisted for
alerting on deleted-doc growth and indexing progress.
…elta

- Add numRecords, numDeletedDocs, indexingFailures, indexingFailuresDelta,
  percentIndexed, indexingState, totalIndexingTime to VectorIndexSnapshot
- Extend SQLite/Postgres/Memory adapters with schema and idempotent
  ALTER migrations for the new columns
- Compute indexingFailuresDelta per (connectionId, indexName) across
  consecutive polls; clamp negative deltas (e.g. after FT.DROPINDEX) to 0
- Clean up poller state when a connection is removed
- Register betterdb_vector_index_docs, betterdb_vector_index_memory_bytes,
  betterdb_vector_index_indexing_failures, betterdb_vector_index_percent_indexed
  gauges labeled by connection and index
- Track current index labels per connection and remove stale labels
  when an index disappears between polls
- Call updateVectorIndexMetrics from VectorSearchService.pollConnection
  after persisting snapshots
- Parse INFO commandstats section into per-command calls/usec samples
- CommandstatsPollerService polls every 15s, establishes a baseline
  on the first poll, persists deltas thereafter, and re-baselines on
  counter reset (current < previous)
- New command_stats_samples table + saveCommandStatsSamples /
  getCommandStatsHistory / pruneOldCommandStatsSamples on StoragePort
  with SQLite, Postgres, and Memory adapter implementations
- GET /metrics/commandstats/:command/history returns delta samples
  within a time window; client-side derives ops/sec and avg latency
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c68a23c. Configure here.

Comment thread apps/api/src/vector-search/vector-search.service.ts
pollConnection previously returned early when getVectorIndexList
returned an empty array or every getVectorIndexInfo call failed,
leaving previously-registered Prometheus gauge labels indefinitely
stale. Call updateVectorIndexMetrics with an empty array before the
early returns so the label-reconciliation path runs.

Reported by Cursor Bugbot on PR #111.
jamby77 added a commit that referenced this pull request Apr 17, 2026
- New /vector-ai route gated on hasVectorSearch capability
- Sidebar nav entry alongside existing Vector Search tab
- FT.SEARCH ops/sec and avg latency charts from commandstats
  delta samples (client-side derivation: callsDelta/intervalSec,
  usecDelta/callsDelta)
- Index health table with destructive styling for indexes that
  have hash_indexing_failures, indexing badge with percent progress
- Alert strip surfacing failures, backfilling, and accumulating
  deleted docs
- Extends web-side VectorIndexInfo with numDeletedDocs and
  totalIndexingTime to match backend shape from PR #111
jamby77 added 3 commits April 17, 2026 14:50
Matches the convention used by the other specs in the metrics module
(slowlog-analyzer, commandstats-poller.service, commandstats.controller).
Surfaces from/to/limit semantics (ms timestamps, defaults, caps) in
the generated OpenAPI spec and Swagger UI.
@jamby77 jamby77 requested a review from KIvanow April 17, 2026 12:10
Add the four betterdb_vector_index_* gauges introduced in
e475289 to docs/prometheus-metrics.md so Prometheus-first
readers can discover them without reading the Vector / AI
feature guide.
@KIvanow KIvanow merged commit d56100d into master Apr 21, 2026
3 checks passed
@KIvanow KIvanow deleted the feature/vector-and-commandstats-backend branch April 21, 2026 11:30
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants