Skip to content

feat: add persistence for latency events and memory stats#42

Merged
KIvanow merged 11 commits intomasterfrom
improved-persistance
Mar 9, 2026
Merged

feat: add persistence for latency events and memory stats#42
KIvanow merged 11 commits intomasterfrom
improved-persistance

Conversation

@KIvanow
Copy link
Copy Markdown
Member

@KIvanow KIvanow commented Mar 8, 2026

  • Add storage, polling, API endpoints, and frontend support for two new persisted data sources: latency snapshots (LATENCY LATEST) and memory snapshots (MEMORY STATS). Both poll at 60s intervals and support 7-day retention via the existing data-retention mechanism.

Note

Medium Risk
Adds new background pollers plus new DB tables/queries across Postgres/SQLite/in-memory storage, so schema/migration and retention behavior could impact production data and performance if incorrect.

Overview
Adds persisted latency and memory analytics to the API and UI.

On the backend, introduces LatencyAnalyticsModule and MemoryAnalyticsModule with 60s multi-connection pollers that store LATENCY LATEST snapshots + LATENCY HISTOGRAM data and periodic memory snapshots (including ops/sec and derived CPU deltas), exposes new read endpoints (/latency-analytics/*, /memory-analytics/snapshots), and extends StoragePort plus the Postgres/SQLite/memory adapters (new tables + queries + pruning; also drops legacy unique constraints for slow/command logs).

On the frontend, adds API clients/types and a date-range filter that switches Dashboard and Latency views from live polling to fetching stored snapshots/histograms; also fixes time-filter refetching to be connection-aware in SlowLog.

Written by Cursor Bugbot for commit e015d58. This will update automatically on new commits. Configure here.

 - Add storage, polling, API endpoints, and frontend support for two new persisted data sources: latency snapshots (LATENCY  LATEST) and memory snapshots (MEMORY STATS). Both poll at 60s intervals and support 7-day retention via the existing data-retention mechanism.
…unit tests

 - Extend MemoryStats interface with optional fields (usedMemoryRss, memFragmentationRatio, maxmemory, allocatorFragRatio) to eliminate unsafe as any casts. Add parseOptionalInt to both analytics  controllers to reject non-numeric query params with 400 errors.
   - Hydrate latency dedup state from stored snapshots on startup to prevent duplicate insertions after restart. Add 33 unit tests covering both services and controllers.
…x memory chart

- Fix memory chart showing zeros by switching from MEMORY STATS (broken
  dotted-key access) to INFO memory for all fields.
- Fix multi-section INFO calls by spreading args instead of joining.
- Extend memory snapshots with opsPerSec, cpuSys, cpuUser; wire up
  OpsChart and CpuChart to use stored data when date-filtered.
- Add latency histogram persistence (new table + adapters) so command
  latency charts populate from stored data when filtering.
- Add currentConnection?.id to useEffect deps for proper refetch on
  connection change.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Int, guard latency polling with RuntimeCapabilityTracker
…ored data on connection switch, add histogram tests
this.storage.pruneOldLatencyHistograms(cutoffTimestamp, connectionId),
]);
return snapshots + histograms;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latency service pruneOldEntries returns sum but tests expect snapshot-only count

Low Severity

LatencyAnalyticsService.pruneOldEntries returns snapshots + histograms (combined count from both prune operations), but data-retention.service.ts already calls pruneOldLatencySnapshots and pruneOldLatencyHistograms separately on the storage layer. This means the data-retention service's latency_snapshots and latency_histograms entries in pruneOps won't use pruneOldEntries at all — they call storage directly. The pruneOldEntries method on the service is effectively unused by the retention system.

Additional Locations (1)

Fix in Cursor Fix in Web

…ined for API query params, fix pruneOldEntries test

protected async pollConnection(ctx: ConnectionContext): Promise<void> {
try {
const info = await ctx.client.getInfoParsed();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory polling fetches all INFO sections unnecessarily

Low Severity

pollConnection calls ctx.client.getInfoParsed() without specifying sections, which fetches the entire Redis INFO output (server, clients, memory, persistence, stats, replication, cpu, modules, keyspace, cluster, commandstats, errorstats, latencystats). Only memory, stats, and cpu are used. Since this runs every 60 seconds for every connection, it adds unnecessary network and Redis overhead. Passing ['memory', 'stats', 'cpu'] to getInfoParsed would reduce the response size significantly.

Fix in Cursor Fix in Web

…nstead of relying on DEFAULT gen_random_uuid(). This makes them consistent with the SQLite and

  memory adapters.
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

}
} catch (error) {
this.logger.error(`Error capturing latency histogram for ${ctx.connectionName}: ${error instanceof Error ? error.message : 'Unknown error'}`);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Histogram errors log at ERROR level every poll cycle

Medium Severity

The getLatencyHistogram() call (Redis 7+ only via LATENCY HISTOGRAM) catches errors and logs at logger.error level, but unlike the getLatestLatencyEvents handler, it never calls runtimeCapabilityTracker.recordFailure. For Redis/Valkey instances pre-7.0 that support LATENCY LATEST but not LATENCY HISTOGRAM, this produces an error-level log message every 60 seconds indefinitely, since the capability is never disabled. The events handler correctly integrates with the tracker to eventually suppress polling after repeated failures, but the histogram handler lacks this same mechanism.

Additional Locations (1)

Fix in Cursor Fix in Web

@KIvanow KIvanow merged commit 1b95ebf into master Mar 9, 2026
3 checks passed
@KIvanow KIvanow deleted the improved-persistance branch March 9, 2026 08:52
@github-actions github-actions bot locked and limited conversation to collaborators Mar 9, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant