perf: fix slow /api/packets and /api/channels on large stores#328
perf: fix slow /api/packets and /api/channels on large stores#328KpaBap merged 3 commits intoKpa-clawbot:masterfrom
Conversation
MeshCore PR Review — #328As the MeshCore PR Reviewer, I have analyzed this pull request based on the project's architectural guidelines and best practices. SummaryThe performance diagnosis is solid: the old However, this PR cannot be merged as-is because it violates Rule 1 (no tests) and has a cache staleness bug that needs a one-line fix. Changes Requested1. No tests — Rule 1 violation (
|
- Add LatestSeen field to StoreTx, maintained in all three observation write paths (load, real-time ingest, poll). Eliminates the per-packet observation scan that was O(total_packets * avg_observations). - Build grouped packet maps under read lock (correct), sort the local copy outside the lock (avoids holding lock during O(n log n) sort). - Cache the full sorted result for 3 seconds keyed by filter params. Repeated requests within the TTL return instantly without re-sorting. Fixes /packets?limit=50000&groupByHash=true taking 16s on large stores. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GetChannels was iterating all payload-type-5 packets and JSON-unmarshaling each one while holding s.mu.RLock(), blocking all concurrent reads. On stores with many channel messages this caused /api/channels to take 13s+. - Copy only the needed fields under the read lock, release before unmarshal - Cache the result for 15 seconds keyed by region param - Invalidate cache on new packet ingestion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
f80974b to
3bf1df5
Compare
…channel cache - TestLatestSeenMaintained: verifies StoreTx.LatestSeen is set >= FirstSeen and >= all observation timestamps after store load - TestQueryGroupedPacketsSortedByLatest: verifies packets with more-recent observations sort before packets with newer first_seen but older observations - TestQueryGroupedPacketsCacheReturnsConsistentResult: verifies cache returns consistent total and ordering on back-to-back calls - TestGetChannelsCacheReturnsConsistentResult: verifies GetChannels cache returns same channel names on repeated calls - TestGetChannelsNotBlockedByLargeLock: verifies GetChannels returns correct data (channel name, messageCount) after lock-copy-unmarshal refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Review feedback addressed (commit `026e0ac`) Added tests per AGENTS.md Rule #1:
All pass: |
Kpa-clawbot
left a comment
There was a problem hiding this comment.
Code Review: perf fix for grouped packets and channels
Tests: All pass (go test ./... — 3.75s) ✅
What was reviewed
Read the full diff for store.go (+360/-79) and routes_test.go (+206) against master.
Algorithmic changes — Correct ✅
LatestSeen maintenance: Added to all three write paths (Load, IngestNewFromDB, IngestNewObservations) with consistent if obs.Timestamp > tx.LatestSeen logic. Initialized to FirstSeen on creation. This eliminates the O(observations) scan per packet at query time. The invariant (LatestSeen ≥ FirstSeen, LatestSeen ≥ max(obs.Timestamp)) is well-tested.
Lock scope reduction: Both QueryGroupedPackets and GetChannels now copy data under the read lock and do expensive work (sort / JSON unmarshal) outside it. This is the right pattern — no correctness issues since the copied data is owned by the goroutine after unlock.
Caching: Grouped packets get a 3s TTL cache keyed by all filter dimensions. Channels get a 15s TTL cache invalidated on ingestion. Both are sensible choices — grouped cache's short TTL makes explicit invalidation unnecessary, and channels cache IS invalidated on new packet ingestion in both IngestNewFromDB and IngestNewObservations.
Edge cases — Handled ✅
- Empty store:
filterPacketsreturns empty slice → empty entries → cache stores empty result → pagination returns{Packets: [], Total: 0} - Single packet with no observations:
LatestSeen = FirstSeen(set at creation), sort works fine - Offset beyond total:
pagePacketResultreturns empty slice with correct total - Cache key covers all filter dimensions including optional pointer fields (Type, Route)
One minor observation (non-blocking)
pagePacketResult returns a sub-slice of the cached PacketResult.Packets. Since callers only JSON-serialize the result, there's no mutation risk. But if a future caller ever modified a returned map in-place, it would corrupt the cache. A comment noting this shared-reference would be defensive, but it's fine as-is given the current usage.
Test coverage — Good ✅
5 new tests covering:
TestLatestSeenMaintained— invariant after LoadTestQueryGroupedPacketsSortedByLatest— sort order correctness (old first_seen with recent obs sorts before new first_seen with old obs)TestQueryGroupedPacketsCacheReturnsConsistentResult— cache consistencyTestGetChannelsCacheReturnsConsistentResult— cache consistencyTestGetChannelsNotBlockedByLargeLock— lock-copy pattern produces correct results
Code style — Consistent ✅
Follows existing patterns (separate cache mutex, same naming conventions, same map building style).
LGTM — clean performance fix with good test coverage.
Problem
Two endpoints were slow on larger installations:
/packets?limit=50000&groupByHash=true— 16s+QueryGroupedPacketsdid two expensive things on every request:latesttimestamps.mu.RLock()during the O(n log n) sort, blocking all concurrent reads/channels— 13s+GetChannelsiterated all payload-type-5 packets and JSON-unmarshaled each one while holdings.mu.RLock(), blocking all concurrent reads for the full duration.Fix
Packets (
QueryGroupedPackets):LatestSeen stringtoStoreTx, maintained incrementally in all three observation write paths. Eliminates the per-packet observation scan at query time.Channels (
GetChannels):Test plan
[SLOW API]warnings gone for both endpoints🤖 Generated with Claude Code