fix(node-ui): drop FTS5 log index that bloated node-ui.db to 9 GB#687
Closed
branarakic wants to merge 5 commits into
Closed
fix(node-ui): drop FTS5 log index that bloated node-ui.db to 9 GB#687branarakic wants to merge 5 commits into
branarakic wants to merge 5 commits into
Conversation
Production incident, rc.10 → rc.11 boundary (May 2026): a 12-day-old
testnet edge node accumulated a 9 GB node-ui.db, ~7 GB of which was
the FTS5 shadow tables (logs_fts_data/_idx/_docsize/_config) backing
free-text log search. SQLite eventually returned
"database disk image is malformed" on boot and the daemon refused to
start; recovery required moving node-ui.db aside and starting over
with a fresh DB.
Forensic findings showed the bloat was structural, not operational:
1. /api/logs?q= — the only HTTP consumer of the FTS5 index — had no
production wiring. The dashboard's actual log viewer (LogsTab in
Operations.tsx, PanelBottom live log) reads /api/node-log, which
tails daemon.log directly and supports the same q= substring
filter.
2. fetchLogs() — the client wrapper — was exported from src/ui/api.ts
but never imported by any React component (verified via grep).
Only its own unit test exercised it.
3. StructuredLogger — the "drop-in Logger that also writes to SQLite"
described in SPEC_NODE_DASHBOARD.md — was exported from index.ts
but never substituted for Logger in any daemon code path. The
live log capture went through Logger.setSink → dashDb.insertLog
in lifecycle.ts, which is preserved.
4. prune() ran on a 90-day retention cutoff and never deleted
anything from a 12-day-old DB.
5. FTS5 fragments without periodic optimize, which we never called.
V15 of DashboardDB cleans this up while preserving the one DB-backed
log feature that *is* in use — per-operation log correlation in
/api/operations/:id (OperationDetail panel) and the failed-ops list,
both served by simple `WHERE operation_id = ?` queries that don't
touch FTS5.
What changes
------------
* DashboardDB SCHEMA_VERSION 14 → 15.
V15 migration: DROP TRIGGER logs_ai, DROP TRIGGER logs_ad,
DROP TABLE logs_fts (drops 4 shadow tables atomically), then a
one-shot VACUUM so existing nodes actually reclaim the GBs.
VACUUM is wrapped in try/catch — it requires an exclusive lock
and we never block startup on disk reclamation.
* DEFAULT_RETENTION_DAYS 90 → 14. Bounds worst-case growth of the
now-FTS5-less logs table to ~150 MB. Operators who want longer
retention can override via setRetentionDays(); the value is
persisted in `settings` and re-read on next boot.
* prune() now VACUUMs whenever it deletes >10k log rows (well above
test-suite noise, well below daily log volume on a busy edge node),
so disk is reclaimed periodically — not only at migration.
* Remove /api/logs HTTP handler.
* Remove DashboardDB.searchLogs() and the private searchLogsFts().
* Remove fetchLogs() client wrapper.
* Remove StructuredLogger class + its export + its test + spec/README
mentions (the class was dead code in production).
What is preserved
-----------------
* The `logs` table and DashboardDB.insertLog().
* The Logger.setSink → dashDb.insertLog pipeline in lifecycle.ts.
* /api/node-log file-tail endpoint (the actual production log viewer).
* DashboardDB.getOperation() and getFailedOperations() per-operation
log lookup (the one DB-backed log feature with a UI consumer).
Migration safety
----------------
* Fresh installs no longer create logs_fts at any version (V1 schema
CREATE block in this file is the canonical V15 shape).
* In-place upgrades from any version V<15 trigger the cleanup.
* Downgrade-safe in the sense that V14 code reading a V15-migrated DB
will see `user_version = 15` and refuse to start (the existing
`if (version >= SCHEMA_VERSION) return;` guard — never tries to
recreate dropped objects). A rollback requires reverting this PR
*and* manually resetting `user_version` back to 14.
Tests
-----
* New: V14 → V15 migration regression test. Builds a realistic V14
fixture (full schema via DashboardDB, then re-attaches FTS5/triggers
and backfills the index), reopens through DashboardDB, asserts
user_version=15, that all FTS5 objects + both triggers are gone,
that the pre-migration log row survives, and that subsequent
insertLog() does not trip on an orphaned trigger.
* Removed: searchLogs free-text / level / time-range / pagination
test cases (the method is gone; per-operation lookup is still
covered by the operation-detail tests above).
* Removed: structured-logger.test.ts (module deleted).
* Updated: ui-api-pure.test.ts drops fetchLogs import + test +
matching mock-server branch.
* Updated: 3 pre-existing user_version pin assertions bumped 14 → 15.
Storage impact on existing nodes
--------------------------------
* Immediate after upgrade: VACUUM reclaims ~99% of node-ui.db size on
nodes that accumulated the FTS5 bloat (verified on the incident
node: 8.98 GB → ~150 MB after manual VACUUM of an FTS5-stripped
copy).
* Steady-state going forward: logs table bounded at retentionDays *
daily-write-volume (~150 MB at 14d default for an edge node, vs
unbounded growth before).
Co-authored-by: Cursor <cursoragent@cursor.com>
branarakic
commented
May 26, 2026
12 tasks
Contributor
Author
matic031
pushed a commit
to KilianTrunk/dkg
that referenced
this pull request
Jun 2, 2026
Drops the FTS5 log index that bloated node-ui.db to 9 GB. Directly addresses the Miles cleanup root cause (Track 1.3 of the SWM-fanout plan): the 21 GB dashDb on Miles was 2.1M log rows + their FTS5 index, not oxigraph SWM cruft. Vacuum reclaimed 11.7 GB locally; this PR prevents the bloat from recurring on every node.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Incident motivating this PR
Rc.10 → rc.11 upgrade on a 12-day-old testnet edge node, May 2026:
node-ui.dbhad grown to 8.98 GB, of which ~7 GB was the FTS5 shadow tables (logs_fts_data/_idx/_docsize/_config) backing free-text log search.database disk image is malformedon boot.node-ui.dbaside and starting over with a fresh DB.Forensic findings
The bloat was structural, not operational. Specifically:
/api/logs?q=— the only HTTP consumer of the FTS5 index — has no production wiring. The dashboard's actual log viewer (LogsTabinOperations.tsx, the live log panel inPanelBottom.tsx) reads/api/node-log, which tailsdaemon.logdirectly and supports the sameq=substring filter.fetchLogs()— the client wrapper — is exported fromsrc/ui/api.tsbut imported by zero React components (verified via grep across the entiresrc/ui/tree). Only its own unit test exercised it.StructuredLogger— the "drop-inLoggerthat also writes to SQLite" advertised inSPEC_NODE_DASHBOARD.md— is exported fromindex.tsbut never substituted forLoggerin any daemon code path. The live log capture goes throughLogger.setSink → dashDb.insertLoginlifecycle.ts, which this PR preserves.prune()ran on a 90-day retention cutoff and never deleted anything from a 12-day-old DB.optimize, which we never called.What this PR does
V15 of
DashboardDBremoves the dead FTS5 infrastructure while preserving the one DB-backed log feature that is in production use — per-operation log correlation in/api/operations/:id(theOperationDetailpanel) and the failed-ops list, both served by simpleWHERE operation_id = ?queries that don't touch FTS5.Schema migration v14 → v15
```sql
DROP TRIGGER IF EXISTS logs_ai;
DROP TRIGGER IF EXISTS logs_ad;
DROP TABLE IF EXISTS logs_fts; -- drops 4 shadow tables atomically
VACUUM; -- one-shot reclaim, try/catch-wrapped
```
Other changes
What is preserved
Tests
Full suite for `packages/node-ui`: 794 passed, 38 skipped, 0 failed.
Storage impact
Migration safety / rollback
Test plan
Made with Cursor