fix(node-ui): drop FTS5 log index that bloated node-ui.db to 9 GB by branarakic · Pull Request #687 · OriginTrail/dkg

branarakic · 2026-05-26T13:02:22Z

Incident motivating this PR

Rc.10 → rc.11 upgrade on a 12-day-old testnet edge node, May 2026:

node-ui.db had grown to 8.98 GB, of which ~7 GB was the FTS5 shadow tables (logs_fts_data / _idx / _docsize / _config) backing free-text log search.
SQLite returned database disk image is malformed on boot.
The rc.11 daemon refused to start; recovery required moving node-ui.db aside and starting over with a fresh DB.

Forensic findings

The bloat was structural, not operational. Specifically:

/api/logs?q= — the only HTTP consumer of the FTS5 index — has no production wiring. The dashboard's actual log viewer (LogsTab in Operations.tsx, the live log panel in PanelBottom.tsx) reads /api/node-log, which tails daemon.log directly and supports the same q= substring filter.
fetchLogs() — the client wrapper — is exported from src/ui/api.ts but imported by zero React components (verified via grep across the entire src/ui/ tree). Only its own unit test exercised it.
StructuredLogger — the "drop-in Logger that also writes to SQLite" advertised in SPEC_NODE_DASHBOARD.md — is exported from index.ts but never substituted for Logger in any daemon code path. The live log capture goes through Logger.setSink → dashDb.insertLog in lifecycle.ts, which this PR preserves.
prune() ran on a 90-day retention cutoff and never deleted anything from a 12-day-old DB.
FTS5 fragments without periodic optimize, which we never called.

What this PR does

V15 of DashboardDB removes the dead FTS5 infrastructure while preserving the one DB-backed log feature that is in production use — per-operation log correlation in /api/operations/:id (the OperationDetail panel) and the failed-ops list, both served by simple WHERE operation_id = ? queries that don't touch FTS5.

Schema migration v14 → v15

```sql
DROP TRIGGER IF EXISTS logs_ai;
DROP TRIGGER IF EXISTS logs_ad;
DROP TABLE IF EXISTS logs_fts; -- drops 4 shadow tables atomically
VACUUM; -- one-shot reclaim, try/catch-wrapped
```

Other changes

`DEFAULT_RETENTION_DAYS` `90 → 14`. Bounds worst-case growth of the (now FTS5-less) `logs` table to ~150 MB. Operators can override via `setRetentionDays()`; value persists in `settings`.
`prune()` now `VACUUM`s whenever it deletes >10k log rows, so disk is reclaimed periodically — not only at migration.
Remove `/api/logs` HTTP handler.
Remove `DashboardDB.searchLogs()` and private `searchLogsFts()`.
Remove `fetchLogs()` client wrapper.
Remove `StructuredLogger` class + its export + its test + spec/README mentions.

What is preserved

The `logs` table itself + `DashboardDB.insertLog()`.
`Logger.setSink → dashDb.insertLog` pipeline in `lifecycle.ts` (unchanged).
`/api/node-log` (file-tail endpoint — the actual production log viewer).
`DashboardDB.getOperation()` and `getFailedOperations()` per-operation log lookup.

Tests

New: V14 → V15 migration regression test. Builds a realistic V14 fixture (full schema via DashboardDB, then re-attaches FTS5 virtual table + both triggers + backfills the index), reopens through DashboardDB, asserts `user_version = 15`, that all FTS5 objects + both triggers are gone, that the pre-migration log row survives, and that subsequent `insertLog()` does not trip on an orphaned trigger.
Removed: `searchLogs` free-text / level / time-range / pagination test cases (method is gone; per-operation lookup is still covered by the existing operation-detail tests).
Removed: `structured-logger.test.ts` (module deleted).
Updated: `ui-api-pure.test.ts` drops `fetchLogs` import + test + matching mock-server branch.
Updated: 3 pre-existing `user_version` pin assertions bumped 14 → 15.

Full suite for `packages/node-ui`: 794 passed, 38 skipped, 0 failed.

Storage impact

Immediate after upgrade: VACUUM reclaims ~99% of `node-ui.db` on nodes that accumulated the FTS5 bloat. Verified on the incident node: 8.98 GB → ~150 MB after manual VACUUM of an FTS5-stripped copy.
Steady-state: `logs` table bounded at `retentionDays * daily-write-volume` (~150 MB at the new 14d default for an edge node, vs unbounded growth before).

Migration safety / rollback

Fresh installs no longer create `logs_fts` at any version — the V1 schema CREATE block is the canonical V15 shape.
In-place upgrades from any `version < 15` trigger the cleanup.
Rollback requires reverting this PR and manually resetting `user_version` back to 14 (the existing `if (version >= SCHEMA_VERSION) return;` guard in `migrate()` would otherwise short-circuit and never recreate the dropped objects).

Test plan

`pnpm --filter @origintrail-official/dkg-node-ui build` clean
`pnpm --filter @origintrail-official/dkg-node-ui test` — 794 passed
`pnpm --filter @origintrail-official/dkg build` clean (CLI transitively consumes node-ui)
`pnpm --filter @origintrail-official/dkg test` — same 2 pre-existing failures in `test/repro-issue-633.test.ts` (EPCIS SPARQL bindings, unrelated; verified identical failures on `origin/main`)
`pnpm --filter @origintrail-official/dkg-node-ui test:e2e` (suggested for reviewer; Playwright wasn't run locally)
Smoke test against a live testnet edge node: open dashboard, click an operation in Operations page, confirm per-op logs render; check Logs tab still tails `daemon.log`.

Made with Cursor

Production incident, rc.10 → rc.11 boundary (May 2026): a 12-day-old testnet edge node accumulated a 9 GB node-ui.db, ~7 GB of which was the FTS5 shadow tables (logs_fts_data/_idx/_docsize/_config) backing free-text log search. SQLite eventually returned "database disk image is malformed" on boot and the daemon refused to start; recovery required moving node-ui.db aside and starting over with a fresh DB. Forensic findings showed the bloat was structural, not operational: 1. /api/logs?q= — the only HTTP consumer of the FTS5 index — had no production wiring. The dashboard's actual log viewer (LogsTab in Operations.tsx, PanelBottom live log) reads /api/node-log, which tails daemon.log directly and supports the same q= substring filter. 2. fetchLogs() — the client wrapper — was exported from src/ui/api.ts but never imported by any React component (verified via grep). Only its own unit test exercised it. 3. StructuredLogger — the "drop-in Logger that also writes to SQLite" described in SPEC_NODE_DASHBOARD.md — was exported from index.ts but never substituted for Logger in any daemon code path. The live log capture went through Logger.setSink → dashDb.insertLog in lifecycle.ts, which is preserved. 4. prune() ran on a 90-day retention cutoff and never deleted anything from a 12-day-old DB. 5. FTS5 fragments without periodic optimize, which we never called. V15 of DashboardDB cleans this up while preserving the one DB-backed log feature that *is* in use — per-operation log correlation in /api/operations/:id (OperationDetail panel) and the failed-ops list, both served by simple `WHERE operation_id = ?` queries that don't touch FTS5. What changes ------------ * DashboardDB SCHEMA_VERSION 14 → 15. V15 migration: DROP TRIGGER logs_ai, DROP TRIGGER logs_ad, DROP TABLE logs_fts (drops 4 shadow tables atomically), then a one-shot VACUUM so existing nodes actually reclaim the GBs. VACUUM is wrapped in try/catch — it requires an exclusive lock and we never block startup on disk reclamation. * DEFAULT_RETENTION_DAYS 90 → 14. Bounds worst-case growth of the now-FTS5-less logs table to ~150 MB. Operators who want longer retention can override via setRetentionDays(); the value is persisted in `settings` and re-read on next boot. * prune() now VACUUMs whenever it deletes >10k log rows (well above test-suite noise, well below daily log volume on a busy edge node), so disk is reclaimed periodically — not only at migration. * Remove /api/logs HTTP handler. * Remove DashboardDB.searchLogs() and the private searchLogsFts(). * Remove fetchLogs() client wrapper. * Remove StructuredLogger class + its export + its test + spec/README mentions (the class was dead code in production). What is preserved ----------------- * The `logs` table and DashboardDB.insertLog(). * The Logger.setSink → dashDb.insertLog pipeline in lifecycle.ts. * /api/node-log file-tail endpoint (the actual production log viewer). * DashboardDB.getOperation() and getFailedOperations() per-operation log lookup (the one DB-backed log feature with a UI consumer). Migration safety ---------------- * Fresh installs no longer create logs_fts at any version (V1 schema CREATE block in this file is the canonical V15 shape). * In-place upgrades from any version V<15 trigger the cleanup. * Downgrade-safe in the sense that V14 code reading a V15-migrated DB will see `user_version = 15` and refuse to start (the existing `if (version >= SCHEMA_VERSION) return;` guard — never tries to recreate dropped objects). A rollback requires reverting this PR *and* manually resetting `user_version` back to 14. Tests ----- * New: V14 → V15 migration regression test. Builds a realistic V14 fixture (full schema via DashboardDB, then re-attaches FTS5/triggers and backfills the index), reopens through DashboardDB, asserts user_version=15, that all FTS5 objects + both triggers are gone, that the pre-migration log row survives, and that subsequent insertLog() does not trip on an orphaned trigger. * Removed: searchLogs free-text / level / time-range / pagination test cases (the method is gone; per-operation lookup is still covered by the operation-detail tests above). * Removed: structured-logger.test.ts (module deleted). * Updated: ui-api-pure.test.ts drops fetchLogs import + test + matching mock-server branch. * Updated: 3 pre-existing user_version pin assertions bumped 14 → 15. Storage impact on existing nodes -------------------------------- * Immediate after upgrade: VACUUM reclaims ~99% of node-ui.db size on nodes that accumulated the FTS5 bloat (verified on the incident node: 8.98 GB → ~150 MB after manual VACUUM of an FTS5-stripped copy). * Steady-state going forward: logs table bounded at retentionDays * daily-write-volume (~150 MB at 14d default for an edge node, vs unbounded growth before). Co-authored-by: Cursor <cursoragent@cursor.com>

branarakic · 2026-05-31T22:19:35Z

Already merged into release/rc.12 in 719ec13; reaches main via #716. Closing as superseded — the open state against main is a side effect of merging into a non-base branch.

Drops the FTS5 log index that bloated node-ui.db to 9 GB. Directly addresses the Miles cleanup root cause (Track 1.3 of the SWM-fanout plan): the 21 GB dashDb on Miles was 2.1M log rows + their FTS5 index, not oxigraph SWM cruft. Vacuum reclaimed 11.7 GB locally; this PR prevents the bloat from recurring on every node.