perf(db): drop redundant idx_edges_source and idx_edges_target#122
perf(db): drop redundant idx_edges_source and idx_edges_target#122andreinknv wants to merge 2 commits into
Conversation
Today every PR adding a schema migration claims `CURRENT_SCHEMA_VERSION = next` AND adds an array entry to `migrations: Migration[]` in src/db/migrations.ts. Two PRs both claiming the same version resolve as: "second PR's v4 silently no-ops on existing DBs" — a real silent-data-loss bug class (PR colbymchenry#113's reviewer caught one). After this refactor: Adding a new schema migration: 1. Pick the next free 3-digit prefix (`git ls-files 'src/db/migrations/[0-9]*.ts'` shows what's taken). 2. Create `src/db/migrations/<NNN>-<short-name>.ts` exporting a `MIGRATION: MigrationModule` (description + up). 3. Add one import line and one entry to `src/db/migrations/index.ts`'s REGISTERED_MODULES array. Two PRs both creating `004-foo.ts` collide on the FILESYSTEM — the maintainer sees it instantly. No more silent skipped migrations. ## What's new - `src/db/migrations/types.ts` — `MigrationModule { description, up }` and `Migration extends MigrationModule { version }`. - `src/db/migrations/002-project-metadata.ts` — extracted v2 body verbatim. - `src/db/migrations/003-lower-name-index.ts` — extracted v3 body verbatim. - `src/db/migrations/index.ts` — central registry. Static-imports each migration, parses the version FROM THE FILENAME (no hand-typed version field that can drift), enforces strict `NNN-kebab-name.ts` shape, validates uniqueness/sort at module load (throws loudly on collision), exposes ALL_MIGRATIONS and CURRENT_SCHEMA_VERSION. - `src/db/migrations.ts` — refactored to a thin runner. Same exported surface (CURRENT_SCHEMA_VERSION, getCurrentVersion, runMigrations, needsMigration, getPendingMigrations, getMigrationHistory, Migration type) — every existing import keeps working unchanged. - `__tests__/migrations-registry.test.ts` — 8 invariant tests: registry non-empty, versions unique + strictly ascending, CURRENT_SCHEMA_VERSION matches max, every file matches the strict NNN-kebab-name pattern, no orphan files, no phantom registrations. ## Reviewer pass Independent reviewer ran once. 3 REQUEST_CHANGES + 1 INFO addressed: 1. Hand-typed `version` field in REGISTERED_MODULES could drift from filename. **Fixed**: removed the version field; registry now parses version from filename via FILENAME_PATTERN regex inside validateRegistered. 2. Filename-pattern test was lenient (allowed 4-digit or 1-digit prefixes). **Fixed**: new "every migration file matches the strict NNN-kebab-name.ts pattern" test catches malformed filenames as orphan-detection-bypassing offenders. 3. `getPendingMigrations` returned `readonly Migration[]`, breaking callers that typed the result as `Migration[]`. **Fixed**: returns a fresh mutable array via `.slice()`. 4. No throw-on-duplicate test for validateRegistered (module evaluation timing). Acknowledged; not added. ## Backward compat Every existing import works unchanged: - `import { CURRENT_SCHEMA_VERSION } from './migrations'` ✓ - `import { runMigrations } from './migrations'` ✓ - `import { needsMigration } from './migrations'` ✓ - `import { getMigrationHistory } from './migrations'` ✓ - `import { getPendingMigrations } from './migrations'` — returns mutable Migration[] (preserved) - `Migration` type — re-exported ## Affected open PRs Every migration-touching PR (colbymchenry#102 UNIQUE edges, colbymchenry#105 cochange, colbymchenry#108 perf db, colbymchenry#111 LLM features, my colbymchenry#112 centrality+churn, colbymchenry#113 issue-history, colbymchenry#114 config-refs, colbymchenry#115 sql-refs) currently claims migration v4 and conflicts with each other on `migrations.ts`. After this lands they each become: - 1 new file: `src/db/migrations/<NNN>-<name>.ts` - 2 lines in registry.ts (import + array entry) Conflict shape changes from "next free version + array entry + CURRENT_SCHEMA_VERSION bump in one file" (4-way conflict) to "1 new file" + 2-line registry edit. If two PRs target the same NNN, the filesystem collision surfaces immediately — no silent skipped migrations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Merge guide: see #120 for the full backlog merge order. Short version for this PR: ⏳ currently in monolithic |
These two narrow indexes are fully covered by the wider
idx_edges_source_kind and idx_edges_target_kind composite indexes
via SQLite's left-prefix scan. Keeping them costs DB size and bulk-
insert time without giving any query that the kind-prefixed indexes
don't already cover.
Empirical measurements on a 50K-node / 250K-edge synthesized DB
(scripts/spikes/spike-edge-indexes.mjs):
- DB size: -22.2% (34.7 MB -> 27.0 MB)
- Bulk insert: 1.37x faster (590ms -> 431ms)
- Source-only / target-only query latency: no regression
(EXPLAIN: SEARCH edges USING COVERING INDEX
idx_edges_source_kind (source=?))
Ported to the post-colbymchenry#118 file-based-migration form: migration body
lives in src/db/migrations/017-drop-redundant-edge-indexes.ts and
is registered in src/db/migrations/index.ts. CURRENT_SCHEMA_VERSION
is auto-derived from the registry — no constant bump.
Schema: removes the two CREATE INDEX statements on the narrow
indexes; the wider kind-prefixed indexes (which actually serve
queries) stay.
8b445b2 to
8c86079
Compare
|
Closing — the optimization itself is good, but this PR also pulls in the file-based migrations architecture from #118 (the entire Going to land the actual perf win directly on main as a small inline migration. The empirical numbers are persuasive — thanks for benchmarking. |
Both narrow indexes are fully covered by the existing (source, kind) and (target, kind) composites via SQLite's left-prefix scan, so they're dead weight on every write. Empirical measurements (from the spike script in PR #122 on a 50K-node / 250K-edge synthetic DB): - DB size: 34.7 MB → 27.0 MB (-22.2%) - Bulk insert (250K edges): 590ms → 431ms (1.37× faster) - source/target lookup latency: no regression Adds migration v4 to drop both on existing databases; fresh-DB schema no longer creates them. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Drops two redundant secondary indexes from the
edgestable:idx_edges_source— fully covered byidx_edges_source_kindvia SQLite's left-prefix scan.idx_edges_target— fully covered byidx_edges_target_kind.These indexes pre-date the
(source, kind)/(target, kind)composites added later. Once the wider indexes were introduced, the narrow ones became dead weight on every write and every disk page — but they were never removed.Empirical validation
Measured on a synthesized 50K-node / 250K-edge SQLite database that mirrors a realistic mid-size codebase. Reproducer:
scripts/spikes/spike-edge-indexes.mjs.WHERE source = ?latencyWHERE target = ?latencyEXPLAIN output on the stripped DB confirms the kind-prefixed indexes still cover source-only / target-only queries:
Run yourself with
node scripts/spikes/spike-edge-indexes.mjs.Why this is safe
WHERE source = ?idx_edges_sourceidx_edges_source_kind(left-prefix, covering)WHERE target = ?idx_edges_targetidx_edges_target_kind(left-prefix, covering)WHERE source = ? AND kind = ?idx_edges_source_kindidx_edges_source_kind(unchanged)WHERE target = ? AND kind = ?idx_edges_target_kindidx_edges_target_kind(unchanged)SQLite's query planner uses a B-tree's leading column(s) for prefix matches, so the composite indexes are a strict superset of the dropped ones for these queries.
Migration
Adds schema migration v4 (
CURRENT_SCHEMA_VERSION3 → 4):schema.sql).Test plan
npx tsc --noEmitcleannpx vitest run— 382/382 tests pass__tests__/pr19-improvements.test.ts:idx_edges_source/idx_edges_targetFiles changed
src/db/migrations.tsCURRENT_SCHEMA_VERSION3 → 4src/db/schema.sqlCREATE INDEXlines__tests__/foundation.test.ts__tests__/pr19-improvements.test.tsscripts/spikes/spike-edge-indexes.mjs🤖 Generated with Claude Code