feat(db): add missing sqlite indices to help with performance#9141
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on enhancing database performance by introducing a set of new SQLite indices. These additions aim to optimize common queries across several key tables, ensuring smoother and more efficient data retrieval. A new database migration has been added to seamlessly integrate these schema changes into existing installations. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces several new database indices to improve query performance. The changes are well-structured, including updates to the main schema file, a new database migration for existing installations, and an update to the database version constant. I've found one minor issue regarding a redundant index creation in the new migration, which I've detailed in a specific comment.
eliandoran
left a comment
There was a problem hiding this comment.
@perfectra1n , what's the size impact of these indices?
|
@eliandoran measured on a 787 MB production database after applying migration 235.
Index Sizes
Table Row Counts
|
|
That was using my own personal Trilium DB - if there's another example or something you'd like as an additional data point, just let me know |
eliandoran
left a comment
There was a problem hiding this comment.
@perfectra1n , what's the size impact of these indices?
Adds 7 new indices and replaces 1 existing index to improve query performance on frequently-accessed code paths. It also fixes a bug where
IDX_attachments_utcDateScheduledForErasureSincewas missing fromschema.sqlfor fresh installs.IDX_entity_changes_isSynced_idIDX_entity_changes_isErased_entityNameIDX_notes_isDeleted_utcDateModifiedIDX_branches_isDeleted_utcDateModifiedIDX_attributes_isDeleted_utcDateModifiedIDX_attachments_isDeleted_utcDateModifiedIDX_attachments_utcDateScheduledForErasureSinceIDX_branches_parentNoteId_isDeleted_notePositionIDX_branches_parentNoteIdIDX_entity_changes_isSynced_id, entity_changes (isSynced, id)Impact: High, sync is a core, frequently-executed operation.
The sync push loop is the hottest query path during synchronization. It runs repeatedly in a loop until all changes are pushed, processing 1000 rows at a time. The
entity_changestable grows unbounded (every edit appends a row), so full table scans become increasingly expensive over time.Queries served (5 across 4 files):
services/sync.tsSELECT * FROM entity_changes WHERE isSynced = 1 AND id > ? LIMIT 1000services/sync.tsSELECT EXISTS(SELECT 1 FROM entity_changes WHERE isSynced = 1 AND id > ?)routes/index.tsSELECT COALESCE(MAX(id), 0) FROM entity_changes WHERE isSynced = 1routes/api/sync.tsSELECT COALESCE(MAX(id), 0) FROM entity_changes WHERE isSynced = 1routes/api/login.tsSELECT COALESCE(MAX(id), 0) FROM entity_changes WHERE isSynced = 1Why this column order: The composite
(isSynced, id)allows SQLite to jump directly toisSynced = 1in the B-tree, then range-scan forward fromid > ?. This turns a full table scan + filter into a single contiguous index range scan. TheMAX(id)queries benefit similarly, SQLite can read the last entry in theisSynced = 1segment.IDX_entity_changes_isErased_entityName, entity_changes (isErased, entityName)Impact: Medium, used in consistency checks and entity change maintenance.
The existing unique index
(entityName, entityId)does not help whenisErasedis the leading filter condition. These queries run during consistency checks (periodic integrity validation) and duringfillAllEntityChanges(a repair/rebuild operation).Queries served (4 across 2 files):
services/entity_changes.tsDELETE FROM entity_changes WHERE isErased = 0 AND entityName = '...' AND entityId NOT IN (...)services/entity_changes.tsDELETE FROM entity_changes WHERE isErased = 0services/consistency_checks.tsSELECT ... FROM entity_changes WHERE entity_changes.isErased = 0 AND entity_changes.entityName = '...'services/consistency_checks.tsSELECT ... FROM entity_changes WHERE entity_changes.isErased = 1 AND entity_changes.entityName = '...'3–6.
IDX_{table}_isDeleted_utcDateModified, (isDeleted, utcDateModified) on notes, branches, attributes, attachmentsImpact: Medium-High, periodic cleanup that scans all four core tables.
The
eraseDeletedEntities()function inservices/erase.tsruns every 4 hours and once at startup (after a 5 minute delay). It finds soft-deleted entities older than a configurable threshold and permanently erases them. Without indices, each of these four queries performs a full table scan.Since
isDeleted = 1typically represents a tiny fraction of rows, the index is highly selective, SQLite jumps to the deleted rows in the B-tree, then range-scansutcDateModified <= ?within that subset.Queries served (4 in
services/erase.tslines 126–135):Note: These indices also partially serve the
eraseNotesWithDeleteId()queries (WHERE isDeleted = 1 AND deleteId = ?inservices/erase.tslines 143–152). TheisDeleted = 1prefix narrows to the small deleted set, making the subsequentdeleteIdfilter cheap even without a dedicated(isDeleted, deleteId)composite.IDX_attachments_utcDateScheduledForErasureSince, Bug Fix (schema.sql only)Impact: Correctness fix, this index was missing for fresh database installations.
Migration 219 (which introduced the attachments table) created this index for databases that upgrade. However, the index was accidentally omitted from
schema.sql, meaning any database created fresh (new installations) has been missing it. This migration does not re-create the index (migration 219 already handles existing databases); the fix is adding it toschema.sqlso new installs get it.Query served (
services/erase.tsline 173):This runs hourly to clean up attachments that are no longer referenced in note content.
IDX_branches_parentNoteId_isDeleted_notePosition, ReplacesIDX_branches_parentNoteIdImpact: High, the most frequently queried pattern on the branches table (11 queries across 7 files).
The old single-column
IDX_branches_parentNoteIdindex finds rows by parent but then must scan all matching rows to filterisDeletedand sort bynotePosition. The new composite(parentNoteId, isDeleted, notePosition)makes all of these operations a single contiguous B-tree range scan with no extra filtering or sorting required.The old index is dropped in this migration since the new composite index fully subsumes it (any query that used
parentNoteIdalone will use the leading column of the new composite).Queries served (11 across 7 files):
services/branches.tsSELECT MAX(notePosition) FROM branches WHERE parentNoteId = ? AND isDeleted = 0services/entity_changes.tsSELECT branchId, notePosition FROM branches WHERE isDeleted = 0 AND parentNoteId = ?services/sync.tsSELECT branchId, notePosition FROM branches WHERE parentNoteId = ? AND isDeleted = 0services/sql_init.tsSELECT noteId FROM branches WHERE parentNoteId = 'root' AND isDeleted = 0 ORDER BY notePositionservices/notes.tsUPDATE branches SET notePosition = notePosition + 10 WHERE parentNoteId = ? AND notePosition > ? AND isDeleted = 0services/notes.tsUPDATE branches SET notePosition = notePosition - 10 WHERE parentNoteId = ? AND notePosition < ? AND isDeleted = 0services/cloning.tsSELECT branchId FROM branches WHERE noteId = ? AND parentNoteId = ? AND isDeleted = 0services/cloning.tsUPDATE branches SET notePosition = notePosition + 10 WHERE parentNoteId = ? AND notePosition > ? AND isDeleted = 0routes/api/branches.tsUPDATE branches SET notePosition = notePosition + 10 WHERE parentNoteId = ? AND notePosition >= ? AND isDeleted = 0routes/api/branches.tsUPDATE branches SET notePosition = notePosition + 10 WHERE parentNoteId = ? AND notePosition > ? AND isDeleted = 0share/shaca/shaca_loader.tsSELECT ... FROM branches WHERE isDeleted = 0 AND parentNoteId IN (...) ORDER BY notePositionWas also with the help of olav: https://github.com/orgs/TriliumNext/discussions/9138