Importer perf: phase-level transaction.atomic + scoped SQLite PRAGMAs#631
Merged
ajslater merged 2 commits intov1.11-performancefrom Apr 28, 2026
Merged
Importer perf: phase-level transaction.atomic + scoped SQLite PRAGMAs#631ajslater merged 2 commits intov1.11-performancefrom
ajslater merged 2 commits intov1.11-performancefrom
Conversation
The scribe daemon previously had no transaction.atomic anywhere — every bulk_create / bulk_update ran in autocommit mode. For a fresh 600k-comic import that's roughly 2,300 fsyncs across the create_and_update and link phases (50 FK creates + 600 comic creates + 200 M2M link batches + 1,500 comic updates). Under SATA SSD fsync costs that's ~15-25s of pure wait time on the import critical path that the SQL itself doesn't need. Wrap create_and_update and link in transaction.atomic to coalesce those commits into one fsync per phase. The existing abort_event checks remain inside the with-block; abort still returns out cleanly, and Django's atomic commits on normal exit (rolls back only on uncaught exception). The codex daemon already serializes writers via db_write_lock, so the long-write transaction does not starve other writers; readers under WAL never block on a writer regardless of transaction length. Phase 1 of tasks/importer-perf/05-sqlite-tuning.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The steady-state SQLite PRAGMAs in settings/__init__.py are tuned for concurrent readers + a slow-drip writer (cache_size=-64000, wal_autocheckpoint=1000). A bulk import inverts that balance. New importer_pragmas() context manager wraps the whole apply() run: - Bumps cache_size to IMPORTER_SQLITE_CACHE_KB (default 512 MiB, configurable via importer.sqlite_cache_kb in codex.toml). The 600k-comic working set fits inside this, eliminating page-cache misses across the link-phase prune walk. - Sets wal_autocheckpoint=0 to defer WAL checkpoints during the import. The default 1000-page autocheckpoint fires hundreds of times per large import, briefly stalling the writer each time. - Hooks the connection_created signal so a mid-import reconnect (CONN_MAX_AGE recycle, pool grow) re-applies the override instead of silently inheriting the steady-state values. - On exit: restores the steady-state PRAGMAs, force-checkpoints the WAL with TRUNCATE (otherwise the deferred frames stay resident until a reader transaction crosses them), and runs PRAGMA optimize so the planner sees fresh statistics post-import rather than planning the next browser query against pre-import stats. Phase 2 of tasks/importer-perf/05-sqlite-tuning.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements
tasks/importer-perf/05-sqlite-tuning.mdfrom PR #627. Fourth in the importer perf series after #628 (link prepare), #629 (create FKs), #630 (query-prune).Two independently revertable commits:
ce82a8d0— wrap create/link phases intransaction.atomicThe scribe daemon previously had no
transaction.atomicanywhere — everybulk_create/bulk_updateran in autocommit. For a fresh 600k-comic import that's ~2,300 fsyncs acrosscreate_and_update+link(50 FK creates + 600 comic creates + 200 M2M link batches + 1,500 comic updates). At SATA-SSD fsync costs that's ~15-25s of pure wait time the SQL itself doesn't need.transaction.atomicwraps both phases so all batches commit on one fsync each. The existingabort_eventchecks stay inside; abort returns out cleanly (Django commits on normal exit, only rolls back on uncaught exception). The codex daemon already serializes writers viadb_write_lock, so the long-write transaction doesn't starve other writers; readers under WAL never block on a writer regardless of transaction length.d6f6aca0— importer-scoped PRAGMAs + post-import checkpoint/optimizeNew
pragmas.pymodule exposes animporter_pragmas()context manager that wraps the entireapply()run:cache_sizetoIMPORTER_SQLITE_CACHE_KB(default 512 MiB, configurable viaimporter.sqlite_cache_kb). The 600k-comic working set fits, eliminating page-cache misses across the link-phase prune walk.wal_autocheckpoint=0to defer WAL checkpoints during the import. The default 1000-page autocheckpoint fires hundreds of times per large import, stalling the writer each time.connection_createdsignal so a mid-import reconnect (CONN_MAX_AGE recycle, pool grow) re-applies the override instead of silently inheriting the steady-state values.PRAGMA wal_checkpoint(TRUNCATE)(otherwise the deferred frames stay resident until a reader transaction crosses them), and runsPRAGMA optimizeso the planner sees fresh statistics post-import.Expected speedup
Order-of-magnitude estimate for a fresh 600k-comic import:
atomic()(2,300 fsyncs → 5)wal_autocheckpoint=0(no mid-import checkpoint stalls)Cumulative with the surgical-N+1 wins from #628/#629/#630.
Test plan
make fixcleanmake lint-pythonclean (0 errors, 0 warnings)pytest tests/importer/ tests/test_search_fts.py— 7 passedPath('codex.sqlite3-wal').stat().st_size< 4 KiB (one frame post-truncate)🤖 Generated with Claude Code