Importer perf: phase-level transaction.atomic + scoped SQLite PRAGMAs by ajslater · Pull Request #631 · ajslater/codex

ajslater · 2026-04-28T04:06:18Z

Summary

Implements tasks/importer-perf/05-sqlite-tuning.md from PR #627. Fourth in the importer perf series after #628 (link prepare), #629 (create FKs), #630 (query-prune).

Two independently revertable commits:

`ce82a8d0` — wrap create/link phases in `transaction.atomic`

The scribe daemon previously had no transaction.atomic anywhere — every bulk_create / bulk_update ran in autocommit. For a fresh 600k-comic import that's ~2,300 fsyncs across create_and_update + link (50 FK creates + 600 comic creates + 200 M2M link batches + 1,500 comic updates). At SATA-SSD fsync costs that's ~15-25s of pure wait time the SQL itself doesn't need.

transaction.atomic wraps both phases so all batches commit on one fsync each. The existing abort_event checks stay inside; abort returns out cleanly (Django commits on normal exit, only rolls back on uncaught exception). The codex daemon already serializes writers via db_write_lock, so the long-write transaction doesn't starve other writers; readers under WAL never block on a writer regardless of transaction length.

`d6f6aca0` — importer-scoped PRAGMAs + post-import checkpoint/optimize

New pragmas.py module exposes an importer_pragmas() context manager that wraps the entire apply() run:

Bumps cache_size to IMPORTER_SQLITE_CACHE_KB (default 512 MiB, configurable via importer.sqlite_cache_kb). The 600k-comic working set fits, eliminating page-cache misses across the link-phase prune walk.
Sets wal_autocheckpoint=0 to defer WAL checkpoints during the import. The default 1000-page autocheckpoint fires hundreds of times per large import, stalling the writer each time.
Hooks the connection_created signal so a mid-import reconnect (CONN_MAX_AGE recycle, pool grow) re-applies the override instead of silently inheriting the steady-state values.
On exit: restores the steady-state PRAGMAs, runs PRAGMA wal_checkpoint(TRUNCATE) (otherwise the deferred frames stay resident until a reader transaction crosses them), and runs PRAGMA optimize so the planner sees fresh statistics post-import.

Expected speedup

Order-of-magnitude estimate for a fresh 600k-comic import:

Optimization	Time saved
Phase-level `atomic()` (2,300 fsyncs → 5)	~15-25 s
512 MiB cache (eliminates link-phase page misses)	~30-90 s
`wal_autocheckpoint=0` (no mid-import checkpoint stalls)	~5-15 s
Combined	~1-2 minutes off a 30-min import

Cumulative with the surgical-N+1 wins from #628/#629/#630.

Test plan

make fix clean
make lint-python clean (0 errors, 0 warnings)
pytest tests/importer/ tests/test_search_fts.py — 7 passed
WAL size check: after a 1k-comic test import, Path('codex.sqlite3-wal').stat().st_size < 4 KiB (one frame post-truncate)
Wall-clock measurement on a real-scale fixture once one is available

🤖 Generated with Claude Code

The scribe daemon previously had no transaction.atomic anywhere — every bulk_create / bulk_update ran in autocommit mode. For a fresh 600k-comic import that's roughly 2,300 fsyncs across the create_and_update and link phases (50 FK creates + 600 comic creates + 200 M2M link batches + 1,500 comic updates). Under SATA SSD fsync costs that's ~15-25s of pure wait time on the import critical path that the SQL itself doesn't need. Wrap create_and_update and link in transaction.atomic to coalesce those commits into one fsync per phase. The existing abort_event checks remain inside the with-block; abort still returns out cleanly, and Django's atomic commits on normal exit (rolls back only on uncaught exception). The codex daemon already serializes writers via db_write_lock, so the long-write transaction does not starve other writers; readers under WAL never block on a writer regardless of transaction length. Phase 1 of tasks/importer-perf/05-sqlite-tuning.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The steady-state SQLite PRAGMAs in settings/__init__.py are tuned for concurrent readers + a slow-drip writer (cache_size=-64000, wal_autocheckpoint=1000). A bulk import inverts that balance. New importer_pragmas() context manager wraps the whole apply() run: - Bumps cache_size to IMPORTER_SQLITE_CACHE_KB (default 512 MiB, configurable via importer.sqlite_cache_kb in codex.toml). The 600k-comic working set fits inside this, eliminating page-cache misses across the link-phase prune walk. - Sets wal_autocheckpoint=0 to defer WAL checkpoints during the import. The default 1000-page autocheckpoint fires hundreds of times per large import, briefly stalling the writer each time. - Hooks the connection_created signal so a mid-import reconnect (CONN_MAX_AGE recycle, pool grow) re-applies the override instead of silently inheriting the steady-state values. - On exit: restores the steady-state PRAGMAs, force-checkpoints the WAL with TRUNCATE (otherwise the deferred frames stay resident until a reader transaction crosses them), and runs PRAGMA optimize so the planner sees fresh statistics post-import rather than planning the next browser query against pre-import stats. Phase 2 of tasks/importer-perf/05-sqlite-tuning.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ajslater and others added 2 commits April 27, 2026 21:05

ajslater merged commit 6043bf9 into v1.11-performance Apr 28, 2026
1 check failed

This was referenced Apr 28, 2026

Importer perf: chunk per-comic phases to bound peak memory #634

Merged

Librarian: release DB connection during long-idle waits #641

Merged

ajslater deleted the importer-sqlite-tuning branch May 2, 2026 22:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Importer perf: phase-level transaction.atomic + scoped SQLite PRAGMAs#631

Importer perf: phase-level transaction.atomic + scoped SQLite PRAGMAs#631
ajslater merged 2 commits intov1.11-performancefrom
importer-sqlite-tuning

ajslater commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ajslater commented Apr 28, 2026

Summary

ce82a8d0 — wrap create/link phases in transaction.atomic

d6f6aca0 — importer-scoped PRAGMAs + post-import checkpoint/optimize

Expected speedup

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`ce82a8d0` — wrap create/link phases in `transaction.atomic`

`d6f6aca0` — importer-scoped PRAGMAs + post-import checkpoint/optimize