fix(store): keep WAL journal mode during bulk write to prevent DB corruption on crash#72
Conversation
…ruption on crash cbm_store_begin_bulk() was switching the SQLite journal mode from WAL to MEMORY for write throughput. If the process crashed mid-bulk-write the in-memory rollback journal was lost, leaving the database file in a partially-written, unrecoverable state. WAL mode is inherently crash-safe: uncommitted WAL entries are discarded on the next open. The performance benefit of bulk mode is preserved via synchronous=OFF and an enlarged cache_size, both of which are safe under WAL. Remove the PRAGMA journal_mode = MEMORY from cbm_store_begin_bulk and the matching PRAGMA journal_mode = WAL from cbm_store_end_bulk. Update the header comments to reflect the new invariant. Add tests/test_store_bulk.c with three tests: - bulk_pragma_wal_invariant: asserts journal_mode remains "wal" after cbm_store_begin_bulk via an independent read-only connection - bulk_pragma_end_wal_invariant: asserts journal_mode remains "wal" after cbm_store_end_bulk - bulk_crash_recovery: forks a child that enters bulk mode, opens an explicit transaction, writes data, then calls _exit() without committing; the parent verifies the database opens cleanly and baseline data survives Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Hey thx for the contribution! Now I have time to review all of this :) Will be done latest the weekend |
QA Round 1 |
- Wrap bulk_crash_recovery test and its POSIX includes with #ifndef _WIN32 guards to fix compilation failure on Windows (fork/waitpid unavailable) - Add negative assertion that the uncommitted "crashed" row is absent after crash recovery, completing the test's correctness verification Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
QA Round 1 — Fixes AppliedAddressed all confirmed findings from the Opus QA review. Commit: 675fdf3 Fixed[Critical] Windows compilation failure — wrapped the [Minor] Missing negative assertion — added Not changedThe |
QA Round 2 —
|
- Validate child exit status (WIFEXITED + WEXITSTATUS) after waitpid so the test fails fast if the child couldn't open the store, rather than passing vacuously due to the "crashed" row never being written Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
QA Round 2 — Fixes Applied (commit a0e809d)[Minor] Child exit status not validated — added |
QA Round 3 — Final Review ✅Reviewer: Claude (Senior C Engineer) Previous Findings — Verified Fixed
Round 3 — Detailed ReviewStore fix (
Header (
Test:
Test:
Test:
Build integration (
No Issues FoundAll three rounds of findings have been addressed. The fix is minimal, well-documented, and the tests cover the two key properties:
Verdict: APPROVED — ready for merge. |
|
Excellent work — this is one of the best-structured PRs we've received. The fix is minimal (3 lines of PRAGMA removal), the rationale is clearly documented, and the 3-round self-QA with fixes applied after each round shows real engineering discipline. The crash recovery test using fork()+_exit() is particularly well done — it validates the fix at the crash-safety level, not just the PRAGMA level. And the separate read-only connection for WAL verification isolates the check properly. Merged to main in b5b9c6b. All 2044 tests pass including your 3 new bulk tests. Thanks for the contribution! |
Bug
cbm_store_begin_bulk()switched the SQLite journal mode from WAL to MEMORY:If the process crashes during bulk write (OOM, SIGKILL, power loss), the in-memory rollback journal is lost. The database file is left in a partially-written state with no way to recover — it is corrupt on next open.
Fix
Remove the journal mode switch entirely. WAL mode is inherently crash-safe: uncommitted WAL entries are simply discarded on the next open. The performance benefit of bulk mode is fully preserved by
synchronous=OFFand a 64 MBcache_size, both of which are safe under WAL.Tests
tests/test_store_bulk.c(new file, 3 tests):bulk_pragma_wal_invariantjournal_modeis still"wal"afterbegin_bulk, verified via an independent read-only connection — deterministic proof of the fixbulk_pragma_end_wal_invariantjournal_modeis still"wal"afterend_bulkbulk_crash_recovery_exit()without committing; parent verifies the database opens cleanly and baseline data survivesAll 2033 tests pass.
Note
This is the C-rewrite equivalent of the same bug that existed in the prior Go implementation. The fix is identical in intent: never leave WAL mode during bulk writes.