Make explicit non-passive Sqlite WAL checkpoints by Qqwy · Pull Request #85 · channable/opsqueue

Qqwy · 2026-03-26T10:49:50Z

The default way SQLite in WAL mode performs checkpoints, is by waiting for quiet time where there are no readers nor writers.

Any read-connection can block this passive WAL checkpointing from making progress. We have observed that in production when having large workloads with consumers working on hundreds of chunks concurrently.

Whereas the WAL is not expected to grow much beyond 4MiB (that's when the passive autocheckpointing kicks in), we saw WALs of > 850MiB. At that point, reads slow down significantly and that can cause a failure for the system as a whole.

To mitigate this, as per the SQLite docs, this PR:

Disables the passive autocheckpointing (leaving it enabled conflicts with manual checkpointing which can then result in a SQLITE_BUSY)
Runs active checkpointing every second. It is very fast when there is little-to-no work to do, but under load it is expected that we'll hit the '1000 page mutations' (AKA the 4MiB default WAL size) within a second.
We use the 'RESTART' strategy, together with a journal_file_limit that ensures that the WAL will be trimmed down to the max of 4MiB. That means we don't always fully truncate, nor do we keep it at 'whatever the max happened to be'.

Doing this checkpointing does mean that every second there is a tiny timeframe in which both write-tasks and also all read-tasks will have to wait. This is unlikely to cause any problems. We now explicitly configure the busy timeout to be 5 seconds, which was the prior implicit default of SQLx/Rusqlite for good measure.

The default way SQLite in WAL mode performs checkpoints, is by waiting for quiet time where there are no readers nor writers. Any read-connection can block this passive WAL checkpointing from making progress. We have observed that in production when having large workloads with consumers working on hundreds of chunks concurrently. Whereas the WAL is not expected to grow much beyond 4MiB (that's when the passive autocheckpointing kicks in), we saw WALs of > 850MiB. At that point, reads slow down significantly and that can cause a failure for the system as a whole. To mitigate this, as per the SQLite docs, this PR: - Disables the passive autocheckpointing (leaving it enabled conflicts with manual checkpointing which can then result in a SQLITE_BUSY) - Runs active checkpointing _every second_. It is very fast when there is little-to-no work to do, but under load it is expected that we'll hit the '1000 page mutations' (AKA the 4MiB default WAL size) within a second. - We use the 'RESTART' strategy, together with a `journal_file_limit` that ensures that the WAL will be trimmed down to the max of 4MiB. That means we don't always fully truncate, nor do we keep it at 'whatever the max happened to be'. Doing this checkpointing does mean that every second there is a tiny timeframe in which both write-tasks and also all read-tasks will have to wait. This is unlikely to cause any problems. We now explicitly configure the busy timeout to be 5 seconds, which was the prior implicit default of SQLx/Rusqlite for good measure.

Qqwy · 2026-03-26T10:58:42Z

SHOW @ReinierMaas

Qqwy · 2026-03-26T11:50:33Z

@OpsBotPrime merge and tag

OpsBotPrime · 2026-03-26T11:50:40Z

Rebased as 9b4d5e2, waiting for CI …

Approved-by: Qqwy Priority: Normal Auto-deploy: false

OpsBotPrime · 2026-03-26T11:50:43Z

CI job 🟡 started.

OpsBotPrime · 2026-03-26T12:12:00Z

The build failed ❌.

If this is the result of a flaky test, then tag me again with the retry command. Otherwise, push a new commit and tag me again.

Qqwy · 2026-03-26T12:21:15Z

@OpsBotPrime retry

Approved-by: Qqwy Priority: Normal Auto-deploy: false

OpsBotPrime · 2026-03-26T12:21:25Z

Rebased as 5c25128, waiting for CI …

OpsBotPrime · 2026-03-26T12:21:28Z

CI job 🟡 started.

ReinierMaas

LGTM! Only a single comment thanks for looking into this!

ReinierMaas · 2026-03-26T12:20:05Z

opsqueue/src/db/mod.rs

+    /// We use the 'TRUNCATE' strategy, which will do the most work but will briefly block the writer *and* all readers
+    ///


You use the RESTART strategy.

OpsBotPrime · 2026-03-26T12:27:12Z

@Qqwy I tagged your PR with v46. Please wait for the build of 5c25128 to pass and don't forget to deploy it!

Bump version

1f005b9

Qqwy force-pushed the active-sqlite-wal-checkpointing branch from 7894cc2 to 1f005b9 Compare March 26, 2026 11:47

OpsBotPrime added a commit that referenced this pull request Mar 26, 2026

Merge #85: Make explicit non-passive Sqlite WAL checkpoints

9b4d5e2

Approved-by: Qqwy Priority: Normal Auto-deploy: false

Merge #85: Make explicit non-passive Sqlite WAL checkpoints

5c25128

Approved-by: Qqwy Priority: Normal Auto-deploy: false

ReinierMaas reviewed Mar 26, 2026

View reviewed changes

OpsBotPrime merged commit 5c25128 into master Mar 26, 2026
7 checks passed

Qqwy mentioned this pull request Mar 26, 2026

Less verbose logging for HTTP server ('tower') responses, zero-length submissions and WAL checkpoints #86

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make explicit non-passive Sqlite WAL checkpoints#85

Make explicit non-passive Sqlite WAL checkpoints#85
OpsBotPrime merged 3 commits intomasterfrom
active-sqlite-wal-checkpointing

Qqwy commented Mar 26, 2026

Uh oh!

Qqwy commented Mar 26, 2026

Uh oh!

Qqwy commented Mar 26, 2026

Uh oh!

OpsBotPrime commented Mar 26, 2026

Uh oh!

OpsBotPrime commented Mar 26, 2026

Uh oh!

OpsBotPrime commented Mar 26, 2026

Uh oh!

Qqwy commented Mar 26, 2026

Uh oh!

OpsBotPrime commented Mar 26, 2026

Uh oh!

OpsBotPrime commented Mar 26, 2026

Uh oh!

ReinierMaas left a comment

Uh oh!

ReinierMaas Mar 26, 2026

Uh oh!

OpsBotPrime commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		/// We use the 'TRUNCATE' strategy, which will do the most work but will briefly block the writer and all readers
		///

Conversation

Qqwy commented Mar 26, 2026

Uh oh!

Qqwy commented Mar 26, 2026

Uh oh!

Qqwy commented Mar 26, 2026

Uh oh!

OpsBotPrime commented Mar 26, 2026

Uh oh!

OpsBotPrime commented Mar 26, 2026

Uh oh!

OpsBotPrime commented Mar 26, 2026

Uh oh!

Qqwy commented Mar 26, 2026

Uh oh!

OpsBotPrime commented Mar 26, 2026

Uh oh!

OpsBotPrime commented Mar 26, 2026

Uh oh!

ReinierMaas left a comment

Choose a reason for hiding this comment

Uh oh!

ReinierMaas Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

OpsBotPrime commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants