Skip to content

ovsqlite: Add direct reader mode for nnrpd with WAL #338

Open
kev009 wants to merge 7 commits intoInterNetNews:mainfrom
kev009:ovsqlite-direct-reader
Open

ovsqlite: Add direct reader mode for nnrpd with WAL #338
kev009 wants to merge 7 commits intoInterNetNews:mainfrom
kev009:ovsqlite-direct-reader

Conversation

@kev009
Copy link
Copy Markdown
Contributor

@kev009 kev009 commented May 2, 2026

This is the next step after #336, and brings ovsqlite similar to how bdb worked. The WAL PR without this change is a modest benefit so that PR still makes sense standalone but the commit is included here so this is self contained.

When WAL mode is enabled, nnrpd processes now open the overview database
directly with SQLITE_OPEN_READONLY instead of routing reads through
ovsqlite-server.  This eliminates the IPC round-trip and the
single-threaded server bottleneck for read operations, allowing read
performance to scale with the number of concurrent readers.

Direct reader mode activates automatically when walmode is true in
ovsqlite.conf AND the database is actually in WAL mode (verified via
pragma journal_mode).  Without WAL, behavior is unchanged -- readers
connect to ovsqlite-server as before to avoid EXCLUSIVE lock contention.

Key changes:
- ovsqlite.c: direct_open() opens the database read-only with schema
  version validation, journal mode verification, decompression support,
  and goto-based cleanup.  Read functions (groupstats, getartinfo,
  search) branch on direct_reader flag.  Write operations return errors
  in direct reader mode.  Falls back to server path gracefully.
- ovsqlite-server.c: WAL checkpoint at shutdown uses PASSIVE first
  (non-blocking, safe with readers connected) then TRUNCATE only if no
  readers hold pins.  Prevents server shutdown from hanging.
- ovsqlite.c: Fix pre-existing buffer leak in ovsqlite_close():
  request and response buffers from server_connect() were never freed.
- sql-read.sql: read-only prepared statements with busy_timeout=10000
  and query_only=1.
- New readercachesize config parameter (default 8 MB per reader).

Includes unit tests (WAL gating, write rejection) and integration tests
that start a real ovsqlite-server, write compressed data through it,
kill the server, and verify the direct reader can read it back correctly
including a server restart cycle.

kev009 added 2 commits April 29, 2026 13:50
When WAL mode is enabled, nnrpd processes now open the overview database
directly with SQLITE_OPEN_READONLY instead of routing reads through
ovsqlite-server.  This eliminates the IPC round-trip and the
single-threaded server bottleneck for read operations, allowing read
performance to scale with the number of concurrent readers.

Direct reader mode activates automatically when walmode is true in
ovsqlite.conf AND the database is actually in WAL mode (verified via
pragma journal_mode).  Without WAL, behavior is unchanged -- readers
connect to ovsqlite-server as before to avoid EXCLUSIVE lock contention.

Key changes:
- ovsqlite.c: direct_open() opens the database read-only with schema
  version validation, journal mode verification, decompression support,
  and goto-based cleanup.  Read functions (groupstats, getartinfo,
  search) branch on direct_reader flag.  Write operations return errors
  in direct reader mode.  Falls back to server path gracefully.
- ovsqlite-server.c: WAL checkpoint at shutdown uses PASSIVE first
  (non-blocking, safe with readers connected) then TRUNCATE only if no
  readers hold pins.  Prevents server shutdown from hanging.
- ovsqlite.c: Fix pre-existing buffer leak in ovsqlite_close():
  request and response buffers from server_connect() were never freed.
- sql-read.sql: read-only prepared statements with busy_timeout=10000
  and query_only=1.
- New readercachesize config parameter (default 8 MB per reader).

Includes unit tests (WAL gating, write rejection) and integration tests
that start a real ovsqlite-server, write compressed data through it,
kill the server, and verify the direct reader can read it back correctly
including a server restart cycle.
@kev009 kev009 changed the title Ovsqlite direct reader ovsqlite: Add direct reader mode for nnrpd with WAL May 2, 2026
@blgl
Copy link
Copy Markdown
Contributor

blgl commented May 3, 2026

Potential memory leak: you should call sqlite3_close_v2 when
sqlite3_open_v2 fails. sqlite3_open_v2 creates a connection object
even on failure (except when out of memory) to have somewhere
to store a specific error message.
This doesn't matter in ovsqlite-server since it dies if it can't open
the database, but your new client-side code keeps running.

@blgl
Copy link
Copy Markdown
Contributor

blgl commented May 3, 2026

I see a real risk of checkpoint starvation here. It might be necessary to add server code to checkpoint and truncate when the WAL size reaches some threshold, accepting that this will block readers for a while.

@Julien-Elie
Copy link
Copy Markdown
Contributor

Thanks for your review, Bo; it is greatly appreciated.

Kevin, it is quite a great improvement. Many thanks for the extensive patch, including documentation and a test suite!

I note extra indentation in lines 728-761 of ovsqlite.c starting with start_request(request_get_groupinfo);. Just telling you to be sure a block condition is not missing in the commit. I'll reformat it when committing if that's OK for you. Unless you plan on providing an updated patch taking into account Bo's suggestions?

@Julien-Elie Julien-Elie self-assigned this May 4, 2026
@Julien-Elie Julien-Elie added enhancement New feature or request C: overview Related to overview methods P: medium Medium priority labels May 4, 2026
@Julien-Elie Julien-Elie added this to the 2.7.4 milestone May 4, 2026
@kev009
Copy link
Copy Markdown
Contributor Author

kev009 commented May 5, 2026

Let me work through the reviews, I'll post follow up commits that can be squashed into one when it is ready for merging.

kev009 added 4 commits May 6, 2026 23:04
When WAL mode is enabled with direct readers, the WAL file can grow
without bound if checkpointing never occurs between commits.  Add a
configurable walcheckpointthreshold (default 1000 pages) that triggers
a TRUNCATE checkpoint during the server's idle period.

The checkpoint temporarily lowers the busy_timeout to 10 seconds
(from ~31 years) so the server waits for readers to finish their
current queries but doesn't stall indefinitely.  Readers use
autocommit with sub-millisecond snapshots, so the checkpoint
typically completes in milliseconds.

Also fix notice() messages being lost after daemon() by registering
message_handlers_notice, remove the unused checkpoint_wal prepared
statement from sql-main.sql, and update close_db() comment.
@kev009
Copy link
Copy Markdown
Contributor Author

kev009 commented May 7, 2026

@blgl I think checkpoints are addressed now but this is certainly tricky machinery. My understanding is the readers effectively never block with WAL, and use very short auto transactions, and this followup attempts to add bounded waits to the server and make forward progress.

freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this pull request May 7, 2026
Move pragma journal_mode out of the sql-main.sql preamble and into
open_db() where it is set conditionally: WAL when walmode is true,
PERSIST otherwise.  Previously the preamble unconditionally set
PERSIST mode before open_db() re-set WAL, causing a pointless
WAL->PERSIST->WAL transition on every startup (or silently failing
if direct readers held the WAL open).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C: overview Related to overview methods enhancement New feature or request P: medium Medium priority

Development

Successfully merging this pull request may close these issues.

3 participants