commitlog: Direct I/O by kim · Pull Request #1079 · clockworklabs/SpacetimeDB

kim · 2024-04-11T15:53:30Z

Introduce page-aligned buffered reads + writes. This is required for direct I/O (O_DIRECT or equivalent), which is now the default.

Direct I/O essentially means to bypass the OS's page cache and operate directly on the device. It does not mean that data is automatically more durable once written, fsync is still required. Optionally, the patch allows to enable O_DSYNC, which blocks a write until the equivalent of fdatasync has occurred.

Both options likely have a performance impact, which needs to be evaluated. There also some performance optimizations of the presented implemention possible, e.g. re-using buffer allocations or scatter/gather ("vectored") I/O.

Stacked on top of #985

Expected complexity level and risk

4

Testing

Describe any testing you've done, and any testing you'd like your reviewers to do,
so that you're confident that all the changes work as expected!

Run test suite under Linux
Check that the code compiles under occult operating systems (macOS / Windows)
Check that the code works under occult operating systems.

kim · 2024-04-11T15:56:39Z

crates/commitlog/src/dio/writer.rs

+/// necessary to preserve `fsync` semantics: a [`PagedWriter`] replaces the OS
+/// page cache, where the latter is flushed to the device when `fsync` is called.


Challenge this.

It seemed confusing having to remember to flush first, even though BufWriter behaves like this -- except that a flush there only means to move the buffered data to kernel space.

I think this makes sense for our use case. What exactly is the alternative?

If we'd make io::Write::flush only flush full pages (or maybe block-aligned chunks), we could save some I/O -- because we need to rewrite a padded 512 block. Then, we'd remember to call flush_all before sync_all to ensure everything written to the buffered writer is durable.

The downside is that this makes it quite hard to reason about when a Commit (which is another layer of buffering) made it to disk, or else try again with a fresh segment.

Idk, maybe assembling Commit should be the caller's responsiblity. Like, here's a list of transactions which committed around the same time, make this durable according to some parameters (now, later, flush only, flush + fsync).

kim · 2024-04-11T15:58:40Z

I might try to come up with an abstraction which uses the std BufReader/BufWriter when alignment is not needed, as that could be slightly more efficient.

kim · 2024-04-11T16:05:50Z

crates/commitlog/src/dio/mod.rs

@@ -0,0 +1,111 @@
+mod page;


Yes, I shall add some prose here explaining what direct I/O is and what's the deal with the alignment requirements.

kim · 2024-04-11T16:12:51Z

crates/commitlog/src/segment.rs

-        writer.append([2; 32]).unwrap();
-        writer.append([2; 32]).unwrap();
-        writer.commit().unwrap();
+        {


Should move those blocks back to the left -- shouldn't depend on drop, actually.

kim · 2024-04-11T16:13:55Z

crates/commitlog/src/tests/partial.rs

    enable_logging();

-    let mut log = open_log::<[u8; 32]>(ShortMem::new(800));
+    let mut log = open_log::<[u8; 32]>(ShortMem::new(5120));


Idk why I changed the parameters, should change back.

gefjon · 2024-04-11T16:18:28Z

crates/commitlog/src/buf.rs

+/// A byte buffer of non-zero size and proper alignment.
+#[derive(Debug)]
+pub struct Aligned {
+    data: ptr::NonNull<u8>,
+    layout: alloc::Layout,
+    size: usize,
+}


Why is this parametric over the layout at runtime, rather than using const generics? Do we ever need to dynamically compute the layout?

Hadn’t thought of const generics. I need to check for zeroes then, no?

crates/commitlog/src/buf.rs

kazimuth

This looks good as always. Would honestly be nice as a little crate.

kazimuth · 2024-04-11T17:12:57Z

crates/commitlog/src/dio/writer.rs

+/// necessary to preserve `fsync` semantics: a [`PagedWriter`] replaces the OS
+/// page cache, where the latter is flushed to the device when `fsync` is called.


I think this makes sense for our use case. What exactly is the alternative?

Introduce page-aligned buffered reads + writes. This is required for direct I/O (O_DIRECT or equivalent), which is now the default. Direct I/O essentially means to bypass the OS's page cache and operate directly on the device. It does not mean that data is automatically more durable once written, `fsync` is still required. Optionally, the patch allows to enable O_DSYNC, which blocks a write until the equivalent of `fdatasync` has occurred. Both options likely have a performance impact, which needs to be evaluated. There also some performance optimizations of the presented implemention possible, e.g. re-using buffer allocations or scatter/gather ("vectored") I/O.

CLAassistant · 2025-05-03T18:56:07Z

All committers have signed the CLA.

kim commented Apr 11, 2024

View reviewed changes

gefjon reviewed Apr 11, 2024

View reviewed changes

kazimuth reviewed Apr 11, 2024

View reviewed changes

kim force-pushed the kim/commitlog2/direct-io branch from cc7ffdb to 8c2fbdb Compare April 11, 2024 17:37

kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from eda25c7 to b02d291 Compare April 12, 2024 05:57

kim force-pushed the kim/commitlog2/direct-io branch 2 times, most recently from 1438332 to 0083554 Compare April 12, 2024 07:45

kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from b02d291 to c2717ab Compare April 12, 2024 07:48

kim force-pushed the kim/commitlog2/direct-io branch from 0083554 to e63477f Compare April 12, 2024 07:48

kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from c2717ab to 514a2e6 Compare April 12, 2024 08:31

kim force-pushed the kim/commitlog2/direct-io branch from e63477f to 056ffe1 Compare April 12, 2024 08:31

kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from 514a2e6 to a847c35 Compare April 12, 2024 08:43

kim force-pushed the kim/commitlog2/direct-io branch 2 times, most recently from fc7c3ba to 0f3bb19 Compare April 12, 2024 08:47

kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from a847c35 to 30077c8 Compare April 12, 2024 09:17

kim force-pushed the kim/commitlog2/direct-io branch from 0f3bb19 to 10bca2a Compare April 12, 2024 09:17

kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from 30077c8 to 53a48fc Compare April 12, 2024 10:05

kim force-pushed the kim/commitlog2/direct-io branch from 10bca2a to c6b6375 Compare April 12, 2024 10:05

kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from 53a48fc to c82b755 Compare May 23, 2024 07:21

kim force-pushed the kim/commitlog2/direct-io branch from c6b6375 to 3d68c10 Compare May 23, 2024 07:21

kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from c82b755 to ab51e1a Compare May 23, 2024 08:13

kim force-pushed the kim/commitlog2/direct-io branch from 3d68c10 to 0c6bf2a Compare May 23, 2024 08:13

cloutiertyler added release-0.10 release-1.0 and removed release-0.10 labels Jun 3, 2024

kim changed the base branch from kim/commitlog-panic-on-fsync-failure to master June 6, 2024 06:33

kim force-pushed the kim/commitlog2/direct-io branch from 0c6bf2a to 893f6b8 Compare June 6, 2024 06:33

kim added 2 commits June 7, 2024 13:33

Add benchmarks (write path)

8c34942

Remove O_DSYNC from benchmarks -- it's just too slow to even consider.

62a40db

cloutiertyler removed the release-1.0 label Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

commitlog: Direct I/O#1079

commitlog: Direct I/O#1079
kim wants to merge 3 commits intomasterfrom
kim/commitlog2/direct-io

kim commented Apr 11, 2024

Uh oh!

kim Apr 11, 2024

Uh oh!

kazimuth Apr 11, 2024

Uh oh!

kim Apr 11, 2024

Uh oh!

kim commented Apr 11, 2024

Uh oh!

kim Apr 11, 2024

Uh oh!

kim Apr 11, 2024

Uh oh!

kim Apr 11, 2024

Uh oh!

gefjon Apr 11, 2024

Uh oh!

kim Apr 11, 2024

Uh oh!

Uh oh!

kazimuth left a comment

Uh oh!

kazimuth Apr 11, 2024

Uh oh!

CLAassistant commented May 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		/// necessary to preserve `fsync` semantics: a [`PagedWriter`] replaces the OS
		/// page cache, where the latter is flushed to the device when `fsync` is called.

Conversation

kim commented Apr 11, 2024

Expected complexity level and risk

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kim commented Apr 11, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kazimuth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLAassistant commented May 3, 2025 •

edited

Loading