Skip to content

commitlog: Direct I/O#1079

Draft
kim wants to merge 3 commits intomasterfrom
kim/commitlog2/direct-io
Draft

commitlog: Direct I/O#1079
kim wants to merge 3 commits intomasterfrom
kim/commitlog2/direct-io

Conversation

@kim
Copy link
Copy Markdown
Contributor

@kim kim commented Apr 11, 2024

Introduce page-aligned buffered reads + writes. This is required for direct I/O (O_DIRECT or equivalent), which is now the default.

Direct I/O essentially means to bypass the OS's page cache and operate directly on the device. It does not mean that data is automatically more durable once written, fsync is still required. Optionally, the patch allows to enable O_DSYNC, which blocks a write until the equivalent of fdatasync has occurred.

Both options likely have a performance impact, which needs to be evaluated. There also some performance optimizations of the presented implemention possible, e.g. re-using buffer allocations or scatter/gather ("vectored") I/O.

Stacked on top of #985

Expected complexity level and risk

4

Testing

Describe any testing you've done, and any testing you'd like your reviewers to do,
so that you're confident that all the changes work as expected!

  • Run test suite under Linux
  • Check that the code compiles under occult operating systems (macOS / Windows)
  • Check that the code works under occult operating systems.

Comment on lines +28 to +29
/// necessary to preserve `fsync` semantics: a [`PagedWriter`] replaces the OS
/// page cache, where the latter is flushed to the device when `fsync` is called.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Challenge this.

It seemed confusing having to remember to flush first, even though BufWriter behaves like this -- except that a flush there only means to move the buffered data to kernel space.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense for our use case. What exactly is the alternative?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we'd make io::Write::flush only flush full pages (or maybe block-aligned chunks), we could save some I/O -- because we need to rewrite a padded 512 block. Then, we'd remember to call flush_all before sync_all to ensure everything written to the buffered writer is durable.

The downside is that this makes it quite hard to reason about when a Commit (which is another layer of buffering) made it to disk, or else try again with a fresh segment.

Idk, maybe assembling Commit should be the caller's responsiblity. Like, here's a list of transactions which committed around the same time, make this durable according to some parameters (now, later, flush only, flush + fsync).

@kim
Copy link
Copy Markdown
Contributor Author

kim commented Apr 11, 2024

I might try to come up with an abstraction which uses the std BufReader/BufWriter when alignment is not needed, as that could be slightly more efficient.

@@ -0,0 +1,111 @@
mod page;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I shall add some prose here explaining what direct I/O is and what's the deal with the alignment requirements.

writer.append([2; 32]).unwrap();
writer.append([2; 32]).unwrap();
writer.commit().unwrap();
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move those blocks back to the left -- shouldn't depend on drop, actually.

enable_logging();

let mut log = open_log::<[u8; 32]>(ShortMem::new(800));
let mut log = open_log::<[u8; 32]>(ShortMem::new(5120));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk why I changed the parameters, should change back.

Comment on lines +8 to +14
/// A byte buffer of non-zero size and proper alignment.
#[derive(Debug)]
pub struct Aligned {
data: ptr::NonNull<u8>,
layout: alloc::Layout,
size: usize,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this parametric over the layout at runtime, rather than using const generics? Do we ever need to dynamically compute the layout?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadn’t thought of const generics. I need to check for zeroes then, no?

Copy link
Copy Markdown
Contributor

@kazimuth kazimuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good as always. Would honestly be nice as a little crate.

Comment on lines +28 to +29
/// necessary to preserve `fsync` semantics: a [`PagedWriter`] replaces the OS
/// page cache, where the latter is flushed to the device when `fsync` is called.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense for our use case. What exactly is the alternative?

@kim kim force-pushed the kim/commitlog2/direct-io branch from cc7ffdb to 8c2fbdb Compare April 11, 2024 17:37
@kim kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from eda25c7 to b02d291 Compare April 12, 2024 05:57
@kim kim force-pushed the kim/commitlog2/direct-io branch 2 times, most recently from 1438332 to 0083554 Compare April 12, 2024 07:45
@kim kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from b02d291 to c2717ab Compare April 12, 2024 07:48
@kim kim force-pushed the kim/commitlog2/direct-io branch from 0083554 to e63477f Compare April 12, 2024 07:48
@kim kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from c2717ab to 514a2e6 Compare April 12, 2024 08:31
@kim kim force-pushed the kim/commitlog2/direct-io branch from e63477f to 056ffe1 Compare April 12, 2024 08:31
@kim kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from 514a2e6 to a847c35 Compare April 12, 2024 08:43
@kim kim force-pushed the kim/commitlog2/direct-io branch 2 times, most recently from fc7c3ba to 0f3bb19 Compare April 12, 2024 08:47
@kim kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from a847c35 to 30077c8 Compare April 12, 2024 09:17
@kim kim force-pushed the kim/commitlog2/direct-io branch from 0f3bb19 to 10bca2a Compare April 12, 2024 09:17
@kim kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from 30077c8 to 53a48fc Compare April 12, 2024 10:05
@kim kim force-pushed the kim/commitlog2/direct-io branch from 10bca2a to c6b6375 Compare April 12, 2024 10:05
@kim kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from 53a48fc to c82b755 Compare May 23, 2024 07:21
@kim kim force-pushed the kim/commitlog2/direct-io branch from c6b6375 to 3d68c10 Compare May 23, 2024 07:21
@kim kim force-pushed the kim/commitlog-panic-on-fsync-failure branch from c82b755 to ab51e1a Compare May 23, 2024 08:13
@kim kim force-pushed the kim/commitlog2/direct-io branch from 3d68c10 to 0c6bf2a Compare May 23, 2024 08:13
Introduce page-aligned buffered reads + writes. This is required for
direct I/O (O_DIRECT or equivalent), which is now the default.

Direct I/O essentially means to bypass the OS's page cache and operate
directly on the device. It does not mean that data is automatically more
durable once written, `fsync` is still required. Optionally, the patch
allows to enable O_DSYNC, which blocks a write until the equivalent of
`fdatasync` has occurred.

Both options likely have a performance impact, which needs to be
evaluated. There also some performance optimizations of the presented
implemention possible, e.g. re-using buffer allocations or
scatter/gather ("vectored") I/O.
@kim kim changed the base branch from kim/commitlog-panic-on-fsync-failure to master June 6, 2024 06:33
@kim kim force-pushed the kim/commitlog2/direct-io branch from 0c6bf2a to 893f6b8 Compare June 6, 2024 06:33
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 3, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants