Conversation
| /// necessary to preserve `fsync` semantics: a [`PagedWriter`] replaces the OS | ||
| /// page cache, where the latter is flushed to the device when `fsync` is called. |
There was a problem hiding this comment.
Challenge this.
It seemed confusing having to remember to flush first, even though BufWriter behaves like this -- except that a flush there only means to move the buffered data to kernel space.
There was a problem hiding this comment.
I think this makes sense for our use case. What exactly is the alternative?
There was a problem hiding this comment.
If we'd make io::Write::flush only flush full pages (or maybe block-aligned chunks), we could save some I/O -- because we need to rewrite a padded 512 block. Then, we'd remember to call flush_all before sync_all to ensure everything written to the buffered writer is durable.
The downside is that this makes it quite hard to reason about when a Commit (which is another layer of buffering) made it to disk, or else try again with a fresh segment.
Idk, maybe assembling Commit should be the caller's responsiblity. Like, here's a list of transactions which committed around the same time, make this durable according to some parameters (now, later, flush only, flush + fsync).
|
I might try to come up with an abstraction which uses the std |
| @@ -0,0 +1,111 @@ | |||
| mod page; | |||
There was a problem hiding this comment.
Yes, I shall add some prose here explaining what direct I/O is and what's the deal with the alignment requirements.
crates/commitlog/src/segment.rs
Outdated
| writer.append([2; 32]).unwrap(); | ||
| writer.append([2; 32]).unwrap(); | ||
| writer.commit().unwrap(); | ||
| { |
There was a problem hiding this comment.
Should move those blocks back to the left -- shouldn't depend on drop, actually.
| enable_logging(); | ||
|
|
||
| let mut log = open_log::<[u8; 32]>(ShortMem::new(800)); | ||
| let mut log = open_log::<[u8; 32]>(ShortMem::new(5120)); |
There was a problem hiding this comment.
Idk why I changed the parameters, should change back.
| /// A byte buffer of non-zero size and proper alignment. | ||
| #[derive(Debug)] | ||
| pub struct Aligned { | ||
| data: ptr::NonNull<u8>, | ||
| layout: alloc::Layout, | ||
| size: usize, | ||
| } |
There was a problem hiding this comment.
Why is this parametric over the layout at runtime, rather than using const generics? Do we ever need to dynamically compute the layout?
There was a problem hiding this comment.
Hadn’t thought of const generics. I need to check for zeroes then, no?
kazimuth
left a comment
There was a problem hiding this comment.
This looks good as always. Would honestly be nice as a little crate.
| /// necessary to preserve `fsync` semantics: a [`PagedWriter`] replaces the OS | ||
| /// page cache, where the latter is flushed to the device when `fsync` is called. |
There was a problem hiding this comment.
I think this makes sense for our use case. What exactly is the alternative?
cc7ffdb to
8c2fbdb
Compare
eda25c7 to
b02d291
Compare
1438332 to
0083554
Compare
b02d291 to
c2717ab
Compare
0083554 to
e63477f
Compare
c2717ab to
514a2e6
Compare
e63477f to
056ffe1
Compare
514a2e6 to
a847c35
Compare
fc7c3ba to
0f3bb19
Compare
a847c35 to
30077c8
Compare
0f3bb19 to
10bca2a
Compare
30077c8 to
53a48fc
Compare
10bca2a to
c6b6375
Compare
53a48fc to
c82b755
Compare
c6b6375 to
3d68c10
Compare
c82b755 to
ab51e1a
Compare
3d68c10 to
0c6bf2a
Compare
Introduce page-aligned buffered reads + writes. This is required for
direct I/O (O_DIRECT or equivalent), which is now the default.
Direct I/O essentially means to bypass the OS's page cache and operate
directly on the device. It does not mean that data is automatically more
durable once written, `fsync` is still required. Optionally, the patch
allows to enable O_DSYNC, which blocks a write until the equivalent of
`fdatasync` has occurred.
Both options likely have a performance impact, which needs to be
evaluated. There also some performance optimizations of the presented
implemention possible, e.g. re-using buffer allocations or
scatter/gather ("vectored") I/O.
0c6bf2a to
893f6b8
Compare
Introduce page-aligned buffered reads + writes. This is required for direct I/O (O_DIRECT or equivalent), which is now the default.
Direct I/O essentially means to bypass the OS's page cache and operate directly on the device. It does not mean that data is automatically more durable once written,
fsyncis still required. Optionally, the patch allows to enable O_DSYNC, which blocks a write until the equivalent offdatasynchas occurred.Both options likely have a performance impact, which needs to be evaluated. There also some performance optimizations of the presented implemention possible, e.g. re-using buffer allocations or scatter/gather ("vectored") I/O.
Stacked on top of #985
Expected complexity level and risk
4
Testing
Describe any testing you've done, and any testing you'd like your reviewers to do,
so that you're confident that all the changes work as expected!