make it easy to consume a `DmaStreamReader` without copying #474

HippoBaro · 2021-11-29T00:03:03Z

The get_buffer_aligned API could be better. It relies on user code to be
aware of buffer boundaries locations. This makes user code more
complicated and brittle because the buffer size will generally be a
configurable knob, subject to change.

Fear not, however, for there is a better way!

Instead of failing when reading too much in get_buffer_aligned, read
whatever we can from the current buffer and return the number of
remaining bytes. This allows the following:

If the read fits inside a buffer, the user can consume the result
directly without any copy;
If the read crosses buffer boundaries and the user can consume partial
results, it can do so in a loop without any copies;
If the read crosses buffer boundaries and the user needs a complete
result, get_buffer_aligned may be used trivially in a loop that
concatenates the results in a reusable user-allocated buffer.

This PR also replaces the ad-hoc poll_read implementation for a proxy
call to get_buffer_aligned, resulting in a very nice reduction in complexity.

HippoBaro · 2021-11-29T00:22:17Z

Reviews are welcome but let's not merge this right now: I want to give this a spin on some real workload first :)

HippoBaro · 2021-11-29T04:55:17Z

This code has been deployed for a little while now, and it seems to be working as expected. We have streams that go over every variable-sized item in a file. Their throughput has doubled, primarily because calls to memcpy have almost disappeared (some copying is still necessary when crossing buffers).

All in all pretty happy with the result!

glommer · 2021-11-29T16:26:01Z

glommio/src/io/dma_file_stream.rs

-    /// [`DmaStreamReaderBuilder`]) and act accordingly.
+    /// extra copy.
+    ///
+    /// This function returns a tuple of a [`ReadResult`] and a [`usize`]. The


I am a fan of the approach. Well done.

Not sure I am a fan of the unnamed tuple. Why do you think this is preferable to a specialized struct ?

Right, I agree.

Come to think of it, this API should work like POSIX read, i.e., it should return what was read instead of what is left to read. What is left to read is actually very hard to know, given we could hit EOF at any time.

If I return what was read instead, then I don't need to return a tuple because, by definition, you can already get that with ReadResult::len(). That will simplify things even more. Nice!

The `get_buffer_aligned` API could be better. It relies on user code to be aware of buffer boundaries locations. This makes user code more complicated and brittle because the buffer size will generally be a configurable knob, subject to change. Fear not, however, for there is a better way! Instead of failing when reading too much in `get_buffer_aligned,` read whatever we can from the current buffer. This allows the following: * If the read fits inside a buffer, the user can consume the result directly without any copy; * If the read crosses buffer boundaries and the user can consume partial results, it can do so in a loop without any copies; * If the read crosses buffer boundaries and the user needs a complete result, `get_buffer_aligned` may be used trivially in a loop that concatenates the results in a reusable user-allocated buffer. The following commit will replace the `AsyncRead::poll_read` implementation with a simple loop calling `get_buffer_aligned.`

We need to make sure the stream can't return any bytes beyond the `max_pos` offset. Right now, `poll_read` respects this while `poll_get_buffer_aligned` doesn't. This commit makes sure we are consistent across the two.

Massive simplification; Oh Yeah!!

HippoBaro · 2021-11-29T18:02:29Z

An additional change I am considering making would be to change the name of the two functions poll_get_buffer_aligned and get_buffer_aligned in favor of poll_read_direct and read_direct. This makes sense because they are now direct replacements for the AsyncRead copying equivalents.

We could even create two traits: AsyncReadDirect and AsyncReadDirectExt, which would work much like the ones from futures_io do but would be copy-free. What do you guys think?

glommer · 2021-11-30T00:40:36Z

We can add the traits, but the biggest advantage of having traits in this case is if there are other implementations. So probably best to wait until the need shows up

HippoBaro requested a review from glommer November 29, 2021 00:03

HippoBaro force-pushed the better_stream_reader branch from d53526b to cd12a8c Compare November 29, 2021 00:11

glommer reviewed Nov 29, 2021

View reviewed changes

HippoBaro force-pushed the better_stream_reader branch from cd12a8c to fb0cddd Compare November 29, 2021 17:47

HippoBaro added 3 commits November 29, 2021 18:58

enforce max_pos stream config in poll_get_buffer_aligned

e5c35a0

We need to make sure the stream can't return any bytes beyond the `max_pos` offset. Right now, `poll_read` respects this while `poll_get_buffer_aligned` doesn't. This commit makes sure we are consistent across the two.

use poll_get_buffer_aligned in DmaStreamReader::poll_read

93dfc17

Massive simplification; Oh Yeah!!

HippoBaro force-pushed the better_stream_reader branch from fb0cddd to 93dfc17 Compare November 29, 2021 17:58

HippoBaro requested a review from glommer November 29, 2021 20:56

glommer approved these changes Nov 30, 2021

View reviewed changes

glommer merged commit 0a00239 into DataDog:master Nov 30, 2021

HippoBaro deleted the better_stream_reader branch November 30, 2021 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make it easy to consume a `DmaStreamReader` without copying #474

make it easy to consume a `DmaStreamReader` without copying #474

HippoBaro commented Nov 29, 2021 •

edited

HippoBaro commented Nov 29, 2021

HippoBaro commented Nov 29, 2021

glommer Nov 29, 2021

HippoBaro Nov 29, 2021

HippoBaro Nov 29, 2021

HippoBaro commented Nov 29, 2021

glommer commented Nov 30, 2021

make it easy to consume a DmaStreamReader without copying #474

make it easy to consume a DmaStreamReader without copying #474

Conversation

HippoBaro commented Nov 29, 2021 • edited

HippoBaro commented Nov 29, 2021

HippoBaro commented Nov 29, 2021

glommer Nov 29, 2021

Choose a reason for hiding this comment

HippoBaro Nov 29, 2021

Choose a reason for hiding this comment

HippoBaro Nov 29, 2021

Choose a reason for hiding this comment

HippoBaro commented Nov 29, 2021

glommer commented Nov 30, 2021

make it easy to consume a `DmaStreamReader` without copying #474

make it easy to consume a `DmaStreamReader` without copying #474

HippoBaro commented Nov 29, 2021 •

edited