Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make it easy to consume a DmaStreamReader without copying #474

Merged
merged 3 commits into from Nov 30, 2021

Conversation

HippoBaro
Copy link
Member

@HippoBaro HippoBaro commented Nov 29, 2021

The get_buffer_aligned API could be better. It relies on user code to be
aware of buffer boundaries locations. This makes user code more
complicated and brittle because the buffer size will generally be a
configurable knob, subject to change.

Fear not, however, for there is a better way!

Instead of failing when reading too much in get_buffer_aligned, read
whatever we can from the current buffer and return the number of
remaining bytes. This allows the following:

  • If the read fits inside a buffer, the user can consume the result
    directly without any copy;
  • If the read crosses buffer boundaries and the user can consume partial
    results, it can do so in a loop without any copies;
  • If the read crosses buffer boundaries and the user needs a complete
    result, get_buffer_aligned may be used trivially in a loop that
    concatenates the results in a reusable user-allocated buffer.

This PR also replaces the ad-hoc poll_read implementation for a proxy
call to get_buffer_aligned, resulting in a very nice reduction in complexity.

@HippoBaro
Copy link
Member Author

Reviews are welcome but let's not merge this right now: I want to give this a spin on some real workload first :)

@HippoBaro
Copy link
Member Author

This code has been deployed for a little while now, and it seems to be working as expected. We have streams that go over every variable-sized item in a file. Their throughput has doubled, primarily because calls to memcpy have almost disappeared (some copying is still necessary when crossing buffers).

All in all pretty happy with the result!

/// [`DmaStreamReaderBuilder`]) and act accordingly.
/// extra copy.
///
/// This function returns a tuple of a [`ReadResult`] and a [`usize`]. The
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a fan of the approach. Well done.

Not sure I am a fan of the unnamed tuple. Why do you think this is preferable to a specialized struct ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I agree.

Come to think of it, this API should work like POSIX read, i.e., it should return what was read instead of what is left to read. What is left to read is actually very hard to know, given we could hit EOF at any time.

If I return what was read instead, then I don't need to return a tuple because, by definition, you can already get that with ReadResult::len(). That will simplify things even more. Nice!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

The `get_buffer_aligned` API could be better. It relies on user code to
be aware of buffer boundaries locations. This makes user code more
complicated and brittle because the buffer size will generally be a
configurable knob, subject to change.

Fear not, however, for there is a better way!

Instead of failing when reading too much in `get_buffer_aligned,` read
whatever we can from the current buffer. This allows the following:
* If the read fits inside a buffer, the user can consume the result
  directly without any copy;
* If the read crosses buffer boundaries and the user can consume partial
  results, it can do so in a loop without any copies;
* If the read crosses buffer boundaries and the user needs a complete
  result, `get_buffer_aligned` may be used trivially in a loop that
  concatenates the results in a reusable user-allocated buffer.

The following commit will replace the `AsyncRead::poll_read`
implementation with a simple loop calling `get_buffer_aligned.`
We need to make sure the stream can't return any bytes beyond the
`max_pos` offset. Right now, `poll_read` respects this while
`poll_get_buffer_aligned` doesn't. This commit makes sure we are
consistent across the two.
@HippoBaro
Copy link
Member Author

An additional change I am considering making would be to change the name of the two functions poll_get_buffer_aligned and get_buffer_aligned in favor of poll_read_direct and read_direct. This makes sense because they are now direct replacements for the AsyncRead copying equivalents.

We could even create two traits: AsyncReadDirect and AsyncReadDirectExt, which would work much like the ones from futures_io do but would be copy-free. What do you guys think?

@glommer
Copy link
Collaborator

glommer commented Nov 30, 2021

We can add the traits, but the biggest advantage of having traits in this case is if there are other implementations. So probably best to wait until the need shows up

@glommer glommer merged commit 0a00239 into DataDog:master Nov 30, 2021
@HippoBaro HippoBaro deleted the better_stream_reader branch November 30, 2021 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants