Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor read_many to take a stream of iovecs as input #416

Merged
merged 6 commits into from Sep 27, 2021

Conversation

HippoBaro
Copy link
Member

Instead of taking an iterator of (offsets, positions), make it a stream.

The reason behind doing this is that using an iterator forces us to
schedule all the IO synchronously at call-time (or keep them in memory
which isn't great either). While this is fine when dealing with a small
number of requests (and -marginally- more efficient) it becomes
problematic as the number of IO requests increases.

Specifically, doing this may starve other tasks performing IO in the same
task queue since the reactor submits IO requests to the device in FIFO
order. Therefore, if two concurrent tasks call read_many they will
race and all the IO submitted by the first task will be processed before
a single IO from the second reaches the IO ring. This is obviously wrong
as we want some amount of fairness even within a common task queue.

Another use case is when performing parallel calls to read_many with
the intention of zipping the resulting streams. For instance, if some
data is spread in many files we may want to fetch data in parallel and
combine the result asynchronously.

The read_many stream now eagerly consumes the input stream until
it yields or until we reach 128 in-flight sources. This allows a maximum
of 128 sources to run concurrently. After that, the stream will wait until
the first source is fulfilled.

128 is a constant for now and was chosen to match the depth of our
io_uring submission queues. Although this value is a constant right now,
a further improvement would be to make it dynamic to maximize
throughput and lower IO latency on the fly.

Baby steps!

Copy link
Member

@duarten duarten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice, some small comments.

glommio/src/io/immutable_file.rs Show resolved Hide resolved
glommio/src/io/immutable_file.rs Show resolved Hide resolved
glommio/src/io/bulk_io.rs Outdated Show resolved Hide resolved
glommio/src/io/bulk_io.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@glommer glommer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me.

Once you and Duarte are on the same page for the comments we can merge

@HippoBaro HippoBaro force-pushed the fair_many_read branch 2 times, most recently from e4798ec to e01758e Compare September 27, 2021 17:19
The code performs less work and doesn't impose the iovec to be `Copy`.
It also removes tour dependency on the `itertool` crate and that's very
nice.
Instead of taking an iterator of (offsets, positions), make it a stream.

The reason behind doing that is that using an iterator forces us to
schedule all the IO synchronously at call-time (or keep them in memory
which isn't great either). While this is fine when dealing with a small
number of requests (and -marginally- more efficient) it becomes
problematic as the number of IO requests increases.

Specifically doing this may starve other tasks performing IO in the same
task queue since the reactor submits IO requests to the devices in FIFO
order. Therefore, if two concurrent tasks call `read_many` they will
race and all the IO submitted by the first task will be processed before
a single IO from the second reaches the IO ring. This is obviously wrong
as we want some amount of fairness even within a common task queue.

Another use case is when performing parallel calls to `read_many` with
the invention of zipping the resulting streams. For instance, if some
data is spread in many files we may want to fetch data in parallel and
combine the result asynchronously.

One caveat of the current state of the code is that the stream will
create only one `Source` at a time. Therefore, this refactoring
temporarily removes all IO concurrency within a `read_many` invocation.
The next commits will reintroduce concurrency.
The `read_many` stream now eagerly consumes the input stream until
said stream yields or until we reach 128 in-flight sources. This allows
a maximum of 128 sources to run concurrently. After that, the stream
will wait until the first source is fulfilled.

128 is a constant for now and was chosen to match the depth of our
io_uring submission queues. Although this value is a constant right now,
a further improvement would be to make it dynamic to maximize
throughput and lower IO latencies on the fly.

Baby step though, one thing at a time.
It is not okay to delay IO requests ever so if the IO request stream
yield Poll::Pending then try to flush the merger so that we avoid
starving the IO reactor.
Don't acquire a strong reference to the reactor each time we want to do
IO, do it once and keep the reference around until we are done with it.
@HippoBaro HippoBaro merged commit bac65bb into DataDog:master Sep 27, 2021
@HippoBaro HippoBaro deleted the fair_many_read branch September 27, 2021 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants