-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add get_byte_ranges method to AsyncFileReader trait #2115
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2115 +/- ##
==========================================
- Coverage 83.76% 83.74% -0.03%
==========================================
Files 225 225
Lines 59457 59473 +16
==========================================
- Hits 49806 49805 -1
- Misses 9651 9668 +17
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some minor nits
parquet/src/arrow/async_reader.rs
Outdated
.get_byte_ranges(fetch_ranges) | ||
.await? | ||
.into_iter() | ||
.enumerate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.zip(update_chunks.iter_mut())
might be cleaner?
parquet/src/arrow/async_reader.rs
Outdated
let mut fetch_ranges = | ||
Vec::with_capacity(column_chunks.len()); | ||
|
||
let mut update_chunks: Vec<( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut says that it would be cleaner to just iterate through the column_chunks
and use filter_map
to extract the ranges, pass this to AsyncFileReader
. Convert the result to an iterator and then iterate the column_chunks
again, popping the next element from the iterator for each included column.
Not a big deal though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think this was cleaner
Benchmark runs are scheduled for baseline = 3096591 and contender = be0d34d. be0d34d is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2110
Rationale for this change
In certain cases it is better from a performance perspective to fetch data in parallel such as reading from object storage. This adds a hook into the
AsyncFileReader
trait to allow upstream consumers of this API to do that.What changes are included in this PR?
Add
get_byte_ranges(&mut self, ranges: Vec<Range<usize>>)
method toAsyncFileReader
trait with a default implementation that will fallback to callingget_bytes
serially for the provided ranges.Are there any user-facing changes?
No