-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document ChunkReader (#4118) #4147
Conversation
pub trait ChunkReader: Length + Send + Sync { | ||
type T: Read + Send; | ||
/// Get a serially readable slice of the current reader | ||
/// This should fail if the slice exceeds the current bounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't actually true, the FileSource doesn't do this
parquet/src/file/reader.rs
Outdated
/// Systems looking to mask high-IO latency through prefetching, such as encountered with | ||
/// object storage, should consider fetching the relevant byte ranges into [`Bytes`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning that get_read
could possibly read more than length
bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I more meant handling this outside of get_read, i.e. don't use ChunkReader for these use-cases 😅
Will see if I can't clarify the wording tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, PTAL
/// Systems looking to mask high-IO latency through prefetching, such as encountered with | ||
/// object storage, should consider instead fetching the relevant parts of the file into | ||
/// [`Bytes`], and then feeding this into the synchronous APIs, instead of implementing | ||
/// [`ChunkReader`] directly. Arrow users can make use of the [async_reader] which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, let me try to clarify this. So this basically means that systems that want to have prefetching, should not rely on this "characteristic" of ChunkReader
to implement prefetching but instead implement their synchronous APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much, if you want prefetching you can either use async_reader which does it for you, or implement something yourself 😄
One could ask the reasonable question, why does ChunkReader exist then, to which the answer is because I've not removed it yet #1163 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should just remove the length parameter? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the length
parameter is a bit confusing. Except for a hint, looks like it doesn't do too much here.
Closing in favour of #4156 |
Which issue does this PR close?
Closes #4118
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?