Document ChunkReader (#4118) #4147

tustvold · 2023-04-27T17:34:56Z

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

tustvold · 2023-04-27T17:35:11Z

parquet/src/file/reader.rs

 pub trait ChunkReader: Length + Send + Sync {
    type T: Read + Send;
    /// Get a serially readable slice of the current reader
-    /// This should fail if the slice exceeds the current bounds


This isn't actually true, the FileSource doesn't do this

viirya · 2023-04-27T21:22:07Z

parquet/src/file/reader.rs

+    /// Systems looking to mask high-IO latency through prefetching, such as encountered with
+    /// object storage, should consider fetching the relevant byte ranges into [`Bytes`]


Meaning that get_read could possibly read more than length bytes?

I more meant handling this outside of get_read, i.e. don't use ChunkReader for these use-cases 😅

Will see if I can't clarify the wording tomorrow

viirya · 2023-04-28T17:30:35Z

parquet/src/file/reader.rs

+    /// Systems looking to mask high-IO latency through prefetching, such as encountered with
+    /// object storage, should consider instead fetching the relevant parts of the file into
+    /// [`Bytes`], and then feeding this into the synchronous APIs, instead of implementing
+    /// [`ChunkReader`] directly. Arrow users can make use of the [async_reader] which


Hmm, let me try to clarify this. So this basically means that systems that want to have prefetching, should not rely on this "characteristic" of ChunkReader to implement prefetching but instead implement their synchronous APIs?

Pretty much, if you want prefetching you can either use async_reader which does it for you, or implement something yourself 😄

One could ask the reasonable question, why does ChunkReader exist then, to which the answer is because I've not removed it yet #1163 😆

Perhaps we should just remove the length parameter? What do you think?

Yea, the length parameter is a bit confusing. Except for a hint, looks like it doesn't do too much here.

tustvold · 2023-04-28T20:48:19Z

Closing in favour of #4156

Document ChunkReader (apache#4118)

2d260a1

tustvold commented Apr 27, 2023

View reviewed changes

github-actions bot added the parquet Changes to the parquet crate label Apr 27, 2023

viirya reviewed Apr 27, 2023

View reviewed changes

Review feedback

c218ff1

viirya reviewed Apr 28, 2023

View reviewed changes

tustvold mentioned this pull request Apr 28, 2023

Cleanup ChunkReader (#4118) #4156

Merged

tustvold marked this pull request as draft April 28, 2023 19:24

tustvold closed this Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document ChunkReader (#4118) #4147

Document ChunkReader (#4118) #4147

tustvold commented Apr 27, 2023

tustvold Apr 27, 2023

viirya Apr 27, 2023

tustvold Apr 27, 2023 •

edited

Loading

tustvold Apr 28, 2023

viirya Apr 28, 2023

tustvold Apr 28, 2023 •

edited

Loading

tustvold Apr 28, 2023

viirya Apr 28, 2023

tustvold commented Apr 28, 2023

		/// Systems looking to mask high-IO latency through prefetching, such as encountered with
		/// object storage, should consider fetching the relevant byte ranges into [`Bytes`]

Document ChunkReader (#4118) #4147

Document ChunkReader (#4118) #4147

Conversation

tustvold commented Apr 27, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

tustvold Apr 27, 2023

Choose a reason for hiding this comment

viirya Apr 27, 2023

Choose a reason for hiding this comment

tustvold Apr 27, 2023 • edited Loading

Choose a reason for hiding this comment

tustvold Apr 28, 2023

Choose a reason for hiding this comment

viirya Apr 28, 2023

Choose a reason for hiding this comment

tustvold Apr 28, 2023 • edited Loading

Choose a reason for hiding this comment

tustvold Apr 28, 2023

Choose a reason for hiding this comment

viirya Apr 28, 2023

Choose a reason for hiding this comment

tustvold commented Apr 28, 2023

tustvold Apr 27, 2023 •

edited

Loading

tustvold Apr 28, 2023 •

edited

Loading