Skip to content

feat(parquet): support object versions in ParquetObjectReader#9753

Open
ClSlaid wants to merge 1 commit intoapache:mainfrom
ClSlaid:issue-8568-parquet-object-reader-version
Open

feat(parquet): support object versions in ParquetObjectReader#9753
ClSlaid wants to merge 1 commit intoapache:mainfrom
ClSlaid:issue-8568-parquet-object-reader-version

Conversation

@ClSlaid
Copy link
Copy Markdown
Contributor

@ClSlaid ClSlaid commented Apr 17, 2026

Summary

  • add ParquetObjectReader::with_version and use ObjectStore::get_opts for suffix, single-range, and multi-range reads when a version is specified
  • ensure parquet metadata and data are read from the requested object revision instead of implicitly reading the latest object state
  • add a self-contained regression test covering the versioned reader path with a temporary local object store

Testing

Code: add with_version and use ObjectStore::get_opts for suffix, single-range, and multi-range reads when a version is set.

Test: add a self-contained regression test using a temporary local object store.

Fix: ensure parquet metadata and data are read from the requested object revision instead of the latest object state.
@github-actions github-actions Bot added the parquet Changes to the parquet crate label Apr 17, 2026
@ClSlaid
Copy link
Copy Markdown
Contributor Author

ClSlaid commented Apr 17, 2026

/cc @yeya24 PTAL.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @ClSlaid

Comment on lines +207 to +212
self.spawn(|store, path| {
async move {
let resp = store.get_opts(path, options).await?;
Ok::<_, ParquetError>(resp.bytes().await?)
}
.boxed()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the async closure here? Can we simplify this to something like this

                store.get_opts(path, options).await.boxed()

impl AsyncFileReader for ParquetObjectReader {
fn get_bytes(&mut self, range: Range<u64>) -> BoxFuture<'_, Result<Bytes>> {
self.spawn(|store, path| store.get_range(path, range).boxed())
if self.version.is_some() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am missing something but the version is not passed to the options... So how does it make it into the request?

I also think it would be easier to read this code if you use the same async closure and just changed the ptions

let mut options = self.get_opts(Some(GetRange::from(range)));
if let Some(version) = self.version.as_ref() {
  options = options.woth_version(version);
}
self.spawn(|store, path|  store.get_opts(path, options).boxed())

}

#[tokio::test]
async fn test_simple_with_version() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do these very the version is passed through? Do they fail if you revert the code changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Parquet] Support version in ParquetObjectReader

2 participants