Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose column index and offset index in ColumnChunkMetadata #1318

Merged
merged 9 commits into from
Feb 16, 2022

Conversation

shanisolomon
Copy link
Contributor

Which issue does this PR close?

Closes #1317.

Exposing the column index and offset index offsets and lengths so parquet engines could optimize their reads.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Feb 16, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #1318 (a31927e) into master (747e72a) will decrease coverage by 0.03%.
The diff coverage is 50.00%.

❗ Current head a31927e differs from pull request most recent head e309b01. Consider uploading reports for the commit e309b01 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1318      +/-   ##
==========================================
- Coverage   83.01%   82.98%   -0.04%     
==========================================
  Files         180      180              
  Lines       52810    52866      +56     
==========================================
+ Hits        43840    43870      +30     
- Misses       8970     8996      +26     
Impacted Files Coverage Δ
parquet/src/schema/printer.rs 68.57% <0.00%> (-2.98%) ⬇️
parquet/src/file/metadata.rs 88.53% <65.71%> (-2.87%) ⬇️
parquet/src/file/serialized_reader.rs 94.55% <100.00%> (+0.07%) ⬆️
arrow/src/array/transform/mod.rs 84.52% <0.00%> (ø)
parquet_derive/src/parquet_field.rs 66.21% <0.00%> (+0.22%) ⬆️
arrow/src/datatypes/datatype.rs 66.80% <0.00%> (+0.39%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 747e72a...e309b01. Read the comment docs.

@shanisolomon
Copy link
Contributor Author

Will sync with https://github.com/apache/arrow-rs/pull/1320 and update the test after its merge.

@alamb
Copy link
Contributor

alamb commented Feb 16, 2022

#1320 has been merged FYI

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thanks @shanisolomon

cc @sunchao (mostly for visibility that the Rust implementation is beginning down the path of using more advanced parquet metadata features)

@shanisolomon
Copy link
Contributor Author

Thanks, @alamb! Could you help with merging please?

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb @shanisolomon ! yes it certainly will be VERY useful if parquet-rs has these advanced predicate pushdown features!

@sunchao sunchao merged commit 827cc3e into apache:master Feb 16, 2022
@sunchao
Copy link
Member

sunchao commented Feb 16, 2022

Just merged to master!

@alamb alamb changed the title Expose column index and offset index Expose column index and offset index in ColumnChunkMetadata Feb 16, 2022
@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose column and offset index metadata offset
4 participants