Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use InMemoryColumnChunkReader (~20% faster) #1956

Merged
merged 1 commit into from
Jun 29, 2022

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Part of #1163

Rationale for this change

#1810 added the InMemoryColumnChunkReader, but I stupidly neglected to actually hook it up 🤦

What changes are included in this PR?

Alters ParquetRecordBatchStream to use the InMemoryColumnChunkReader to avoid an unnecessary copy. For large column chunks, this represents a significant saving.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 28, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #1956 (9edb297) into master (464e8d1) will increase coverage by 0.00%.
The diff coverage is 0.00%.

❗ Current head 9edb297 differs from pull request most recent head 7e308a4. Consider uploading reports for the commit 7e308a4 to get more accurate results

@@           Coverage Diff           @@
##           master    #1956   +/-   ##
=======================================
  Coverage   83.47%   83.47%           
=======================================
  Files         221      221           
  Lines       57049    57046    -3     
=======================================
  Hits        47621    47621           
+ Misses       9428     9425    -3     
Impacted Files Coverage Δ
parquet/src/arrow/async_reader.rs 0.00% <0.00%> (ø)
arrow/src/datatypes/datatype.rs 65.42% <0.00%> (-0.38%) ⬇️
parquet_derive/src/parquet_field.rs 65.98% <0.00%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 464e8d1...7e308a4. Read the comment docs.

@tustvold tustvold merged commit 314d00f into apache:master Jun 29, 2022
@Ted-Jiang
Copy link
Member

Ted-Jiang commented Jun 30, 2022

@tustvold does this means we not support SerializedPageReader any more?
Or choose by user pick it up?

@tustvold
Copy link
Contributor Author

SerializedPageReader is still used by the sync APIs, i.e. SerializedFileReader, etc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants