Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support empty projection in ParquetRecordBatchReader #1560

Merged
merged 4 commits into from
Apr 15, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Apr 13, 2022

Which issue does this PR close?

Closes #1537

Rationale for this change

See ticket

What changes are included in this PR?

See ticket

Are there any user-facing changes?

No, the only changes are to experimental modules.

@@ -214,12 +205,6 @@ impl ParquetRecordBatchReader {
batch_size: usize,
array_reader: Box<dyn ArrayReader>,
) -> Result<Self> {
// Check that array reader is struct array reader
array_reader
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check does not seem necessary, all it cares about is that it yields StructArray

/// but with row counts that correspond to the amount of data in the file
///
/// This is useful for when projection eliminates all columns within a collection
pub fn make_empty_array_reader(row_count: usize) -> Box<dyn ArrayReader> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this name, but it was the best I could come up with

@github-actions github-actions bot added the parquet Changes to the parquet crate label Apr 13, 2022
}

struct EmptyArrayReader {
data_type: ArrowType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_type seems not necessary to keep here, it is always ArrowType::Struct(vec![]).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed for get_data_type(&self) -> &ArrowType as it needs to return a reference with a lifetime coupled to the object itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, just found it and want to go to comment. you already replied. nvm.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @tustvold

@codecov-commenter
Copy link

Codecov Report

Merging #1560 (1a1dda8) into master (0192872) will decrease coverage by 0.00%.
The diff coverage is 73.80%.

@@            Coverage Diff             @@
##           master    #1560      +/-   ##
==========================================
- Coverage   82.86%   82.85%   -0.01%     
==========================================
  Files         190      191       +1     
  Lines       55057    55085      +28     
==========================================
+ Hits        45622    45642      +20     
- Misses       9435     9443       +8     
Impacted Files Coverage Δ
parquet/src/arrow/async_reader.rs 0.00% <0.00%> (ø)
parquet/src/arrow/array_reader/empty_array.rs 55.55% <55.55%> (ø)
parquet/src/arrow/array_reader.rs 86.98% <100.00%> (-0.10%) ⬇️
parquet/src/arrow/array_reader/builder.rs 68.01% <100.00%> (+0.09%) ⬆️
parquet/src/arrow/arrow_reader.rs 94.96% <100.00%> (+0.48%) ⬆️
arrow/src/array/array_struct.rs 88.44% <0.00%> (-0.80%) ⬇️
arrow/src/array/transform/mod.rs 86.46% <0.00%> (ø)
parquet/src/arrow/schema.rs 85.83% <0.00%> (+0.11%) ⬆️
parquet/src/encodings/encoding.rs 93.56% <0.00%> (+0.18%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0192872...1a1dda8. Read the comment docs.

@alamb alamb merged commit b109201 into apache:master Apr 15, 2022
@alamb alamb changed the title Support empty projection in ParquetRecordBatchReader Support empty projection in ParquetRecordBatchReader Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Empty Column Projection in ParquetRecordBatchReader
5 participants