Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Arrow take kernel within ListArrayReader #1490

Merged
merged 2 commits into from
Mar 29, 2022
Merged

Conversation

viirya
Copy link
Member

@viirya viirya commented Mar 27, 2022

Which issue does this PR close?

Closes #1482.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the parquet Changes to the parquet crate label Mar 27, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #1490 (b13bfa5) into master (6bf3b3a) will increase coverage by 0.03%.
The diff coverage is 85.71%.

@@            Coverage Diff             @@
##           master    #1490      +/-   ##
==========================================
+ Coverage   82.72%   82.76%   +0.03%     
==========================================
  Files         188      188              
  Lines       54286    54241      -45     
==========================================
- Hits        44908    44891      -17     
+ Misses       9378     9350      -28     
Impacted Files Coverage Δ
parquet/src/arrow/array_reader.rs 86.79% <85.71%> (+2.43%) ⬆️
arrow/src/compute/kernels/take.rs 95.41% <0.00%> (+0.13%) ⬆️
parquet_derive/src/parquet_field.rs 65.98% <0.00%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6bf3b3a...b13bfa5. Read the comment docs.

@viirya viirya changed the title Remove remove_indices Use Arrow take kernel Within ListArrayReader Mar 27, 2022
@viirya viirya changed the title Use Arrow take kernel Within ListArrayReader Use Arrow take kernel within ListArrayReader Mar 27, 2022
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best kind of PR 🎉

Left some minor nits

let batch_values = match non_null_list_indices.len() {
l if l == def_levels.len() => next_batch_array.clone(),
_ => {
let indices = UInt32Array::from(non_null_list_indices);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could remove the collect and then use from_iter_values, aside from being faster, this would also avoid unnecessary work in the no-nulls case.

None
} else {
Some(index as u32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be more concisely written as (*def > self.list_empty_def_level).then(|| index as u32)

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better now 😄

@viirya
Copy link
Member Author

viirya commented Mar 29, 2022

Thank you @tustvold 😄

@alamb alamb merged commit 8feff35 into apache:master Mar 29, 2022
@alamb
Copy link
Contributor

alamb commented Mar 29, 2022

Thank you @viirya and @tustvold

@viirya
Copy link
Member Author

viirya commented Mar 29, 2022

Thank you @alamb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cleanup: Use Arrow take kernel Within parquet ListArrayReader
4 participants