Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid allocating vector of indices in lexicographical_partition_ranges #998

Merged
merged 3 commits into from
Dec 15, 2021

Conversation

jhorstmann
Copy link
Contributor

Which issue does this PR close?

Closes #997.

Draft for now since I still have to update the code comments and maybe add some more tests.

Are there any user-facing changes?

I haven't run any benchmarks yet but this should improve performance of window functions with high-cardinality partition-by in datafusion.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 3, 2021
@codecov-commenter
Copy link

codecov-commenter commented Dec 3, 2021

Codecov Report

Merging #998 (caa05c7) into master (e9be49d) will increase coverage by 0.00%.
The diff coverage is 97.36%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master     #998    +/-   ##
========================================
  Coverage   82.31%   82.32%            
========================================
  Files         168      168            
  Lines       48763    49060   +297     
========================================
+ Hits        40139    40388   +249     
- Misses       8624     8672    +48     
Impacted Files Coverage Δ
arrow/src/compute/kernels/partition.rs 97.45% <97.36%> (-0.21%) ⬇️
arrow/src/array/array_string.rs 97.08% <0.00%> (-0.83%) ⬇️
parquet/src/record/reader.rs 89.83% <0.00%> (-0.63%) ⬇️
parquet/src/schema/printer.rs 72.47% <0.00%> (-0.55%) ⬇️
arrow/src/datatypes/datatype.rs 65.95% <0.00%> (-0.43%) ⬇️
arrow/src/datatypes/field.rs 53.37% <0.00%> (-0.31%) ⬇️
arrow/src/datatypes/schema.rs 66.66% <0.00%> (-0.27%) ⬇️
parquet/src/arrow/arrow_array_reader.rs 77.87% <0.00%> (-0.15%) ⬇️
arrow/src/compute/kernels/take.rs 95.21% <0.00%> (-0.01%) ⬇️
arrow/src/buffer/immutable.rs 97.84% <0.00%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9be49d...caa05c7. Read the comment docs.

@jhorstmann jhorstmann marked this pull request as ready for review December 7, 2021 16:01
@jhorstmann
Copy link
Contributor Author

@jimexist I'd like to get your feedback on this change, especially whether the comment in LexicographicalPartitionIterator::next sounds correct.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this code carefully and structurally it looks equivalent to me. However, I agree it would be great to get @jimexist 's take on it as well

@@ -77,24 +74,51 @@ impl<'a> LexicographicalPartitionIterator<'a> {
/// see <https://en.wikipedia.org/wiki/Exponential_search>
#[inline]
fn exponential_search(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please document what start and end and bound are? (specifically how the relate to each other -- I think as written it seems like bound is some starting index and the search starts at start+bound and stops at end indexes?

@alamb
Copy link
Contributor

alamb commented Dec 14, 2021

I'll plan to merge this PR in later this week (arrow 6.5 would get created at the end of next week, so I think we have plenty of time to land this one). Perhaps @jimexist will have a chance to review by then

@alamb alamb merged commit fc343e7 into apache:master Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lexicographical_partition_ranges does not need to materialize indices
3 participants