Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11086: [Rust] Extend take implementation to more index types #9057

Closed
wants to merge 2 commits into from

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Dec 31, 2020

Context

The context of this PR is that I want to experiment with a simplified implementation of the hash join in DataFusion which directly can index the build-side array instead of keeping a list of batches. This array could grow beyond 2 ^ 32 billion elements, so would need indexes of type UInt64 rather than UInt32.

Implementation

In the PR I just extend the public take to take any IndexType which implements ArrowNumericType and ToPrimitive.
I am not sure about the consideration before to restrict take to only UInt32Array.

@github-actions
Copy link

@alamb
Copy link
Contributor

alamb commented Dec 31, 2020

The full set of Rust CI tests did not run on this PR :(

Can you please rebase this PR against apache/master to pick up the changes in #9056 so that they do?

I apologize for the inconvenience.

@Dandandan
Copy link
Contributor Author

Rebased

@codecov-io
Copy link

codecov-io commented Dec 31, 2020

Codecov Report

Merging #9057 (571bd65) into master (4b7cdcb) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #9057   +/-   ##
=======================================
  Coverage   82.61%   82.61%           
=======================================
  Files         203      204    +1     
  Lines       50140    50140           
=======================================
  Hits        41422    41422           
  Misses       8718     8718           
Impacted Files Coverage Δ
rust/arrow/src/compute/kernels/take.rs 95.21% <100.00%> (ø)
rust/datafusion/src/physical_plan/hash_join.rs 89.53% <100.00%> (ø)
rust/arrow/src/csv/writer.rs 78.82% <0.00%> (-0.56%) ⬇️
rust/arrow/src/util/serialization.rs 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b7cdcb...571bd65. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@alamb
Copy link
Contributor

alamb commented Jan 1, 2021

The clippy failures in https://github.com/apache/arrow/pull/9057/checks?check_run_id=1630788725 seem unrelated to your change -- let me check that out...

@alamb
Copy link
Contributor

alamb commented Jan 1, 2021

Ah, I hadn't yet seen #9061 which appears to fix the clippy errors -- thanks @Dandandan !

@Dandandan
Copy link
Contributor Author

FYI @jorgecarleitao @andygrove this (small) PR is needed to finish #9070

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jorgecarleitao
Copy link
Member

Clippy missing xD

@Dandandan
Copy link
Contributor Author

@jorgecarleitao restarted the CI 😄

GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
## Context
The context of this PR is that I want to experiment with a simplified implementation of the hash join in DataFusion which directly can index the build-side array instead of keeping a list of batches. This array could grow beyond 2 ^ 32 billion elements, so would need indexes of type `UInt64` rather than `UInt32`.

## Implementation

In the PR I just extend the public `take` to take any `IndexType` which implements `ArrowNumericType` and `ToPrimitive`.
I am not sure about the consideration before to restrict `take` to only `UInt32Array`.

Closes apache#9057 from Dandandan/take_index

Authored-by: Heres, Daniel <danielheres@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants