-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-11086: [Rust] Extend take implementation to more index types #9057
Conversation
The full set of Rust CI tests did not run on this PR :( Can you please rebase this PR against apache/master to pick up the changes in #9056 so that they do? I apologize for the inconvenience. |
e644413
to
c0af0d9
Compare
Rebased |
Codecov Report
@@ Coverage Diff @@
## master #9057 +/- ##
=======================================
Coverage 82.61% 82.61%
=======================================
Files 203 204 +1
Lines 50140 50140
=======================================
Hits 41422 41422
Misses 8718 8718
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
The clippy failures in https://github.com/apache/arrow/pull/9057/checks?check_run_id=1630788725 seem unrelated to your change -- let me check that out... |
Ah, I hadn't yet seen #9061 which appears to fix the clippy errors -- thanks @Dandandan ! |
FYI @jorgecarleitao @andygrove this (small) PR is needed to finish #9070 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Clippy missing xD |
@jorgecarleitao restarted the CI 😄 |
## Context The context of this PR is that I want to experiment with a simplified implementation of the hash join in DataFusion which directly can index the build-side array instead of keeping a list of batches. This array could grow beyond 2 ^ 32 billion elements, so would need indexes of type `UInt64` rather than `UInt32`. ## Implementation In the PR I just extend the public `take` to take any `IndexType` which implements `ArrowNumericType` and `ToPrimitive`. I am not sure about the consideration before to restrict `take` to only `UInt32Array`. Closes apache#9057 from Dandandan/take_index Authored-by: Heres, Daniel <danielheres@gmail.com> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Context
The context of this PR is that I want to experiment with a simplified implementation of the hash join in DataFusion which directly can index the build-side array instead of keeping a list of batches. This array could grow beyond 2 ^ 32 billion elements, so would need indexes of type
UInt64
rather thanUInt32
.Implementation
In the PR I just extend the public
take
to take anyIndexType
which implementsArrowNumericType
andToPrimitive
.I am not sure about the consideration before to restrict
take
to onlyUInt32Array
.