-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-11076: [Rust][DataFusion] Refactor usage of right indices in ha…
…sh join This applies some refactoring to `build_batch_from_indices` which is supposed to make further changes easier, e.g. solving https://issues.apache.org/jira/browse/ARROW-11030 * This starts handling right (1) batch and left (many) batches differently as for the right batches we can directly use `take` on it. This should be more efficient anyway, and also allows in the future to build the index array directly instead of doing extra copying. * Use `indices.len()` for the capacity parameter, rather than the number of rows at the left. This is of impact at larger sizes (e.g. SF 100), see: #9036 Rather than estimating it based on previous batches, this does it based on the (known) number of resulting rows. * Reuse "computed" right indices across multiple columns. * The refactoring makes it easier to apply changes needed for https://issues.apache.org/jira/browse/ARROW-11030 where we need to remove the n*n work that is done for the build side * The changes don't have a big impact locally on performance on TPC-H with small scale factor, but I believe it should have a similar effect as #9036 on SF=100 by using `indices.len()` rather than the number of rows in the build side. FYI @jorgecarleitao @andygrove Closes #9048 from Dandandan/join_right_refactor Authored-by: Heres, Daniel <danielheres@gmail.com> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
- Loading branch information
1 parent
2440c9e
commit 25b7302
Showing
1 changed file
with
29 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters