New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Rust] [DataFusion] HashJoinExec slow with many batches #26949
Comments
Andy Grove / @andygrove: |
Andy Grove / @andygrove: |
Jorge Leitão / @jorgecarleitao: I agree that having an average of 20 rows per batch may be breaking some of the assumption we make about the performance of arrays. |
Andy Grove / @andygrove: It looks like we create a buffer and for each input array, we extend this buffer. Each time we extend it, the buffer is larger so the cost of extending it again gets higher each time? Is there a way we can compute upfront how much to extend it by and do one extend operation? |
Daniël Heres / @Dandandan: I think part of a further speed up could be moving the building of the left / build-side Vec<&ArrayData> arrays so that it is only created once instead of for each right batch in build_batch_from_indices. Currently when making the batch size smaller, the build-side Vec is built more times, but also contains more (smaller) batches itself, which could explain (part of the) big / exponential slowdown on smaller batches. |
Daniël Heres / @Dandandan: |
Daniël Heres / @Dandandan: |
Jorge Leitão / @jorgecarleitao:
|
Andy Grove / @andygrove: |
Daniël Heres / @Dandandan:
|
Andy Grove / @andygrove: The work is in https://github.com/andygrove/arrow/tree/coalesce-left |
Daniël Heres / @Dandandan: |
Daniël Heres / @Dandandan: Thoughts:
|
Jorge Leitão / @jorgecarleitao: I originally implemented the |
Daniël Heres / @Dandandan: |
Andy Grove / @andygrove: |
Daniël Heres / @Dandandan: If the limit is 2 ^ 64, maybe at this length at this time we could say at this point that a higher limit is not realistic yet for an in memory hash join? I think technically you could make it a Vec and group/append indices by the first index to be still using the "direct index" approach as mentioned instead and still utilizing take, etc. But if technically not possible, is it necessary? |
Daniël Heres / @Dandandan: |
Andrew Lamb / @alamb: |
Andy Grove / @andygrove: I am specifically thinking about the 32-bit variants, which will be |
Performance of joins slows down dramatically with smaller batches.
The issue is related to slow performance of MutableDataArray::new() when passed a high number of batches. This happens when passing in all of the batches from the build side of the join and this happens once per build-side join key for each probe-side batch.
It seems to get exponentially slower as the number of arrays increases even though the number of rows is the same.
I modified hash_join.rs to have this debug code:
Batch size 131072:
Batch size 16384:
Batch size 4096:
Reporter: Andy Grove / @andygrove
Assignee: Daniël Heres / @Dandandan
PRs and other links:
Note: This issue was originally created as ARROW-11030. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: