Optimizations for hash join operator

> Thanks! Besides looking at optimizing the join order during planning time or dynamic (I think there are a couple of issues covering that), we can look at what makes the operator slow in more challenging scenario's.
> 
> Some optimizations for the current operator come to mind that might improve the current hash join operator in certain scenario's, while keeping the same algorithm:
> 
> * Reuse the allocation of `Vec` indices between calls. This probably helps when the amount of matching indices is low (compared to the batch size).
> * (Related): Keep building matching indices until `limit` rows have been reached and use `interleave` to collect the batches. That probably makes the operator more cache efficient as accessing the map / chain is done at the same time, before producing output batches from the input data. This also helps with avoiding the overhead of a later `CoalesceBatches`, which might help as well.
> * Instead of building indices for the right side, we can build a boolean mask / filter to mark match / no match. This reduces memory usage (somewhat) plus a boolean filter is much faster for low selectivity (i.e. most of the right side matches). We then should use the coalesce kernel to produce the right side arrays.
> 
> I opened https://github.com/apache/datafusion/issues/18939 for exploring to use a different algorithm (radix hash joins), which additionally should improve the performance of our join operators by making the algorithm more cache efficient. 

 _Originally posted by @Dandandan in [#17494](https://github.com/apache/datafusion/issues/17494#issuecomment-3581011421)_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimizations for hash join operator #18942

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimizations for hash join operator #18942

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions