Skip to content

Speed up hash join build phase [experiment] #18376

@Dandandan

Description

@Dandandan

Is your feature request related to a problem or challenge?

If the build side of the join is large, a significant bottleneck can be building the hash table.
We can explore some opportunities to improve the performance of building this map.

Describe the solution you'd like

Core Idea

The slowest part of building the hash map is finding and then inserting the items (hash + offset) into the map for each element.

We should be able to test the following:

If this doesn't involve any regressions, there are some other opportunities for further improving the performance and simplify the join algorithm by using the sorted property for improving the "chain" datastructure as well (I'll do some experiments on this later).

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformanceMake DataFusion faster

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions