-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Merge hash table implementations and remove leftover utilities #7366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge hash table implementations and remove leftover utilities #7366
Conversation
I think if this is the last usage, we should remove it from Cargo.toml as well |
|
I wonder whether you're seeing performance improvements in symmetric / streaming hash join with this PR? |
|
I created a small benchmark for streaming using tpch data. First query is SELECT
o_orderkey
FROM
orders,
lineitem
WHERE
o_orderdate = l_shipdate
AND l_orderkey >= o_orderkey - 10
AND l_orderkey < o_orderkey + 10
AND l_returnflag = 'R'and the second one is SELECT
o_orderkey
FROM
orders,
lineitem
WHERE
o_orderstatus = l_linestatus
AND l_orderkey >= o_orderkey - 10
AND l_orderkey < o_orderkey + 10
AND l_returnflag = 'R'
LIMIT 10000;The second query involves key pairs with low cardinality. While |
Co-authored-by: Daniël Heres <danielheres@gmail.com>
ozankabak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to go from my perspective, what do you think @Dandandan?
|
I will go ahead and merge this PR after CI passes. We will file a follow-on PR in case any other suggestions come in post merge |
|
Thanks @metesynnada @ozankabak 🙏 |
Looks great, thank you |
Which issue does this PR close?
Continue on #6679.
Rationale for this change
The current implementation of the
JoinHashMapandSymmetricJoinHashMaptypes could benefit from being more generic and flexible. Specifically, the ability to support different types of list data structures for chaining, as well as handling resizing in a more idiomatic and efficient manner, would be advantageous. This PR introduces theJoinHashMapTypetrait and implements it for bothJoinHashMapandPruningJoinHashMap, which allows for more code reuse and a clearer separation of concerns.In this PR, Several unused hash join utilities are removed. Also, we can introduce a vectorized implementation of
SymmetricHashJointhat includes hash collision checks.What changes are included in this PR?
JoinHashMapTypetrait with methods for handling the mutable map and mutable list, as well as a methodas_any_mutfor dynamic downcasting.JoinHashMapTypetrait for bothJoinHashMapandPruningJoinHashMap.update_hashfunction to use theJoinHashMapTypetrait and only resize the list in the case ofPruningJoinHashMap.build_equal_condition_join_indicesfunction to use the JoinHashMapType trait and introduced an offset parameter.I have removed
smallveccompletely from the code, but I am unsure whether or not to remove it from Cargo.toml.Are these changes tested?
Yes, the changes are covered by the existing tests. No new tests were required as the new implementation preserves the existing functionality. All tests passed successfully after the changes were applied.
Are there any user-facing changes?
No, the changes made in this PR are internal and do not affect the public API or the functionality of the crate.
cc @Dandandan