-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-4266][VL] Support cross join type with Merge join and Hash join #4281
[GLUTEN-4266][VL] Support cross join type with Merge join and Hash join #4281
Conversation
Run Gluten Clickhouse CI |
80d7e25
to
2e42e86
Compare
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, queries like select * from TBL1 cross join TBL2 on TBL1.c1 == TBL2.c1 falls back to Spark.
Thanks for supporting this case. Shall we add a unit test maybe in TestOperator and check the query plan to make sure it won't fallback?
JoinRel.JoinType.UNRECOGNIZED | ||
} | ||
override protected lazy val substraitJoinType: JoinRel.JoinType = | ||
SubstraitUtil.toSubstrait(joinType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We exchange tables sometimes to adjust the build side of suffled hash join for performance optimization. For BHJ and SMJ, we may need to keep the original left and right tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted this change. Instead changed Inner
to InnerLike
which will work for both Inner
& Cross
joins.
2e42e86
to
864bebd
Compare
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Added test as suggested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
What changes were proposed in this pull request?
Added cross join type in hash and merge join:
Cross join type will only get converted to hash or merge join when there are equi join conditions. Cross join with equi join conditions is actually inner join only. Ideally, users should not even write such queries.
Currently, queries like
select * from TBL1 cross join TBL2 on TBL1.c1 == TBL2.c1
falls back to Spark. which can easily be supported by Gluten by converting cross join substrait inner join type.(Fixes: #4266)
How was this patch tested?
Tested using dummy unit tests