Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-4266][VL] Support cross join type with Merge join and Hash join #4281

Merged
merged 2 commits into from
Jan 9, 2024

Conversation

Surbhi-Vijay
Copy link
Contributor

What changes were proposed in this pull request?

Added cross join type in hash and merge join:

Cross join type will only get converted to hash or merge join when there are equi join conditions. Cross join with equi join conditions is actually inner join only. Ideally, users should not even write such queries.

Currently, queries like select * from TBL1 cross join TBL2 on TBL1.c1 == TBL2.c1 falls back to Spark. which can easily be supported by Gluten by converting cross join substrait inner join type.

(Fixes: #4266)

How was this patch tested?

Tested using dummy unit tests

Copy link

github-actions bot commented Jan 4, 2024

#4266

Copy link

github-actions bot commented Jan 4, 2024

Run Gluten Clickhouse CI

@Surbhi-Vijay Surbhi-Vijay force-pushed the SupportCrossJoinWithHashMergeJoin branch from 80d7e25 to 2e42e86 Compare January 4, 2024 13:05
Copy link

github-actions bot commented Jan 4, 2024

Run Gluten Clickhouse CI

Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, queries like select * from TBL1 cross join TBL2 on TBL1.c1 == TBL2.c1 falls back to Spark.

Thanks for supporting this case. Shall we add a unit test maybe in TestOperator and check the query plan to make sure it won't fallback?

JoinRel.JoinType.UNRECOGNIZED
}
override protected lazy val substraitJoinType: JoinRel.JoinType =
SubstraitUtil.toSubstrait(joinType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We exchange tables sometimes to adjust the build side of suffled hash join for performance optimization. For BHJ and SMJ, we may need to keep the original left and right tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this change. Instead changed Inner to InnerLike which will work for both Inner & Cross joins.

@Surbhi-Vijay Surbhi-Vijay force-pushed the SupportCrossJoinWithHashMergeJoin branch from 2e42e86 to 864bebd Compare January 8, 2024 12:59
Copy link

github-actions bot commented Jan 8, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Jan 8, 2024

Run Gluten Clickhouse CI

@Surbhi-Vijay
Copy link
Contributor Author

Currently, queries like select * from TBL1 cross join TBL2 on TBL1.c1 == TBL2.c1 falls back to Spark.

Thanks for supporting this case. Shall we add a unit test maybe in TestOperator and check the query plan to make sure it won't fallback?

Added test as suggested

Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@rui-mo rui-mo merged commit b08063f into apache:main Jan 9, 2024
19 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4281_time.csv log/native_master_01_08_2024_b81ecdef1_time.csv difference percentage
q1 33.11 33.20 0.088 100.26%
q2 25.97 25.74 -0.233 99.10%
q3 37.82 38.04 0.221 100.59%
q4 39.27 39.01 -0.256 99.35%
q5 73.20 71.87 -1.325 98.19%
q6 6.99 8.18 1.190 117.04%
q7 88.56 85.51 -3.052 96.55%
q8 85.35 85.98 0.630 100.74%
q9 123.19 122.63 -0.561 99.54%
q10 44.41 43.71 -0.697 98.43%
q11 19.46 20.14 0.679 103.49%
q12 29.29 27.54 -1.758 94.00%
q13 47.93 46.35 -1.576 96.71%
q14 15.00 18.04 3.036 120.24%
q15 28.91 28.07 -0.845 97.08%
q16 15.42 15.33 -0.087 99.44%
q17 158.28 157.40 -0.875 99.45%
q18 196.67 196.16 -0.507 99.74%
q19 16.49 16.55 0.060 100.36%
q20 30.62 29.10 -1.519 95.04%
q21 226.15 226.46 0.314 100.14%
q22 14.09 14.01 -0.077 99.45%
total 1356.16 1349.01 -7.150 99.47%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Support cross join type with Merge join and Hash join
3 participants