[SPARK-41159][SQL] Optimize like any and like all expressions#38672
Closed
wankunde wants to merge 7 commits intoapache:masterfrom
Closed
[SPARK-41159][SQL] Optimize like any and like all expressions#38672wankunde wants to merge 7 commits intoapache:masterfrom
wankunde wants to merge 7 commits intoapache:masterfrom
Conversation
339217c to
351a584
Compare
|
Can one of the admins verify this patch? |
Contributor
Author
|
Hi, @beliefer @cloud-fan @wangyum Could you help to review this PR? Thanks |
beliefer
reviewed
Dec 7, 2022
Contributor
There was a problem hiding this comment.
LikeSimplification have the similar optimization. Why need this class ?
beliefer
reviewed
Dec 7, 2022
Contributor
There was a problem hiding this comment.
The benchmark cannot prove the performance improvement. Could you test with or without MatchMultiHelper ?
Contributor
Author
There was a problem hiding this comment.
Before this PR:
[info] Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
[info] Multi like query: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Query with multi like 1393 1469 119 0.0 1392586.7 1.0X
[info] Query with LikeAny simplification 1244 1309 97 0.0 1244382.5 1.1X
[info] Query without LikeAny simplification 400 407 8 0.0 399924.3 3.5X
[info] Multi like query: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Query with multi like 1476 1576 149 0.0 1475710.1 1.0X
[info] Query with LikeAny simplification 1387 1429 37 0.0 1386669.1 1.1X
[info] Query without LikeAny simplification 430 470 35 0.0 430435.8 3.4X
After this PR:
[info] Multi like query: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Query with multi like 1441 1516 78 0.0 1441335.8 1.0X
[info] Query with LikeAny simplification 1401 1431 44 0.0 1400743.9 1.0X
[info] Query without LikeAny simplification 357 369 10 0.0 357419.8 4.0X
[info] Multi like query: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Query with multi like 1524 1628 117 0.0 1524119.6 1.0X
[info] Query with LikeAny simplification 1405 1418 18 0.0 1405258.7 1.1X
[info] Query without LikeAny simplification 362 372 12 0.0 361654.4 4.2X
Contributor
Author
|
After |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Optimize like any and like all expressions with startWith, endWith, contains, equalTo methods.
Why are the changes needed?
Now like any and like all expressions will be very slow whether enable or disable LikeSimplification rule.
Refer to
org.apache.spark.sql.execution.benchmark.LikeAnyBenchmarkDoes this PR introduce any user-facing change?
No
How was this patch tested?
Exists UT