-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE #33429
Conversation
cc @cloud-fan and @maryannxue can you take a look please? |
cc @ulysses-you too FYI |
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (from my side). I agree with the reason of renaming mentioned by @HyukjinKwon .
let me rebase. seems like it couldn't detect my GitHub actions job. |
Kubernetes integration test starting |
Test build #141300 has finished for PR 33429 at commit
|
retest this please |
Kubernetes integration test status success |
@@ -139,21 +139,21 @@ class AdaptiveQueryExecSuite | |||
} | |||
} | |||
|
|||
private def checkNumLocalShuffleReaders( | |||
plan: SparkPlan, numShufflesWithoutLocalReader: Int = 0): Unit = { | |||
private def checkNumLocalShuffleReads( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"num readers" seems more natural?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or "num shuffles with local read", but it's longer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine. The number of local shuffle reads that refers the number of (local) AQEShuffleReadExec
, the number of AQE read plans.
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
BTW, looks like GA changed something .. all tests fail apparently because of OOM .. |
Test build #141297 has finished for PR 33429 at commit
|
Kubernetes integration test status success |
Test build #141304 has finished for PR 33429 at commit
|
(rebased to resolve conflicts) |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141335 has finished for PR 33429 at commit
|
I'll merge in few days if there are no more comments. |
I am fine with "Rename OptimizeLocalShuffleReader rule to OptimizeShuffleWithLocalRead" But can still use "reader" for the physical node name, i.e., |
hmm I feel like that's inconsistent with the current SparkPlan naming rules. For example, there's no |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141555 has finished for PR 33429 at commit
|
Will merge tomorrow if there's no more comment @maryannxue |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #141636 has finished for PR 33429 at commit
|
thanks, merging to master/3.2! (otherwise backporting AQE changes will be very hard) |
…eReader in AQE ### What changes were proposed in this pull request? This PR proposes to rename: - Rename `*Reader`/`*reader` to `*Read`/`*read` for rules and execution plan (user-facing doc/config name remain untouched) - `*ShuffleReaderExec` ->`*ShuffleReadExec` - `isLocalReader` -> `isLocalRead` - ... - Rename `CustomShuffle*` prefix to `AQEShuffle*` - Rename `OptimizeLocalShuffleReader` rule to `OptimizeShuffleWithLocalRead` ### Why are the changes needed? There are multiple problems in the current naming: - `CustomShuffle*` -> `AQEShuffle*` it sounds like it is a pluggable API. However, this is actually only used by AQE. - `OptimizeLocalShuffleReader` -> `OptimizeShuffleWithLocalRead` it is the name of a rule but it can be misread as a reader, which is counterintuative - `*ReaderExec` -> `*ReadExec` Reader execution reads a bit odd. It should better be read execution (like `ScanExec`, `ProjectExec` and `FilterExec`). I can't find the reason to name it with something that performs an action. See also the generated plans: Before: ``` ... * HashAggregate (12) +- CustomShuffleReader (11) +- ShuffleQueryStage (10) +- Exchange (9) ... ``` After: ``` ... * HashAggregate (12) +- AQEShuffleRead (11) +- ShuffleQueryStage (10) +- Exchange (9) .. ``` ### Does this PR introduce _any_ user-facing change? No, internal refactoring. ### How was this patch tested? Existing unittests should cover the changes. Closes #33429 from HyukjinKwon/SPARK-36217. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6e3d404) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Thanks Wenchen! |
What changes were proposed in this pull request?
This PR proposes to rename:
*Reader
/*reader
to*Read
/*read
for rules and execution plan (user-facing doc/config name remain untouched)*ShuffleReaderExec
->*ShuffleReadExec
isLocalReader
->isLocalRead
CustomShuffle*
prefix toAQEShuffle*
OptimizeLocalShuffleReader
rule toOptimizeShuffleWithLocalRead
Why are the changes needed?
There are multiple problems in the current naming:
CustomShuffle*
->AQEShuffle*
it sounds like it is a pluggable API. However, this is actually only used by AQE.
OptimizeLocalShuffleReader
->OptimizeShuffleWithLocalRead
it is the name of a rule but it can be misread as a reader, which is counterintuative
*ReaderExec
->*ReadExec
Reader execution reads a bit odd. It should better be read execution (like
ScanExec
,ProjectExec
andFilterExec
). I can't find the reason to name it with something that performs an action. See also the generated plans:Before:
After:
Does this PR introduce any user-facing change?
No, internal refactoring.
How was this patch tested?
Existing unittests should cover the changes.