New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42813][K8S] Print application info when waitAppCompletion is false #40444
Conversation
cc @slothspot @dongjoon-hyun @yaooqinn, please take a look when you get time, thanks. |
override def watchOrStop(sId: String): Boolean = if (conf.get(WAIT_FOR_APP_COMPLETION)) { | ||
logInfo(s"Waiting for application ${conf.appName} with submission ID $sId to finish...") | ||
override def watchOrStop(sId: String): Boolean = { | ||
logInfo(s"Waiting for application ${conf.appId} with submission ID $sId to finish...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is SPARK-24266 instead of SPARK-35174. This change looks risky to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't get your point, would you please elaborate more about the risk?
IMO, this does not break SPARK-24266.
In the whole codebase except for the testing code, only one place invokes watchOrStop
Lines 183 to 205 in 8860f69
if (conf.get(WAIT_FOR_APP_COMPLETION)) { | |
val sId = Seq(conf.namespace, driverPodName).mkString(":") | |
breakable { | |
while (true) { | |
val podWithName = kubernetesClient | |
.pods() | |
.inNamespace(conf.namespace) | |
.withName(driverPodName) | |
// Reset resource to old before we start the watch, this is important for race conditions | |
watcher.reset() | |
watch = podWithName.watch(watcher) | |
// Send the latest pod state we know to the watcher to make sure we didn't miss anything | |
watcher.eventReceived(Action.MODIFIED, podWithName.get()) | |
// Break the while loop if the pod is completed or we don't want to wait | |
if (watcher.watchOrStop(sId)) { | |
watch.close() | |
break | |
} | |
} | |
} | |
} |
I claim it related to SPARK-35174 because after SPARK-35174, conf.get(WAIT_FOR_APP_COMPLETION)
is always true
, then the change just like
- if (true) {
- logic1
- } else {
- logic2
- }
+ logic1
Please let me know if I misunderstand your comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Now, I understand what you meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think you can add a test case to prevent a future regression like before, @pan3793 ?
@dongjoon-hyun UT is added, please take a look again, thanks. |
Kindly ping @dongjoon-hyun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at this PR once more, but I'm a little negative with one part of the proposal which is to remove appName
because appName
was used since Apache Spark 3.0.0 (SPARK-28947). Could you keep appName
in the existing way and add appId
additionally if you want, @pan3793 ?
@dongjoon-hyun thank you, I updated the log message to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. (Pending CIs).
Thank you, @dongjoon-hyun and @yaooqinn |
What changes were proposed in this pull request?
On K8s cluster mode,
when
spark.kubernetes.submission.waitAppCompletion=false
, print the application information onspark-submit
exit, as it did before SPARK-35174add
appId
in the output messageWhy are the changes needed?
On K8s cluster mode, when
spark.kubernetes.submission.waitAppCompletion=false
,before SPARK-35174, the
spark-submit
will exit quickly w/ the basic application information.After SPARK-35174, those part of code is unreachable, so nothing is output.
This PR also proposes to add
appId
in the output message, to make it consistent w/ the context (if you look at theLoggingPodStatusWatcherImpl
, this is kind of an exception,... application $appId ...
is used in other places), and YARN.spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
Lines 1311 to 1318 in 8860f69
Does this PR introduce any user-facing change?
Yes, changes contain
spark.kubernetes.submission.waitAppCompletion=false
, the user can see the app information whenspark-submit
exit.spark-submit
information contains app id now, which is consistent w/ the context and other resource managers like YARN.How was this patch tested?
Pass CI.