Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42813][K8S] Print application info when waitAppCompletion is false #40444

Closed
wants to merge 5 commits into from

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Mar 15, 2023

What changes were proposed in this pull request?

On K8s cluster mode,

  1. when spark.kubernetes.submission.waitAppCompletion=false, print the application information on spark-submit exit, as it did before SPARK-35174

  2. add appId in the output message

Why are the changes needed?

On K8s cluster mode, when spark.kubernetes.submission.waitAppCompletion=false,
before SPARK-35174, the spark-submit will exit quickly w/ the basic application information.

logInfo(s"Deployed Spark application ${conf.appName} with submission ID $sId into Kubernetes")

After SPARK-35174, those part of code is unreachable, so nothing is output.

This PR also proposes to add appId in the output message, to make it consistent w/ the context (if you look at the LoggingPodStatusWatcherImpl, this is kind of an exception, ... application $appId ... is used in other places), and YARN.

if (!launcherBackend.isConnected() && fireAndForget) {
val report = getApplicationReport
val state = report.getYarnApplicationState
logInfo(s"Application report for $appId (state: $state)")
logInfo(formatReportDetails(report, getDriverLogsLink(report)))
if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) {
throw new SparkException(s"Application $appId finished with status: $state")
}

Does this PR introduce any user-facing change?

Yes, changes contain

  1. when spark.kubernetes.submission.waitAppCompletion=false, the user can see the app information when spark-submit exit.
  2. the end of spark-submit information contains app id now, which is consistent w/ the context and other resource managers like YARN.

How was this patch tested?

Pass CI.

@pan3793
Copy link
Member Author

pan3793 commented Mar 15, 2023

cc @slothspot @dongjoon-hyun @yaooqinn, please take a look when you get time, thanks.

override def watchOrStop(sId: String): Boolean = if (conf.get(WAIT_FOR_APP_COMPLETION)) {
logInfo(s"Waiting for application ${conf.appName} with submission ID $sId to finish...")
override def watchOrStop(sId: String): Boolean = {
logInfo(s"Waiting for application ${conf.appId} with submission ID $sId to finish...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is SPARK-24266 instead of SPARK-35174. This change looks risky to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't get your point, would you please elaborate more about the risk?

IMO, this does not break SPARK-24266.

In the whole codebase except for the testing code, only one place invokes watchOrStop

if (conf.get(WAIT_FOR_APP_COMPLETION)) {
val sId = Seq(conf.namespace, driverPodName).mkString(":")
breakable {
while (true) {
val podWithName = kubernetesClient
.pods()
.inNamespace(conf.namespace)
.withName(driverPodName)
// Reset resource to old before we start the watch, this is important for race conditions
watcher.reset()
watch = podWithName.watch(watcher)
// Send the latest pod state we know to the watcher to make sure we didn't miss anything
watcher.eventReceived(Action.MODIFIED, podWithName.get())
// Break the while loop if the pod is completed or we don't want to wait
if (watcher.watchOrStop(sId)) {
watch.close()
break
}
}
}
}

I claim it related to SPARK-35174 because after SPARK-35174, conf.get(WAIT_FOR_APP_COMPLETION) is always true, then the change just like

- if (true) {
-   logic1
- } else {
-   logic2
- }
+ logic1

Please let me know if I misunderstand your comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Now, I understand what you meant.

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense to me

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think you can add a test case to prevent a future regression like before, @pan3793 ?

@pan3793
Copy link
Member Author

pan3793 commented Mar 17, 2023

@dongjoon-hyun UT is added, please take a look again, thanks.

@pan3793
Copy link
Member Author

pan3793 commented Mar 21, 2023

Kindly ping @dongjoon-hyun

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at this PR once more, but I'm a little negative with one part of the proposal which is to remove appName because appName was used since Apache Spark 3.0.0 (SPARK-28947). Could you keep appName in the existing way and add appId additionally if you want, @pan3793 ?

@pan3793
Copy link
Member Author

pan3793 commented Mar 21, 2023

@dongjoon-hyun thank you, I updated the log message to

"Application $appName with application ID $appId and submission ID $sId finished"

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. (Pending CIs).

@dongjoon-hyun
Copy link
Member

Merged to master branch for Apache Spark 3.5.0.
Thank you, @pan3793 and @yaooqinn .

@pan3793
Copy link
Member Author

pan3793 commented Mar 21, 2023

Thank you, @dongjoon-hyun and @yaooqinn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants