Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39617][CORE] Driver cores mult be a positive number fix #37016

Conversation

alexandrutofan
Copy link

What changes were proposed in this pull request?

Hello, according to this problem raised also by me, and highlighted by two other persons on Stack Overflow (this and this) and thanks to this feature we are currently unable to run Spark 3.2.x with Mesos in Cluster Mode.

From what I saw, it happends because of the MesosClusterDispatcher and its components which treat the driverCores variable as a Double and in the end it's passed to SparkSubmit as a double and this makes the check by SparkSubmitArguments to fail.

I'm proposing this quick fix because I think it's a quick & safe way to go with.

Why are the changes needed?

To be able to run Spark 3.2.x in Cluster Mode with Mesos

Does this PR introduce any user-facing change?

No

How was this patch tested?

It's a very small change, basically the Cast from String to Int is safe now.

Thank you!

@@ -253,7 +253,7 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
&& Try(JavaUtils.byteStringAsBytes(executorMemory)).getOrElse(-1L) <= 0) {
error("Executor memory must be a positive number")
}
if (driverCores != null && Try(driverCores.toInt).getOrElse(-1) <= 0) {
if (driverCores != null && Try(driverCores.toDouble.toInt).getOrElse(-1) <= 0) {
error("Driver cores must be a positive number")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should be improved properly by pattern matching on the Try and printing the actual Failure.
Not just for driverCores but for all input arguments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to do that in general, but IMHO, in this case, the message is very easy to understand, and also I'd rather keep it consistent with the other checks..

@srowen
Copy link
Member

srowen commented Jun 28, 2022

Please file a JIRA and update the PR per https://spark.apache.org/contributing.html
Can we not just fix the Mesos component itself? this is hacky but not terrible.

Mesos support is probably going away in Spark 4, note.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@VeridionRO
Copy link

@srowen the issue is created and posted in the initial post: https://issues.apache.org/jira/browse/SPARK-39617

@srowen
Copy link
Member

srowen commented Jun 29, 2022

This needs to be connected by putting [SPARK-39617] in the title along with component -- see the contributing guide

@srowen
Copy link
Member

srowen commented Jun 29, 2022

Also, you opened this vs 3.2, but needs to be vs master.

@alexandrutofan alexandrutofan changed the base branch from branch-3.2 to master June 29, 2022 14:14
@alexandrutofan alexandrutofan changed the base branch from master to branch-3.2 June 29, 2022 14:15
@alexandrutofan alexandrutofan changed the base branch from branch-3.2 to master June 29, 2022 14:22
@alexandrutofan alexandrutofan force-pushed the bugfix/driverCores-string-to-int branch from 2a42f65 to 0ebc806 Compare June 29, 2022 14:25
@alexandrutofan alexandrutofan changed the title Driver cores mult be a positive number fix [SPARK-39617][CORE] Driver cores mult be a positive number fix Jun 29, 2022
@LuciferYang
Copy link
Contributor

@alexandrutofan can u re-trigger GitHub Actions, a UT failed

@LuciferYang
Copy link
Contributor

It is possible to add a new UT to mesos module?

@alexandrutofan
Copy link
Author

@LuciferYang I had a look on the Mesos side but unfortunately I don't have the time to do that

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Oct 16, 2022
@github-actions github-actions bot closed this Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants