[WIP][SPARK-32899][CORE] Support submit application with user-defined cluster manager #29770

ConeyLiu · 2020-09-16T07:49:32Z

What changes were proposed in this pull request?

Add the support to submit applications with user-defined cluster manager.

Why are the changes needed?

We have supported users to define the customed cluster manager with ExternalClusterManager trait. However, we can not submit the application with SparkSubmit. And also we can set the user-defined master with pyspark. The reason is that we check the master whether is the natively support one in SparkSubmit. However, the customed cluster manager is checked in SparkContext. This patch fixes the problem.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New UT.

AmplabJenkins · 2020-09-16T07:55:58Z

Can one of the admins verify this patch?

KevinSmile · 2020-09-16T10:15:51Z

nit:
Maybe we have some doc things to do, e.g. add user-defined option here in in SparkSubmitArguments:

spark/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

Lines 490 to 492 in 5cccd6b

    
                   |Options: 
        
                   |  --master MASTER_URL         spark://host:port, mesos://host:port, yarn, 
        
                   |                              k8s://https://host:port, or local (Default: local[*]).

KevinSmile · 2020-09-16T11:30:02Z

Also, I can see that SparkSubmit.scala is full of if...else... based on:

spark/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Lines 290 to 301 in 2e3aa2f

    
           (args.deployMode, deployMode) match { 
        
             case (null, CLIENT) => args.deployMode = "client" 
        
             case (null, CLUSTER) => args.deployMode = "cluster" 
        
             case _ => 
        
           } 
        
           val isYarnCluster = clusterManager == YARN && deployMode == CLUSTER 
        
           val isMesosCluster = clusterManager == MESOS && deployMode == CLUSTER 
        
           val isStandAloneCluster = clusterManager == STANDALONE && deployMode == CLUSTER 
        
           val isKubernetesCluster = clusterManager == KUBERNETES && deployMode == CLUSTER 
        
           val isKubernetesClient = clusterManager == KUBERNETES && deployMode == CLIENT 
        
           val isKubernetesClusterModeDriver = isKubernetesClient && 
        
             sparkConf.getBoolean("spark.kubernetes.submitInDriver", false)

So can we get the 4-tuple return value of prepareSubmitEnvironment correctly if a new user-defined option is added?

spark/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Lines 220 to 228 in 2e3aa2f

    
           private[deploy] def prepareSubmitEnvironment( 
        
               args: SparkSubmitArguments, 
        
               conf: Option[HadoopConfiguration] = None) 
        
               : (Seq[String], Seq[String], SparkConf, String) = { 
        
             // Return values 
        
             val childArgs = new ArrayBuffer[String]() 
        
             val childClasspath = new ArrayBuffer[String]() 
        
             val sparkConf = args.toSparkConf() 
        
             var childMainClass = ""

e.g. the childArgs is added differently in different modes:

spark/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Lines 716 to 734 in 2e3aa2f

    
           if (isYarnCluster) { 
        
             childMainClass = YARN_CLUSTER_SUBMIT_CLASS 
        
             if (args.isPython) { 
        
               childArgs += ("--primary-py-file", args.primaryResource) 
        
               childArgs += ("--class", "org.apache.spark.deploy.PythonRunner") 
        
             } else if (args.isR) { 
        
               val mainFile = new Path(args.primaryResource).getName 
        
               childArgs += ("--primary-r-file", mainFile) 
        
               childArgs += ("--class", "org.apache.spark.deploy.RRunner") 
        
             } else { 
        
               if (args.primaryResource != SparkLauncher.NO_RESOURCE) { 
        
                 childArgs += ("--jar", args.primaryResource) 
        
               } 
        
               childArgs += ("--class", args.mainClass) 
        
             } 
        
             if (args.childArgs != null) { 
        
               args.childArgs.foreach { arg => childArgs += ("--arg", arg) } 
        
             } 
        
           }

I doubt that it would be hard for a user-defined mode to get correct 4-tuple return value of prepareSubmitEnvironment?

ConeyLiu · 2020-09-16T12:02:00Z

@KevinSmile , thansk for the advice. Which one do you think can not be processed well?

ConeyLiu · 2020-09-16T12:02:22Z

Hi @cloud-fan, could you help to review on this? Thanks a lot.

KevinSmile · 2020-09-16T12:22:03Z

Just a glance, I'm not very sure about this...

Let's say we have a new user-defined cluster manager called my-Yarn, which is all the same as Yarn, so we just copy the yarn-scheduler code to implement the new one .

But what should we do with the following if(isYarnCluster) snippet in SparkSubmit.scala, copy-paste and change it to if(isMyYarnCluster) to get correct childArgs? Where to copy-paste it?

spark/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Lines 716 to 734 in 2e3aa2f

    
           if (isYarnCluster) { 
        
             childMainClass = YARN_CLUSTER_SUBMIT_CLASS 
        
             if (args.isPython) { 
        
               childArgs += ("--primary-py-file", args.primaryResource) 
        
               childArgs += ("--class", "org.apache.spark.deploy.PythonRunner") 
        
             } else if (args.isR) { 
        
               val mainFile = new Path(args.primaryResource).getName 
        
               childArgs += ("--primary-r-file", mainFile) 
        
               childArgs += ("--class", "org.apache.spark.deploy.RRunner") 
        
             } else { 
        
               if (args.primaryResource != SparkLauncher.NO_RESOURCE) { 
        
                 childArgs += ("--jar", args.primaryResource) 
        
               } 
        
               childArgs += ("--class", args.mainClass) 
        
             } 
        
             if (args.childArgs != null) { 
        
               args.childArgs.foreach { arg => childArgs += ("--arg", arg) } 
        
             } 
        
           }

ConeyLiu · 2020-09-16T12:32:17Z

But what should we do with the following if(isYarnCluster) snippet in SparkSubmit.scala, copy-paste and change it to if(isMyYarnCluster)?

The childArgs is passed to start the cluster mode driver. If a user want to use the UserDefinedClusterManager and with cluster mode, they should process the logic in the Main.class right? The current ExternalClusterManager has no ability to process it.

KevinSmile · 2020-09-16T12:37:12Z

But Main.class's childArgs is retrieved from SparkSubmit( 4-tuple return value of prepareSubmitEnvironment)?

spark/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Lines 876 to 877 in 2e3aa2f

    
           private def runMain(args: SparkSubmitArguments, uninitLog: Boolean): Unit = { 
        
             val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)

spark/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Line 934 in 2e3aa2f

app.start(childArgs.toArray, sparkConf)

So it's another topic? Which means that the current design of SparkSubmit.scala and ExternalClusterManager is not so elegant, user who want to use UserDefinedClusterManager should also modify SparkSubmit besides the user-defined-scheduler part?

tgravescs · 2020-09-16T13:18:11Z

this is an interesting issue. One of the issues is how does spark submit properly know what all arguments are supported by that cluster manager. Similar what deployment modes are supported. there is a lot of cluster manager specific logic in here and this may work for you for most things but I would be surprised if it worked for all things.

Did you test this with both spark-submit and the interactive shells (spark-shell, pyspark, etc)? I'm not sure if you cluster manager supports full cluster mode or not vs running driver locally.

I think if we officially want to support this we need something else, some parts would need to be pluggable. I think that is going to be a whole lot more change though.

ConeyLiu · 2020-09-16T13:36:52Z

But Main.class's childArgs is retrieved from SparkSubmit( 4-tuple return value of prepareSubmitEnvironment)?

Look the code may be more clear. The main method receives arguments as well. So the parsed args will be passed into the main method in client mode, and we do not need to append some special arguments. The extra arguments need to be appended when in cluster mode.

So it's another topic? Which means that the current design of SparkSubmit.scala and ExternalClusterManager is not so elegant, user who want to use UserDefinedClusterManager should also modify SparkSubmit besides the user-defined-scheduler part?

Yes

ConeyLiu · 2020-09-16T13:43:59Z

Hi @tgravescs. Thanks for the advice. I have not tested all arguments, however, it works for us now. I agree with you, we need to redesign the ExternalClusterManager to support full arguments. Actually, the special logic for each ClusterManager should be processed in each ClusterManager, not in the SparkSubmit. The current change should be the smallest change. cc @carsonwang

KevinSmile · 2020-09-16T14:00:25Z

Yes, not pluggable is the point.

tgravescs · 2020-09-16T14:51:14Z

I'm possibly ok with this one but don't really want to keep hacking on it. I can see this going in and then someone filing another jira saying it doesn't work for X or Y and then we keep hacking at it. I would rather see it done properly if we are going to say its supported.

ConeyLiu · 2020-09-17T09:14:37Z

@tgravescs, thanks for the suggestion. Mark this as WIP, will optimize it better.

github-actions · 2020-12-27T01:00:36Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

support submit with user defined ClusterManager

5cccd6b

probot-autolabeler bot added the CORE label Sep 16, 2020

ConeyLiu changed the title ~~[SPARK-32899][CORE] Support submit application with user-defined cluster manager~~ [WIP][SPARK-32899][CORE] Support submit application with user-defined cluster manager Sep 17, 2020

github-actions bot added the Stale label Dec 27, 2020

github-actions bot closed this Dec 28, 2020

ConeyLiu mentioned this pull request Mar 19, 2021

[WIP][SPARK-32899][CORE] Support submit application with user-defined cluster manager #31896

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][SPARK-32899][CORE] Support submit application with user-defined cluster manager #29770

[WIP][SPARK-32899][CORE] Support submit application with user-defined cluster manager #29770

ConeyLiu commented Sep 16, 2020

AmplabJenkins commented Sep 16, 2020

KevinSmile commented Sep 16, 2020

KevinSmile commented Sep 16, 2020 •

edited

ConeyLiu commented Sep 16, 2020

ConeyLiu commented Sep 16, 2020

KevinSmile commented Sep 16, 2020 •

edited

ConeyLiu commented Sep 16, 2020 •

edited

KevinSmile commented Sep 16, 2020 •

edited

tgravescs commented Sep 16, 2020

ConeyLiu commented Sep 16, 2020 •

edited

ConeyLiu commented Sep 16, 2020 •

edited

KevinSmile commented Sep 16, 2020

tgravescs commented Sep 16, 2020

ConeyLiu commented Sep 17, 2020

github-actions bot commented Dec 27, 2020

[WIP][SPARK-32899][CORE] Support submit application with user-defined cluster manager #29770

[WIP][SPARK-32899][CORE] Support submit application with user-defined cluster manager #29770

Conversation

ConeyLiu commented Sep 16, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

AmplabJenkins commented Sep 16, 2020

KevinSmile commented Sep 16, 2020

KevinSmile commented Sep 16, 2020 • edited

ConeyLiu commented Sep 16, 2020

ConeyLiu commented Sep 16, 2020

KevinSmile commented Sep 16, 2020 • edited

ConeyLiu commented Sep 16, 2020 • edited

KevinSmile commented Sep 16, 2020 • edited

tgravescs commented Sep 16, 2020

ConeyLiu commented Sep 16, 2020 • edited

ConeyLiu commented Sep 16, 2020 • edited

KevinSmile commented Sep 16, 2020

tgravescs commented Sep 16, 2020

ConeyLiu commented Sep 17, 2020

github-actions bot commented Dec 27, 2020

KevinSmile commented Sep 16, 2020 •

edited

KevinSmile commented Sep 16, 2020 •

edited

ConeyLiu commented Sep 16, 2020 •

edited

KevinSmile commented Sep 16, 2020 •

edited

ConeyLiu commented Sep 16, 2020 •

edited

ConeyLiu commented Sep 16, 2020 •

edited