[Improvement][Spark] Support Local Spark Cluster #15548

git-blame · 2024-01-31T20:01:51Z

Search before asking

I had searched in the issues and found no similar feature requirement.

Description

When a Spark Task executes spark-submit, the 'cluster' and 'client' deploy maps to --master yarn or --master k8s://.... I would like option to use a local Spark cluster. In other words, the equivalent spark-submit option is:--master spark://<hostname>:<port>

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

pegasas · 2024-02-02T14:10:53Z

I would like to have a try on this issue.

git-blame · 2024-02-02T14:42:21Z

Current workaround for me is to pass --master ... --deploy-mode cluster in the extra options. Since spark-submit will use the last values, this will send task to local cluster. For example look at this log which has my own --master option which overrides Dolphin --master local:

[INFO] 2024-02-02 14:27:38.934 +0000 - Final Shell file is : 
#!/bin/bash
BASEDIR=$(cd `dirname $0`; pwd)
cd $BASEDIR
export SPARK_HOME=/opt/spark-3.5.0-bin-hadoop3
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
${SPARK_HOME}/bin/spark-submit --master local 
--class com.example.monitor.ScanMonitor --conf spark.driver.cores=1 --conf spark.driver.memory=512M 
--conf spark.executor.instances=2 --conf spark.executor.cores=2 
--conf spark.executor.memory=2G 
--master spark://devel:7077 --deploy-mode cluster 
file:/opt/apache-dolphinscheduler-3.2.0-bin/standalone-server/files/default/resources/monitor-0.1-jdk11.jar producer
...
24/02/02 14:27:54 INFO ClientEndpoint: Driver successfully submitted as driver-20240202142754-0003
2024-02-02 14:28:00.038 +0000 -  -> 
24/02/02 14:27:59 INFO ClientEndpoint: State of driver-20240202142754-0003 is RUNNING
24/02/02 14:27:59 INFO ClientEndpoint: Driver running on 172.16.254.204:35595 (worker-20240202141308-172.16.254.204-35595)
24/02/02 14:27:59 INFO ClientEndpoint: spark-submit not configured to wait for completion, exiting spark-submit JVM.

pegasas · 2024-02-02T16:29:26Z

Current workaround for me is to pass --master ... --deploy-mode cluster in the extra options. Since spark-submit will use the last values, this will send task to local cluster. For example look at this log which has my own --master option which overrides Dolphin --master local:

[INFO] 2024-02-02 14:27:38.934 +0000 - Final Shell file is : 
#!/bin/bash
BASEDIR=$(cd `dirname $0`; pwd)
cd $BASEDIR
export SPARK_HOME=/opt/spark-3.5.0-bin-hadoop3
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
${SPARK_HOME}/bin/spark-submit --master local 
--class com.example.monitor.ScanMonitor --conf spark.driver.cores=1 --conf spark.driver.memory=512M 
--conf spark.executor.instances=2 --conf spark.executor.cores=2 
--conf spark.executor.memory=2G 
--master spark://devel:7077 --deploy-mode cluster 
file:/opt/apache-dolphinscheduler-3.2.0-bin/standalone-server/files/default/resources/monitor-0.1-jdk11.jar producer
...
24/02/02 14:27:54 INFO ClientEndpoint: Driver successfully submitted as driver-20240202142754-0003
2024-02-02 14:28:00.038 +0000 -  -> 
24/02/02 14:27:59 INFO ClientEndpoint: State of driver-20240202142754-0003 is RUNNING
24/02/02 14:27:59 INFO ClientEndpoint: Driver running on 172.16.254.204:35595 (worker-20240202141308-172.16.254.204-35595)
24/02/02 14:27:59 INFO ClientEndpoint: spark-submit not configured to wait for completion, exiting spark-submit JVM.

Thanks @git-blame for quick work around, indeed it will work in the extra options, but master is a important parameter among spark as mentioned.

I will communicate with community to see if it is by design in previous discussions.

If not, I will add paramater into spark task.

github-actions · 2024-03-04T00:20:21Z

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

pegasas · 2024-03-04T02:05:48Z

still working

github-actions · 2024-04-05T00:20:17Z

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

pegasas · 2024-04-05T03:12:54Z

still working

git-blame added improvement make more easy to user or prompt friendly Waiting for reply Waiting for reply labels Jan 31, 2024

SbloodyS added good first issue good first issue and removed Waiting for reply Waiting for reply labels Feb 2, 2024

pegasas mentioned this issue Feb 11, 2024

[Improvement][Spark] Support Local Spark Cluster #15589

Merged

github-actions bot added the Stale label Mar 4, 2024

github-actions bot removed the Stale label Mar 5, 2024

SbloodyS assigned pegasas Mar 5, 2024

SbloodyS removed the good first issue good first issue label Mar 5, 2024

github-actions bot added the Stale label Apr 5, 2024

github-actions bot removed the Stale label Apr 6, 2024

rickchengx closed this as completed in #15589 Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement][Spark] Support Local Spark Cluster #15548

[Improvement][Spark] Support Local Spark Cluster #15548

git-blame commented Jan 31, 2024

pegasas commented Feb 2, 2024

git-blame commented Feb 2, 2024

pegasas commented Feb 2, 2024 •

edited

github-actions bot commented Mar 4, 2024

pegasas commented Mar 4, 2024

github-actions bot commented Apr 5, 2024

pegasas commented Apr 5, 2024

[Improvement][Spark] Support Local Spark Cluster #15548

[Improvement][Spark] Support Local Spark Cluster #15548

Comments

git-blame commented Jan 31, 2024

Search before asking

Description

Are you willing to submit a PR?

Code of Conduct

pegasas commented Feb 2, 2024

git-blame commented Feb 2, 2024

pegasas commented Feb 2, 2024 • edited

github-actions bot commented Mar 4, 2024

pegasas commented Mar 4, 2024

github-actions bot commented Apr 5, 2024

pegasas commented Apr 5, 2024

pegasas commented Feb 2, 2024 •

edited