Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement][Spark] Support Local Spark Cluster #15548

Closed
2 of 3 tasks
git-blame opened this issue Jan 31, 2024 · 7 comments · Fixed by #15589
Closed
2 of 3 tasks

[Improvement][Spark] Support Local Spark Cluster #15548

git-blame opened this issue Jan 31, 2024 · 7 comments · Fixed by #15589
Assignees
Labels
improvement make more easy to user or prompt friendly

Comments

@git-blame
Copy link

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

When a Spark Task executes spark-submit, the 'cluster' and 'client' deploy maps to --master yarn or --master k8s://.... I would like option to use a local Spark cluster. In other words, the equivalent spark-submit option is:--master spark://<hostname>:<port>

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@git-blame git-blame added improvement make more easy to user or prompt friendly Waiting for reply Waiting for reply labels Jan 31, 2024
@SbloodyS SbloodyS added good first issue good first issue and removed Waiting for reply Waiting for reply labels Feb 2, 2024
@pegasas
Copy link
Contributor

pegasas commented Feb 2, 2024

I would like to have a try on this issue.

@git-blame
Copy link
Author

Current workaround for me is to pass --master ... --deploy-mode cluster in the extra options. Since spark-submit will use the last values, this will send task to local cluster. For example look at this log which has my own --master option which overrides Dolphin --master local:

[INFO] 2024-02-02 14:27:38.934 +0000 - Final Shell file is : 
#!/bin/bash
BASEDIR=$(cd `dirname $0`; pwd)
cd $BASEDIR
export SPARK_HOME=/opt/spark-3.5.0-bin-hadoop3
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
${SPARK_HOME}/bin/spark-submit --master local 
--class com.example.monitor.ScanMonitor --conf spark.driver.cores=1 --conf spark.driver.memory=512M 
--conf spark.executor.instances=2 --conf spark.executor.cores=2 
--conf spark.executor.memory=2G 
--master spark://devel:7077 --deploy-mode cluster 
file:/opt/apache-dolphinscheduler-3.2.0-bin/standalone-server/files/default/resources/monitor-0.1-jdk11.jar producer
...
24/02/02 14:27:54 INFO ClientEndpoint: Driver successfully submitted as driver-20240202142754-0003
2024-02-02 14:28:00.038 +0000 -  -> 
24/02/02 14:27:59 INFO ClientEndpoint: State of driver-20240202142754-0003 is RUNNING
24/02/02 14:27:59 INFO ClientEndpoint: Driver running on 172.16.254.204:35595 (worker-20240202141308-172.16.254.204-35595)
24/02/02 14:27:59 INFO ClientEndpoint: spark-submit not configured to wait for completion, exiting spark-submit JVM.

@pegasas
Copy link
Contributor

pegasas commented Feb 2, 2024

Current workaround for me is to pass --master ... --deploy-mode cluster in the extra options. Since spark-submit will use the last values, this will send task to local cluster. For example look at this log which has my own --master option which overrides Dolphin --master local:

[INFO] 2024-02-02 14:27:38.934 +0000 - Final Shell file is : 
#!/bin/bash
BASEDIR=$(cd `dirname $0`; pwd)
cd $BASEDIR
export SPARK_HOME=/opt/spark-3.5.0-bin-hadoop3
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
${SPARK_HOME}/bin/spark-submit --master local 
--class com.example.monitor.ScanMonitor --conf spark.driver.cores=1 --conf spark.driver.memory=512M 
--conf spark.executor.instances=2 --conf spark.executor.cores=2 
--conf spark.executor.memory=2G 
--master spark://devel:7077 --deploy-mode cluster 
file:/opt/apache-dolphinscheduler-3.2.0-bin/standalone-server/files/default/resources/monitor-0.1-jdk11.jar producer
...
24/02/02 14:27:54 INFO ClientEndpoint: Driver successfully submitted as driver-20240202142754-0003
2024-02-02 14:28:00.038 +0000 -  -> 
24/02/02 14:27:59 INFO ClientEndpoint: State of driver-20240202142754-0003 is RUNNING
24/02/02 14:27:59 INFO ClientEndpoint: Driver running on 172.16.254.204:35595 (worker-20240202141308-172.16.254.204-35595)
24/02/02 14:27:59 INFO ClientEndpoint: spark-submit not configured to wait for completion, exiting spark-submit JVM.

Thanks @git-blame for quick work around, indeed it will work in the extra options, but master is a important parameter among spark as mentioned.

I will communicate with community to see if it is by design in previous discussions.

If not, I will add paramater into spark task.

Copy link

github-actions bot commented Mar 4, 2024

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Mar 4, 2024
@pegasas
Copy link
Contributor

pegasas commented Mar 4, 2024

still working

@github-actions github-actions bot removed the Stale label Mar 5, 2024
@SbloodyS SbloodyS removed the good first issue good first issue label Mar 5, 2024
Copy link

github-actions bot commented Apr 5, 2024

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Apr 5, 2024
@pegasas
Copy link
Contributor

pegasas commented Apr 5, 2024

still working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement make more easy to user or prompt friendly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants