Skip to content

[Bug] [Task Plugin] seatunnel unable submit spark job to spark:// #14546

@Yhr-N

Description

@Yhr-N

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

When using the seatunel engine to submit Spark jobs, selecting Spark as the master did not concatenate the master address in the generated job submission script, resulting in job failure

it generated a script like this:

[INFO] 2023-07-14 00:39:14.376 +0000 - SeaTunnel task command: ${SEATUNNEL_HOME}/bin/start-seatunnel-spark-3-connector-v2.sh --config /tmp/dolphinscheduler/exec/process/root/10215811485920/10215863853024_2/9/9/seatunnel_9_9.conf --deploy-mode client --master spark:// sparkhost:port

and report this exception:

23/07/14 08:39:18 ERROR SparkContext: Error initializing SparkContext.
	org.apache.spark.SparkException: Invalid master URL: spark://
		at org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2551)
		at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47)
		at org.apache.spark.deploy.client.StandaloneAppClient.$anonfun$masterRpcAddresses$1(StandaloneAppClient.scala:53)
		at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
		at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
		at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
		at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
		at scala.collection.TraversableLike.map(TraversableLike.scala:286)
		at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
		at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
		at org.apache.spark.deploy.client.StandaloneAppClient.<init>(StandaloneAppClient.scala:53)
		at org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.start(StandaloneSchedulerBackend.scala:123)
		at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:222)
		at org.apache.spark.SparkContext.<init>(SparkContext.scala:595)
		at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2714)
		at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
		at scala.Option.getOrElse(Option.scala:189)
		at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
		at org.apache.seatunnel.core.starter.spark.execution.SparkRuntimeEnvironment.prepare(SparkRuntimeEnvironment.java:113)
		at org.apache.seatunnel.core.starter.spark.execution.SparkRuntimeEnvironment.prepare(SparkRuntimeEnvironment.java:38)
		at org.apache.seatunnel.core.starter.execution.RuntimeEnvironment.initialize(RuntimeEnvironment.java:48)
		at org.apache.seatunnel.core.starter.spark.execution.SparkRuntimeEnvironment.<init>(SparkRuntimeEnvironment.java:60)
		at org.apache.seatunnel.core.starter.spark.execution.SparkRuntimeEnvironment.getInstance(SparkRuntimeEnvironment.java:173)
		at org.apache.seatunnel.core.starter.spark.execution.SparkExecution.<init>(SparkExecution.java:50)
		at org.apache.seatunnel.core.starter.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:59)
		at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
		at org.apache.seatunnel.core.starter.spark.SeaTunnelSpark.main(SeaTunnelSpark.java:35)
		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.lang.reflect.Method.invoke(Method.java:498)
		at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
		at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
		at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
		at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
		at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
		at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
		at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
		at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	Caused by: java.net.URISyntaxException: Expected authority at index 8: spark://
		at java.net.URI$Parser.fail(URI.java:2847)
		at java.net.URI$Parser.failExpecting(URI.java:2853)
		at java.net.URI$Parser.parseHierarchical(URI.java:3101)
		at java.net.URI$Parser.parse(URI.java:3052)
		at java.net.URI.<init>(URI.java:588)
		at org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2536)
		... 38 more

What you expected to happen

There should be no spaces in the master address, like this

--master spark://sparkhost:port

How to reproduce

the code of generating master args in org.apache.dolphinscheduler.plugin.task.seatunnel.spark.SeatunnelSparkTask: buildOptions() needs to be modified

if master is spark or mesos, the master command needs to be concatenated with the url

like this

args.add(MASTER_OPTIONS);
if (MasterTypeEnum.SPARK.equals(master) || MasterTypeEnum.MESOS.equals(master)) {
    args.add(master.getCommand() + seatunnelParameters.getMasterUrl());
} else {
    args.add(master.getCommand());
}

Anything else

No response

Version

3.1.x

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions