Skip to content

[Bug][SPARK] Spark Program is running successfully on yarn, but some show success or failure on dolphinscheduler, version 1.3.5 #5598

@nmz0324

Description

@nmz0324

Spark Program is running successfully on yarn, but some show success or failure on dolphin scheduler, version 1.3.5
SPARK程序在yarn `上运行状态是成功,但在dolphinscheduler上有的显示成功,有的显示失败 dolphin scheduler版本1.3.5
work.log

21/06/07 17:25:10 INFO common.FileUtils: Creating directory if it doesn't exist: hdfs://master:8020/user/hive/warehouse/llys.db/d_meter_info
	21/06/07 17:25:10 INFO spark.SparkContext: Invoking stop() from shutdown hook
	21/06/07 17:25:10 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.xxxxxxxx:4040
	21/06/07 17:25:10 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
	21/06/07 17:25:10 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
	21/06/07 17:25:10 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
	21/06/07 17:25:10 INFO cluster.YarnClientSchedulerBackend: Stopped
	21/06/07 17:25:10 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
	21/06/07 17:25:10 INFO storage.MemoryStore: MemoryStore cleared
	21/06/07 17:25:10 INFO storage.BlockManager: BlockManager stopped
	21/06/07 17:25:10 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
	21/06/07 17:25:10 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
	21/06/07 17:25:10 INFO spark.SparkContext: Successfully stopped SparkContext
	21/06/07 17:25:10 INFO util.ShutdownHookManager: Shutdown hook called
	21/06/07 17:25:10 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-a62daf3e-951a-4906-8e69-2efcc7688362
	21/06/07 17:25:10 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-58e66c77-b8a1-40ef-a9bf-5d9ca39b418f
[INFO] 2021-06-07 17:25:11.098  - [taskAppId=TASK-943-539-746]:[125] - FINALIZE_SESSION
[INFO] 2021-06-07 17:25:11.109  - [taskAppId=TASK-943-539-746]:[431] - find app id: application_1623056438401_0003
[INFO] 2021-06-07 17:25:11.113 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[141] - task instance id : 746,task final status : FAILURE
[INFO] 2021-06-07 17:25:11.116 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[161] - develop mode is: false
[INFO] 2021-06-07 17:25:11.119 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[179] - exec local path: /tmp/dolphinscheduler/exec/process/61/943/539/746 cleared.

yarn

Log Type: stderr

Log Upload Time: Mon Jun 07 17:25:13 +0800 2021

Log Length: 68880

Showing 4096 bytes of 68880 total. Click here for the full log.

_1623056438401_0003/__spark_conf__8945468690948841211.zip" } size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
21/06/07 17:24:51 INFO yarn.ExecutorRunnable: Prepared Local resources Map(__spark_conf__ -> resource { scheme: "hdfs" host: "master" port: 8020 file: "/user/root/.sparkStaging/application_1623056438401_0003/__spark_conf__8945468690948841211.zip" } size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
21/06/07 17:24:51 INFO yarn.ExecutorRunnable: Prepared Local resources Map(__spark_conf__ -> resource { scheme: "hdfs" host: "master" port: 8020 file: "/user/root/.sparkStaging/application_1623056438401_0003/__spark_conf__8945468690948841211.zip" } size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
21/06/07 17:24:51 INFO yarn.ExecutorRunnable: Prepared Local resources Map(__spark_conf__ -> resource { scheme: "hdfs" host: "master" port: 8020 file: "/user/root/.sparkStaging/application_1623056438401_0003/__spark_conf__8945468690948841211.zip" } size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
21/06/07 17:24:54 INFO yarn.YarnAllocator: Completed container container_1623056438401_0003_01_000004 on host: worker01 (state: COMPLETE, exit status: 1)
21/06/07 17:24:54 WARN yarn.YarnAllocator: Container marked as failed: container_1623056438401_0003_01_000004 on host: worker01. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1623056438401_0003_01_000004
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
	at org.apache.hadoop.util.Shell.run(Shell.java:507)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 1

21/06/07 17:24:57 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 11264 MB memory (including 1024 MB of overhead)
21/06/07 17:24:57 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
21/06/07 17:24:59 INFO yarn.YarnAllocator: Launching container container_1623056438401_0003_01_000007 on host master
21/06/07 17:24:59 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
21/06/07 17:24:59 INFO yarn.ExecutorRunnable: Preparing Local resources
21/06/07 17:24:59 INFO yarn.ExecutorRunnable: Prepared Local resources Map(__spark_conf__ -> resource { scheme: "hdfs" host: "master" port: 8020 file: "/user/root/.sparkStaging/application_1623056438401_0003/__spark_conf__8945468690948841211.zip" } size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
21/06/07 17:25:10 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
21/06/07 17:25:10 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. 192.168.xx.xx:60916
21/06/07 17:25:10 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. worker03:60916
21/06/07 17:25:10 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
21/06/07 17:25:10 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
21/06/07 17:25:10 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
21/06/07 17:25:10 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1623056438401_0003
21/06/07 17:25:10 INFO util.ShutdownHookManager: Shutdown hook called

微信图片_20210607175004
微信图片_20210607180117
微信图片_20210607180131

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions