Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception #8963

Closed
dawany opened this issue May 22, 2020 · 9 comments
Labels
area:providers kind:bug This is a clearly a bug pending-response provider:cncf-kubernetes Kubernetes provider related issues stale Stale PRs per the .github/workflows/stale.yml policy file

Comments

@dawany
Copy link

dawany commented May 22, 2020

SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception

description

I use airflow to schedule spark jobs on k8s using SparkSubmitOperator.
when spark jobs run on k8s for long time (>30mins), airflow often mark job failed status when the job is still running even the job finish successfully.

when it happen ,this exception often appears at the same time but not always

20/05/20 10:49:58 INFO TaskSetManager: Finished task 7.0 in stage 91.0 (TID 13313) in 34319 ms on 172.30.238.243 (executor 5) (10/20)
20/05/20 10:49:59 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 140189600 (140238574)
	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
	at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:307)
	at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:222)
	at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
	at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
20/05/20 10:50:19 INFO TaskSetManager: Finished task 10.0 in stage 91.0 (TID 13316) in 27478 ms on 172.30.244.131 (executor 1) (11/20)
20/05/20 10:50:19 INFO TaskSetManager: Finished task 12.0 in stage 91.0 (TID 13318) in 27052 ms on 172.30.244.131 (executor 1) (12/20)

env

uat env
airflow version:1.10.10
python:3.6.8
k8s server 1.5
k8s client ?
spark 2.4.4-h2.7

submit jobs by manual,get k8s logs directly

Scenario 1: job succeed, log return from k8s

2020-05-11 11:58:21 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-11 11:58:27 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: empreporting-1589212705263-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
	 pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
	 creation time: 2020-05-11T15:58:27Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: N/A
	 start time: N/A
	 container images: N/A
	 phase: Pending
	 status: []

2020-05-11 11:58:27 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: empreporting-1589212705263-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
	 pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
	 creation time: 2020-05-11T15:58:27Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: 10.74.200.157
	 start time: N/A
	 container images: N/A
	 phase: Pending
	 status: []

2020-05-11 11:58:28 INFO  Client:54 - Waiting for application emp_reporting to finish...
2020-05-11 11:58:48 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: empreporting-1589212705263-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
	 pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
	 creation time: 2020-05-11T15:58:27Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: 10.74.200.157
	 start time: 2020-05-11T15:58:48Z
	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
	 phase: Pending
	 status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]

2020-05-11 11:59:28 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: empreporting-1589212705263-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
	 pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
	 creation time: 2020-05-11T15:58:27Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: 10.74.200.157
	 start time: 2020-05-11T15:58:48Z
	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
	 phase: Running
	 status: [ContainerStatus(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-11T15:59:26Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]

2020-05-11 12:01:15 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: empreporting-1589212705263-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
	 pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
	 creation time: 2020-05-11T15:58:27Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: 10.74.200.157
	 start time: 2020-05-11T15:58:48Z
	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
	 phase: Succeeded
	 status: [ContainerStatus(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b, exitCode=0, finishedAt=Time(time=2020-05-11T16:01:08Z, additionalProperties={}), message=null, reason=Completed, signal=null, startedAt=Time(time=2020-05-11T15:59:26Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
2020-05-11 12:01:15 INFO  LoggingPodStatusWatcherImpl:54 - Container final statuses:


	 Container name: spark-kubernetes-driver
	 Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
	 Container state: Terminated
	 Exit code: 0
2020-05-11 12:01:15 INFO  Client:54 - Application emp_reporting finished.
2020-05-11 12:01:15 INFO  ShutdownHookManager:54 - Shutdown hook called
2020-05-11 12:01:15 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-f3b0955d-fe7c-4858-a1c6-f70597d42104

Scenario 2:job failed, log return from k8s

2020-05-11 11:54:10 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-11 11:54:16 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: empreporting-1589212454526-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
	 pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
	 creation time: 2020-05-11T15:54:16Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: N/A
	 start time: N/A
	 container images: N/A
	 phase: Pending
	 status: []

2020-05-11 11:54:16 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: empreporting-1589212454526-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
	 pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
	 creation time: 2020-05-11T15:54:16Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: 10.74.200.186
	 start time: N/A
	 container images: N/A
	 phase: Pending
	 status: []

2020-05-11 11:54:17 INFO  Client:54 - Waiting for application emp_reporting to finish...
2020-05-11 11:54:17 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: empreporting-1589212454526-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
	 pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
	 creation time: 2020-05-11T15:54:16Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: 10.74.200.186
	 start time: 2020-05-11T15:54:17Z
	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
	 phase: Pending
	 status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]

2020-05-11 11:55:04 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: empreporting-1589212454526-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
	 pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
	 creation time: 2020-05-11T15:54:16Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: 10.74.200.186
	 start time: 2020-05-11T15:54:17Z
	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
	 phase: Running
	 status: [ContainerStatus(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-11T15:55:03Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]

2020-05-11 11:55:29 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: empreporting-1589212454526-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
	 pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
	 creation time: 2020-05-11T15:54:16Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: 10.74.200.186
	 start time: 2020-05-11T15:54:17Z
	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
	 phase: Failed
	 status: [ContainerStatus(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9, exitCode=1, finishedAt=Time(time=2020-05-11T15:55:27Z, additionalProperties={}), message=null, reason=Error, signal=null, startedAt=Time(time=2020-05-11T15:55:03Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
2020-05-11 11:55:29 INFO  LoggingPodStatusWatcherImpl:54 - Container final statuses:


	 Container name: spark-kubernetes-driver
	 Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
	 Container state: Terminated
	 Exit code: 1
2020-05-11 11:55:29 INFO  Client:54 - Application emp_reporting finished.
2020-05-11 11:55:29 INFO  ShutdownHookManager:54 - Shutdown hook called
2020-05-11 11:55:29 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-8ba3c338-3eae-4bc7-9c25-3146642b9207

airflow logs examples

Scenario 1: job success, log return from airflow ,get right status:

[dawany@dawany-inf env_qa]$ kubectl describe pod rmtextract-1589270449587-driver -n batch-pipeline-qa
Name:         rmtextract-1589270449587-driver
Namespace:    batch-pipeline-qa
Priority:     0
Node:         10.93.122.236/10.93.122.236
Start Time:   Tue, 12 May 2020 04:00:51 -0400
Labels:       spark-app-selector=spark-2f7caeac92c94710ab54cf40b5ff29e5
              spark-role=driver
Annotations:  kubernetes.io/psp: db2oltp-dev-psp
Status:       Succeeded
IP:           172.30.244.25
IPs:          <none>
Containers:
  spark-kubernetes-driver:
    Container ID:  containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6
    Image:         txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
    Image ID:      txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e
    Ports:         7078/TCP, 7079/TCP, 4040/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      driver
      --properties-file
      /opt/spark/conf/spark.properties
      --class
      com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob
      spark-internal
      0
      3
      8
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 12 May 2020 04:01:07 -0400
      Finished:     Tue, 12 May 2020 04:52:03 -0400
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  11Gi
    Requests:
      cpu:     1
      memory:  11Gi
    Environment:
      SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
      SPARK_LOCAL_DIRS:           /var/data/spark-02fa9b25-af0b-430d-ab75-aee12dd217a7
      SPARK_CONF_DIR:             /opt/spark/conf
    Mounts:
      /opt/spark/conf from spark-conf-volume (rw)
      /opt/spark/secrets from dswsecret-volume (rw)
      /var/data/spark-02fa9b25-af0b-430d-ab75-aee12dd217a7 from spark-local-dir-1 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  spark-local-dir-1:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  dswsecret-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  dswsecret
    Optional:    false
  spark-conf-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rmtextract-1589270449587-driver-conf-map
    Optional:  false
  spark-token-fpqpz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  spark-token-fpqpz
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                 node.kubernetes.io/unreachable:NoExecute for 600s
Events:          <none>

*** Reading local file: /home/airflow/airflow/logs/Transformation_Rmt_Master_Load_Sequence_Adhoc_Main/Rmt_Extract/2020-05-12T04:00:00+00:00/1.log
[2020-05-12 03:00:43,970] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-12T04:00:00+00:00 [queued]>
[2020-05-12 03:00:44,074] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-12T04:00:00+00:00 [queued]>
[2020-05-12 03:00:44,074] {taskinstance.py:879} INFO -
--------------------------------------------------------------------------------
[2020-05-12 03:00:44,074] {taskinstance.py:880} INFO - Starting attempt 1 of 2
[2020-05-12 03:00:44,074] {taskinstance.py:881} INFO -
--------------------------------------------------------------------------------
[2020-05-12 03:00:44,173] {taskinstance.py:900} INFO - Executing <Task(SparkSubmitOperator): Rmt_Extract> on 2020-05-12T04:00:00+00:00
[2020-05-12 03:00:44,176] {standard_task_runner.py:53} INFO - Started process 11051 to run task
[2020-05-12 03:00:44,619] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-12T04:00:00+00:00 [running]> kafka02.cloud.ibm.com
[2020-05-12 03:00:45,043] {logging_mixin.py:112} INFO - [2020-05-12 03:00:45,043] {base_hook.py:87} INFO - Using connection to: id: spark_default. Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX
[2020-05-12 03:00:45,071] {logging_mixin.py:112} INFO - [2020-05-12 03:00:45,071] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8
[2020-05-12 03:00:47,353] {logging_mixin.py:112} INFO - [2020-05-12 03:00:47,352] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[2020-05-12 03:00:49,643] {logging_mixin.py:112} INFO - [2020-05-12 03:00:49,643] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
[2020-05-12 03:00:49,643] {logging_mixin.py:112} INFO - [2020-05-12 03:00:49,643] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize the log4j system properly.
[2020-05-12 03:00:49,644] {logging_mixin.py:112} INFO - [2020-05-12 03:00:49,644] {spark_submit_hook.py:479} INFO - log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2020-05-12 03:00:50,889] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,888] {spark_submit_hook.py:479} INFO - Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

[2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:50 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
[2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
[2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - node name: N/A
[2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - start time: N/A
[2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - container images: N/A
[2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,910] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,910] {spark_submit_hook.py:479} INFO - status: []

[2020-05-12 03:00:50,912] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,912] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:50 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,912] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
[2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - start time: N/A
[2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - container images: N/A
[2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - status: []

[2020-05-12 03:00:51,163] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,163] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:51 INFO Client: Waiting for application Rmt_Extract to finish...
[2020-05-12 03:00:51,406] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,406] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:51 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
[2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
[2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
[2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
[2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]

[2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:479} INFO - 20/05/12 03:01:07 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
[2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z
[2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,813] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,813] {spark_submit_hook.py:479} INFO - phase: Running
[2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,813] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-12T08:01:07Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]

[2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
[2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z
[2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - phase: Succeeded
[2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6, exitCode=0, finishedAt=2020-05-12T08:52:03Z, message=null, reason=Completed, signal=null, startedAt=2020-05-12T08:01:07Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
[2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,882] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO LoggingPodStatusWatcherImpl: Container final statuses:
[2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO -
[2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO -
[2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Container name: spark-kubernetes-driver
[2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Container state: Terminated
[2020-05-12 03:52:13,884] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Exit code: 0
[2020-05-12 03:52:13,884] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,884] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO Client: Application Rmt_Extract finished.
[2020-05-12 03:52:13,889] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,889] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO ShutdownHookManager: Shutdown hook called
[2020-05-12 03:52:13,890] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,890] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO ShutdownHookManager: Deleting directory /tmp/spark-8c6f511c-4c0f-451e-b686-7d0f444d8f59
[2020-05-12 03:52:14,006] {taskinstance.py:1065} INFO - Marking task as SUCCESS.dag_id=Transformation_Rmt_Master_Load_Sequence_Adhoc_Main, task_id=Rmt_Extract, execution_date=20200512T040000, start_date=20200512T080043, end_date=20200512T085214
[2020-05-12 03:52:19,162] {logging_mixin.py:112} INFO - [2020-05-12 03:52:19,161] {local_task_job.py:103} INFO - Task exited with return code 0

Scenario 2:job failed , log return from airflow ,get right status:

[dawany@dawany-inf env_qa]$ kubectl describe pod merge-rptg-appl-dscr-1589248907865-driver -n batch-pipeline-qa
Name:         merge-rptg-appl-dscr-1589248907865-driver
Namespace:    batch-pipeline-qa
Priority:     0
Node:         10.74.200.186/10.74.200.186
Start Time:   Mon, 11 May 2020 22:01:52 -0400
Labels:       spark-app-selector=spark-1e9e33df2d5c488dbc0555a618577f40
              spark-role=driver
Annotations:  kubernetes.io/psp: db2oltp-dev-psp
Status:       Failed
IP:           172.30.21.6
IPs:          <none>
Containers:
  spark-kubernetes-driver:
    Container ID:  containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31
    Image:         txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
    Image ID:      txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885
    Ports:         7078/TCP, 7079/TCP, 4040/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      driver
      --properties-file
      /opt/spark/conf/spark.properties
      --class
      com.ibm.cio.dswim.ingest.CommonMerge
      spark-internal
      -t
      odsqa.shar1.rptg_appl_dscr
      -b
      dsw-data-drop-qa
      -d
      dashdb
      -k
      XREF
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 11 May 2020 22:02:19 -0400
      Finished:     Mon, 11 May 2020 22:05:03 -0400
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  896Mi
    Requests:
      cpu:     1
      memory:  896Mi
    Environment:
      SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
      SPARK_LOCAL_DIRS:           /var/data/spark-8688577b-07b4-4cf5-b362-1a15003420ae
      SPARK_CONF_DIR:             /opt/spark/conf
    Mounts:
      /opt/spark/conf from spark-conf-volume (rw)
      /opt/spark/secrets from dswsecret-volume (rw)
      /var/data/spark-8688577b-07b4-4cf5-b362-1a15003420ae from spark-local-dir-1 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  spark-local-dir-1:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  dswsecret-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  dswsecret
    Optional:    false
  spark-conf-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      merge-rptg-appl-dscr-1589248907865-driver-conf-map
    Optional:  false
  spark-token-fpqpz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  spark-token-fpqpz
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                 node.kubernetes.io/unreachable:NoExecute for 600s
Events:          <none>
[dawany@dawany-inf env_qa]$

*** Reading local file: /home/airflow/airflow/logs/Batch_xref.rptg_appl_dscr/merge_rptg_appl_dscr/2020-05-11T02:01:00+00:00/1.log
[2020-05-11 21:01:41,692] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 2020-05-11T02:01:00+00:00 [queued]>
[2020-05-11 21:01:41,801] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 2020-05-11T02:01:00+00:00 [queued]>
[2020-05-11 21:01:41,801] {taskinstance.py:879} INFO -
--------------------------------------------------------------------------------
[2020-05-11 21:01:41,801] {taskinstance.py:880} INFO - Starting attempt 1 of 2
[2020-05-11 21:01:41,802] {taskinstance.py:881} INFO -
--------------------------------------------------------------------------------
[2020-05-11 21:01:41,911] {taskinstance.py:900} INFO - Executing <Task(SparkSubmitOperator): merge_rptg_appl_dscr> on 2020-05-11T02:01:00+00:00
[2020-05-11 21:01:41,919] {standard_task_runner.py:53} INFO - Started process 37536 to run task
[2020-05-11 21:01:42,508] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 2020-05-11T02:01:00+00:00 [running]> kafka02.cloud.ibm.com
[2020-05-11 21:01:43,042] {logging_mixin.py:112} INFO - [2020-05-11 21:01:43,042] {base_hook.py:87} INFO - Using connection to: id: spark_default. Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX
[2020-05-11 21:01:43,055] {logging_mixin.py:112} INFO - [2020-05-11 21:01:43,055] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf spark.kubernetes.executor.request.cores=0.5 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf --jars cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class com.ibm.cio.dswim.ingest.CommonMerge cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF
[2020-05-11 21:01:45,650] {logging_mixin.py:112} INFO - [2020-05-11 21:01:45,649] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[2020-05-11 21:01:47,988] {logging_mixin.py:112} INFO - [2020-05-11 21:01:47,988] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
[2020-05-11 21:01:47,989] {logging_mixin.py:112} INFO - [2020-05-11 21:01:47,989] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize the log4j system properly.
[2020-05-11 21:01:47,989] {logging_mixin.py:112} INFO - [2020-05-11 21:01:47,989] {spark_submit_hook.py:479} INFO - log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2020-05-11 21:01:49,267] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,266] {spark_submit_hook.py:479} INFO - Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

[2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,282] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
[2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - node name: N/A
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - start time: N/A
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - container images: N/A
[2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-11 21:01:49,285] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,285] {spark_submit_hook.py:479} INFO - status: []

[2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,291] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
[2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - start time: N/A
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - container images: N/A
[2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-11 21:01:49,294] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - status: []

[2020-05-11 21:01:49,550] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,550] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO Client: Waiting for application merge-rptg-appl-dscr to finish...
[2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:53 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
[2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z
[2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,415] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
[2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,415] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,415] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]

[2020-05-11 21:02:20,393] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,393] {spark_submit_hook.py:479} INFO - 20/05/11 21:02:20 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,393] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
[2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
[2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - phase: Running
[2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-12T02:02:19Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]

[2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
[2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
[2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z
[2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,955] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
[2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,955] {spark_submit_hook.py:479} INFO - phase: Failed
[2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,955] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31, exitCode=1, finishedAt=2020-05-12T02:05:03Z, message=null, reason=Error, signal=null, startedAt=2020-05-12T02:02:19Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
[2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO LoggingPodStatusWatcherImpl: Container final statuses:
[2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO -
[2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO -
[2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - Container name: spark-kubernetes-driver
[2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
[2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,961] {spark_submit_hook.py:479} INFO - Container state: Terminated
[2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,961] {spark_submit_hook.py:479} INFO - Exit code: 1
[2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,961] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO Client: Application merge-rptg-appl-dscr finished.
[2020-05-11 21:06:00,968] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,968] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO ShutdownHookManager: Shutdown hook called
[2020-05-11 21:06:00,969] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,969] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO ShutdownHookManager: Deleting directory /tmp/spark-88cb9622-28cd-42e4-aaea-48a2206ae942
[2020-05-11 21:06:01,098] {taskinstance.py:1145} ERROR - Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf spark.kubernetes.executor.request.cores=0.5 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf --jars cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class com.ibm.cio.dswim.ingest.CommonMerge cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF. Error code is: 0.

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 187, in execute
    self._hook.submit(self._application)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 405, in submit
    self._mask_cmd(spark_submit_cmd), returncode
airflow.exceptions.AirflowException: Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf spark.kubernetes.executor.request.cores=0.5 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf --jars cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class com.ibm.cio.dswim.ingest.CommonMerge cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF. Error code is: 0.
[2020-05-11 21:06:01,100] {taskinstance.py:1168} INFO - Marking task as UP_FOR_RETRY
[2020-05-11 21:06:03,316] {logging_mixin.py:112} INFO - [2020-05-11 21:06:03,313] {local_task_job.py:103} INFO - Task exited with return code 1

Scenario 3:job success, log return from airflow ,get wrong status(This Scenario is what we want to analysis)

[dawany@dawany-inf env_qa]$ kubectl describe pod rmtextract-1589227268554-driver -n batch-pipeline-qa
Name:         rmtextract-1589227268554-driver
Namespace:    batch-pipeline-qa
Priority:     0
Node:         10.93.122.236/10.93.122.236
Start Time:   Mon, 11 May 2020 16:01:10 -0400
Labels:       spark-app-selector=spark-836f53b29a274eabbeba208d27e242de
              spark-role=driver
Annotations:  kubernetes.io/psp: db2oltp-dev-psp
Status:       Succeeded
IP:           172.30.244.35
IPs:          <none>
Containers:
  spark-kubernetes-driver:
    Container ID:  containerd://9608ccf92d1067645b6cbfcd289e6c99c76cd57463ed7c1fb71352a38a27a58c
    Image:         txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
    Image ID:      txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e
    Ports:         7078/TCP, 7079/TCP, 4040/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      driver
      --properties-file
      /opt/spark/conf/spark.properties
      --class
      com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob
      spark-internal
      0
      3
      8
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 11 May 2020 16:01:31 -0400
      Finished:     Mon, 11 May 2020 16:50:14 -0400
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  11Gi
    Requests:
      cpu:     1
      memory:  11Gi
    Environment:
      SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
      SPARK_LOCAL_DIRS:           /var/data/spark-2bb8b417-ca61-4e7a-a4b2-fa0a695a1109
      SPARK_CONF_DIR:             /opt/spark/conf
    Mounts:
      /opt/spark/conf from spark-conf-volume (rw)
      /opt/spark/secrets from dswsecret-volume (rw)
      /var/data/spark-2bb8b417-ca61-4e7a-a4b2-fa0a695a1109 from spark-local-dir-1 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  spark-local-dir-1:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  dswsecret-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  dswsecret
    Optional:    false
  spark-conf-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rmtextract-1589227268554-driver-conf-map
    Optional:  false
  spark-token-fpqpz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  spark-token-fpqpz
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                 node.kubernetes.io/unreachable:NoExecute for 600s
Events:          <none>

*** Reading local file: /home/airflow/airflow/logs/Transformation_Rmt_Master_Load_Sequence_Adhoc_Main/Rmt_Extract/2020-05-11T16:00:00+00:00/1.log
[2020-05-11 15:01:02,589] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-11T16:00:00+00:00 [queued]>
[2020-05-11 15:01:02,722] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-11T16:00:00+00:00 [queued]>
[2020-05-11 15:01:02,722] {taskinstance.py:879} INFO -
--------------------------------------------------------------------------------
[2020-05-11 15:01:02,722] {taskinstance.py:880} INFO - Starting attempt 1 of 2
[2020-05-11 15:01:02,722] {taskinstance.py:881} INFO -
--------------------------------------------------------------------------------
[2020-05-11 15:01:02,826] {taskinstance.py:900} INFO - Executing <Task(SparkSubmitOperator): Rmt_Extract> on 2020-05-11T16:00:00+00:00
[2020-05-11 15:01:02,831] {standard_task_runner.py:53} INFO - Started process 39984 to run task
[2020-05-11 15:01:03,415] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-11T16:00:00+00:00 [running]> kafka02.cloud.ibm.com
[2020-05-11 15:01:03,970] {logging_mixin.py:112} INFO - [2020-05-11 15:01:03,969] {base_hook.py:87} INFO - Using connection to: id: spark_default. Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX
[2020-05-11 15:01:04,000] {logging_mixin.py:112} INFO - [2020-05-11 15:01:04,000] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8
[2020-05-11 15:01:06,349] {logging_mixin.py:112} INFO - [2020-05-11 15:01:06,348] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[2020-05-11 15:01:08,607] {logging_mixin.py:112} INFO - [2020-05-11 15:01:08,607] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
[2020-05-11 15:01:08,608] {logging_mixin.py:112} INFO - [2020-05-11 15:01:08,607] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize the log4j system properly.
[2020-05-11 15:01:08,608] {logging_mixin.py:112} INFO - [2020-05-11 15:01:08,608] {spark_submit_hook.py:479} INFO - log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2020-05-11 15:01:09,773] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,773] {spark_submit_hook.py:479} INFO - Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

[2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,784] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:09 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver
[2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver
[2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
[2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - node name: N/A
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - start time: N/A
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - container images: N/A
[2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-11 15:01:09,787] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,787] {spark_submit_hook.py:479} INFO - status: []

[2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,796] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:09 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver
[2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver
[2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
[2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - start time: N/A
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - container images: N/A
[2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-11 15:01:09,799] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - status: []

[2020-05-11 15:01:10,080] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,080] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:10 INFO Client: Waiting for application Rmt_Extract to finish...
[2020-05-11 15:01:10,307] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,307] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:10 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,307] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z
[2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - start time: 2020-05-11T20:01:10Z
[2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - phase: Pending
[2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]

[2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:32 INFO LoggingPodStatusWatcherImpl: State changed, new state:
[2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver
[2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver
[2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
[2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
[2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6
[2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z
[2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - service account name: spark
[2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
[2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
[2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - start time: 2020-05-11T20:01:10Z
[2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,413] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,413] {spark_submit_hook.py:479} INFO - phase: Running
[2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,413] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://9608ccf92d1067645b6cbfcd289e6c99c76cd57463ed7c1fb71352a38a27a58c, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-11T20:01:31Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]

[2020-05-11 15:49:57,933] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,933] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO LoggingPodStatusWatcherImpl: Container final statuses:
[2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO -
[2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO -
[2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container name: spark-kubernetes-driver
[2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
[2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container state: Running
[2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container started at: 2020-05-11T20:01:31Z
[2020-05-11 15:49:57,935] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO Client: Application Rmt_Extract finished.
[2020-05-11 15:49:57,936] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,936] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO ShutdownHookManager: Shutdown hook called
[2020-05-11 15:49:57,937] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,937] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-5bfba41b-8cab-429b-b64f-dcb4f52c4d3a
[2020-05-11 15:49:58,101] {taskinstance.py:1145} ERROR - Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8. Error code is: 0.
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 187, in execute
    self._hook.submit(self._application)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 405, in submit
    self._mask_cmd(spark_submit_cmd), returncode
airflow.exceptions.AirflowException: Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8. Error code is: 0.
[2020-05-11 15:49:58,102] {taskinstance.py:1168} INFO - Marking task as UP_FOR_RETRY
[2020-05-11 15:50:00,600] {logging_mixin.py:112} INFO - [2020-05-11 15:50:00,599] {local_task_job.py:103} INFO - Task exited with return code 1

submit job in Scenario 3, by manual, get k8s logs

Scenario 3:log return from k8s

[dawany@dawany-inf env_qa]$ kubectl describe pod -n batch-pipeline-qa  testquoterptfact-1589301844228-driver
Name:         testquoterptfact-1589301844228-driver
Namespace:    batch-pipeline-qa
Priority:     0
Node:         10.74.200.157/10.74.200.157
Start Time:   Tue, 12 May 2020 12:44:09 -0400
Labels:       spark-app-selector=spark-6ddf413798da4c5d83645a7bc760a925
              spark-role=driver
Annotations:  kubernetes.io/psp: db2oltp-dev-psp
Status:       Succeeded
IP:           172.30.0.6
IPs:          <none>
Containers:
  spark-kubernetes-driver:
    Container ID:  containerd://d5479e5ff5c582db541e8b545953981d705024f73861a0fddd506a3b11999e4b
    Image:         txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
    Image ID:      txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e
    Ports:         7078/TCP, 7079/TCP, 4040/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      driver
      --properties-file
      /opt/spark/conf/spark.properties
      --class
      com.ibm.cio.dswim.qrf.job.QuoteRptFactJob
      spark-internal
      1
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 12 May 2020 12:44:37 -0400
      Finished:     Tue, 12 May 2020 14:52:57 -0400
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     4
      memory:  16896Mi
    Requests:
      cpu:     1
      memory:  16896Mi
    Environment:
      SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
      SPARK_LOCAL_DIRS:           /var/data/spark-e4c8fd1e-8b33-449c-b460-c842df658705
      SPARK_CONF_DIR:             /opt/spark/conf
    Mounts:
      /opt/spark/conf from spark-conf-volume (rw)
      /opt/spark/secrets/ from dswsecret-volume (rw)
      /var/data/spark-e4c8fd1e-8b33-449c-b460-c842df658705 from spark-local-dir-1 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  spark-local-dir-1:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  dswsecret-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  dswsecret
    Optional:    false
  spark-conf-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      testquoterptfact-1589301844228-driver-conf-map
    Optional:  false
  spark-token-fpqpz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  spark-token-fpqpz
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                 node.kubernetes.io/unreachable:NoExecute for 600s
Events:          <none>

2020-05-12 12:43:59 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-12 12:44:06 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: testquoterptfact-1589301844228-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver
	 pod uid: ff918848-2819-4336-a4a3-654f01dd756c
	 creation time: 2020-05-12T16:44:06Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: N/A
	 start time: N/A
	 container images: N/A
	 phase: Pending
	 status: []
2020-05-12 12:44:06 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: testquoterptfact-1589301844228-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver
	 pod uid: ff918848-2819-4336-a4a3-654f01dd756c
	 creation time: 2020-05-12T16:44:06Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: 10.74.200.157
	 start time: N/A
	 container images: N/A
	 phase: Pending
	 status: []
2020-05-12 12:44:07 INFO  Client:54 - Waiting for application test_quote_rpt_fact to finish...
2020-05-12 12:44:09 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: testquoterptfact-1589301844228-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver
	 pod uid: ff918848-2819-4336-a4a3-654f01dd756c
	 creation time: 2020-05-12T16:44:06Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: 10.74.200.157
	 start time: 2020-05-12T16:44:09Z
	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
	 phase: Pending
	 status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2020-05-12 12:44:37 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
	 pod name: testquoterptfact-1589301844228-driver
	 namespace: batch-pipeline-qa
	 labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver
	 pod uid: ff918848-2819-4336-a4a3-654f01dd756c
	 creation time: 2020-05-12T16:44:06Z
	 service account name: spark
	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
	 node name: 10.74.200.157
	 start time: 2020-05-12T16:44:09Z
	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
	 phase: Running
	 status: [ContainerStatus(containerID=containerd://d5479e5ff5c582db541e8b545953981d705024f73861a0fddd506a3b11999e4b, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-12T16:44:37Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
2020-05-12 13:43:51 INFO  WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0)
2020-05-12 13:43:53 INFO  LoggingPodStatusWatcherImpl:54 - Container final statuses:


	 Container name: spark-kubernetes-driver
	 Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
	 Container state: Running
	 Container started at: 2020-05-12T16:44:37Z
2020-05-12 13:43:53 INFO  Client:54 - Application test_quote_rpt_fact finished.
2020-05-12 13:43:53 INFO  ShutdownHookManager:54 - Shutdown hook called
2020-05-12 13:43:53 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-09a9bf16-11ee-43ea-95f0-e970b0ea7578

Difference is here

2020-05-12 13:43:51 INFO  WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0)
2020-05-12 13:43:53 INFO  LoggingPodStatusWatcherImpl:54 - Container final statuses:


	 Container name: spark-kubernetes-driver
	 Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
	 Container state: Running
	 Container started at: 2020-05-12T16:44:37Z
2020-05-12 13:43:53 INFO  Client:54 - Application test_quote_rpt_fact finished.
2020-05-12 13:43:53 INFO  ShutdownHookManager:54 - Shutdown hook called
2020-05-12 13:43:53 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-09a9bf16-11ee-43ea-95f0-e970b0ea7578

conclusion:

Compared logs details ,there is difference when log is terminated:

  1. airflow side & source code

     log stream interrupt and no 'exit code' return, so 'self._spark_exit_code' will be initial value 'None' then airflow will mark job failed though the job is still running in k8s actually.
    
    here is a similar issue:https://issues.apache.org/jira/browse/AIRFLOW-6244
    
class SparkSubmitHook
    def submit(self, application="", **kwargs):
        """
        Remote Popen to execute the spark-submit job

        :param application: Submitted application, jar or py file
        :type application: str
        :param kwargs: extra arguments to Popen (see subprocess.Popen)
        """
        spark_submit_cmd = self._build_spark_submit_command(application)

        if hasattr(self, '_env'):
            env = os.environ.copy()
            env.update(self._env)
            kwargs["env"] = env

        self._submit_sp = subprocess.Popen(spark_submit_cmd,
                                           stdout=subprocess.PIPE,
                                           stderr=subprocess.STDOUT,
                                           bufsize=-1,
                                           universal_newlines=True,
                                           **kwargs)

        self._process_spark_submit_log(iter(self._submit_sp.stdout.readline, ''))
        returncode = self._submit_sp.wait()

        # Check spark-submit return code. In Kubernetes mode, also check the value
        # of exit code in the log, as it may differ.
        if returncode or (self._is_kubernetes and self._spark_exit_code != 0):
            raise AirflowException(
                "Cannot execute: {}. Error code is: {}.".format(
                    self._mask_cmd(spark_submit_cmd), returncode
                )
            )
        ...

    def _process_spark_submit_log(self, itr):
        # Consume the iterator
        for line in itr:
            line = line.strip()

            # If we run Kubernetes cluster mode, we want to extract the driver pod id
            # from the logs so we can kill the application when we stop it unexpectedly
            elif self._is_kubernetes:
                match = re.search(r'\s*pod name: ((.+?)-([a-z0-9]+)-driver)', line)
                if match:
                    self._kubernetes_driver_pod = match.groups()[0]
                    self.log.info("Identified spark driver pod: %s",
                                  self._kubernetes_driver_pod)

                # Store the Spark Exit code
                match_exit_code = re.search(r'\s*Exit code: (\d+)', line)
                if match_exit_code:
                    self._spark_exit_code = int(match_exit_code.groups()[0])

            ...

            self.log.info(line)
  1. k8s side (k8s client)

    2020-05-12 12:51:40 INFO WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0)
    
    these are nothing to worry about. This is a known occurrence in Kubernetes and is not an issue [0]. The API server ends watch requests when they are very old. The operator uses a client-go informer, which takes care of automatically re-listing the resource and then restarting the watch from the latest resource version.
    
    fabric8:https://github.com/fabric8io/kubernetes-client/issues/1075
    
    https://stackoverflow.com/questions/52910322/kubernetes-resource-versioning/52925973#52925973
    
private long nextReconnectInterval() {
    int exponentOfTwo = currentReconnectAttempt.getAndIncrement();
    if (exponentOfTwo > maxIntervalExponent)
      exponentOfTwo = maxIntervalExponent;
    long ret = reconnectInterval * (1 << exponentOfTwo);
    logger.debug("Current reconnect backoff is " + ret + " milliseconds (T" + exponentOfTwo + ")");
    return ret;
  }
  1. spark side

     http://mail-archives.apache.org/mod_mbox/spark-issues/201805.mbox/%3CJIRA.13158986.1526264708000.69213.1526330460039@Atlassian.JIRA%3E
    
     https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16474770#comment-16474770
    
     pull request: https://github.com/apache/airflow/pull/9081
    

ask for suggestions

so is there some suggestions to avoid this issue?

actions may be considered

[airflow][source_code_change]

modify source code in airflow,use 'kubectl describe pod xxxx -n xxxx' every several seconds instead of log stream.
@dawany dawany added the kind:bug This is a clearly a bug label May 22, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented May 22, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@dawany
Copy link
Author

dawany commented May 25, 2020

Now I am trying kubernetes client invoked by airflow instead of shell cmd, working on it

@stijndehaes
Copy link
Contributor

@ywan2017 I also have a PR open on airflow to work with spark 3.0 #8730

@dawany
Copy link
Author

dawany commented May 29, 2020

@ywan2017 I also have a PR open on airflow to work with spark 3.0 #8730

I saw you trying to fix spark watcher on k8s which is awesome! And that influence airflow schedule too much! It's sad that I am using spark 2.4.4 which is difficult to merge with your code change.

@stijndehaes
Copy link
Contributor

@ywan2017 Yeah once this is merged I want to try to backport it to 2.4.x. But the code has been refactored a lot in 3.x so this will take a while.

@RenGeng
Copy link
Contributor

RenGeng commented Jul 23, 2021

Hi guys, any update on this issue?

@eladkal eladkal added provider:cncf-kubernetes Kubernetes provider related issues provider:Apache labels Sep 15, 2021
@eladkal
Copy link
Contributor

eladkal commented Sep 15, 2021

Hi guys, any update on this issue?

There was attempt of the author to fix this issue #9081 but the PR was abandond.
If you are interested in fixing it please open a PR

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 11, 2021
@github-actions
Copy link

This issue has been closed because it has not received response from the issue author.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:bug This is a clearly a bug pending-response provider:cncf-kubernetes Kubernetes provider related issues stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

No branches or pull requests

6 participants