Skip to content

error on _resolve_should_track_driver_status with spark standalone (cluster mode) #46334

@rlebreto

Description

@rlebreto

Apache Airflow Provider(s)

apache-spark

Versions of Apache Airflow Providers

apache-airflow-providers-apache-spark 5.0.0
pyspark 3.5.4

Apache Airflow version

2.10.3

Operating System

ubuntu server 22.04

Deployment

Other

Deployment details

NA

What happened

Hi

SparkSubmitOperator fails when interacting with a spark standalone server in deploy-mode=cluster.

In such condition, the _resolve_should_track_driver_status function in /usr/local/lib/python3.12/site-packages/airflow/providers/apache/spark/hooks/spark_submit.py returns True...

I have exactly the same pb as the one discussed here :
#21799

I did find a case opened for this issue.

What you think should happen instead

I have an error when _resolve_should_track_driver_status returns True.
If I modify the function to return False, testing the task passes but in such case the final spark job status is not confirmed by the spark driver.

How to reproduce

cat sparkoperator_test.py
import airflow
from datetime import datetime, timedelta

from airflow.providers.apache.spark.operators.spark_submit import SparkSubmitOperator

default_args = {
    'owner': 'xxxxxxxx',
    'start_date': datetime(2025, 1, 31),
}

with airflow.DAG('sparkoperator_test',
                  default_args=default_args,
                  schedule_interval='@hourly',
                  description='test of sparkoperator in airflow by Omar and Regis',
                  ) as dag:

    spark_conf={
        'spark.standalone.submit.waitAppCompletion':'true',
    }

    spark_compute_pi = SparkSubmitOperator(
        task_id='spark-compute-pi',
        conn_id='spark_qa1',
        java_class='org.apache.spark.examples.SparkPi',
        application='/opt/spark/examples/jars/spark-examples_2.12-3.5.3.jar',
        name='compute_pi_from_airflow',
        application_args=["30"],
        total_executor_cores=4,
        executor_memory='1g',
        executor_cores=1,
        verbose='true',
        conf=spark_conf,
        execution_timeout=timedelta(minutes=10),
        retries=0,
    )

    spark_compute_pi

spark version : 3.5.3
1 single node (hosting the master + 1 worker node)

The error happens when testing the task with airflow tasks test ...

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions