Skip to content

DataprocSubmitJobOperator - sometimes doesn't detect job status COMPLETED #30253

@betaedu27

Description

@betaedu27

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

Composer version composer-2.1.9-airflow-2.4.3

apache-airflow-providers-common-sql==1.3.3
apache-airflow-providers-dbt-cloud==3.0.0
apache-airflow-providers-ftp==3.3.1
apache-airflow-providers-google==8.9.0
apache-airflow-providers-hashicorp==3.2.0
apache-airflow-providers-http==4.1.1
apache-airflow-providers-imap==3.1.1
apache-airflow-providers-mysql==4.0.1
apache-airflow-providers-postgres==5.4.0
apache-airflow-providers-sendgrid==3.1.0
apache-airflow-providers-sqlite==3.3.1
apache-airflow-providers-ssh==3.4.0

Apache Airflow version

airflow-2.4.3

Deployment

Google Cloud Composer

What happened

When running DataprocSubmitJobOperator, the job is successfully submitted to dataproc and runs without any issues.

However, Airflow does not detect that the job has completed successfully, neither a error is generated in logs.

This issue does not always occur. I am not sure when it occurs, but it appears more frequently when running several different jobs simultaneously.

Airflow log:
airflow_log.txt

Dataproc log:
dataproc_log.txt

Dataproc job rest api:
rest_api_dataproc_job.txt

How to reproduce

I am not sure when it occurs, but it appears more frequently when running several different jobs simultaneously.
All jobs have a 1 - 2 mins of duration

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions