Skip to content

GlueJobOperator - resume_glue_job_on_retry doesn't seem to work on MWAA 2.11.0 #62353

@wilsonhooi86

Description

@wilsonhooi86

Apache Airflow version

2.11.X

If "Other Airflow 3 version" selected, which one?

MWAA 2.11.0

What happened?

Hi,

I tried to run the GlueJobOperator with resume_glue_job_on_retry=True but somehow during retry when the task failed, it creates a new glue job ID while the existing glue job ID is still running.

Sample of the GlueJobOperator:

 drug_program = GlueJobOperator(
            task_id="drug_program",
            job_name="gtm-core-dl-dev-euc1-glue-job-glo_informa_prd",
            verbose=False,
            script_args={
                "--name": var_glue_job_name,
                "--db_name": var_database_name,
                "--tbl_name": "drug_program",
            },
            stop_job_run_on_kill=False,
            deferrable=False,
            pool=var_glue_pool,
            botocore_config=botocore_overide_config,
            resume_glue_job_on_retry=True,
        )

1st Run:
Create a new glue job ID as per screenshot
Image

2nd Run (1st Retry):

Job failed due to internal errors, during retry, it didn't find back the existing glue job ID but created 1 new glue job ID. End results, 2 same glue jobs are running together.

Image

What you think should happen instead?

During task retry, it should check the existing glue job ID, if found, it should not create a new glue job ID and resume run status in Airflow Task

How to reproduce

  1. Install the apache-airflow-providers-amazon==9.22.0rc1 in MWAA Airflow 2.11 environment
Image

2.. Create GlueJobOperator Dag
Sample of the GlueJobOperator:

 drug_program = GlueJobOperator(
            task_id="drug_program",
            job_name="gtm-core-dl-dev-euc1-glue-job-glo_informa_prd",
            verbose=False,
            script_args={
                "--name": var_glue_job_name,
                "--db_name": var_database_name,
                "--tbl_name": "drug_program",
            },
            stop_job_run_on_kill=False,
            deferrable=False,
            pool=var_glue_pool,
            botocore_config=botocore_overide_config,
            resume_glue_job_on_retry=True,
        )
  1. Run the glue job task.
  2. During 1st run, it failed (to test, produce by marking the task as failed)
  3. Task went to for retry but somehow it creates a new glue job ID

Operating System

Amazon Linux 2023

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==9.22.0rc1

Deployment

Amazon (AWS) MWAA

Deployment details

MWAA 2.11.0 deployment with requirements to install apache-airflow-providers-amazon==9.22.0rc1

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions