Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TriggerDagRunOperator task fails with index out of range exception while trying to reset dag run #27299

Closed
1 of 2 tasks
hugowangler opened this issue Oct 26, 2022 · 4 comments
Closed
1 of 2 tasks
Assignees
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow good first issue kind:bug This is a clearly a bug

Comments

@hugowangler
Copy link

hugowangler commented Oct 26, 2022

Apache Airflow version

2.4.2

What happened

List index out of range exception is raised when trying to trigger a DAG run of another DAG using the TriggerDagRunOperator with reset_dag_run=True.

Note that although the TriggerDagRunOperator task fails in the task that is trying to trigger the target DAG, the DAG run is actually cleared and triggered correctly.

[2022-10-26, 17:13:38 UTC] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: trigger_example.trigger manual__2022-10-26T17:13:33+00:00 [queued]>
[2022-10-26, 17:13:38 UTC] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: trigger_example.trigger manual__2022-10-26T17:13:33+00:00 [queued]>
[2022-10-26, 17:13:38 UTC] {taskinstance.py:1362} INFO - 
--------------------------------------------------------------------------------
[2022-10-26, 17:13:38 UTC] {taskinstance.py:1363} INFO - Starting attempt 1 of 1
[2022-10-26, 17:13:38 UTC] {taskinstance.py:1364} INFO - 
--------------------------------------------------------------------------------
[2022-10-26, 17:13:38 UTC] {taskinstance.py:1383} INFO - Executing <Task(TriggerDagRunOperator): trigger> on 2022-10-26 17:13:33+00:00
[2022-10-26, 17:13:38 UTC] {standard_task_runner.py:55} INFO - Started process 2181 to run task
[2022-10-26, 17:13:38 UTC] {standard_task_runner.py:82} INFO - Running: ['airflow', 'tasks', 'run', 'trigger_example', 'trigger', 'manual__2022-10-26T17:13:33+00:00', '--job-id', '920', '--raw', '--subdir', 'DAGS_FOLDER/dags/trigger-example-dag.py', '--cfg-path', '/tmp/tmpmg9ay0du']
[2022-10-26, 17:13:38 UTC] {standard_task_runner.py:83} INFO - Job 920: Subtask trigger
[2022-10-26, 17:13:38 UTC] {task_command.py:376} INFO - Running <TaskInstance: trigger_example.trigger manual__2022-10-26T17:13:33+00:00 [running]> on host airflow-worker-0.airflow-worker.airflow.svc.cluster.local
[2022-10-26, 17:13:38 UTC] {taskinstance.py:1590} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=trigger_example
AIRFLOW_CTX_TASK_ID=trigger
AIRFLOW_CTX_EXECUTION_DATE=2022-10-26T17:13:33+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-10-26T17:13:33+00:00
[2022-10-26, 17:13:38 UTC] {trigger_dagrun.py:146} INFO - Clearing example on 2022-10-24T00:00:00+00:00
[2022-10-26, 17:13:38 UTC] {taskinstance.py:1851} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/trigger_dagrun.py", line 136, in execute
    dag_run = trigger_dag(
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/api/common/trigger_dag.py", line 124, in trigger_dag
    triggers = _trigger_dag(
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/api/common/trigger_dag.py", line 78, in _trigger_dag
    raise DagRunAlreadyExists(
airflow.exceptions.DagRunAlreadyExists: A Dag Run already exists for dag id example at 2022-10-24T00:00:00+00:00 with run id manual__2022-10-24T00:00:00+00:00

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/trigger_dagrun.py", line 157, in execute
    dag_run = DagRun.find(dag_id=dag.dag_id, run_id=run_id)[0]
IndexError: list index out of range
[2022-10-26, 17:13:38 UTC] {taskinstance.py:1401} INFO - Marking task as FAILED. dag_id=trigger_example, task_id=trigger, execution_date=20221026T171333, start_date=20221026T171338, end_date=20221026T171338
[2022-10-26, 17:13:38 UTC] {standard_task_runner.py:100} ERROR - Failed to execute job 920 for task trigger (list index out of range; 2181)
[2022-10-26, 17:13:38 UTC] {local_task_job.py:164} INFO - Task exited with return code 1
[2022-10-26, 17:13:38 UTC] {local_task_job.py:273} INFO - 0 downstream tasks scheduled from follow-on schedule check

What you think should happen instead

The DAG run should be cleared since a run at the specified execution_date exists, or if something else actually is wrong this should probably be logged better so the user understands what's wrong their DAG.

After some further testing I noticed that the DAG run is actually cleared and rerun at the specified execution_date, so the exception that occurs only causes the TriggerDagRunOperator task to fail. But I still expect this not to fail since it's actually working.

How to reproduce

To reproduce I used the following two DAGs

example-dag.py

import pendulum
from airflow.decorators import task, dag
from airflow.operators.bash import BashOperator


@dag(
    dag_id="example",
    schedule="@daily",
    start_date=pendulum.datetime(2022, 10, 24, tz="UTC"),
    catchup=True,
)
def example():
    hello = BashOperator(task_id="hello", bash_command="echo hello")

    @task(task_id="airflow")
    def airflow():
        print("airflow")

    hello >> airflow()


dag = example()

trigger-example-dag.py

import pendulum
from airflow.decorators import dag, task
from airflow.operators.trigger_dagrun import TriggerDagRunOperator


@dag(
    dag_id="trigger_example",
    schedule="@daily",
    start_date=pendulum.datetime(2022, 10, 25, tz="UTC"),
    catchup=False,
)
def trigger_example_dag():
    @task(task_id="dummy")
    def dummy():
        print("dummy")

    retry = TriggerDagRunOperator(
        task_id="trigger",
        trigger_dag_id="example",
        execution_date="20221024",
        reset_dag_run=True,
    )

    dummy() >> retry


dag = trigger_example_dag()

Steps

From the Airflow UI

  1. Enable the example DAG and let it catchup
  2. Note the Started timestamp of the example DAG run with RUN_ID=scheduled__2022-10-24T00:00:00+00:00
  3. Enable the trigger_example DAG

After this is done you should be able to see that the trigger task in trigger_exampe fails with the list index out of bounds exception (see stacktrace above). You will also be able to see that the example DAG has correctly been rerun at the specified execution_date (the started timestamp should be different).

Operating System

debian 11 bullseye

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==6.0.0
apache-airflow-providers-celery==3.0.0
apache-airflow-providers-cncf-kubernetes==4.4.0
apache-airflow-providers-common-sql==1.2.0
apache-airflow-providers-docker==3.2.0
apache-airflow-providers-elasticsearch==4.2.1
apache-airflow-providers-ftp==3.1.0
apache-airflow-providers-google==8.4.0
apache-airflow-providers-grpc==3.0.0
apache-airflow-providers-hashicorp==3.1.0
apache-airflow-providers-http==4.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-microsoft-azure==4.3.0
apache-airflow-providers-mysql==3.2.1
apache-airflow-providers-odbc==3.1.2
apache-airflow-providers-postgres==5.2.2
apache-airflow-providers-redis==3.0.0
apache-airflow-providers-sendgrid==3.0.0
apache-airflow-providers-sftp==4.1.0
apache-airflow-providers-slack==6.0.0
apache-airflow-providers-sqlite==3.2.1
apache-airflow-providers-ssh==3.2.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

Using the Apache Airflow Helm Chart 1.6.0 but we have upgraded the airflow version to 2.4.2.

Also using self deployed postgres with pgbouncer enabled. The postgres deployment has been working as expected.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@hugowangler hugowangler added area:core kind:bug This is a clearly a bug labels Oct 26, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Oct 26, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@hugowangler
Copy link
Author

hugowangler commented Oct 26, 2022

I noticed now that the example DAG is actually correctly triggered and rerun on the specified execution_date. So it's only the trigger task in trigger_example that "fails".

Would still consider it a bug but the actual trigger of the dag run still happens.

Will update the issue to reflect this.

@uranusjr uranusjr added area:core-operators Operators, Sensors and hooks within Core Airflow and removed area:core labels Oct 27, 2022
@Adityamalik123
Copy link
Contributor

@potiuk I'd be interested to take this issue up. Can this be assigned to me?

@eladkal
Copy link
Contributor

eladkal commented Nov 16, 2022

fixed in #27635

@eladkal eladkal closed this as completed Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow good first issue kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

5 participants