Skip to content

DAG run gets failed due to unable to fetch worker pod name or wrong worker name  #21945

@Amey2400

Description

@Amey2400

Apache Airflow version

2.1.3

What happened

DAG run gets failed due to some task instances getting failed in the first try either due to "pod name must be provided" or "pod not found" issue.

But in the second retry task gets completed successfully and that task is marked as a success but before that downstream tasks were marked as "upstream failed" and they didn't rerun and the whole dag run is marked as failed.

In some instances, in 1st try run of the task, there are no errors for fetching logs but no logs are shown except 1st reading from the log file statement.

But on 2nd try proper logs are shown and that task is completed successfully. But in that case, the same issue occurs, downstream tasks were marked as "upstream failed" before 2nd retry was complete, and the dag run was marked as failed.

What you expected to happen

For 1st try of task instance, it should be able to fetch worker pod logs and the "pod name must be provided" error must not occur or logs should not empty.
And if 1st try of the task has failed but if some no of retries is set for it then the downstream task should not be marked as "upstream failed" completely even if downstream tasks also have the same no of retries set. And if in 2nd try of task if the task gets completed successfully then downstream tasks should also rerun and dag run should not be marked as failed.

How to reproduce

No response

Operating System

Debian GNU/Linux

Versions of Apache Airflow Providers

No response

Deployment

Other 3rd-party Helm chart

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions