-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic task mapping does not always create mapped tasks #29531
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
cc: @uranusjr just dragging your attention - seems that #28379 did not solve the issue and semes also another user also sees the same behaviour in 2.5.1 that supposed to fix it #28296 (comment) |
Hi @potiuk, I've prepared the MR #30892 with a solution to such a problem.
I propose a solution in which I check all previously mapped tasks for one task_id as one whole dependency for the current mapped task since we can have different amounts of mapped tasks, and if the current dag run has more mapped tasks and there is no such map_index in the previous dag run, it is difficult to understand that the previous dag run really doesn't have such an map_index or tasks haven't been expanded yet. Also, I think if tasks have the same map_index, it aren't guaranteed that they do the same work and it's better to wait until all mapped tasks in the past finish. |
Hi @knab-analytics, I'm trying to follow what the status is of this issue, but getting confused between all of the related issues and milestones. Is it your understanding that this issue is supposed to be fixed? I am still seeing this problem occurring on my Airflow 2.5.1 installation using Cloud Composer. |
@spire-mike This is not fixed AFAIK, still happens on 2.6.1 using the code in the top post. |
Thanks @knab-analytics, that's too bad. I hope it gets resolved. As a workaround I set |
Apache Airflow version
2.5.1
What happened
Same problem as #28296, but seems to happen nondeterministically, and still happens when ignoring
depends_on_past=True
.I've got a task that retrieves some filenames, which then creates dynamically mapped tasks to move the files, one per task.
I'm using a similar task across multiple DAGs. However, task mapping fails on some DAG runs: it inconsistently happens per DAG run, and some DAGs do not seem to be affected at all. These seem to be the DAGs where no task was ever mapped, so that the mapped task instance ended up in a Skipped state.
What happens is that multiple files will be found, but only a single dynamically mapped task will be created. This task never starts and has map_index of -1. It can be found under the "List instances, all runs" menu, but says "No Data found." under the "Mapped Tasks" tab.
When I press the "Run" button when the mapped task is selected, the following error appears:
The previous task has run however. No errors appeared in my Airflow logs.
When I try to run the task with Ignore All Deps enabled, I get the error:
This last bit is a contradiction, the task cannot be mapped and not mapped simultaneously.
If the amount of mapped tasks is 0 while in this erroneous state, the mapped tasks will not be marked as skipped as expected.
What you think should happen instead
The mapped tasks should not get stuck with "no status".
The mapped tasks should be created and ran successfully, or in the case of a 0-length list output of the upstream task they should be skipped.
How to reproduce
Run the below DAG, if it runs successfully clear several tasks out of order. This may not immediately reproduce the bug, but after some task clearing, for me it always ends up in the faulty state described above.
Operating System
Amazon Linux v2
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: