-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tasks stuck in queued state when using Triggerer process along with max_active_tis_per_dag #34624
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
Thank you for reporting this issue. Indeed most of the concurrency limits don't consider the I will check all the problems you explained and check if there is something else to fix. |
Hi @hussein-awala, Here are the logs:
Weird is that the deferrable tasks terminated with success and remain in queued state (same before the patch by the way) |
Hi again @hussein-awala, Regarding my previous comment it seems that this happened because of other problems such as Kubernetes client timeouts in conjunction to the new deployment that was made (with the patch applied) along with tasks that were already running. Once these were resolved, everything is stable as mentioned earlier… |
Apache Airflow version
2.7.1
What happened
In our company, when we started taking leverage of the new triggerer process. In a specific Airflow instance, we started to have constant tasks stuck in queued state at random intervals almost every day.
The only difference between this specific instance and the other ones, is that this one has a DAG that
max_active_tis_per_dag=1
in a configured in adeferrable
enabled task to avoid having multiple runs of the same task instance along all dag runs.Not executing <TaskInstance: ... [scheduled]> since the task concurrency for this task has been reached.
What you think should happen instead
Scheduler logs started being flooded with the following:
How to reproduce
A DAG with following characteristics:
max_active_runs=4
schedule_interval
max_active_tis_per_dag=1
Operating System
Debian GNU/Linux 11 (bullseye)
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
AKS
Anything else
After digging the code, it seems that the following is happening:
concurrency_map
gets filled with deferred tasks that already completed but are treated as starved tasks refusing to execute any more tasks, despite the deferred tasks were already completed: https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job_runner.py#L530To sum up, the fix would be to add the DEFERRED state to the EXECUTION_STATES structure:
to have the similar behaviour of pools.
On the other hand, the pool's logic should have the same pattern, feed from that EXECUTION_STATES structure instead of making a concatenation:
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: