-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Airflow scheduling only 1 dag at a time leaving other dags in a queued state #18051
Comments
I am also observing the same issue. I can add that it is not limited to We have Airflow 2.1.3 deployed on k8s using the unofficial helm chart at https://github.com/airflow-helm/charts. We mostly have default config settings. Changes related to the scheduler are these:
|
For now, Increase @r-richmond , #17945 will resolve it but would introduce another bug that execution_order won't be followed. We are working on a better fix: #18061 |
Ty @ephraimbuddy for the update. I was about to proceed with cherry picking #17945 and your comment saved me some extra experimentation work. |
Going to close this out since I assume #18061 tackled it and I'm on 2.1.4. now. If this issue presents itself again I'll re-open. Thanks again @ephraimbuddy note: the only thing that I find still strange is the error logs from the scheduler that I originally had in particular the error
however, that hasn't come up again so... fingers crossed it won't happen again? |
@pcolladosoto after reading the linked issue I think you may be right. Thank you so much for your write up and fix. It definitely looks like you found the root cause for the OsError I saw in the logs. Now I/we wait for 2.3 :) |
Apache Airflow version
2.1.3 (latest released)
Operating System
Linux
Versions of Apache Airflow Providers
I don't think it is relevant but can provide them upon request.
Deployment
Other Docker-based deployment
Deployment details
No response
What happened
Completed an update to airflow 2.1.3 on Thursday and received an error report over the weekend that jobs were not being run. Upon investigating I discovered that only 1 DAG was running with everything else stuck in the queued state.
Glancing at the 1 running dag was a long running backfill (30 days, ETA 2 weeks left, ~180 dag runs) that has
max_active_runs=1
The logs on airflow worker were normal however the logs on the scheduler were displaying the following error
raise OSError("handle is closed")
(See below for complete Logs).Anyways restarting the scheduler & workers did nothing. However upon turning off this long running dag (another 1 just like it started after turning off that one too) all of the dags began scheduling normally.
What you expected to happen
Airflow should be able to continue scheduling dags normally regardless of the existence of this dag that is slowly catching up.
Note this may be related to one of the following:
#17975
#13542
#17945
EDIT: Upon reading #17945 again it seems like this will resolve our issue. I'll mark this as closed once I can verify that change fixes the issue (i.e. a new version is released)
How to reproduce
Steps to attempt to reproduce (Apologies for not going this far).
Anything else
Airflow Scheduler Logs
Logs from after the issue started and after restarting the scheduler.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: