Make dag_run join as lazyload by default in TaskInstance#32619
Make dag_run join as lazyload by default in TaskInstance#32619haoch wants to merge 2 commits intoapache:mainfrom
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
|
(Housekeeping notes) As the pull request template mentions, please put the description above the line, not below, to improve readability. I have modified the description for you. |
|
Thanks @uranusjr |
|
There are static code failures (see here for how to run those locally to catch and fix them before the PR) as well as many test failures because a new unexpected arg is being passed. |
|
Sure, @o-nikolas will fix all the failures firstly. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
General problem:
joinedloadonTaskInstance.dag_runis generally not needed except for UI/API (list all task instances and related fields at once), but the unnecessaryJOINis a killer of Airflow scheduling performance especially when there are thousands of task instances in single DagRun.The real problem in some large scale Airflow clusters:
We manage thousands of airflow executors (nodes) in production, and each DagRun may have 10k task instances.
In Scheduler side , the impact is very obvious as after making it lazy, we do see 3+ times scheduling performance improvement.
In Executor side, when tasks are queued and preparing to execute, it will check if dependencies meet with all finished task instances, a large amount of concurrent JOIN SQL will slow database performance. Especially when the config content size of some DagRun is large (1MB), thousands of concurrent executors will pull the same large DagRun at same time and it will also cause huge network throughput (~1 GB/second) from database and make the centralized database and the whole airflow cluster unhealthy.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.