Skip to content

Comments

Make dag_run join as lazyload by default in TaskInstance#32619

Closed
haoch wants to merge 2 commits intoapache:mainfrom
haoch:improve_scheduling_performance
Closed

Make dag_run join as lazyload by default in TaskInstance#32619
haoch wants to merge 2 commits intoapache:mainfrom
haoch:improve_scheduling_performance

Conversation

@haoch
Copy link
Member

@haoch haoch commented Jul 14, 2023

General problem: joinedload on TaskInstance.dag_run is generally not needed except for UI/API (list all task instances and related fields at once), but the unnecessary JOIN is a killer of Airflow scheduling performance especially when there are thousands of task instances in single DagRun.

The real problem in some large scale Airflow clusters:
We manage thousands of airflow executors (nodes) in production, and each DagRun may have 10k task instances.

  • In Scheduler side , the impact is very obvious as after making it lazy, we do see 3+ times scheduling performance improvement.

  • In Executor side, when tasks are queued and preparing to execute, it will check if dependencies meet with all finished task instances, a large amount of concurrent JOIN SQL will slow database performance. Especially when the config content size of some DagRun is large (1MB), thousands of concurrent executors will pull the same large DagRun at same time and it will also cause huge network throughput (~1 GB/second) from database and make the centralized database and the whole airflow cluster unhealthy.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added the area:webserver Webserver related Issues label Jul 14, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 14, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@uranusjr
Copy link
Member

(Housekeeping notes) As the pull request template mentions, please put the description above the line, not below, to improve readability. I have modified the description for you.

@haoch
Copy link
Member Author

haoch commented Jul 17, 2023

Thanks @uranusjr

@o-nikolas
Copy link
Contributor

There are static code failures (see here for how to run those locally to catch and fix them before the PR) as well as many test failures because a new unexpected arg is being passed.

@haoch
Copy link
Member Author

haoch commented Jul 18, 2023

Sure, @o-nikolas will fix all the failures firstly.

@github-actions
Copy link

github-actions bot commented Sep 1, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Sep 1, 2023
@github-actions github-actions bot closed this Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:webserver Webserver related Issues stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants