Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance degradation when updating dagrun state #8435

Merged
merged 1 commit into from
Apr 18, 2020

Conversation

danfrankj
Copy link
Contributor

@danfrankj danfrankj commented Apr 18, 2020

The set comprehension {t.task_id for t in dag.leaves} in the conditional is re-evaluated every for each loop in the outer list comprehension resulting in an O(n^2) computation and significant slowdowns on large dags.

Introduced here: 8f6ca53#diff-32aa8dbb910719ef24a39cab5d0f2a97R302

Diagnosed and discovered by @cmlad


Make sure to mark the boxes below before creating PR: [x]

  • [x ] Description above provides context of the change
  • [not sure - please advise ] Unit tests coverage for changes (not needed for documentation changes)
  • [x ] Commits follow "How to write a good git commit message"
  • [not sure - please advise ] Relevant documentation is updated including usage instructions.
  • [x ] I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@kaxil kaxil requested a review from BasPH April 18, 2020 18:04
Copy link
Contributor

@BasPH BasPH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, good catch

@BasPH
Copy link
Contributor

BasPH commented Apr 18, 2020

@danfrankj out of interest: how big is your DAG?

@danfrankj
Copy link
Contributor Author

@BasPH we're at ~5000 tasks so when we upgraded recently, it ground to a halt.

Also, pretty sure the CI failure here is unrelated?

@danfrankj
Copy link
Contributor Author

Whoops, good catch

Catch credit goes to @cmlad :)

@kaxil kaxil merged commit dd9f04e into apache:master Apr 18, 2020
@kaxil kaxil added this to the Airflow 1.10.11 milestone Apr 18, 2020
kaxil pushed a commit that referenced this pull request Jun 18, 2020
Co-authored-by: Dan Frank <dan.frank@coinbase.com>
(cherry picked from commit dd9f04e)
potiuk pushed a commit that referenced this pull request Jun 29, 2020
Co-authored-by: Dan Frank <dan.frank@coinbase.com>
(cherry picked from commit dd9f04e)
@kaxil kaxil added the type:improvement Changelog: Improvements label Jul 1, 2020
kaxil pushed a commit that referenced this pull request Jul 1, 2020
Co-authored-by: Dan Frank <dan.frank@coinbase.com>
(cherry picked from commit dd9f04e)
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
Co-authored-by: Dan Frank <dan.frank@coinbase.com>
(cherry picked from commit dd9f04e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants