Fix float overflow for retry_exponential_backoff#47967
Closed
alealandreev wants to merge 161 commits intoapache:mainfrom
alealandreev:alealandreev-chngs
Closed
Fix float overflow for retry_exponential_backoff#47967alealandreev wants to merge 161 commits intoapache:mainfrom alealandreev:alealandreev-chngs
alealandreev wants to merge 161 commits intoapache:mainfrom
alealandreev:alealandreev-chngs
Conversation
(cherry picked from commit cb80bda)
…ty (#41382) When using older FAB providers on the new airflow, this function is called in the old provider and is no longer available in the new airflow. This PR brings this back to fix issue in main and v2-10-test branch where all DAGs fail because of lack of this function (cherry picked from commit 0576f55)
* Attempt to fix TriggerDagRunOperator for Database Isolation Tests * Finalize making tests run for triggerdagrunoperator in db isolation mode * Adjust query count assert for adjustments to serialization * Review feedback (cherry picked from commit 6b810b8)
(cherry picked from commit 60cbea5)
(cherry picked from commit b4a92f8)
(cherry picked from commit 54c165c)
(cherry picked from commit 68a6a05)
) * Pass serialized parameter for dag_maker * Serialisation of object is on __exit__ moving out the dag definition out of dag_maker context (cherry picked from commit 278f3c4)
…iases are resolved into new datasets (#41398) * fix(datasets/manager): fix DagPriorityParsingRequest unique constraint error when dataset aliases are resolved into new datasets this happens when dynamic task mapping is used * refactor(dataset/manager): reword debug log Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com> * refactor(dataset/manager): remove unnecessary logging Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com> --------- Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com> (cherry picked from commit bf64cb6)
) The PROD image building fails currently in non-main because it attempts to build source provider packages rather than use them from PyPi when PR is run against "v-test" branch. This PR fixes it: * PROD images in non-main-targetted build will pull providers from PyPI rather than build them * they use PyPI constraints to install the providers * they use UV - which should speed up building of the images (cherry picked from commit 4d5f1c4) (cherry picked from commit bf0d412)
…41610) * Enable pull requests to be run from v*test branches (#41474) (#41476) Since we switch from direct push of cherry-picking to open PRs against v*test branch, we should enable PRs to run for the target branch. (cherry picked from commit a9363e6) * Prevent provider lowest-dependency tests to run in non-main branch (#41478) (#41481) When running tests in v2-10-test branch, lowest depenency tests are run for providers - because when calculating separate tests, the "skip_provider_tests" has not been used to filter them out. This PR fixes it. (cherry picked from commit 75da507) * Make PROD image building works in non-main PRs (#41480) (#41484) The PROD image building fails currently in non-main because it attempts to build source provider packages rather than use them from PyPi when PR is run against "v-test" branch. This PR fixes it: * PROD images in non-main-targetted build will pull providers from PyPI rather than build them * they use PyPI constraints to install the providers * they use UV - which should speed up building of the images (cherry picked from commit 4d5f1c4) * Add WebEncoder for trigger page rendering to avoid render failure (#41350) (#41485) Co-authored-by: M. Olcay Tercanlı <muhammed_tercanli@epam.com> * Incorrect try number subtraction producing invalid span id for OTEL airflow (issue #41501) (#41502) (#41535) * Fix for issue #39336 * removed unnecessary import (cherry picked from commit dd3c3a7) Co-authored-by: Howard Yoo <32691630+howardyoo@users.noreply.github.com> * Fix failing pydantic v1 tests (#41534) (#41541) We need to exclude some versions of Pydantic v1 because it conflicts with aws provider. (cherry picked from commit a033c5f) * Fix Non-DB test calculation for main builds (#41499) (#41543) Pytest has a weird behaviour that it will not collect tests from parent folder when subfolder of it is specified after the parent folder. This caused some non-db tests from providers folder have been skipped during main build. The issue in Pytest 8.2 (used to work before) is tracked at pytest-dev/pytest#12605 (cherry picked from commit d489826) * Add changelog for airflow python client 2.10.0 (#41583) (#41584) * Add changelog for airflow python client 2.10.0 * Update client version (cherry picked from commit 317a28e) * Make all test pass in Database Isolation mode (#41567) This adds dedicated "DatabaseIsolation" test to airflow v2-10-test branch.. The DatabaseIsolation test will run all "db-tests" with enabled DB isolation mode and running `internal-api` component - groups of tests marked with "skip-if-database-isolation" will be skipped. * Upgrade build and chart dependencies (#41570) (#41588) (cherry picked from commit c88192c) Co-authored-by: Jarek Potiuk <jarek@potiuk.com> * Limit watchtower as depenendcy as 3.3.0 breaks moin. (#41612) (cherry picked from commit 1b602d5) * Enable running Pull Requests against v2-10-stable branch (#41624) (cherry picked from commit e306e7f) * Fix tests/models/test_variable.py for database isolation mode (#41414) * Fix tests/models/test_variable.py for database isolation mode * Review feedback (cherry picked from commit 736ebfe) * Make latest botocore tests green (#41626) The latest botocore tests are conflicting with a few requirements and until apache-beam upcoming version is released we need to do some manual exclusions. Those exclusions should make latest botocore test green again. (cherry picked from commit a13ccbb) * Simpler task retrieval for taskinstance test (#41389) The test has been updated for DB isolation but the retrieval of task was not intuitive and it could lead to flaky tests possibly (cherry picked from commit f25adf1) * Skip database isolation case for task mapping taskinstance tests (#41471) Related: #41067 (cherry picked from commit 7718bd7) * Skipping tests for db isolation because similar tests were skipped (#41450) (cherry picked from commit e94b508) --------- Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Brent Bovenzi <brent@astronomer.io> Co-authored-by: M. Olcay Tercanlı <muhammed_tercanli@epam.com> Co-authored-by: Howard Yoo <32691630+howardyoo@users.noreply.github.com> Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com> Co-authored-by: Bugra Ozturk <bugraoz93@users.noreply.github.com>
* able to change the 'Changed Row' display message after edit * added message in connection form to warn of empty fields * attempt to warn the specific fields cannot be empty * revert change because need to check fields before save is clicked * issues warning for specific fields that can't be deleted after save * removed the individual warnings * changed status to concise string * added more concise suggestion --------- Co-authored-by: Lucy Hu <90779522+lh5844@users.noreply.github.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
#43567) It's possible that the start/end date are null when processing an executor event, and there is no point in adding an OTEL event in that case. Before this, we'd try and convert `None` to nanoseconds and blow up the scheduler. Note: I don't think `queued_dttm` can be empty, but figured it didn't hurt to guard against it just in case I've overlooked a way it can be possible. (cherry picked from commit fe41e15) (cherry picked from commit c83e524)
…d from an extended operator (#42849) (#43577) * refactor: Don't raise a warning when execute is called from an extended operator, as this should always be allowed. * refactored: Fixed import of test_utils in test_dag_run --------- Co-authored-by: David Blain <david.blain@infrabel.be> (cherry picked from commit 95c46ec) Co-authored-by: David Blain <info@dabla.be> (cherry picked from commit 2f29c57)
2 tasks
Member
pierrejeambrun
left a comment
There was a problem hiding this comment.
Thank you for the PR, the branch looks broken (wrong rebase or wrong target branch).
Do you mind cleaning the branch to keep only the relevant change so we can proceed with the review.
potiuk
requested changes
Mar 21, 2025
Member
potiuk
left a comment
There was a problem hiding this comment.
Please remove all unrleated changes before doing anythin here. Therea are plenty plenty of commits in this PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue: #47971
Fixed float overflow for exponential_backoff calculation. I encountered with error when retry_delay = 5 minutes, max_retry_delay=1 hour, retry_exponential_backoff is True and retries=1000 in DAG configuration. In this case Scheduler brokes down on ~1000 retry due to float overflow (delay is calculating on each retry) and after 1000 retries DAG is still trying to start. So total number of retries I encountered is 1017, which is more than 1000. This is due to this formula in line 2657 in taskinstance.py: min_backoff = math.ceil(delay.total_seconds() * (2 ** (self.try_number - 1))).
We should limit degree to reasonable value, such as 30 for instance. After that we need to avoid all possible exceptions. This fix repairs exponential backoff logic, so float overflow will never happen.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.