Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix retry_exponential_backoff divide by zero error when retry delay is zero #17003

Merged

Conversation

edwardwang888
Copy link
Contributor

@edwardwang888 edwardwang888 commented Jul 14, 2021

When retry_delay is zero, the retry_exponential_backoff algorithm has a divide-by-zero error in the modded_hash calculation, which causes the scheduler to crash.

d1dceff attempted to address the divide-by-zero error by adding a ceiling function, but it doesn't account for this particular edge case. Using a lower bound instead would be a better implementation.

Fixes #17005.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg
Copy link

boring-cyborg bot commented Jul 14, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@edwardwang888 edwardwang888 force-pushed the fix_retry_exponential_backoff_divide_by_zero branch from 15dbe99 to 48bd9dd Compare September 28, 2021 23:19
@edwardwang888 edwardwang888 changed the title WIP Fix retry_exponential_backoff divide by zero error when retry delay is zero Fix retry_exponential_backoff divide by zero error when retry delay is zero Sep 28, 2021
@edwardwang888 edwardwang888 marked this pull request as ready for review September 29, 2021 02:19
@edwardwang888 edwardwang888 force-pushed the fix_retry_exponential_backoff_divide_by_zero branch 3 times, most recently from fef1c54 to cec61c0 Compare September 29, 2021 17:48
This test was incorrect before. When `delay.total_seconds()` equals 1 and `self.try_number` equals 1, `min_backoff` equals 1. Thus, `modded_hash` always equals 1, regardless of the value of `ti_hash`. So it should not be a period of 1 to 15 seconds, but just 1 second.

Code referenced was:
airflow/airflow/models/taskinstance.py
Lines 1064 to 1087
The test can be reused as the end result is the same.
@edwardwang888 edwardwang888 force-pushed the fix_retry_exponential_backoff_divide_by_zero branch from cec61c0 to b5acb5b Compare September 29, 2021 18:02
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Sep 29, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

retry_exponential_backoff algorithm does not account for case when retry_delay is zero
4 participants