Skip to content

Adding 1 second sleep time between retries for publishing tasks#35791

Closed
sathish9094 wants to merge 2 commits intoapache:mainfrom
sathish9094:master
Closed

Adding 1 second sleep time between retries for publishing tasks#35791
sathish9094 wants to merge 2 commits intoapache:mainfrom
sathish9094:master

Conversation

@sathish9094
Copy link

This retry loop is pretty fast and it doesn't wait for number of seconds value that we pass from airflow.cfg

# The Maximum number of retries for publishing task messages to the broker when failing
# due to ``AirflowTaskTimeout`` error before giving up and marking Task as failed.
task_publish_max_retries = 3

By making it to wait for a second before each retry will give some time for broker recovery before making the task as failed.

@boring-cyborg
Copy link

boring-cyborg bot commented Nov 22, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@sathish9094
Copy link
Author

@hussein-awala Can you please help to review/approve this PR?

@potiuk
Copy link
Member

potiuk commented Dec 1, 2023

I don't think hardcoding it to 1 second is a good idea. it sounds pretty magical. Also it has potentially undesireable side effects - there might be cases where the retry is actually good if it happens quickly.

If we want to do add anything like that, I'd say we should have a configuration option telling whether to delay retries or not and how long. Also it is generally a bad idea to have a fixed hard-coded value - ideally there should be a way to get at least a bit of exponential back-off and make several retries with increasing delay period - and we should be able to configure both initlal delay, number of retries and exponential back-off factor.

We can use tenacity for that - we already use it for similar purposes.

I think if this is done this way - with configurable retries, exponential backoff (and unit tests as well) - then we could merge it I think.

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @potiuk; the best solution is moving the retried block to a new method and applying a @tenacity.retry, and similar to task_publish_max_retries, we should add a configuration for wait time and strategy (fixed or exponential).

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 31, 2024
@github-actions github-actions bot closed this Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:celery stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants