Add exponential backoff and retry functionality to rest_client #904

jp-harvey · 2021-12-24T21:51:00Z

While working with the Atlassian Cloud API and attempting to optimize the time it takes to do many operations, it's common to hit the Atlassian API request limits, resulting in annoying negative engineering.

This PR adds a convenience option to the underlying REST client, turned off by default, to retry using exponential backoff.

jp-harvey · 2021-12-24T21:52:09Z

@gonchik I've not tested this quite enough for it not to be a [WIP], will update after I've done so.

codecov-commenter · 2021-12-25T16:24:08Z

Codecov Report

Merging #904 (c25ea52) into master (4d90a20) will increase coverage by 0.03%.
The diff coverage is 47.82%.

@@            Coverage Diff             @@
##           master     #904      +/-   ##
==========================================
+ Coverage   36.19%   36.22%   +0.03%     
==========================================
  Files          32       32              
  Lines        6335     6357      +22     
  Branches      978      983       +5     
==========================================
+ Hits         2293     2303      +10     
- Misses       3934     3945      +11     
- Partials      108      109       +1

Impacted Files	Coverage Δ
atlassian/rest_client.py	`65.68% <47.82%> (-3.03%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d90a20...c25ea52. Read the comment docs.

gonchik · 2021-12-25T18:49:25Z

@jp-harvey checked a few options, that's very great step for the rest client module :)
Thank you for your initiative, I hope soon it will be done.

atlassian/rest_client.py

flichtenheld · 2022-01-19T14:09:16Z

Another change I made was to add these new options to BitbucketBase._new_session_args. At least my assumption was that if you set backoff_and_retry=True when creating the Cloud object you would expect all child objects to inherit the setting.

flichtenheld · 2022-01-19T14:24:39Z

Actually functionality looks good to me. I have a Jenkins job running into the rate limiting, and this patch together with my modifications seems to solve the problem.

Co-authored-by: Frank Lichtenheld <frank@lichtenheld.com>

jp-harvey · 2022-01-20T16:50:09Z

Great suggestion @flichtenheld , I've been working with Jira and not with Confluence so thanks for adding that in, glad it helped you out :-)

jp-harvey · 2022-01-20T16:51:38Z

Another change I made was to add these new options to BitbucketBase._new_session_args. At least my assumption was that if you set backoff_and_retry=True when creating the Cloud object you would expect all child objects to inherit the setting.

@flichtenheld do you want to add those as a review to this PR as well?

flichtenheld · 2022-01-20T18:52:54Z

Another change I made was to add these new options to BitbucketBase._new_session_args. At least my assumption was that if you set backoff_and_retry=True when creating the Cloud object you would expect all child objects to inherit the setting.

@flichtenheld do you want to add those as a review to this PR as well?

Hmm, not sure what would be the best way to do that given that they are in a completely different file that is not touched by the PR. But you can see them here: flichtenheld@f94cef6

atlassian/rest_client.py

Spacetown · 2022-04-28T19:25:46Z

atlassian/rest_client.py

+                files=files,
+                proxies=self.proxies,
+            )
+            responseloop = False


Setting always to False and in Line 295 to True is confusing. Remove line 284 and 295.

@Spacetown I think that would cause an infinite loop if self.backoffandretry is set to False, which is the default.

If it's confusing is there another way we could implement it?

Now that you mention it I'm also just looking at line 288 and I think that could probably be removed although I think I put that in there to future proof any other logic changes relating to responseloop that might change the default - ie. making the logic explicit - so would probably vote to leave that in.

Than init it with True and set it to False on success or on max back off retries.

atlassian/rest_client.py

mykytoleg · 2022-05-02T13:29:59Z

Hi everyone, what are requirements for this pull request to be merged? Do you have any ETA?

Spacetown · 2022-05-02T13:58:29Z

@mykytoleg please check the review remarks.

jp-harvey · 2022-08-01T16:50:59Z

@gonchik there has been no additional feedback by reviewers for some time following changes / responses and I think this one is probably good to go?

Spacetown

Oops, forgot to submit the review.

Spacetown · 2022-04-28T19:25:47Z

atlassian/rest_client.py

+                files=files,
+                proxies=self.proxies,
+            )
+            responseloop = False


Setting always to False and in Line 295 to True is confusing. Remove line 284 and 295.

@jp-harvey do you have time to update that PR. ?

atlassian/rest_client.py

Spacetown · 2022-06-29T20:19:20Z

atlassian/rest_client.py

+                files=files,
+                proxies=self.proxies,
+            )
+            responseloop = False


Than init it with True and set it to False on success or on max back off retries.

Spacetown · 2022-08-02T13:34:12Z

atlassian/rest_client.py

+                for em in self.retry_error_matches:
+                    if retries > self.max_backoff_retries:
+                        log.warning("Hit max backoff retry limit of {0}, no more retries.".format(self.max_backoff_retries))
+                        responseloop = False


This line isn't needed because you use break to exit the loop.

Spacetown · 2022-08-02T13:36:36Z

atlassian/rest_client.py

+            responseloop = False
+            if self.backoff_and_retry:
+                for em in self.retry_error_matches:
+                    if retries > self.max_backoff_retries:


This block should be moved in front of the loop.

gonchik · 2022-08-23T08:58:17Z

@jp-harvey @flichtenheld can we update PR and merge it ?

jp-harvey · 2022-08-24T15:05:43Z

@jp-harvey @flichtenheld can we update PR and merge it ?

@gonchik I've made some of the changes, the others I am not comfortable making without doing testing and I don't have a good way to test at the moment. Functionally it works as it is and has been tested many times in real-life scenarios, it's just the code structure that is the concern now.

EDIT: looks like there's a linting issue as well which should be easy enough to sort out.

Spacetown · 2022-09-02T17:05:21Z

@jp-harvey @flichtenheld can we update PR and merge it ?

@gonchik I've made some of the changes, the others I am not comfortable making without doing testing and I don't have a good way to test at the moment. Functionally it works as it is and has been tested many times in real-life scenarios, it's just the code structure that is the concern now.

EDIT: looks like there's a linting issue as well which should be easy enough to sort out.

It works but it's wired and not easy to understand. If max back off is reached but the response was success a warning is logged.

        backoff = 1
        retries = 0
        responseloop = True
        while responseloop:
            response = self._session.request(
                method=method,
                url=url,
                headers=headers,
                data=data,
                json=json,
                timeout=self.timeout,
                verify=self.verify_ssl,
                files=files,
                proxies=self.proxies,
            )
            if self.backoff_and_retry:
                for em in self.retry_error_matches:
                    if response.status_code == em[0] and response.reason == em[1]:
                        if retries > self.max_backoff_retries:
                            log.warning("Hit max backoff retry limit of {0}, no more retries.".format(self.max_backoff_retries))
                            responseloop = False
                        else:
                            log.warning('Backing off due to error "{0}: {1}" for {2}s'.format(em[0], em[1], backoff))
                            time.sleep(backoff)
                            backoff = backoff * 2 if backoff * 2 < self.max_backoff_seconds else self.max_backoff_seconds
                            retries += 1
                        break  # for loop
            else:
                responseloop = False

Spacetown · 2022-09-02T17:49:45Z

The for loop and the if can also be combined using any() together with comparing set objects.

            if self.backoff_and_retry:
                if any([(response.status_code, response.reason) == em for em in self.retry_error_matches]):
                    if retries > self.max_backoff_retries:
                        log.warning("Hit max backoff retry limit of {0}, no more retries.".format(self.max_backoff_retries))
                        responseloop = False
                    else:
                        current_backoff = backoff + (random.random() * backoff / 10)
                        log.warning('Backing off due to error "{0}: {1}" for {2}s'.format(response.status_code, response.reason, current_backoff))
                        time.sleep(current_backoff)
                        backoff = min(backoff * 2, self.max_backoff_seconds)
                        retries += 1
            else:
                responseloop = False

Spacetown

I would use only break to finish the loop. At the moment this is mixed and in line 310-311 the flag sets a new iteration but a break is used which doesn't make sense.

Spacetown · 2023-07-28T21:25:32Z

atlassian/rest_client.py

@@ -60,7 +60,46 @@ def __init__(
        cloud=False,
        proxies=None,
        token=None,
+        backoff_and_retry=False,
+        retry_error_matches=[(429, "Too Many Requests"),
+                                         (429, "Unknown Status Code")],


Fix indention.

Spacetown · 2023-07-28T21:27:07Z

atlassian/rest_client.py

+                for em in self.retry_error_matches:
+                    if retries > self.max_backoff_retries:
+                        log.warning("Hit max backoff retry limit of {0}, no more retries.".format(self.max_backoff_retries))
+                        responseloop = False


This line isn't needed since it's set to false directly after the response.

atlassian/rest_client.py

Co-authored-by: Michael Förderer <michael.foerderer@gmx.de>

Spacetown · 2023-07-28T21:45:36Z

Ok, now I see that there are two nested loops...

Spacetown

The suggested code only sets response_loop to False if the loop shall be exited.

atlassian/rest_client.py

gonchik · 2023-07-28T21:21:12Z

atlassian/rest_client.py

+                files=files,
+                proxies=self.proxies,
+            )
+            responseloop = False


@jp-harvey do you have time to update that PR. ?

gonchik · 2023-08-18T22:40:24Z

@jp-harvey can we finalize it ?

jp-harvey · 2023-08-19T22:15:23Z

@jp-harvey can we finalize it ?

Hi @gonchik I'm at a point where it's no longer 100% clear to me what the issues are or what to change. I also don't have a good way to test at the moment, so making changes and testing is going to be high-friction.

I've also discovered subsequently that the requests library has a built-in backoff and retry capability built into sessions, so that's probably a better option all round.

I think we have the following options:

Abandon this PR
Merge it (mostly?) as-is
Someone other than me modifies the code, tests, and once approved it can be merged
I modify the existing code, test, and once approved it gets merged
I resubmit the PR using the retry capability of sessions

Unfortunately I won't have the bandwidth to do it for a while and don't have an easy way to test it at the moment, so 4 and 5 won't be able to happen soon.

Sorry about this, I know the code works as it is now despite not being perfect, and I respect that it's considered not suitable for merging. By the time the PR was reviewed we passed a window I was able to keep working on it and don't want to make blind changes now that may break it. I may be able to in the future though, but would probably do option 5 before 4 because it looks like a more elegant option. I'm also fine with options 1-3 if that's the best thing for this project. Let me know what your preference is.

mjurbanski-reef · 2024-01-18T08:28:20Z

atlassian/rest_client.py

+        retry_error_matches=[(429, "Too Many Requests"),
+                                         (429, "Unknown Status Code")],


503 also deserve a retry IMO, since jira cloud does throw it from time to time

posting it here if someone want to update this PR or create a new one using requests.Session retry mechanism

I included 503 in my new PR #1339 since that is in the default list of urllib3.util.Retry

flichtenheld · 2024-02-23T16:15:27Z

I will take a stab at resurrecting this mechanism since I'm currently working on related code anyway. While I have been using this code for years at this point and it works fine in production (the original code, not the current one in this PR which I'm pretty sure has been patched wrongly), I will try to change the mechanism to use urllib3.util.Retry as suggested to reduce unnecessary code.

flichtenheld · 2024-02-27T15:55:15Z

Since #1339 was merged, I think this PR should be closed.

jp-harvey · 2024-02-27T16:02:32Z

@flichtenheld thanks for taking this one up and your great work on #1339!

jp-harvey added 3 commits December 24, 2021 13:07

Add exponential backup and retry functionality to rest_client

c6f4dac

black formatting

8b9fe76

Add pipenv files to .gitignore to prevent accidental committing.

4039cf1

Fix incorrect description in docstring

c25ea52

flichtenheld reviewed Jan 19, 2022

View reviewed changes

atlassian/rest_client.py Outdated Show resolved Hide resolved

This was referenced Jan 19, 2022

Bitbucket Cloud: provide custom raise_for_status #925

Merged

Bitbucket repo_users fails with a 429 and is not handled properly #502

Open

Add "429, Unknown Status Code" to the default retry error messages

93bfdf0

Co-authored-by: Frank Lichtenheld <frank@lichtenheld.com>

kkosciusz reviewed Apr 5, 2022

View reviewed changes

atlassian/rest_client.py Outdated Show resolved Hide resolved

Spacetown suggested changes Apr 28, 2022

View reviewed changes

jp.harvey and others added 2 commits June 29, 2022 10:45

Simplify backoff time calculation

20afa1d

Relocate break from response loop after max retry

3c5a318

jp-harvey changed the title ~~[WIP] Add exponential backoff and retry functionality to rest_client~~ Add exponential backoff and retry functionality to rest_client Aug 1, 2022

Spacetown reviewed Aug 2, 2022

View reviewed changes

Add jitter to backoff sleep

005759a

gonchik added 2 commits November 12, 2022 11:47

Merge branch 'master' into backoff-and-retry

bb99ee5

Merge branch 'master' into backoff-and-retry

c34fce8

gonchik requested review from Spacetown, kkosciusz and flichtenheld July 28, 2023 21:22

Merge branch 'master' into backoff-and-retry

4359762

Spacetown reviewed Jul 28, 2023

View reviewed changes

gonchik and others added 3 commits July 29, 2023 00:39

Update atlassian/rest_client.py

5f0bf54

Co-authored-by: Michael Förderer <michael.foerderer@gmx.de>

Update atlassian/rest_client.py

96b18f2

Co-authored-by: Michael Förderer <michael.foerderer@gmx.de>

Update atlassian/rest_client.py

e33a821

Co-authored-by: Michael Förderer <michael.foerderer@gmx.de>

Spacetown suggested changes Jul 28, 2023

View reviewed changes

atlassian/rest_client.py Show resolved Hide resolved

gonchik approved these changes Aug 16, 2023

View reviewed changes

gonchik assigned gonchik and jp-harvey and unassigned gonchik Aug 18, 2023

Spacetown mentioned this pull request Sep 15, 2023

advanced_mode flag is not passed to underlying request method preventing handling of 429 rate limit #1241

Closed

mjurbanski-reef reviewed Jan 18, 2024

View reviewed changes

flichtenheld mentioned this pull request Feb 26, 2024

[Rest] Support exponential backoff and retry #1339

Merged

jp-harvey closed this Feb 27, 2024

		retry_error_matches=[(429, "Too Many Requests"),
		(429, "Unknown Status Code")],

Add exponential backoff and retry functionality to rest_client #904

Add exponential backoff and retry functionality to rest_client #904

Conversation

jp-harvey commented Dec 24, 2021

jp-harvey commented Dec 24, 2021

codecov-commenter commented Dec 25, 2021

Codecov Report

gonchik commented Dec 25, 2021

flichtenheld commented Jan 19, 2022

flichtenheld commented Jan 19, 2022

jp-harvey commented Jan 20, 2022

jp-harvey commented Jan 20, 2022

flichtenheld commented Jan 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mykytoleg commented May 2, 2022

Spacetown commented May 2, 2022

jp-harvey commented Aug 1, 2022

Spacetown left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gonchik commented Aug 23, 2022

jp-harvey commented Aug 24, 2022 • edited Loading

Spacetown commented Sep 2, 2022 • edited Loading

Spacetown commented Sep 2, 2022 • edited Loading

Spacetown left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Spacetown commented Jul 28, 2023

Spacetown left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gonchik commented Aug 18, 2023

jp-harvey commented Aug 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flichtenheld commented Feb 23, 2024

flichtenheld commented Feb 27, 2024

jp-harvey commented Feb 27, 2024

jp-harvey commented Aug 24, 2022 •

edited

Loading

Spacetown commented Sep 2, 2022 •

edited

Loading

Spacetown commented Sep 2, 2022 •

edited

Loading