Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APIC-79] - reconfigure retry logic using tenacity #401

Merged

Conversation

byndcivilization
Copy link
Contributor

@byndcivilization byndcivilization commented Aug 26, 2020

This reimplements retry logic using tenacity. Requirements are:
[X] retry on is [429, 502, 503, and 504]
[X] retry on ['HEAD', 'TRACE', 'GET', 'PUT', 'OPTIONS', 'DELETE']
[X] Retry-After header is present, we use that value for the retry interval.
[X] Retry POSTs on 429s and 503s.
[X] default max total time 10 minutes
[X] default max tries = 10
[X] exponential backoff with full jitter

@byndcivilization
Copy link
Contributor Author

byndcivilization commented Aug 27, 2020

Okay @skomanduri it looks like you were right on the urllib part. Bifurcation of different force lists for different verbs seems painful. This is at 60-70ish% so i wanted to check in on approach before i started on the respect_retry_after_header part, some issues with retries on the cli and tests. theres also still some tearout work and I'm not sure how you want me to handle file buffering retries. It seems like the best course here is to leave what we have in place and add some jitter to the back off calculation. I had a working backoff algo on a different branch that I can port back in.

civis/base.py Outdated Show resolved Hide resolved
@byndcivilization
Copy link
Contributor Author

Okay @skomanduri how does this look. Tests still need work but otherwise i think this works. Theres some funniness with the interaction of retry conditions in that if we raise a connection error for status the codes will not be available in as part of the exception object. This retry after header class and only the status code check gets around this and makes the response headers available for inspection.

Copy link
Contributor

@skomanduri skomanduri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I had one question, but otherwise let me know when you are ready for final review.

civis/_utils.py Show resolved Hide resolved
@byndcivilization
Copy link
Contributor Author

Okay @skomanduri this should be ready for review

@mheilman
Copy link
Contributor

just a minor thing: could you please add a changelog entry?

Copy link
Contributor

@skomanduri skomanduri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more questions, but this seems almost ready to go!

README.rst Outdated Show resolved Hide resolved
civis/_utils.py Show resolved Hide resolved
# retry_state is an instance of tenacity.RetryCallState.
# The .outcome property contains the result/exception
# that came from the underlying function.
result_headers = retry_state.outcome._result.headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to account for outcome returning an exception here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not throwing exceptions so I don't think so unless I'm missing something? The retry logic is using status code alone to trigger a retry. The reason for this is the default requests exception handling doesnt seem to pass back any of the response information for requests to handle. At one point I had a raise_for_status call in there hoping that i could set the retry parameter to something like lambda res: res.status_code in civis.civis.RETRY_CODES & StatusException. That would never trigger though because res.status_code didnt exist in the retry object.

Also to take a step back this is sort of unrelated to the cause of the retry. All this class is doing is overriding the wait parameter with the header value if it exists. Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skomanduri does that answer your question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't retry_state.outcome contain an exception if requests throws an exception? The situation I'm worried about is where that happens and then trying to call .headers on it would throw its own exception and we would lose the requests exception. I don't know if this happens, but want to either rule it out or handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skomanduri I'm unclear on what type of exception you are envisioning. I thought the only exceptions that requests would implicitly return are ConnectionError, Timeout, and TooManyRedirects. I don't think any of these would come into play with the retry logic as it currently exists. Since we're purely using the status_code returned on the response object, that wait time class will only be evaluated once the retry logic has been entered afaik.

Is there another set of exceptions that I'm missing here?

That said, currently if an exception is thrown currently the retry is not triggered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, currently if an exception is thrown currently the retry is not triggered.

Okay, that's the piece I was missing. I thought if there was a ConnectionError it would still evaluate here, but if not then this would be fine. Can you add a test for no retry on ConnectionError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay @skomanduri test is last on the test_utils.py file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I had in mind for the test. WFM!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skomanduri can i get the final approval so i can merge this?

civis/resources/_resources.py Outdated Show resolved Hide resolved
@byndcivilization
Copy link
Contributor Author

@mheilman yeah for sure. this is probably a good time to ask about release process. do we just bump the version here and release or is there a cadence that we follow.

byndcivilization and others added 3 commits September 10, 2020 17:19
…city

* master:
  [APIC-314] Refactored File Cleaning Logic (#405)
  [APIC-307] fix a workflows usage example in docs/source/client.rst (#409)
  add some missing PR numbers to the changelog (#408)
  Update the API spec at `civis/tests/civis_api_spec.json` so that new endpoints are included (e.g., `/exports/files/csv`) (#407)
  Updated list of base API resources to include `aliases`, `git_repos`, `json_values`, `services`, and `storage_hosts` so that they show up in the sphinx docs (#406)
  [APIC-284] remove python 3.5 support since it is reaching EOL (#404)
  [APIC-305] Require sql_type for every column when table_columns is provided (CHANGELOG) (#403)
  [APIC-313] retain specific sql types in civis_file_to_table (#402)
Co-authored-by: Saranga Komanduri <skomanduri@civisanalytics.com>
@mheilman
Copy link
Contributor

do we just bump the version here and release or is there a cadence that we follow.

There isn't a set cadence for releases. The list of unreleased changes is getting pretty long right now, so I was thinking of making one after this and maybe one more PR, so you can just leave the version number as is.

@byndcivilization
Copy link
Contributor Author

rad thanks

Copy link
Contributor

@skomanduri skomanduri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@byndcivilization byndcivilization merged commit 151a017 into master Sep 24, 2020
@byndcivilization byndcivilization deleted the APIC-79-reconfigure-retry-logic-using-tenacity branch September 24, 2020 21:11
mjziolko pushed a commit that referenced this pull request Nov 17, 2021
* initial implementation of tenacity

* added comment

* got docs on how to stash cert

* savepont

* retry working

* added class as trigger and propegated

* everything except client working

* cleanup

* cleanup

* cleanup

* working as wrapper on call

* added retry for client invoke

* moved max retries var and cleanup

* implemented retry_error_callback as a passthrough and some cleanup

* tearout

* tearout

* cleanup

* respect retry headers

* respect retry-after header

* respect retry-after header

* flake fixes

* add jitter to retry deco for files

* working on tests

* fixed some tests

* all tests passing

* renamed io retry tests for clarity

* fixed flak8

* fixed flak8

* tests for retry

* cleanup

* cleanup

* changlog

* rebased

* Update README.rst

Co-authored-by: Saranga Komanduri <skomanduri@civisanalytics.com>

* typo

* added connectionerror test

* cleanup

* cleanup

Co-authored-by: Saranga Komanduri <skomanduri@civisanalytics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants