Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloudRetry/AWSRetry backoff decorator with unit tests #17039

Merged
merged 11 commits into from
Sep 13, 2016

Conversation

linuxdynasty
Copy link
Contributor

@linuxdynasty linuxdynasty commented Aug 10, 2016

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME

CloudRetry Base class inside of module_utils/cloud.py
AWSRetry decorator inside of module_utils/ec2.py

ANSIBLE VERSION
ansible 2.1.1.0
SUMMARY

The CloudRetry class can be implemented by any other cloud provider, that wants to implement a basic backoff algorithm in a decorator.

AWSRetry class overwrites 2 methods.

  • status_code_from_exception (Parses the exception for the exact string)
  • found (Iterates over a list of exceptions to match against.

AWSRetry.backoff() decorator can be applied to any function that is making an aws boto/boto3 call.
It will only retry on the following Exceptions.
This list of failures is based on this API Reference.

  • RequestLimitExceeded
  • Unavailable
  • ServiceUnavailable
  • InternalFailure
  • InternalError
  • "\w+.NotFound" (Eventual Consistency)

The CloudRetry.backoff decorator takes on the following kwargs.

  • tries (number of times to try) default=10
  • delay (initial delay between retries in seconds) default=3
  • backoff (This is the multiplier, that will double on each retry) default=2

This change is Reviewable

@bcoca
Copy link
Member

bcoca commented Aug 11, 2016

so instead of re implementing from scratch, would it not be better to expand the existing retry functions to allow for the custom conditions?

https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/api.py

@linuxdynasty
Copy link
Contributor Author

linuxdynasty commented Aug 11, 2016

@bcoca I read api.py before writing my own. This one is purely focused on AWS and not just on the RateLimiting issue. This one also gets around the NotFound" (Eventual Consistency) issue as well. Could we add an argument to the decorator in API to take in Exception Strings. Sure, but I think this way the writer of the modules will not have to think of the multiple use cases beyond RequestLimitExceeded. Also if an exception is not part of the list of retries, it will just raise it.

@bcoca
Copy link
Member

bcoca commented Aug 11, 2016

@linuxdynasty i'm not saying modules should wrap the existing retries, i was suggesting this function could be re-implemented as a simple wrapper to an expanded 'generic retry'.

This would make 'cloud specific' retry functions much easier in general.

@linuxdynasty
Copy link
Contributor Author

linuxdynasty commented Aug 11, 2016

I refactored it @bcoca. Now there is a base class called CloudRetry in module_utils/cloud.py. AWSRetry class in module_utils/ec2.py inherits from CloudRetry. Now only two staticmethods need to be overridden when another cloud provider wants to use this decorator. Test also updated.

@rodrickbrown
Copy link

+1 this looks good well needed feature request thanks @linuxdynasty

@linuxdynasty
Copy link
Contributor Author

Can we add botocore, boto3 to the ansible core testing framework. As one of the failures, is that botocore does not exist.

@mattclay
Copy link
Member

@linuxdynasty You can add botocore and boto3 to test/utils/shippable/sanity-requirements.txt.

while max_tries > 1:
try:
return f(*args, **kwargs)
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't compatible with Python 2.4, which is causing it to fail the py24 tests:

2016-08-11 20:25:51 + python2.4 -m compileall -fq -x 'module_utils/(a10|rax|openstack|ec2|gce|docker_common|azure_rm_common|vca|vmware|gcp|gcdns).py' lib/ansible/module_utils
2016-08-11 20:25:51 Compiling lib/ansible/module_utils/cloud.py ...
2016-08-11 20:25:51   File "lib/ansible/module_utils/cloud.py", line 86
2016-08-11 20:25:51     except Exception as e:
2016-08-11 20:25:51                       ^
2016-08-11 20:25:51 SyntaxError: invalid syntax

You can use this instead:

except Exception:
    e = get_exception()

@mattclay
Copy link
Member

@linuxdynasty My apologies, I had you put the dependencies in the wrong requirements file. You should add them to the two requirements files in the test/utils/tox directory instead.

@mattclay
Copy link
Member

@linuxdynasty If you want to run the unit tests locally with an environment similar to Shippable, use the following command: TOXENV=py27 test/utils/shippable/sanity.sh

That will run the unit tests on Python 2.7. You can use other Python versions as well, if you have them installed.

@@ -2,3 +2,5 @@ tox
pyyaml
jinja2
setuptools
botocore
boto3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to this file aren't needed since the dependencies are in the test/utils/tox requirements files.

@linuxdynasty linuxdynasty changed the title aws_retry decorator function with unit tests CloudRetry/AWSRetry decorator with unit tests Aug 12, 2016
@linuxdynasty linuxdynasty changed the title CloudRetry/AWSRetry decorator with unit tests CloudRetry/AWSRetry backoff decorator with unit tests Aug 12, 2016
@linuxdynasty
Copy link
Contributor Author

@mattclay thank you for all your help. Test are finally passing :D

@linuxdynasty
Copy link
Contributor Author

tests are failing, but not do to this PR.

@mattclay
Copy link
Member

I've restarted the build. The failure was due to a temporary timeout.

@AWSRetry.backoff(tries=2, delay=0.1)
def retry_once():
self.counter += 1
if self.counter < 2:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests would be a lot cleaner using Mock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is why the original 'retry' is in api.py, I might migrate this after merge

@linuxdynasty
Copy link
Contributor Author

linuxdynasty commented Aug 18, 2016

@bcoca any updates on if you want me to move CloudRetry from cloud.py to api.py and still leave AWSRetry in ec2.py?

pass

@classmethod
def backoff(cls, tries=10, delay=3, backoff=2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about these defaults - a delay of 3 and backoff of 2 for 10 tries would mean that, to fail, this retry decorator would wait for 3069 seconds (3 + 3*2 + 3*2*2 ...., or sum([3 * 2**i for i in range(10)])) or about 50 minutes. That seems like a really long time, especially since most modules make several calls.

A better default might be 4 tries, for a total default wait time of 45 seconds and having a max of, say, a minute between tries. That way, if someone wanted 10 tries it would only take about 7.5 minutes to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense and I will update it shortly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of changing the default backoff of 2 seconds to 1.1 seconds. I think this is more reasonable. sum([3 * 1.1**i for i in range(10)])
About a total of 48 seconds in 10 tries. @ryansb thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be configurable, because this isn't enough for the work I do with auto scale groups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjshoe It is configurable, these values are just the defaults that will be used if the developer calls the decorator without arguments.

@linuxdynasty That seems reasonable to me - go for it. Should be plenty to wait out most rate limits.

@linuxdynasty
Copy link
Contributor Author

@ryansb I pushed the changes. The build failed cause it could not spin a container.

This will be about a total of 48 seconds in 10 tries. This is
configurable.
@linuxdynasty
Copy link
Contributor Author

@ryansb does everything else look good to you?

@ryansb ryansb merged commit b510abc into ansible:devel Sep 13, 2016
sereinity pushed a commit to sereinity-forks/ansible that referenced this pull request Jan 25, 2017
* Added aws_retry decorator function with unit tests

* Restructured the code to be used with a base class.

This base class CloudRetry can be reused by any other cloud provider.
This decorator should be used in situations, where you need to implement
a backoff algorithm and want to retry based on the status code from the
exception.

* updated documentation

* fixed tabs

* added botocore and boto3 to requirements.txt

* removed cloud.py from py24 tests, as it depends on boto3

* fix relative imports

* updated test to be 2.6 compat

* updated method name from retry to backoff

* readded lxd

* Updated default backoff from 2 seconds to 1.1s.

This will be about a total of 48 seconds in 10 tries. This is
configurable.
@ansibot ansibot added feature This issue/PR relates to a feature request. and removed feature_pull_request labels Mar 4, 2018
@ansible ansible locked and limited conversation to collaborators Apr 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
aws feature This issue/PR relates to a feature request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants