Skip to content

Conversation

@ejholmes
Copy link
Contributor

Since botocore 1.6.0, it's possible to pass max_attempts to a generated client, like so:

from botocore.config import Config

config = Config(
    retries = dict(
        max_attempts = 10
    )
)

ec2 = boto3.client('ec2', config=config)

This removes the internal retry_with_backoff function, in favor of using botocore's internal exponential backoff with a higher value of max_attempts.

The old behavior was a combination of the internal botocore backoff, combined with a linear backoff in stacker, where stacker's algorithm was:

sleep_time = min(max_delay, min_delay * attempt)

So, in the event of throttling, the request would have been retried 4 times in boto (as determined by https://github.com/boto/botocore/blob/1.6.1/botocore/data/_retry.json#L77) with exponential backoff, then retried immediately after by stacker, which would reset the backoff in botocore.

With this change, the backoff continues to grow instead of getting reset.

Note, I've set max_attempts on the CloudFormation client to 10, which is pretty high (about 8 minutes worst case), but I think a large value makes sense for stacker's use case, especially with parallelization. It's rare that you would actually want stacker to abort on throttling.

@ejholmes ejholmes requested a review from phobologic February 14, 2018 02:16
@ejholmes
Copy link
Contributor Author

I can forcefully cause throttling using this stacker config, Which creates 60 stacks in parallel:

from stacker.config import Config, Stack, dump

stacks = []
for i in range(60):
    stack = Stack({
        "name": "vpc%d" % i,
        "class_path": "stacker.tests.fixtures.mock_blueprints.Dummy",
    })
    stacks += [stack]

config = Config()
config.namespace = "ejholmes"
config.stacks = stacks

print dump(config)

Without this change, throttling exceptions are raised pretty frequently. After this change, everything executes, without throwing any exceptions (albeit, it takes a while because of throttling).

@ejholmes ejholmes mentioned this pull request Feb 14, 2018
4 tasks
Copy link
Member

@phobologic phobologic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good- my only concern is all the other places where we don't use retry_on_throttling already,mostly in hooks/handlers/actions - will those be affected by the throttling issues do you think?

@ejholmes
Copy link
Contributor Author

  1. hooks shouldn't be throttled, since they aren't run in parallel.
  2. For lookups, most that are using aws clients directly (kms, dynamodb, ssm, ami), aren't doing any retry, other than the defaults in botocore, so that will remain unchanged for now. Output lookups are pre-cached, so they don't make unnecessary API calls.
  3. For actions, the only aws API calls that happen outside of the provider are for pushing templates to S3, but that won't get throttled (I don't think AWS has limits on PutObject).

@ejholmes ejholmes merged commit b9fee47 into dag-concurrent Feb 14, 2018
@ejholmes
Copy link
Contributor Author

I pulled this into #531 and I'll work on any polish there.

Copy link
Member

@russellballestrini russellballestrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@russellballestrini russellballestrini deleted the dag-throttling branch February 14, 2018 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants