Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ec2_elb: Failing with boto exception ( 400 - throttling ) #30229

Closed
ansibot opened this issue Sep 12, 2017 · 30 comments
Closed

ec2_elb: Failing with boto exception ( 400 - throttling ) #30229

ansibot opened this issue Sep 12, 2017 · 30 comments
Assignees
Labels
affects_1.7 This issue/PR affects Ansible v1.7 aws bot_closed bug This issue/PR relates to a bug. cloud collection:community.aws collection Related to Ansible Collections work has_pr This issue has an associated PR. module This issue/PR relates to a module. needs_collection_redirect https://github.com/ansible/ansibullbot/blob/master/docs/collection_migration.md support:community This issue/PR relates to code supported by the Ansible community. traceback This issue/PR includes a traceback.

Comments

@ansibot
Copy link
Contributor

ansibot commented Sep 12, 2017

ISSUE TYPE

bug report

COMPONENT NAME

ec2_elb module

ANSIBLE VERSION

Ansible 1.7.2
Boto Version 2.32.1

OS / ENVIRONMENT

Redhat Enterprise Linux 6.4 ( Ansible Tower v2.0.0)

SUMMARY

:**
I have a process "lights on" which turns on ec2 instances, then subsequently adds said instances to their respective load balancers ( defined by elb_shortname ). I get very intermittent results, sometimes the playbook will complete successfully, even consecutively at times. No matter what I do, ( change logic, implement 'pauses' etc, setting and retrieving facts ) I can not get around this AWS throttling message.

**

STEPS TO REPRODUCE

:**

Invoke playbook that turns on ec2_instances, waits and then places machines with defined elb_shortname variable into its respective group load balancer.

- name: starting instance(s)
  when: instance_id is defined and lights_on|default ("false") == "true"
  local_action: ec2
  args:
    region: 'us-west-2'
    instance_ids: "{{ instance_id }}"
    state: 'running'
    wait: 'yes'
    wait_timeout: '300'
  register: ec2

- name: Pausing, trying to avoid AWS throttling
  pause: minutes=10

- name: registering instance to its respective groups ELB
  # instances that do not require ELBs do not need to run this part of the playbook
  when: elb_shortname is defined and lights_on|default ("false") == "true"
  local_action: ec2_elb
  args:
    region: 'us-west-2'
    state: 'present'
    wait: 'yes'
    wait_timeout: '300'

**

EXPECTED RESULTS

:**

TASK: [roles/lights_on | registering instance to its respective groups ELB] *** 
skipping: [tstmaoradbc01 -> 127.0.0.1] 
skipping: [tstmaoradbt01 -> 127.0.0.1] 
skipping: [tstmaoradbt02 -> 127.0.0.1] 
skipping: [tstmaoramqa01 -> 127.0.0.1] 
skipping: [tstmaoramqe01 -> 127.0.0.1] 
skipping: [tstmaoramem01 -> 127.0.0.1] 
skipping: [tstmaoradbu01 -> 127.0.0.1] 
skipping: [tstmaoramem02 -> 127.0.0.1] 
skipping: [tstmaorabix01 -> 127.0.0.1] 
skipping: [tstmaorarel01 -> 127.0.0.1] 
skipping: [tstmaorarex01 -> 127.0.0.1] 
skipping: [tstmaorarex02 -> 127.0.0.1] 
skipping: [tstmaorarex03 -> 127.0.0.1] 
skipping: [tstmaorarex04 -> 127.0.0.1] 
skipping: [tstmaoraetl01 -> 127.0.0.1] 
changed: [tstmaorawsa01 -> 127.0.0.1] 
changed: [tstmaoramax01 -> 127.0.0.1] 
changed: [tstmaorauia02 -> 127.0.0.1] 
changed: [tstmaorauiw01 -> 127.0.0.1] 
changed: [tstmaoramwx01 -> 127.0.0.1] 
changed: [tstmaorawss01 -> 127.0.0.1] 
changed: [tstmaorawsl01 -> 127.0.0.1] 
changed: [tstmaorawss02 -> 127.0.0.1] 
changed: [tstmaorawsa02 -> 127.0.0.1] 
changed: [tstmaorauia01 -> 127.0.0.1] 
changed: [tstmaoramwx02 -> 127.0.0.1]

**

ACTUAL RESULTS

:**

TASK: [roles/lights_on | registering instance to its respective groups ELB] ***
failed: [tstmaorawss01 -> 127.0.0.1] => {"failed": true, "parsed": false}
invalid output was: Traceback (most recent call last):
  File "/Users/ndobbs/.ansible/tmp/ansible-tmp-1412091621.06-77576803493759/ec2_elb", line 1874, in <module>
    main()
  File "/Users/ndobbs/.ansible/tmp/ansible-tmp-1412091621.06-77576803493759/ec2_elb", line 326, in main
    elb_man.register(wait, enable_availability_zone, timeout)
  File "/Users/ndobbs/.ansible/tmp/ansible-tmp-1412091621.06-77576803493759/ec2_elb", line 159, in register
    self._await_elb_instance_state(lb, 'InService', initial_state, timeout)
  File "/Users/ndobbs/.ansible/tmp/ansible-tmp-1412091621.06-77576803493759/ec2_elb", line 196, in _await_elb_instance_state
    instance_state = self._get_instance_health(lb)
  File "/Users/ndobbs/.ansible/tmp/ansible-tmp-1412091621.06-77576803493759/ec2_elb", line 244, in _get_instance_health
    status = lb.get_instance_health([self.instance_id])[0]
  File "/Library/Python/2.7/site-packages/boto/ec2/elb/loadbalancer.py", line 324, in get_instance_health
    return self.connection.describe_instance_health(self.name, instances)
  File "/Library/Python/2.7/site-packages/boto/ec2/elb/__init__.py", line 547, in describe_instance_health
    [('member', InstanceState)])
  File "/Library/Python/2.7/site-packages/boto/connection.py", line 1166, in get_list
    raise self.ResponseError(response.status, response.reason, body)
boto.exception.BotoServerError: BotoServerError: 400 Bad Request
<ErrorResponse xmlns="http://elasticloadbalancing.amazonaws.com/doc/2012-06-01/">
  <Error>
    <Type>Sender</Type>
    <Code>Throttling</Code>
    <Message>Rate exceeded</Message>
  </Error>
  <RequestId>1a08966d-48b8-11e4-8ddc-e3515a48666b</RequestId>
</ErrorResponse>

Copied from original issue: ansible/ansible-modules-core#143

@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

From @ansibot on 2014-10-06T14:46:34Z

Can You Help Us Out?

Thanks for filing a ticket! I am the friendly GitHub Ansibot.

It looks like you might not have filled out the issue description based on our standard issue template. You might not have known about that, and that's ok too, we'll tell you how to do it.

We have a standard template because Ansible is a really busy project and it helps to have some standard information in each ticket, and GitHub doesn't yet provide a standard facility to do this like some other bug trackers. We hope you understand as this is really valuable to us!.

Solving this is simple: please copy the contents of this template and paste it into the description of your ticket. That's it!

If You Had A Question To Ask Instead

If you happened to have a "how do I do this in Ansible" type of question, that's probably more of a user-list question than a bug report, and you should probably ask this question on the project mailing list instead.

However, if you think you have a bug, the report is the way to go! We definitely want all the bugs filed :) Just trying to help!

About Priority Tags

Since you're here, we'll also share some useful information at this time.

In general tickets will be assigned a priority between P1 (highest) and P5, and then worked in priority order. We may also have some follow up questions along the way, so keeping up with follow up comments via GitHub notifications is a good idea.

Due to large interest in Ansible, humans may not comment on your ticket immediately.

Mailing Lists

If you have concerns or questions, you're welcome to stop by the ansible-project or ansible-development mailing lists, as appropriate. Here are the links:

Thanks again for the interest in Ansible!

@ansibot ansibot added the affects_1.7 This issue/PR affects Ansible v1.7 label Sep 12, 2017
@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

From @smiller171 on 2014-10-06T14:46:34Z

This problem is caused by AWS itself, because each account can only make API calls so fast before being throttled. Amazon suggests using an exponential cooldown timer. It would make sense to build such a cooldown timer into all modules that make AWS API calls, so that we don't have to build that logic into our plays.

@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

From @ndobbs on 2014-10-06T14:46:34Z

smiller171, I completely agree. However from viewing boto source of ec2_elb_lb module, it seems as if this backoff is already implemented.

I do agree with you, this logic should be implemented inside of ansible ec2* modules, however that's just my personal opinion.

@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

From @smiller171 on 2014-10-06T14:46:34Z

It's worth noting that I had the same problem with the ec2_metric_alarm module. I have avoided the issue so far by deploying in batches with serial: 10

@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

From @ndobbs on 2014-10-06T14:46:34Z

smiller171, thank you for the suggestion I hadn't even considered batching the machines with serial. I have implemented the batching in my plays - hopefully we'll see at least a higher success rate with our 'light_switch' process.

Thanks again for your help.

@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

From @acaire on 2014-10-06T14:46:34Z

I ran into this today and resolved it by using the retrying library: acaire/ansible-modules-core@421f7efcc56fde85d0f54743b7ad2436735dab9e

I'm assuming it'd be a stretch to add the required pip package though, or is it worth the PR?

@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

From @ndobbs on 2014-10-06T14:46:34Z

I was able to solve this issue by implementing 'until' logic - thanks to a recommendation by @tgerla in a Tower Support ticket I created that allowed me to get the right syntax down.

- name: starting instance(s)
  when: instance_id is defined and lights_on|default ("false") == "true"
  local_action:
    module: 'ec2'
    region: 'us-west-2'
    instance_ids: "{{ instance_id }}"
    state: 'running'
    wait: 'yes'
    wait_timeout: '120'
  register: ec2_result
  # If instance does not start, try to start it again
  until: ec2_result|success
  retries: 10
  delay: 30

- name: registering instance to its respective groups ELB
  # instances that do not require ELBs do not need to run this part of the playbook
  when: elb_shortname is defined and lights_on|default ("false") == "true"
  local_action: ec2_elb
  args:
    region: 'us-west-2'
    enable_availability_zone: 'no'
    instance_id: "{{ instance_id }}"
    ec2_elbs: "{{ env }}-{{ elb_shortname }}"
    state: 'present'
    wait: 'yes'
    wait_timeout: '120'
  register: ec2_elb_result
  until: ec2_elb_result|success
  retries: 10
  delay: 30

@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

From @ndobbs on 2014-10-06T14:46:34Z

This issue was fixed by implementing controls such as until and serializing machine's in 'batches' in order to avoid the AWS throttling limit.

@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

From @smiller171 on 2014-10-06T14:46:34Z

@ndobbs I would call that a workaround, not a solution. That said, many changes have been made and this was likely solved by now.

@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

From @cooniur on 2014-10-06T14:46:34Z

I agree with @smiller171, the "until" solution is indeed a workaround. Ansible should provide a way of setting polling rate on cloud services (not only AWS, but also others).

I got the same throttling error while using rds module to restore databases.

Please consider re-open this ticket

@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

From @ndobbs on 2014-10-06T14:46:34Z

I reopened this issue due to the fact that its not resolved and we still have people reporting in the thread.

@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

From @cooniur on 2014-10-06T14:46:34Z

Update: this issue happens in Ansible 2.x too.

@ansibot
Copy link
Contributor Author

ansibot commented Sep 12, 2017

@ansibot Greetings! Thanks for taking the time to open this issue. In order for the community to handle your issue effectively, we need a bit more information.

Here are the items we could not find in your description:

  • component name

Please set the description of this issue with this template:
https://raw.githubusercontent.com/ansible/ansible/devel/.github/ISSUE_TEMPLATE.md

click here for bot help

@ansibot ansibot added bug_report needs_info This issue requires further information. Please answer any outstanding questions. needs_template This issue/PR has an incomplete description. Please fill in the proposed template correctly. support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Sep 12, 2017
@smiller171
Copy link
Contributor

Is this still a problem?

@ansibot ansibot added aws cloud deprecated This issue/PR relates to a deprecated module. module This issue/PR relates to a module. support:certified This issue/PR relates to certified code. and removed needs_info This issue requires further information. Please answer any outstanding questions. needs_template This issue/PR has an incomplete description. Please fill in the proposed template correctly. support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Sep 13, 2017
@sbussetti
Copy link
Contributor

sbussetti commented Sep 13, 2017

@smiller171 yes:

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: </ErrorResponse>
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Traceback (most recent call last):
  File \"/var/folders/v7/mmsm_9x941j0xjnm33hq5_th0000gp/T/ansible_cxkDNn/ansible_module_ec2_elb_lb.py\", line 1359, in <module>
    main()
  File \"/var/folders/v7/mmsm_9x941j0xjnm33hq5_th0000gp/T/ansible_cxkDNn/ansible_module_ec2_elb_lb.py\", line 1352, in main
    elb=elb_man.get_info(),
  File \"/var/folders/v7/mmsm_9x941j0xjnm33hq5_th0000gp/T/ansible_cxkDNn/ansible_module_ec2_elb_lb.py\", line 614, in get_info
    info['connection_draining_timeout'] = int(self.elb_conn.get_lb_attribute(self.name, 'ConnectionDraining').timeout)
  File \"/Users/sbussetti/.virtualenvs/devops/lib/python2.7/site-packages/boto/ec2/elb/__init__.py\", line 481, in get_lb_attribute
    attributes = self.get_all_lb_attributes(load_balancer_name)
  File \"/Users/sbussetti/.virtualenvs/devops/lib/python2.7/site-packages/boto/ec2/elb/__init__.py\", line 459, in get_all_lb_attributes
    params, LbAttributes)
  File \"/Users/sbussetti/.virtualenvs/devops/lib/python2.7/site-packages/boto/connection.py\", line 1208, in get_object
    raise self.ResponseError(response.status, response.reason, body)
boto.exception.BotoServerError: BotoServerError: 400 Bad Request
<ErrorResponse xmlns=\"http://elasticloadbalancing.amazonaws.com/doc/2012-06-01/\">
  <Error>
    <Type>Sender</Type>
    <Code>Throttling</Code>
    <Message>Rate exceeded</Message>
  </Error>
  <RequestId>052ac731-98c8-11e7-9ec3-7d07fe631ddd</RequestId>
</ErrorResponse>

", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 0}

@smiller171
Copy link
Contributor

@bcoca This has been an issue since 2014 and is still present. Are you able to comment on this bug? Has anyone on the team taken a look at this? I don't think it would be terribly difficult to implement retries in the case of throttling since Boto is explicit that it's a throttling error.

@ansibot
Copy link
Contributor Author

ansibot commented Nov 19, 2017

@ansibot ansibot added bug This issue/PR relates to a bug. and removed bug_report labels Mar 1, 2018
@ansibot ansibot added the traceback This issue/PR includes a traceback. label May 28, 2018
@ansibot ansibot added support:core This issue/PR relates to code supported by the Ansible Engineering Team. and removed support:certified This issue/PR relates to certified code. labels Sep 17, 2018
@ansibot ansibot added needs_maintainer Ansibot is unable to identify maintainers for this PR. (Check `author` in docs or BOTMETA.yml) support:certified This issue/PR relates to certified code. and removed support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Oct 3, 2018
@jamiecwilliams
Copy link

@s-hertel, PR #31892 that you referenced was closed. We are still experiencing this throttling issue in Ansible 2.6.4. Is there a plan to address this?

@ansibot ansibot added support:community This issue/PR relates to code supported by the Ansible community. and removed support:certified This issue/PR relates to certified code. labels Oct 11, 2018
@biohazd
Copy link
Contributor

biohazd commented Nov 8, 2018

+1 this is a very important issue to fix.

@ansibot ansibot removed the needs_maintainer Ansibot is unable to identify maintainers for this PR. (Check `author` in docs or BOTMETA.yml) label Nov 16, 2018
@jaksah
Copy link

jaksah commented Jan 29, 2019

According to the boto documentation for get_all_load_balancer there is an optional parameter to specify load balancer names. Even if the ec2_elbs argument is defined, this property is not used, resulting in multiple calls to AWS for traversing the pagination. I've created a PR (#51424) for sending the ec2_elbs to get_all_load_balancers. Hopefully this can reduce the request rate, especially for accounts with large number of load balancers.

@ansibot ansibot removed the deprecated This issue/PR relates to a deprecated module. label Feb 6, 2019
@ansibot ansibot added the has_pr This issue has an associated PR. label Jul 24, 2019
@gibsonje
Copy link

gibsonje commented Dec 6, 2019

December 2019 I'm repeatedly seeing this issue. All other AWS modules are not giving me rate limit errors. I upgraded to ansible 2.9.0 and using the throttle keyword no other parts of my playbook are giving rate limit errors. However, this ec2_elb module very frequently does. I've had to try a lot of adjustments and retries to get this to reliably work.

I can only assume my other modules are masking the problem by doing exponential backoff retries behind the scenes and this module is the victim: rate limited by previous module executions and having no recourse but to fail hard.

It would be great if a boto3 module existed for this with the retry logic. Other modules are working great, especially in combination with throttle: 1 to avoid rate limiting.

@ansibot
Copy link
Contributor Author

ansibot commented Jan 31, 2020

@ansibot ansibot added collection Related to Ansible Collections work collection:community.aws needs_collection_redirect https://github.com/ansible/ansibullbot/blob/master/docs/collection_migration.md labels Apr 29, 2020
@ansibot ansibot added the needs_triage Needs a first human triage before being processed. label May 16, 2020
@ansibot
Copy link
Contributor Author

ansibot commented Aug 16, 2020

Thank you very much for your interest in Ansible. Ansible has migrated much of the content into separate repositories to allow for more rapid, independent development. We are closing this issue/PR because this content has been moved to one or more collection repositories.

For further information, please see:
https://github.com/ansible/ansibullbot/blob/master/docs/collection_migration.md

@ansibot ansibot closed this as completed Aug 16, 2020
@sivel sivel removed the needs_triage Needs a first human triage before being processed. label Aug 17, 2020
@ansible ansible locked and limited conversation to collaborators Sep 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_1.7 This issue/PR affects Ansible v1.7 aws bot_closed bug This issue/PR relates to a bug. cloud collection:community.aws collection Related to Ansible Collections work has_pr This issue has an associated PR. module This issue/PR relates to a module. needs_collection_redirect https://github.com/ansible/ansibullbot/blob/master/docs/collection_migration.md support:community This issue/PR relates to code supported by the Ansible community. traceback This issue/PR includes a traceback.
Projects
None yet
Development

Successfully merging a pull request may close this issue.