Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boto - Attaching to Load balancer #5076

Closed
brandonhilkert opened this issue Nov 27, 2013 · 31 comments
Closed

Boto - Attaching to Load balancer #5076

brandonhilkert opened this issue Nov 27, 2013 · 31 comments
Labels
bug This issue/PR relates to a bug.

Comments

@brandonhilkert
Copy link
Contributor

boto==2.17.0 (also tried with 2.18.0)

I updated to Ansible 1.4, and ever since, I get the following error when attaching a machine to an EC2 load balancer. I realize this is a boto response, but I was hoping you have might have insight to version issues or something else that might have caused this.

The DNS name is right and the instance name is says that it can't find it there as well.

TASK: [attach to load balancer] ***********************************************
failed: [ec2-54-205-199-70.compute-1.amazonaws.com] => (item=ec2-54-205-199-70.compute-1.amazonaws.com) => {"failed": true, "item": "ec2-54-205-199-70.compute-1.amazonaws.com", "parsed": false}
invalid output was: Traceback (most recent call last):
  File "/home/ubuntu/.ansible/tmp/ansible-1385561931.01-206921944768593/ec2_elb", line 1365, in <module>
    main()
  File "/home/ubuntu/.ansible/tmp/ansible-1385561931.01-206921944768593/ec2_elb", line 291, in main
    elb_man.register(wait, enable_availability_zone)
  File "/home/ubuntu/.ansible/tmp/ansible-1385561931.01-206921944768593/ec2_elb", line 154, in register
    initial_state = lb.get_instance_health([self.instance_id])[0]
  File "/usr/local/lib/python2.7/dist-packages/boto/ec2/elb/loadbalancer.py", line 267, in get_instance_health
    return self.connection.describe_instance_health(self.name, instances)
  File "/usr/local/lib/python2.7/dist-packages/boto/ec2/elb/__init__.py", line 454, in describe_instance_health
    [('member', InstanceState)])
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1118, in get_list
    raise self.ResponseError(response.status, response.reason, body)
boto.exception.BotoServerError: BotoServerError: 400 Bad Request
<ErrorResponse xmlns="http://elasticloadbalancing.amazonaws.com/doc/2012-06-01/">
  <Error>
    <Type>Sender</Type>
    <Code>InvalidInstance</Code>
    <Message>Could not find EC2 instance i-880518f3.</Message>
  </Error>
  <RequestId>d7a48851-576e-11e3-8587-0fdf568091fb</RequestId>
</ErrorResponse>



FATAL: all hosts have already failed -- aborting
@brandonhilkert
Copy link
Contributor Author

Here's the play:

- name: update LB
  gather_facts: false
  hosts:
    - tag_role_{{ role }}

  tasks:
    - name: get info about servers
      action: ec2_facts

    - name: attach to load balancer
      local_action: ec2_elb
      with_items: groups['tag_role_{{ role }}']
      args:
        instance_id: "{{ hostvars[item].ansible_ec2_instance_id }}"
        ec2_elbs: "{{ lb }}"
        state: "{{ state }}"
        region: "us-east-1"

@jctanner
Copy link
Contributor

Are you sure that i-880518f3 exists in us-east-1 ?

@brandonhilkert
Copy link
Contributor Author

screen shot 2013-11-27 at 9 49 55 am

@brandonhilkert
Copy link
Contributor Author

here's debugging getting the instance info before:

PLAY [update LB] **************************************************************

TASK: [get info about servers] ************************************************
ok: [ec2-54-205-199-70.compute-1.amazonaws.com]

TASK: [debug var=hostvars[item].ansible_ec2_instance_id] **********************
ok: [ec2-54-205-199-70.compute-1.amazonaws.com] => (item=ec2-54-205-199-70.compute-1.amazonaws.com) => {
    "hostvars[item].ansible_ec2_instance_id": "i-880518f3",
    "item": "ec2-54-205-199-70.compute-1.amazonaws.com"
}

TASK: [attach to load balancer] ***********************************************
failed: [ec2-54-205-199-70.compute-1.amazonaws.com] => (item=ec2-54-205-199-70.compute-1.amazonaws.com) => {"failed": true, "item": "ec2-54-205-199-70.compute-1.amazonaws.com", "parsed": false}
invalid output was: Traceback (most recent call last):
  File "/Users/bhilkert/.ansible/tmp/ansible-1385563531.98-76133645732484/ec2_elb", line 1365, in <module>
    main()
  File "/Users/bhilkert/.ansible/tmp/ansible-1385563531.98-76133645732484/ec2_elb", line 291, in main
    elb_man.register(wait, enable_availability_zone)
  File "/Users/bhilkert/.ansible/tmp/ansible-1385563531.98-76133645732484/ec2_elb", line 154, in register
    initial_state = lb.get_instance_health([self.instance_id])[0]
  File "/usr/local/lib/python2.7/site-packages/boto/ec2/elb/loadbalancer.py", line 239, in get_instance_health
    return self.connection.describe_instance_health(self.name, instances)
  File "/usr/local/lib/python2.7/site-packages/boto/ec2/elb/__init__.py", line 454, in describe_instance_health
    [('member', InstanceState)])
  File "/usr/local/lib/python2.7/site-packages/boto/connection.py", line 1076, in get_list
    raise self.ResponseError(response.status, response.reason, body)
boto.exception.BotoServerError: BotoServerError: 400 Bad Request
<ErrorResponse xmlns="http://elasticloadbalancing.amazonaws.com/doc/2012-06-01/">
  <Error>
    <Type>Sender</Type>
    <Code>InvalidInstance</Code>
    <Message>Could not find EC2 instance i-880518f3.</Message>
  </Error>
  <RequestId>91cb889a-5772-11e3-9b16-cd2af0e0b575</RequestId>
</ErrorResponse>

@brandonhilkert
Copy link
Contributor Author

As suggested by @jctanner, tried HEAD and got no difference:

ubuntu@ip-10-147-227-145:~/pipelinedeals$ ansible --version
ansible 1.5 (devel 696ce0effe) last updated 2013/11/27 10:30:35 (GMT -400)
PLAY [update LB] **************************************************************

TASK: [get info about servers] ************************************************
ok: [ec2-54-205-199-70.compute-1.amazonaws.com]

TASK: [attach to load balancer] ***********************************************
failed: [ec2-54-205-199-70.compute-1.amazonaws.com] => (item=ec2-54-205-199-70.compute-1.amazonaws.com) => {"failed": true, "item": "ec2-54-205-199-70.compute-1.amazonaws.com", "parsed": false}
invalid output was: Traceback (most recent call last):
  File "/home/ubuntu/.ansible/tmp/ansible-1385566255.02-64214058287739/ec2_elb", line 1365, in <module>
    main()
  File "/home/ubuntu/.ansible/tmp/ansible-1385566255.02-64214058287739/ec2_elb", line 291, in main
    elb_man.register(wait, enable_availability_zone)
  File "/home/ubuntu/.ansible/tmp/ansible-1385566255.02-64214058287739/ec2_elb", line 154, in register
    initial_state = lb.get_instance_health([self.instance_id])[0]
  File "/usr/lib/python2.7/dist-packages/boto/ec2/elb/loadbalancer.py", line 217, in get_instance_health
    return self.connection.describe_instance_health(self.name, instances)
  File "/usr/lib/python2.7/dist-packages/boto/ec2/elb/__init__.py", line 349, in describe_instance_health
    [('member', InstanceState)])
  File "/usr/lib/python2.7/dist-packages/boto/connection.py", line 896, in get_list
    raise self.ResponseError(response.status, response.reason, body)
boto.exception.BotoServerError: BotoServerError: 400 Bad Request
<ErrorResponse xmlns="http://elasticloadbalancing.amazonaws.com/doc/2011-11-15/">
  <Error>
    <Type>Sender</Type>
    <Code>InvalidInstance</Code>
    <Message>Could not find EC2 instance i-880518f3.</Message>
  </Error>
  <RequestId>e8df09ec-5778-11e3-9532-15ccd612d0d3</RequestId>
</ErrorResponse>



FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/ubuntu/lb.retry

ec2-54-205-199-70.compute-1.amazonaws.com : ok=1    changed=0    unreachable=0    failed=1

@brandonhilkert
Copy link
Contributor Author

This commit (3ea0b2b) caused it to stop working

@brandonhilkert
Copy link
Contributor Author

adding enable_availability_zone: no fixes my issue. But I'm not sure what the setting means. We've enable multi-AZ load balancing, so i'm not sure why the module would fail if it's trying to do that for me and it's already been done.

@brandonhilkert
Copy link
Contributor Author

This line specifically broke it: 3ea0b2b#diff-65143e5debe6a07b9e81d5d14fc8abbbR154

@mpdehaan
Copy link
Contributor

We should definitely not revert the above as it has a lot of useful things in it.

Do you have some cycles to attempt to fix this in a less invasive way or are you just asking for us to do it? :)

@brandonhilkert
Copy link
Contributor Author

Looking now. I might need some feedback though since my python fu is not up
to snuff.
On Nov 27, 2013 11:49 AM, "Michael DeHaan" notifications@github.com wrote:

We should definitely not revert the above as it has a lot of useful things
in it.

Do you have some cycles to attempt to fix this in a less invasive way or
are you just asking for us to do it? :)


Reply to this email directly or view it on GitHubhttps://github.com//issues/5076#issuecomment-29399916
.

@brandonhilkert
Copy link
Contributor Author

On HEAD, i get the following error if I don't specify the enable_availability_zone: no property:

TASK: [attach to load balancer] ***********************************************
failed: [ec2-54-205-199-70.compute-1.amazonaws.com] => (item=ec2-54-205-199-70.compute-1.amazonaws.com) => {"failed": t
rue, "item": "ec2-54-205-199-70.compute-1.amazonaws.com", "parsed": false}
invalid output was: Traceback (most recent call last):
  File "/home/ubuntu/.ansible/tmp/ansible-1385571333.74-89591837652002/ec2_elb", line 1366, in <module>
    main()
  File "/home/ubuntu/.ansible/tmp/ansible-1385571333.74-89591837652002/ec2_elb", line 292, in main
    elb_man.register(wait, enable_availability_zone)
  File "/home/ubuntu/.ansible/tmp/ansible-1385571333.74-89591837652002/ec2_elb", line 157, in register
    self._enable_availailability_zone(lb)
  File "/home/ubuntu/.ansible/tmp/ansible-1385571333.74-89591837652002/ec2_elb", line 180, in _enable_availailability_z
one
    instance = self._get_instance()
  File "/home/ubuntu/.ansible/tmp/ansible-1385571333.74-89591837652002/ec2_elb", line 250, in _get_instance
    return ec2_conn.get_only_instances(instance_ids=[self.instance_id])[0]
AttributeError: 'EC2Connection' object has no attribute 'get_only_instances'


FATAL: all hosts have already failed -- aborting

@brandonhilkert
Copy link
Contributor Author

Disregard the last comment, old version of boto.

So it appears that the newest version of boto (2.18) does work with HEAD.

Closing.

@brandonhilkert
Copy link
Contributor Author

Opening this back up...I hit it again on devel branch with boto version 2.19. Maybe someone can point me in the right direction.

LB is there, this worked ok for me in 1.3.x.

TASK: [attach to load balancer] ***********************************************
failed: [ec2-54-234-196-20.compute-1.amazonaws.com] => (item=ec2-54-234-196-20.compute-1.amazonaws.com) => {"failed": true, "item": "ec2-54-234-196-20.compute-1.amazonaws.com", "parsed": false}
invalid output was: Traceback (most recent call last):
  File "/home/ubuntu/.ansible/tmp/ansible-1385997896.35-22143209806618/ec2_elb", line 1365, in <module>
    main()
  File "/home/ubuntu/.ansible/tmp/ansible-1385997896.35-22143209806618/ec2_elb", line 291, in main
    elb_man.register(wait, enable_availability_zone)
  File "/home/ubuntu/.ansible/tmp/ansible-1385997896.35-22143209806618/ec2_elb", line 154, in register
    initial_state = lb.get_instance_health([self.instance_id])[0]
  File "/usr/local/lib/python2.7/dist-packages/boto/ec2/elb/loadbalancer.py", line 320, in get_instance_health
    return self.connection.describe_instance_health(self.name, instances)
  File "/usr/local/lib/python2.7/dist-packages/boto/ec2/elb/__init__.py", line 524, in describe_instance_health
    [('member', InstanceState)])
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1119, in get_list
    raise self.ResponseError(response.status, response.reason, body)
boto.exception.BotoServerError: BotoServerError: 400 Bad Request
<ErrorResponse xmlns="http://elasticloadbalancing.amazonaws.com/doc/2012-06-01/">
  <Error>
    <Type>Sender</Type>
    <Code>InvalidInstance</Code>
    <Message>Could not find EC2 instance i-d5ffc7b0.</Message>
  </Error>
  <RequestId>e6ee2d4d-5b65-11e3-b220-490e0a871a9a</RequestId>
</ErrorResponse>

@brandonhilkert brandonhilkert reopened this Dec 2, 2013
@brandonhilkert
Copy link
Contributor Author

Confirmed that checking out 1.3.4 restores this functionality.

@nickdevereaux
Copy link

Replicated with RHEL available boto (2.13.3) and ansible 1.4

Updating ansible to 1.5 reproduces the error.

Updating boto to 2.19.0 reproduces the error.

So in simpler terms the ansible ec2_elb module is checking the health of the instance as a "pre" state to confirm against the registration? How does this make sense when adding something to the pool requires its state in the pool to be checked as a member? I feel like I'm taking crazy pills!

@brandonhilkert
Copy link
Contributor Author

I'd like to help fix this since it's a blocker for me, but I need some suggestions. Here is the offending line:

https://github.com/ansible/ansible/blob/devel/library/cloud/ec2_elb#L154

initial_state = lb.get_instance_health([self.instance_id])[0]

The instance is being checked in the context of the load balancer, which it is not attached to, so it fails with:

<Message>Could not find EC2 instance i-ec11ee91.</Message>

In Ruby, I might consider rescuing that exception when it doesn't find it and set some default. Any suggestions?

@brandonhilkert
Copy link
Contributor Author

@jsdalton I'm hoping you could add some insight to this (https://github.com/ansible/ansible/pull/4112/files#diff-65143e5debe6a07b9e81d5d14fc8abbbR194)...This line seems to have broken ELB for me and others with the errors above. I don't know python and I was hoping to get your insight and maybe some assistance with a fix.

@jsdalton
Copy link
Contributor

jsdalton commented Dec 9, 2013

Thanks for alerting me to this @brandonhilkert.

Your diagnosis appears correct as I'm reviewing the traceback. My assumption here is that an instance that has never been registered with a given load balancer will throw a Boto exception if it's instance health is queried, whereas once it has already been registered at some point, if it's not in service it will return Out of Service. The EC2 API is filled with corner cases like these that are tricky to develop for. :)

I think you're right that catching this particular error is the right approach. The state checking is really only necessary if "wait" mode is on -- i.e. if you want this module to wait around for the actual registration or deregistration to occur, and report the new state, do idempotence checks etc. So really that check for initial state should be scoped under if wait.

I'll try to find some time to put together a fix today if I can. I hate that feeling of knowing exactly what some code needs to do but not being quite familiar enough with the language to be able to express it.

@brandonhilkert
Copy link
Contributor Author

Thanks for the detailed feedback. ...And I hate knowing sort of what's going on, but not being able to help much. If I can, in any way, please let me know. I'm happy to do whatever necessary to get over this road block. Thanks!

jsdalton pushed a commit to jsdalton/ansible that referenced this issue Dec 10, 2013
@jsdalton
Copy link
Contributor

Okay @brandonhilkert I just submitted a pull request that hopefully fixes the issue. There was actually another issue hiding behind the first one you discovered, where an instance enters a "pending" state when first registered and we weren't accounting for it.

All in all this was kind of an annoying issue to deal with. It's a bit odd to me that AWS chooses to treat this as an error condition, though I guess they have their reasons. They also don't do a good job of distinguishing between an instance that's out of service because the operation is pending vs. one that's out of service for failed health check (the first will eventually go in service, the second presumably never will).

Anyhow, I'd appreciate it if you were able to try this code out yourself and let me know if it works for you. it's definitely a pain to test corner cases....this one in particular was annoying since i needed to use a fresh instance or EBL every time I wanted to see if it worked. :)

Let me know and hopefully an admin will accept my pull request shortly.

@nickdevereaux
Copy link

Thanks @jsdalton (and @brandonhilkert)! I'd been meaning to write a patch too but got sidetracked on some other work. Fingers crossed for an acceptance!

@jsdalton
Copy link
Contributor

@nickdevereaux No problem at all -- sorry for the initial difficulties you hit. If you have a chance at all to try out my patch, let me know if it works or if you run into any problems...

@brandonhilkert
Copy link
Contributor Author

Thanks @jadalton for the attention. I'll check it out tomorrow first thing.
I've got some machines sitting waiting to be tested on and will circle back
with feedback at that time.

On Monday, December 9, 2013, jsdalton wrote:

Okay @brandonhilkert https://github.com/brandonhilkert I just submitted
a pull request that hopefully fixes the issue. There was actually another
issue hiding behind the first one you discovered, where an instance enters
a "pending" state when first registered and we weren't accounting for it.

All in all this was kind of an annoying issue to deal with. It's a bit odd
to me that AWS chooses to treat this as an error condition, though I guess
they have their reasons. They also don't do a good job of distinguishing
between an instance that's out of service because the operation is pending
vs. one that's out of service for failed health check (the first will
eventually go in service, the second presumably never will).

Anyhow, I'd appreciate it if you were able to try this code out yourself
and let me know if it works for you. it's definitely a pain to test corner
cases....this one in particular was annoying since i needed to use a fresh
instance or EBL every time I wanted to see if it worked. :)

Let me know and hopefully an admin will accept my pull request shortly.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5076#issuecomment-30197280
.


http://brandonhilkert.com

@nickdevereaux
Copy link

Got it working! Thanks @jsdalton

@brandonhilkert
Copy link
Contributor Author

Confirmed that it works for me too. Thanks @jsdalton! This is a huge life-saver...

@brandonhilkert
Copy link
Contributor Author

Closing in favor of the PR.

jctanner added a commit that referenced this issue Dec 10, 2013
Account for instances that have not yet been registered. Fixes #5076
@jmhodges
Copy link

This seems to still exist on master? From the docs, it's not clear what I'm supposed to do. I see mentions of availability zones here, but I don't follow how that comes from this error message.

TASK: [Register ec2 instance in elb] ****************************************** 
failed: [ec2-54-203-118-51.us-west-2.compute.amazonaws.com] => (item=howsmyssl-simple-lb) => {"failed": true, "item": "howsmyssl-simple-lb", "parsed": false}
invalid output was: Traceback (most recent call last):
  File "/home/vagrant/.ansible/tmp/ansible-1387152129.09-249710582431813/ec2_elb", line 1410, in <module>
    main()
  File "/home/vagrant/.ansible/tmp/ansible-1387152129.09-249710582431813/ec2_elb", line 333, in main
    elb_man.register(wait, enable_availability_zone)
  File "/home/vagrant/.ansible/tmp/ansible-1387152129.09-249710582431813/ec2_elb", line 163, in register
    self._enable_availailability_zone(lb)
  File "/home/vagrant/.ansible/tmp/ansible-1387152129.09-249710582431813/ec2_elb", line 188, in _enable_availailability_zone
    instance = self._get_instance()
  File "/home/vagrant/.ansible/tmp/ansible-1387152129.09-249710582431813/ec2_elb", line 291, in _get_instance
    return ec2_conn.get_only_instances(instance_ids=[self.instance_id])[0]
AttributeError: 'EC2Connection' object has no attribute 'get_only_instances'

@jmhodges
Copy link

(Well, I mean, obviously from the stack trace it's about zones, but I'm surprised that an attribute error is thrown if the wrong config option is in place. This may just be my ignorance of how boto works.)

@brandonhilkert
Copy link
Contributor Author

@jmhodges what version boto?

@jmhodges
Copy link

Oh, oh dear. I have 2.9.6-1. This is terribly old, isn't it? I'll upgrade. Apologies.

@brandonhilkert
Copy link
Contributor Author

Yup, the method get_only_instances is probably not on those versions. I've hit errors like this a few times and it's not been obvious the issue was the boto version. I wish there was a better way to manage external dependencies like Boto.

jimi-c pushed a commit that referenced this issue Dec 6, 2016
)

* Restart EC2 instances with multiple network interfaces

A previous bug, #3234, caused instances with multiple ENI's to fail when being
started or stopped because sourceDestCheck is a per-interface attribute, but we
use the boto global access to it (which only works when there's a single ENI).

This patch handles a variant of that bug that only surfaced when restarting an
instance, and catches the same type of exception.

* Default termination_protection to None instead of False

AWS defaults the value of termination_protection to False, so we don't
need to explicitly send `False` when the user hasn't specified a
termination protection level. Before this patch, the below pair of tasks
would:

1. Create an instance (enabling termination_protection)
2. Restart that instance (disabling termination_protection)

Now, the default None value would prevent the restart task from
disabling termination_protection.

```
- name: make an EC2 instance
  ec2:
    vpc_subnet_id: {{ subnet  }}
    instance_type: t2.micro
    termination_protection: yes
    exact_count: 1
    count_tag:
       Name: TestInstance
    instance_tags:
       Name: TestInstance
    group_id: {{ group }}
    image: ami-7172b611
    wait: yes
- name: restart a protected EC2 instance
  ec2:
    vpc_subnet_id: {{ subnet  }}
    state: restarted
    instance_tags:
       Name: TestInstance
    group_id: {{ group }}
    image: ami-7172b611
    wait: yes
```
@ansibot ansibot added bug This issue/PR relates to a bug. and removed bug_report labels Mar 6, 2018
@ansible ansible locked and limited conversation to collaborators Apr 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue/PR relates to a bug.
Projects
None yet
Development

No branches or pull requests

7 participants