Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaling is suspended when lifecycled crashes #344

Closed
lox opened this issue Sep 13, 2017 · 0 comments
Closed

Autoscaling is suspended when lifecycled crashes #344

lox opened this issue Sep 13, 2017 · 0 comments

Comments

@lox
Copy link
Contributor

lox commented Sep 13, 2017

Just encountered a gnarly issue. The symptoms were that we have lots of queued jobs and the scale up alarm was firing, but our stack wasn't scaling up. There was a single instance sitting waiting for it's lifecycle hooks to finish.

When I logged into the instance, it looked like a network partition had made SQS inaccessible so lifecycled thrashed until terminating:

Sep 12 23:03:58 ip-10-0-1-244 lifecycled: time="2017-09-12T23:03:58Z" level=info msg="Looking up instance id from metadata service"
Sep 12 23:04:14 ip-10-0-1-244 lifecycled: time="2017-09-12T23:04:14Z" level=info msg="Listening for lifecycle notifications"
Sep 13 00:29:04 ip-10-0-1-244 lifecycled: time="2017-09-13T00:29:04Z" level=info msg="Failed to query metadata service" error="Get http://169.254.169.254/latest/meta-data/spot/termination-time: dial tcp 169.254.169.254:80: socket: too many open files"
Sep 13 00:29:05 ip-10-0-1-244 lifecycled: lifecycled: error: RequestError: send request failed
Sep 13 00:29:05 ip-10-0-1-244 lifecycled: caused by: Post https://sqs.us-east-1.amazonaws.com/: dial tcp: lookup sqs.us-east-1.amazonaws.com on 10.0.0.2:53: no such host, try --help
Sep 13 00:29:05 ip-10-0-1-244 init: lifecycled main process (2898) terminated with status 1
Sep 13 00:29:05 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:29:05 ip-10-0-1-244 lifecycled: time="2017-09-13T00:29:05Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:29:05 ip-10-0-1-244 lifecycled: time="2017-09-13T00:29:05Z" level=info msg="Listening for lifecycle notifications"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: lifecycled: error: InternalError: We encountered an internal error. Please try again.
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: #011status code: 500, request id: 2a4d207a-c108-5ebd-b980-724840885298, try --help
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (7872) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 50aa8bc3-98cc-5cfd-8bd0-d9f119545ab9"
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (8189) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 657ba801-09b6-567f-bd10-fa33882ecb1a"
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (8200) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: e8ab9033-a92f-5b5a-a006-c20128b73523"
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (8211) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 9a1103c2-558e-52fe-81d1-51f0d416306e"
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (8222) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 6ec31b6e-20cc-5e66-b234-495ca15424f9"
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (8233) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:08 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:08Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 14c60a9e-1f48-580c-9652-58ac9ed0c8cf"
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process (8244) terminated with status 1
Sep 13 00:44:08 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 0c3a07ac-28be-53e5-af52-559ea927b5e3"
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process (8255) terminated with status 1
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 894cb688-d7f1-5b32-8230-e4ba5668ff61"
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process (8266) terminated with status 1
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: 9eaaf3c3-df2d-524b-8087-a1f15eb46a57"
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process (8277) terminated with status 1
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process ended, respawning
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=info msg="Looking up instance id from metadata service"
Sep 13 00:44:09 ip-10-0-1-244 lifecycled: time="2017-09-13T00:44:09Z" level=fatal msg="AWS.SimpleQueueService.QueueDeletedRecently: You must wait 60 seconds after deleting a queue before you can create another with the same name.\n\tstatus code: 400, request id: fdd2e998-a978-5ecf-9ccc-bbaa28e1b573"
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled main process (8288) terminated with status 1
Sep 13 00:44:09 ip-10-0-1-244 init: lifecycled respawning too fast, stopped
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant