New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix several issues with the autodraining lambda #23
Conversation
This patch makes the following changes that address several issues and cleans up the existing Lambda function: - Previously, the old Lambda function was included as a separate 8.1 MB ZIP file that needed to be stored at S3 and managed separately from the rest of your ECS cluster. Python code in AWS Lambda no longer needs to bundle all of its dependencies . With all of the refactoring above, the new Python code is small enough that it is embedded directly in the CloudFormation template to reduce external dependencies. This will make it easy to make changes to this code on a branch and test it against a single ECS cluster. - The AWS code discovers the ECS cluster name by parsing the user data. This will cause issues if the EC2 instances need to be bootstrapped with other items through the user data script. - The AWS code can post messages to the wrong SNS topic when retrying. It looks for the first SNS topic in the account that has a lambda function subscribed to it and posts the retry message to that topic. - The AWS code does not do any kind of pagination against the ECS API when reading the list of EC2 instances. So if it couldn't find the instance ID that was about to be terminated on the first page, then the instance was not set to DRAINING and the end users would see 50X messages when the operation timed out and autoscaling killed the instance. - The retry logic did not put in any kind of delay in place when retrying. The Lambda function would be invoked about 5-10 times a second, and each Lambda function invocation would probably make close to a dozen AWS API calls. A 5 second delay between each retry was introduced. - There was a large amount of unused code and variables in the in the AWS implementation. - Converted the code from Python 2 to 3.
within the CloudFormation template.
Totally legit, great work, thanks @masneyb 👍 |
Thanks! Here is a blog post that describes how we are using ECS at my employer: https://techblog.realtor.com/a-better-ecs/. It includes a CloudFormation template with the corrected autodraining Lambda. Brian |
Calling time.sleep(5) does not stop the execution environment. You will still pay for the duration of the execution environment, from the time the function is invoked until the time the function exits. |
That is correct. The sleep time is to avoid the API throttling errors
that we were encountering in our accounts (even beyond the autodraining
lambdas).
According to the AWS pricing page, the 128MB Lambdas with 128MB of RAM
get 3,200,000 free seconds per month. Beyond that, each 5 second pause
will cost cost $0.000010400. If I recall correctly, the EC2 scale down
timeout is set to 15 minutes in the template.
Brian
…On Wed, Jul 11, 2018 at 11:28:31AM -0700, Leonid wrote:
Calling time.sleep(5) does not stop the execution environment. You will still pay for the duration of the execution environment, from the time the function is invoked until the time the function exits.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#23 (comment)
|
These changes look really great. I don't have write access right now, but will try to track down someone who can merge this. Edit: @mperi is looking at it and testing before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masneyb : Great simplication and enhancements. Line 425 in ecs.yaml needs to reference CFN clustername instead of stack name. Rest all looks good testing wise. Thank you!
@mperi : I corrected the ECS cluster name. Let me know if you need any other changes. Brian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Brian. Tested and merging
OK, thanks. Let me know if I need to do anything else to get this merged. Brian |
Merging post testing. Thank you for the changes in PR. |
This patch makes the following changes that address several issues
and cleans up the existing Lambda function:
ZIP file that needed to be stored at S3 and managed separately from
the rest of your ECS cluster. Python code in AWS Lambda no longer
needs to bundle all of its dependencies . With all of the refactoring
above, the new Python code is small enough that it is embedded
directly in the CloudFormation template to reduce external
dependencies. This will make it easy to make changes to this code on
a branch and test it against a single ECS cluster.
This will cause issues if the EC2 instances need to be bootstrapped
with other items through the user data script.
It looks for the first SNS topic in the account that has a lambda
function subscribed to it and posts the retry message to that topic.
when reading the list of EC2 instances. So if it couldn't find the
instance ID that was about to be terminated on the first page, then
the instance was not set to DRAINING and the end users would see 50X
messages when the operation timed out and autoscaling killed the
instance.
retrying. The Lambda function would be invoked about 5-10 times a
second, and each Lambda function invocation would probably
make close to a dozen AWS API calls. A 5 second delay between each
retry was introduced.
AWS implementation.