Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix several issues with the autodraining lambda #23

Merged
merged 3 commits into from Aug 1, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
16 changes: 2 additions & 14 deletions README.md
Expand Up @@ -6,15 +6,10 @@ There are times when EC2 instances need to be removed from the cluster, for exam


# Overview of steps
1. Download index.zip from this repository
1. Download the CloudFormation template

2. Upload the downloaded index.zip containing Lambda code index.py to [Your_AWS_Account_S3_Bucket]
2. Launch the CloudFormation template that creates the following AWS resources:

3. Download the CloudFormation template

4. Launch the CloudFormation template that creates the following AWS resources:

* CloudFormation will require S3 bucket name as one of the parameters you created in Step 2 above.
* The VPC and associated network elements (subnets, security groups, route table, etc)
* ECS Cluster, ECS service, a sample ECS task definition
* Auto scaling group with two EC2 instances and a termination lifecycle hook
Expand All @@ -26,9 +21,6 @@ For the full solution overview visit [Blog link](https://aws.amazon.com/blogs/co
## CloudFormation template
- cform/ecs.yaml

## Solution code
- code/index.py

***

Copyright 2016-2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Expand All @@ -38,7 +30,3 @@ Licensed under the Apache License, Version 2.0 (the "License"). You may not use
http://aws.amazon.com/apache2.0/

or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

## PLEASE NOTE

Lambda version of boto3 DOES NOT have the latest version of boto3 that (as of February 10th). This leads to user experiencing issue similar to one reported in https://github.com/awslabs/ecs-cid-sample/issues/2 . Please use the index.zip package provided, or if you are customizing code, package up the latest boto3 in the Lambda zip file.
107 changes: 79 additions & 28 deletions cform/ecs.yaml
Expand Up @@ -15,10 +15,6 @@ Parameters:
Description: >
REQUIRED - Specifies the name of an existing Amazon EC2 key pair
to enable SSH access to the EC2 instances in your cluster.
S3BucketName:
Type: String
Description: >
REQUIRED - Specifies the name of your AWS account S3 bucket in which the index.zip file is stored. index.zip contains the Python Lambda code index.py; Please download this from https://github.com/awslabs/ecs-cid-sample/tree/master/code and upload to your S3 bucket.
EcsAmiId:
Type: String
Description: REQUIRED - Default ECS Optimized AMI for us-west-2 region. Please change it to reflect your regions' latest ECS AMI-ID
Expand Down Expand Up @@ -261,10 +257,7 @@ Resources:
NotificationConfigurations:
- TopicARN: !Ref ASGSNSTopic
NotificationTypes:
- autoscaling:EC2_INSTANCE_LAUNCH
- autoscaling:EC2_INSTANCE_LAUNCH_ERROR
- autoscaling:EC2_INSTANCE_TERMINATE
- autoscaling:EC2_INSTANCE_TERMINATE_ERROR
Tags:
-
Key: Name
Expand Down Expand Up @@ -378,26 +371,16 @@ Resources:
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
- Effect: "Allow"
Action:
- autoscaling:CompleteLifecycleAction
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
- ec2:DescribeInstances
- ec2:DescribeInstanceAttribute
- ec2:DescribeInstanceStatus
- ec2:DescribeHosts
- ecs:ListContainerInstances
- ecs:SubmitContainerStateChange
- ecs:SubmitTaskStateChange
- ecs:DescribeContainerInstances
- ecs:UpdateContainerInstancesState
- ecs:ListTasks
- ecs:DescribeTasks
- sns:Publish
- sns:ListSubscriptions
Resource: "*"
AssumeRolePolicyDocument:
Version: "2012-10-17"
Expand Down Expand Up @@ -426,17 +409,85 @@ Resources:
LambdaFunctionForASG:
Type: "AWS::Lambda::Function"
Properties:
Description: Gracefully drain ECS tasks from EC2 instances before the instances are
terminated by autoscaling.
Handler: index.lambda_handler
Role: !GetAtt LambdaExecutionRole.Arn
Runtime: python3.6
MemorySize: 128
Timeout: 60
Code:
S3Bucket: !Ref S3BucketName
S3Key: "index.zip"
Description: Lambda code for the autoscaling hook triggers invoked when autoscaling events of launching and terminating instance occur
Handler: "index.lambda_handler"
Role:
Fn::GetAtt:
- "LambdaExecutionRole"
- "Arn"
Runtime: "python2.7"
Timeout: "300"
ZipFile: !Sub |
import json
import time
import boto3

CLUSTER = '${EcsClusterName}'
REGION = '${AWS::Region}'

ECS = boto3.client('ecs', region_name=REGION)
ASG = boto3.client('autoscaling', region_name=REGION)
SNS = boto3.client('sns', region_name=REGION)

def find_ecs_instance_info(instance_id):
paginator = ECS.get_paginator('list_container_instances')
for list_resp in paginator.paginate(cluster=CLUSTER):
arns = list_resp['containerInstanceArns']
desc_resp = ECS.describe_container_instances(cluster=CLUSTER,
containerInstances=arns)
for container_instance in desc_resp['containerInstances']:
if container_instance['ec2InstanceId'] != instance_id:
continue

print('Found instance: id=%s, arn=%s, status=%s, runningTasksCount=%s' %
(instance_id, container_instance['containerInstanceArn'],
container_instance['status'], container_instance['runningTasksCount']))

return (container_instance['containerInstanceArn'],
container_instance['status'], container_instance['runningTasksCount'])

return None, None, 0

def instance_has_running_tasks(instance_id):
(instance_arn, container_status, running_tasks) = find_ecs_instance_info(instance_id)
if instance_arn is None:
print('Could not find instance ID %s. Letting autoscaling kill the instance.' %
(instance_id))
return False

if container_status != 'DRAINING':
print('Setting container instance %s (%s) to DRAINING' %
(instance_id, instance_arn))
ECS.update_container_instances_state(cluster=CLUSTER,
containerInstances=[instance_arn],
status='DRAINING')

return running_tasks > 0

def lambda_handler(event, context):
msg = json.loads(event['Records'][0]['Sns']['Message'])

if 'LifecycleTransition' not in msg.keys() or \
msg['LifecycleTransition'].find('autoscaling:EC2_INSTANCE_TERMINATING') == -1:
print('Exiting since the lifecycle transition is not EC2_INSTANCE_TERMINATING.')
return

if instance_has_running_tasks(msg['EC2InstanceId']):
print('Tasks are still running on instance %s; posting msg to SNS topic %s' %
(msg['EC2InstanceId'], event['Records'][0]['Sns']['TopicArn']))
time.sleep(5)
sns_resp = SNS.publish(TopicArn=event['Records'][0]['Sns']['TopicArn'],
Message=json.dumps(msg),
Subject='Publishing SNS msg to invoke Lambda again.')
print('Posted msg %s to SNS topic.' % (sns_resp['MessageId']))
else:
print('No tasks are running on instance %s; setting lifecycle to complete' %
(msg['EC2InstanceId']))

ASG.complete_lifecycle_action(LifecycleHookName=msg['LifecycleHookName'],
AutoScalingGroupName=msg['AutoScalingGroupName'],
LifecycleActionResult='CONTINUE',
InstanceId=msg['EC2InstanceId'])
LambdaInvokePermission:
Type: "AWS::Lambda::Permission"
Properties:
Expand Down
195 changes: 0 additions & 195 deletions code/index.py

This file was deleted.

Binary file removed code/index.zip
Binary file not shown.