-
Notifications
You must be signed in to change notification settings - Fork 630
Description
I am in process of testing the update to the latest Amazon ECS-optimized AMI (amzn-ami-2016.09.d-amazon-ecs-optimized).
Our current ECS Instances are running amzn-ami-2015.09.g-amazon-ecs-optimized which at the time of the launch pulled the following stack:
Docker: 1.9.1
ECS Agent: 1.8.2
I don't think it's a good idea to simply update launch configuration with the new AMI and hope for the best. What if things fail under load, what if we discover a bug with the new AMI/Docker/Agent combo running our containers? These are all possibilities and we need to mitigate the risks by preserving our old instances while the new instances are burning-in under production load. Once we feel solid - we can terminate the old instances.
I can't figure out how to do this. Here's what I tried:
-
I updated the launch configuration for the Auto Scaling Group and doubled the number of instances in it. End result I have 4 instances with OLD AMI and 4 instances with NEW AMI. Good!
-
I then updated the ECS Service and increased it's number of Tasks from 4 to 8. End result 4 new tasks were started on the NEW AMI Instances and 4 original tasks are still running on the OLD AMI Instances. Good!
All good at this point. Next I need to stop the tasks on the 4 OLD AMI Instances and somehow keep these OLD AMI Instances in reserve while we burn in the 4 NEW AMI Instances. Here's what I tried:
-
I set the Status for the 4 OLD AMI Instance "Standby" (in ASG). I was expecting ECS AGENT on these OLD AMI Instances to terminate all running ECS Tasks. No dice!
-
I then reduced ECS Service task number from 8 to 4 hoping that ECS Agent will terminate the Tasks on the OLD AMI Instances. No dice! It terminated TASKS on random instances mixing NEW/OLD in the process.
-
I then decided to help ECS Agent and manually (one at a time) stopped running TASKS on the OLD AMI Instances hoping that ECS Agent will NOT re-launch the TASKS on the OLD AMI Instances. No dice -- it still managed to launch some tasks on the OLD AMI Instances.
At this point I am lost. Is this even possible?
One option I am considering is using task-placement-constraints, but I am hoping someone here has gone through this basic need and can share their ideas with me.
I feel we should have a way to mark ECS Instances as StandBy and have the ECS Agent not schedule any tasks on them for as long as that status is active. I don't think "Deregister" functionality is sufficient here because there is no way that I know of to bring deregistered instances back into service.
I also don't like that a specific version of Docker/ECS Agent is not pinned to a specific version of Amazon ECS-optimized AMI. If it were - this would not be an issue, I could always bring back a known, good working set of versions into service. But as it is now - even if I used an older AMI - it will pull in the most recent version of ECS Agent and Docker on launch.
Thank you!