Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS::CodeDeploy::DeploymentGroup-DeploymentType Support BLUE_GREEN also for ECS/Fargate #37

Closed
pgarbe opened this issue Jul 31, 2019 · 47 comments
Labels
dev tools CodeStar, CodeCommit, CodeBuild, CodeDeploy, CodePipeline, Cloud9, X-Ray enhancement New feature or request
Milestone

Comments

@pgarbe
Copy link

pgarbe commented Jul 31, 2019

  1. Title -> AWS::CodeDeploy::DeploymentGroup-DeploymentType
  2. Scope of request -> AWS::CodeDeploy::DeploymentGroup-DeploymentType supports BLUE_GREEN for Lambda but not for ECS/Fargate.
  3. Expected behavior -> BlueGreen deployments for ECS/Fargate services.
  4. Links to existing API doc (optional) -> Announcement: https://aws.amazon.com/blogs/devops/use-aws-codedeploy-to-implement-blue-green-deployments-for-aws-fargate-and-amazon-ecs/
  5. Category tag (optional) -> Compute
  6. Any additional context (optional) -> Same request as in Containers Roadmap: [ECS] [CodeDeploy] [CloudFormation]: CloudFormation support for BLUE/GREEN deployments on ECS aws/containers-roadmap#130
@TheDanBlanco TheDanBlanco added the dev tools CodeStar, CodeCommit, CodeBuild, CodeDeploy, CodePipeline, Cloud9, X-Ray label Jul 31, 2019
@jk2l
Copy link

jk2l commented Jul 31, 2019

I don't think AWS Blue/Green is compatible with CloudFormation. When CodeDeploy deploy in Blue/Green. it create a new AutoScaling group and then on completed it delete the old AutoScaling group. it is a hard delete and unrecoverable.

This mean managing AutoScaling group with CloudFormation with CodeDeploy Blue/Green does not work well. Because when you try to update your AutoScaling group (e.g. update the LaunchConfiguration) it will say the AutoScaling group does not exist and just simply fail

@ghost
Copy link

ghost commented Aug 2, 2019

This has 62 likes on the Containers Roadmap: aws/containers-roadmap#130 and it has been a big issue for mission critical deployments on ECS with cloud-formation. this can be done with other non aws toolsets.

@JoeAlamo
Copy link

Super excited you are working on this 😃

@richardmaltais
Copy link

Way to go @luiseduardocolon !
A bit blocked about this in my PoC for a fully automated, within CloudFormation, CodePipeline pipeline with ECS Fargate as the target across multiple environments.

@miguelsanchez-eb
Copy link

Looking forward to this so much. Please, let us know as soon as you release something.

@clareliguori
Copy link

You can now use CloudFormation to perform Amazon ECS blue/green and canary deployments through AWS CodeDeploy! To learn more visit the announcement and the user guide.

Note: This issue and #56 still call out valid coverage gaps for using CloudFormation to create a CodeDeploy deployment group, where that deployment group will then be used for doing ECS blue-green deployments directly with CodeDeploy outside of CloudFormation (as described in https://aws.amazon.com/blogs/devops/use-aws-codedeploy-to-implement-blue-green-deployments-for-aws-fargate-and-amazon-ecs/).

The new feature released today is used for doing an ECS blue-green deployment during a CloudFormation stack update, orchestrated by CodeDeploy; it is not for doing an ECS blue-green deployment outside of CloudFormation. It doesn't require a AWS::CodeDeploy::DeploymentGroup resource in the template; instead it uses a new 'Hooks' section in the template for specifying deployment settings.

@konstantinj
Copy link

@clareliguori That means CloudFormation can now do with ECS exactly what CodeDeploy was already capable of but CloudFormation is not using any CodeDeploy resources for that?

@clareliguori
Copy link

It still uses CodeDeploy under the hood and follows the same steps as direct CodeDeploy blue-green deployments. It just doesn't require any explicit CodeDeploy resources to be created in the template; instead it is configured via the Hooks and Transform sections. There are some slight differences between the two 'modes': for example, instead of configuring alarms to watch for rollback in CodeDeploy, you configure them on your CloudFormation stack and CFN monitors and rolls back the template.

There is an example template here:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/blue-green.html#blue-green-template-example

@pgarbe
Copy link
Author

pgarbe commented May 20, 2020

@clareliguori Can you explain what the new 'Hooks' section is doing? How is it different to 'Transforms'?

@mattias-fjellstrom
Copy link

@clareliguori There are no alarms defined in the example you linked to. Are you saying that if I define an alarm in the same stack as the blue/green resources and then perform a stack update it will work similar to when I tell CodeDeploy to watch an alarm during a deployment?

@mattias-fjellstrom
Copy link

Has anyone successfully tested this using an NLB instead of an ALB?

I've adjusted the example provided in https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/blue-green.html#blue-green-template-example to use an NLB. The template I use is available here https://github.com/mattias-fjellstrom/ecs-blue-green/blob/master/templates/ecs-blue-green-nlb.yml
I am able to create the stack successfully, but when I perform an update (in this case modifying task cpu/memory resources to force a replacement of the task definition) I get an error saying 'CodeDeployBlueGreenHook' of type AWS::CodeDeploy::BlueGreen failed with message: Internal failure.. Not much to go on.

@clareliguori
Copy link

@mattias-fjellstrom re: alarms: No, alarms defined in the stack template are not automatically monitored. CloudFormation has a 'rollback triggers' feature to configure a list of pre-existing alarms that CFN should watch and rollback stack updates on:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-rollback-triggers.html

I have asked the team to take a look at the Internal failure you're seeing

@vladiscovery
Copy link

vladiscovery commented May 20, 2020

You can now use CloudFormation to perform Amazon ECS blue/green and canary deployments through AWS CodeDeploy! To learn more visit the announcement and the user guide.

We followed the example provided. The stack was created successfully, but when we tried to update the task definition (CPU/memory) we got an error:

'CodeDeployBlueGreenHook' of type AWS::CodeDeploy::BlueGreen failed with message: The TaskDefinition logical Id [BlueTaskDefinition] is the same between initial and final template, CodeDeploy can't perform BlueGreen style update properly

@KiamarzFallahi
Copy link

@mattias-fjellstrom Re: NLB vs ALB.

Yes, NLB is supported, but only with AllAtOnce traffic routing config. The template you provided is setup properly. I don't see any apparent issues.

I can look into the "Internal Failure" you are facing if you can DM me your CodeDeploy deployment ID, Account ID and CloudFormation Stack ID.

@vladiscovery
Copy link

vladiscovery commented May 21, 2020

@KiamarzFallahi @clareliguori
Updates to TaskDefinition are failing cloudformation stack update. Here is another topic about it on aws forum
I found that if using an example from the official user guide, only updates to TaskSet are triggering blue/green deployment without CloudFormation error. But it's actually useless as we need to trigger blue/green deployment after TaskDefinition was changed (by changing the image, cpu, memory, environment variables etc).

@luiseduardocolon luiseduardocolon moved this from We're working on it to Shipped in coverage-roadmap May 22, 2020
@luiseduardocolon
Copy link
Contributor

Will close this issue, although we understand there might be specific use cases not covered here. Feel free to open new issues for those use cases specifically so we can track them individually, and we'll keep an eye out for them.

@anugarg07
Copy link

anugarg07 commented May 27, 2020

You can now use CloudFormation to perform Amazon ECS blue/green and canary deployments through AWS CodeDeploy! To learn more visit the announcement and the user guide.

We followed the example provided. The stack was created successfully, but when we tried to update the task definition (CPU/memory) we got an error:

'CodeDeployBlueGreenHook' of type AWS::CodeDeploy::BlueGreen failed with message: The TaskDefinition logical Id [BlueTaskDefinition] is the same between initial and final template, CodeDeploy can't perform BlueGreen style update properly

Can you check if you added the "Transform" section in your template, in addition to "Hooks" section. As called out in public docs, both of these sections are needed for CD B/G deployments for ECS

Add a reference to the AWS::CodeDeployBlueGreen transform to your template:

"Transform": [
    "AWS::CodeDeployBlueGreen"
],

Here is an example with full template - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/blue-green.html#blue-green-template-example

@anugarg07
Copy link

Updates to TaskDefinition are failing cloudformation stack update. Here is another topic about it on aws forum

It should work for TaskDefinition updates as well. Can you post the issue (if different from above) you are seeing?

@vladiscovery
Copy link

I did use the same example you suggested. Transform section is in the template. The Blue/Green deployment is triggering only with TaskDefinition image changes. Changes to task definition environment variables or to cpu/memory are not triggering the deployment and stack update failing with an error above.

@henrik-utter-wcar
Copy link

I can look into the "Internal Failure" you are facing if you can DM me your CodeDeploy deployment ID, Account ID and CloudFormation Stack ID.

I am experiencing the exact same thing when trying to use an NLB, i.e. error message 'CodeDeployBlueGreenHook' of type AWS::CodeDeploy::BlueGreen failed with message: Internal failure.
Did you get anywhere on this? @mattias-fjellstrom @KiamarzFallahi

@mattias-fjellstrom
Copy link

@henrik-utter-wcar I actually sent the question to the AWS business support, the relevant part of the answer is this:

This behaviour can be resolved by providing a test listener and specifying it in the TestTrafficRoute section of the Blue/Green hook section of the CloudFormation template:

        TrafficRouting:
          ProdTrafficRoute:
            Type: 'AWS::ElasticLoadBalancingV2::Listener'
            LogicalID: NLBListenerProdTraffic
          TestTrafficRoute:
            Type: 'AWS::ElasticLoadBalancingV2::Listener'
            LogicalID: NLBListenertestTraffic

They say this is a workaround, and that the underlying issue will be resolved at some point (not able to give an estimate).

I tried the workaround, and it works, however: the behavior during the deployment is not very good. It seems like you get a downtime of 1 to several minutes after the target group is switched. Let me know if you see any different behavior, I only spent an hour investigating without getting anywhere.

@henrik-utter-wcar
Copy link

Thanks @mattias-fjellstrom. I will give that a shot, but I assume I will see the same behavior then.
I have experienced similar downtime when doing Blue/Green deployments using CodeDeploy + NLB from the cli as well. ALB always works like a charm.

@mattias-fjellstrom
Copy link

@henrik-utter-wcar I've also tried that with NLB and seen the exact same behavior. So it seems like all versions of blue/green are not compatible with an NLB. I really need to use an NLB due to having a setup like this:

API Gateway > VPC Link > NLB > ECS

I tried adding an ALB behind the NLB, i.e.

API Gateway > VPC Link > NLB > ALB > ECS

and that works fine with blue/green (since in that case the target from the viewpoint of the NLB does not change, instead the target change is between the ALB and ECS) but it feels like a hack that I shouldn't have to use - and it is twice the loadbalancer cost. I was really hoping this new blue/green feature would solve the underlying issue, but I realize the problem is with the NLB rather than something else.

@henrik-utter-wcar
Copy link

@mattias-fjellstrom It seems like we are in the same boat, that is exactly the setup I'm struggling with as well. I also had high hopes for this feature, but I can also verify now that it is still the same issue with the NLB.
I have previously tried using BeforeAllowTraffic/AfterAllowTraffic hooks to control the behavior during deployment, but can't get that to work in a reliable way either. It seems we need to wait for better support in NLB for this to work properly.

@jmacgowa-ks
Copy link

jmacgowa-ks commented Jun 4, 2020

I am not seeing much value in this "CF Macro" solution other then providing an unmanageable Blue/Green deployment option for a simple CloudFormation managed ECS service that doesn't use CodePipeline.
Without the ability to resolve SSM parameters or import values in the CF template, this further limits its usefulness.

Limitations:

  • Can't update ECS service from outside of CloudFormation. I haven't tested if this impacts scaling, but it will break my lambda functions.
  • No visibility of this "CodeDeploy" variation from CodeDeploy.
  • No CodeDeploy style visibility or management of in-progress deployment.
  • Can't use CodePipeline/CodeDeploy once this is in place.
  • Can't use resolve:ssm or ImportValue within template.
  • Some ECS options, such as HealthCheckGracePeriodSeconds, are not available.

After simplifying my test CF template, I am still unable to successfully update the stack. I keep running into a useless "Failed to transform template" error. I am testing this with an ALB.
I want my day back.

@yubangxi
Copy link

@henrik-utter-wcar @mattias-fjellstrom - can you provide more details about the downtime scenario when you are using NLB?

One thing we are aware of is: when the "Green" ECS TaskSet is created with the "Green" TargetGroup, because at that point the "Green" TargetGroup is not added to the NLB prod listener, so ECS will rely on NLB health check to determine ECS TaskSet health. The "Green" ECS TaskSet creation will be marked as completed once "Green" ECS TaskSet shows "Stable" and the stack update will promote forward. It's possible when the "Green" TargetGroup is flipped to serve traffic, the "Green" ECS TaskSet is not ready and can not pass the NLB health check. This is a known issue ECS, ELB and CodeDeploy need to work together on a proper improvement.

Because ALB does support weighted traffic shifting, we will force a weight of 0 for "Green" TargetGroup at the beginning of deployment so that "Green" ECS TaskSet will not show as "Stable" until the NLB health check passed. That's why ALB will not have the same issue.

For NLB right now, the workaround will be using test listener or CodeDeploy lifecycle hook to perform baking and testing to make sure the "Green" ECS TaskSet is ready to serve traffic before the traffic is actually flipped, which can minimize any potential downtime here.

@mattias-fjellstrom
Copy link

@yubangxi I'll paste what I wrote to the AWS Support back in September 2019 when I noticed unusual behavior when using regular CodeDeploy blue/green deploy to ECS. The behavior is similar when using this new CloudFormation blue/green approach together with an NLB.

When testing the blue/green deployment with ECS/CodeDeploy in this setup [API Gateway > VPC Link > NLB > ECS] we noticed some strange behaviors. We have two listeners and two target groups. When we begin our deploy one of the target groups receives all traffic. CodeDeploy then starts up new ECS-tasks in our second target group and eventually all traffic is shifted to this target group (all at once). At that point we expected all traffic to reach the new tasks, but this is not the case. There is a long delay (1-2 minutes) before some of the traffic reaches the new target group. I say "some" because at that point the traffic seems to flip between the two target groups for a short while. A few seconds after that all traffic goes to the new target group. This is now 1.5-2.5 minutes after all traffic was supposed to only go to the new target group.

We also tried the rollback feature in CodeDeploy. After verifying that all traffic reaches the new targets, but before the deploy finished we initiated a rollback and expected all the traffic to go to the original target group with a maximum delay of a few seconds. Once again there is a 1-2 minute delay before the traffic stops reaching the new targets. But it gets worse, after that delay all requests are now failing with 500 Internal Server Error. This goes on for a short while (5-15 seconds) until the old target group receives all traffic again.

Everything I expect to work in the scenario described above works as intended if I use an ALB instead of an NLB.

I have not tested this new CloudFormation approach enough to give a detailed description, but in one of the tests I ran I saw the same behavior with the NLB where the traffic did not reach the new targets even though the traffic was supposed to have been shifted. When I stopped the deployment by cancelling the stack update I expected a rollback to occur, but at that point it seemed like my old targets had completely disappeared and after a while I received 500 responses, similar to what I saw when using the regular CodeDeploy blue/green approach.

To me it seems like there is something with the NLB that is not compatible with blue/green deployments like this. As I mentioned in a comment above I tried blue/green deployment in a setup where I had an ALB behind the NLB, and let the NLB just send traffic to the ALB, while the ALB was involved in the blue/green deployment. In that case it worked fine. I guess it was because from the NLB's perspective the targets never changed. It's almost like the NLB has a memory that takes 1-2 minutes to update.

@yubangxi
Copy link

@mattias-fjellstrom thanks for the details. This is a very good feedback and helpful. We will do some related investigation on our side and post update here if we find anything.

@avenging
Copy link

Has anyone successfully tested this using an NLB instead of an ALB?

I've adjusted the example provided in https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/blue-green.html#blue-green-template-example to use an NLB. The template I use is available here https://github.com/mattias-fjellstrom/ecs-blue-green/blob/master/templates/ecs-blue-green-nlb.yml
I am able to create the stack successfully, but when I perform an update (in this case modifying task cpu/memory resources to force a replacement of the task definition) I get an error saying 'CodeDeployBlueGreenHook' of type AWS::CodeDeploy::BlueGreen failed with message: Internal failure.. Not much to go on.

I am having the exact same issue with an NLB, is NLB support pretty much Alpha?

@anugarg07
Copy link

anugarg07 commented Jun 19, 2020

I did use the same example you suggested. Transform section is in the template. The Blue/Green deployment is triggering only with TaskDefinition image changes. Changes to task definition environment variables or to cpu/memory are not triggering the deployment and stack update failing with an error above.

@vladiscovery I verified that changes to only CPU/memory or both properties on “AWS::ECS::TaskDefinition” resource is triggering BG deployments and they are getting executed successfully without any issues, similar to “image” property update. I used “canary” style traffic-shift (the example linked in docs uses “AllAtOnce”) but that should show similar behavior. I am sharing sample templates where only ‘Memory’ property is changed and B/G deployment is triggered, for your reference :

create-stack template : https://cfnda-datalake.s3.amazonaws.com/ecsbg/create-stack-canary-lc.yaml
update-stack-template: https://cfnda-datalake.s3.amazonaws.com/ecsbg/update-stack-canary-memonly-lc.yaml

Can you share the templates/ snippets where you saw the error when you changed CPU/memory property so that we can debug the issue you saw ? Pls share both existing and new templates.

@craigataws craigataws added this to the cov milestone Jul 21, 2020
@huynhphat22
Copy link

@anugarg07 , I ran the example from the AWS doc, and changing the "Image" property of the container definition also triggers the error that @vladiscovery had...
image

@jveldboom
Copy link

@anugarg07 Thanks for the update and the example templates. I was able to get those stacks up and modify the memory property on the AWS::ECS::TaskDefinition resource. My issue now is the inability to update the DesiredCount on the AWS::ECS::Service. Updating that from 1 to 3 in the CFN stack throws the following error:

Service arn:aws:ecs:us-east-1:xxxxxxxxx:service/ecs-deployments-blue-green-ECSDemoCluster-JWxadWwdKJVq/ecs-deployments-blue-green-ECSDemoService-1X0PWPCHDGZCT failed to stabilize due to task set computed desired count 1 does not match service desired count 3. 

I'm also unable to update that value manually through the console. Am I missing something here?

@flyinprogrammer
Copy link

Tried to get the demo deployed and working today: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/blue-green.html

I can get the first set of resources deployed, but upon updating the image to a new version I receive the error: Template parameters modified by transform

I can't tell if this is "working as desired" but if it is, this is useless. It should also be noted that the YAML demo Hook is missing the ServiceRole key. Based on the other comments here it's clear that it will be at least 6 months before something is actually tested and repeatedly works. Good luck.

@yasaswy12
Copy link

yasaswy12 commented Sep 23, 2020

@anugarg07 How do I achieve Service auto scaling with EXTERNAL deployment controller? The ECS user guide says AutoScaling is not supported with EXTERNAL controller. When I use CODE_DEPLOY as controller, the hooks section throws an error saying TaskSet is not usable with CODE_DEPLOY.

How do I solve this?

@ta-takeuchi
Copy link

ta-takeuchi commented Oct 26, 2020

Failed with this error when there are multiple parameters.
Error: Template parameters modified by transform.

# example1 : success
Parameters:
  Env:
    Type: String
    AllowedValues:
      - prd
      - stg1

# example2 : failure
Parameters:
  Env:
    Type: String
    AllowedValues:
      - prd
      - stg1
  SSMActivate:
    Type: String

@TomLBarden
Copy link

TomLBarden commented Oct 28, 2020

I'm getting the same error as @flyinprogrammer, Template parameters modified by transform, even though I have no parameters in my template. I was able to successfully create the stack, but I am unable to either create a change set or update the stack due to this error. Is there at least somewhere I can be looking for more troubleshooting guidance? This error is vague and not very helpful.

@TomLBarden
Copy link

Unfortunately none of this has helped @ta-takeuchi

The issue itself is so vague and I can't find any documentation as to what's causing it.

@JeremySB
Copy link

We've identified an issue with the Blue/Green transform involving boolean parsing which leads to those Template parameters modified by transform errors and are working with the CodeDeploy team to resolve it. The failures I've encountered are from attempts to use NoEcho parameters in Blue/Green templates which isn't supported - if you have other sample templates that cause the same error feel free to post them and I'll try to reproduce. Thanks!

@BhrikutyAggarwal
Copy link

BhrikutyAggarwal commented Dec 11, 2020

Hey @JeremySB , I am facing same error Template parameters modified by transform by updating below template.
https://github.com/BhrikutyAggarwal/ECS-BG/blob/main/ECS-BG-template.json

I already created an empty cluster before deploying the above stack and while updating I am only changing the "ImageTag"

@CelesteComet
Copy link

Can we reopen this? I am hitting an internal error problem when just changing the image property in my task definition. This basically makes the CF a one time usage without the ability to update anything.

@CelesteComet
Copy link

Here is a link to the gist. I am using ALB and running into the mysterious internal error problem.
https://gist.github.com/gucolin/b9b108e5bcea513f9c704c576595f174

@ta-takeuchi
Copy link

Blue/Green conversion issues, including Boolean parsing, still seem to occur.
I have created a template that converts Parameters to boolean in Conditions and uses "! If" in the resource properties.
The stack was created successfully, but the Image update encountered an internal error.
CodeDeployBlueGreenHook of type AWS::CodeDeploy::BlueGreen failed with message: Internal Failure

I shared the template with the support team. Best regards.

@adrianwilson
Copy link

If it helps anyone, I was getting Template parameters modified by transform on a nested stack. Flipping the nested stack to a construct resolved the issue. Might be worth investigating if there's an issue with using AWS::CodeDeployBlueGreen on nested stacks?

@chezhuo1994
Copy link

Facing "Failed to transform template" error, anyone has fixed it?

@ta-takeuchi
Copy link

ta-takeuchi commented Jun 18, 2021

Issue is not solved yet.
I'm doing this...

  1. Update stack directly with template with dummy resources added.
  2. Update with template with dummy resources removed.
  3. Deploying with a template with changed ImageID.
  DummyWaitConditionHandle:
    Type: "AWS::CloudFormation::WaitConditionHandle"

@trevorijones
Copy link

If it can help

I was facing this Template parameters modified by transform for my updates too. My template expected many parameters, like the image ecr url for instance. Parameters work as expected for creation but an update with

Transform:
  - AWS::CodeDeployBlueGreen
Hooks:
  CodeDeployBlueGreenHook:
...

would only proceed when I'd replace all my parameters Refs by hard coded values except for

  Vpc:
    Type: AWS::EC2::VPC::Id
  Subnet1:
    Type: AWS::EC2::Subnet::Id
  Subnet2:
    Type: AWS::EC2::Subnet::Id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev tools CodeStar, CodeCommit, CodeBuild, CodeDeploy, CodePipeline, Cloud9, X-Ray enhancement New feature or request
Projects
coverage-roadmap
  
Shipped
Development

No branches or pull requests