New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ECS] [Deployment]: Allow Custom CircuitBreaker min failures #1247
Comments
I am using ALB health checks. If the health checks are setup to try a couple of times, it might take a while for a task to fail. Multiply that times 10 and it takes way to long. I would suggest something like:
If retrymode is manual, the desired task count is multiplied by retryfactor. The result is always rounded up. Examples: Desired Count 10: |
Thanks for the input @LRuttenCN @elruwen . Additionally, are there other configurations that you'd like to see for rollbacks for rolling updates in ECS? An example of this might be rollbacks based on Cloudwatch Alarms. |
Hi @vibhav-ag There is also #1273 which is about rollbacks. With regards to rollbacks, I am not sure how the rollbacks currently work in detail, since due to bug #1206, testing the rollbacks is quite painful. I run ECS containers on EC2 instances. They are both in the same cloudformation stacks. Imagine I got 2 EC2 instances A and B, A runs with two tasks and is "full" and B runs with one task and has space for more more task. Let's imagine I am behind an ALB. Case 1 - I update only the Task Definitions
Case 2 - I update the Task Definitions + EC2 AMI
|
Hello, We also have this problem for quite a time. Manually initiating a 'cancel update stack' for the cloudformation stack is not an option for us because it still takes 30+ minutes for cloudformation to reach UPDATE_ROLLBACK_COMPLETE state. Our current workaround is this: https://aws.amazon.com/blogs/compute/automating-rollback-of-failed-amazon-ecs-deployments/ Does anyone here have another workaround that they are using to achieve faster feedback/rollback loops? |
python boto3 ecs client This is a bigger issue when a build process is wrapped up in any commercial offering that tracks build credits. |
Hello, The circuit breaker is something we tried recently and we were a bit disappointed with it. If you run a single instance service with the current settings it takes around 100 minutes to detect a failed deployment because of the threshold + throttling. It's faster to detect it by hand by simply watching the AWS Console. Any news on this subject @vibhav-ag @tabern ? |
Same here. The worst is that even detecting a failing deployment in advance, I cannot stop an ongoing deployment and the tasks keeps being created again and again. If there is a task already working, I don't need to re-create a task for a minimum of 10 times until the deployment is marked as FAILED! |
We've enhanced circuit breaker to be more responsive by default. Is there still a need to configure minimum failures for circuit breaker? |
Thank you, 3 is better than 10. But yes, it would be more convenient if we could just adjust the minimum number of failures. |
Community Note
Tell us about your request
Allow a custom minimum amount of failures for circuit breaker.
Which service(s) is this request for?
ECS (fargate)
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
For dev environments I'd like to set up quick deployments & failures for new task definition versions. Currently the minimum amount of failures is 10 before circuit breaker marks it as failed. However, for dev environments often only 1 container is running anyway and when it fails it usually means a bug in the code. One or two failures are sufficient.
Are you currently working around this issue?
Current workaround for me is manually initiating a 'cancel update stack' for my cloudformation stack.
Thanks!
Luuk
The text was updated successfully, but these errors were encountered: