Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ECS] [request]: Ability to disable task restart behaviour on Healthcheck Failure #1373

Open
raags opened this issue May 14, 2021 · 2 comments
Labels
Proposed Community submitted issue

Comments

@raags
Copy link

raags commented May 14, 2021

Community Note

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
What do you want us to build?

Right now if a container health-check fails for an essential ECS task, the task is restarted. This case is not desirable in most cases, because the health-check failure could be transient due to traffic spike, network unavailability, resource constraint, misconfiguration etc.

One would also want the failed task to remain running for debugging, to understand why the health check is failing.

This is also related to the ALB health-checks, which behave in the same way ( #1271 and #289 ). But this ticket is regarding container health-check.

I think a flag to control this behaviour (for container and ALB health-check restart behaviour) would help, so that even if the health check fails, ECS does not stop the task.

Which service(s) is this request for?
Fargate, ECS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Explained above

Are you currently working around this issue?
By not using ECS container health checks, and instead use a third-party side-car to do the same.

Additional context
Health-check failures already generate a CW event, which can be used for alerting. With this alert, the engineer can investigate the issue, and make a call to restart or provision additional tasks, without losing the running task, which can be used for debugging.

@raags raags added the Proposed Community submitted issue label May 14, 2021
@whelanp
Copy link

whelanp commented Mar 13, 2023

hello @raags could you share the name of your third-party side-car? We are having the same issue and would be looking to apply this work around too!

@raags
Copy link
Author

raags commented Mar 13, 2023

Hi @whelanp - I was using sensu side-car for the health checks. There are others that can do the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposed Community submitted issue
Projects
None yet
Development

No branches or pull requests

2 participants