-
Notifications
You must be signed in to change notification settings - Fork 325
[ECS] [request]: Timeout for RunTask #291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi any updates on this? We're running into the same issue where rogue fargate tasks that have gone wrong somehow end up running forever. We've tried including timeouts in the function itself to end, but for some reason this doesn't work and tasks continue to keep running. |
Would also love to see this implemented! |
Also running into a similar issue, the task doesn't stop even when the underlying process completes, only happens occasionally(< 1% of executions), but still important to not manually check every time if there are any rogue tasks lying around, or build another automation to stop them |
To add to the scenario: "On rare occasions, these jobs can hang or take an excessive amount of time to complete, incurring cost and potentially impacting future schedules of the task. " |
Facing same problem here. |
I'm running into the same problem. Step functions can submit ECS tasks but doesn't clean up tasks (even if a timeout is specified). I have to set up a relatively elaborate catch and cleanup in StepFunctions to clean up jobs that hang indefinitely - that would block further processing. It would be so much easier if we could just specify this stop-after-x-seconds value in ECS. |
me, as well. I want this, plz |
I am wrapping this up in a short blog post to add more context but I built a SF workflows that essentially kicks off, check if there is a tag This is the CFN template that includes everything (SF workflow, EB rules, IAM roles, etc). There is nothing else to do: when the stack is deployed as-is all tasks launched in the account/region with a
I hear you the ideal solution would be native support for this capability in ECS but I am curious re whether an approach like this would work? In addition to having to pay extra for this Step Functions (I hear you, again), what are other reasons why this approach would not work Vs a timeout flag in the RunTask. |
This is the blog post that gets into more context: https://it20.info/2023/03/configuring-a-timeout-for-amazon-ecs-tasks/ |
This looks similar to #572 |
@mreferre regarding your question
Many turn to AWS and services ECS to handle most of their hosting complexities in order to be able to focus on where they can deliver the most value. So "it" may technically work (i.e. step functions approach 🚀 ) but introduces needless complexity (over timeout flag), not only in implementation but also maintenance and support. With in total almost 250 👍 let's hope the ECS team can deliver this (sub)feature sometime soon. |
@jeroenhabets fair enough. Thanks! |
I would like to suggest another method, and that is using a sidecar container, all native inside ECS.
For a more detailed explanation I wrote this up Also helps with #572 Would be interested in your feedback. |
Please implement this |
@maishsk same feedback for your workaround :
With in total almost 250 👍 let's hope the ECS team can deliver this (sub)feature sometime soon. |
A potential workaround is to enforce the timeout in your own application code For example, I am using the
|
Definitely this feature could be quite helpful. Trying this now is a challenging. BTW in our use case we had to run tasks in different times so services are no an option. |
This is similar to
https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup |
I was able to implement this using
|
Tell us about your request
An optional timeout parameter for the RunTask API
Which service(s) is this request for?
Fargate, ECS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
As well as services, that are expected to be always in a running state, we also run scheduled tasks in ECS that are expected to complete various batch processes, then exit. On rare occasions, these jobs can hang or take an excessive amount of time to complete, incurring cost and potentially impacting future schedules of the task. An optional timeout parameter that's enforced by the ECS scheduler would help to manage these.
Are you currently working around this issue?
Only by manually calling the StopTask API when we spot long running tasks.
The text was updated successfully, but these errors were encountered: