You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When firing off a large number of jobs to AWS ECS it can happen that there are some jobs that are stuck on starting. Using run_monitoring you can catch these runs, where after the runs are set to failed. Instead of catching them and setting them to failed, it would be valuable to incorporate an automatic retry on them (as the run_retries feature restarts all jobs that are classified as failed, which is not desired in a vast majority of cases).
Ideas of implementation
Add retry option to run_monitoring
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered:
Maybe an easy solution here would be to have the option to turn auto retrying off for certain jobs via tags so that you can opt jobs out of the retry behavior. cc @johannkm would something like that be reasonable?
What's the use case?
When firing off a large number of jobs to AWS ECS it can happen that there are some jobs that are stuck on starting. Using run_monitoring you can catch these runs, where after the runs are set to failed. Instead of catching them and setting them to failed, it would be valuable to incorporate an automatic retry on them (as the run_retries feature restarts all jobs that are classified as failed, which is not desired in a vast majority of cases).
Ideas of implementation
Add retry option to run_monitoring
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered: