Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
[ECS] [Proposal]: Container Ordering #123
The ECS team is planning on implementing container startup and shutdown
ECS does not currently have an explicit mechanism to ensure that containers
Overview of Solution
ECS will address these use cases by improving container dependency management.
These three components can be added to the container definition shape as follows:
Within the container definition, we will make it possible to declare
When starting up, agent will guarantee that a container will only start if its
Currently, the agent does not enforce any ordering when a task is stopped, even
There is already an implicit dependency condition for containers using links or
A "condition" may be one of the three enumerated strings: "START", "COMPLETE",
Currently, the container start and stop timeouts are instance level settings
Introducing container dependencies will introduce an additional set of
Additionally, a global option is not going to give customers enough
In order to give customers the most flexibility, we will need to enhance the
I moved already from designing this kind of dependency after reading moby/moby#31333, in particular comment moby/moby#31333 (comment).
Your orchestration is now out of order. Should ECS forcefully terminate B? What if there's no time?
That being said, the START, SUCCESS and the COMPLETE scenarios described here does seem safe to use as they represent a immutable container state e.g a started container will never not have been started or a container that already shutdown will never not have been executed.
@deleugpn your points are well taken and perhaps AWS should add a section to the documentation to warn users of the potential pitfalls and complexities of the feature, but overall I think it is a net positive for most situations.
Take for example a case where you have multiple containers depending on a database or message queue to fully start. Currently, each container is responsible for implementing the exact same logic in a custom entrypoint script. With this feature the DB or message queue container can take on the responsible for implementing the check and marking itself as healthy, and then all other containers simply reuse that information.
Regarding your example, the developer can decide what will happen by marking Container A as essential or not. If it is, then everything restarts, if not, not.
So I agree, some people may get a false sense of security from this feature and may end up having problems troubleshooting if they don't have adequate fallbacks and proper timeouts, but overall I think it much is better to have the functionality available (and off by default). Good documentation and practical examples will help as well.
This would be extremely useful for us - at the moment if a container instance dies there's a land rush for our ECS services to launch tasks which causes lots of unnecessary logs to be written and alarms to go off.
For example we have a Consul ECS service which launches a deamon task on each cluster instance. Our microservices (each expressed as their own ECS service + task definition) expect this container to be running when they launch and inevitably fail until the Consul task is ready.
Our current solution for this involves using bash for loops and silent curl calls in the
referenced this issue
Jan 23, 2019
Regarding 'healthy': this feature is designed for local coordination and should not be considered a solution for fault tolerance. The example that @deleugpn provided is a clear failure condition that this wouldn't catch by itself. The ordering won't report that your app is healthy or not -- rather, it is intended to replace the need to use a sleep / wait loop until resources are available.
Your applications will still need to handle the case where container A breaks, either during startup or hours after the task starts. If you are trying to implement self-healing architecture as described in the moby thread, you could use the ECS service abstraction paired with health checks on your essential containers.
Envoy proxy is the example we have been using to justify 'healthy' as a dependency condition. For Envoy, it is not enough to validate that the container has started. We also need to ensure that the container is ready to receive traffic. This means that containers that depend on Envoy can start knowing that Envoy has already finished its initialization sequences.
However, this doesn't mean an application depending on Envoy can assume that it will always be available. You would still need to implement a failure path, even if that failure path is reporting that the container is unhealthy and signaling the scheduler to restart the task.
I have a need for this and agree the proposal looks well thought out. However, a colleague pointed out the documentation doesn't explicitly state that the dependencies would be run on the same instance. I think it's somewhat implied, but in reading the proposal, statements like "agent will ensure that dependencies are run" could imply that these containers are run but not necessarily on the local host.
In my case I would need dependencies run on the same host as the primary container, and it sounds like @alexbilbie has the same need. I think clarification of the proposal on this point would be good.
This could be helpful for one of my use cases. I recently had to implement custom entrypoint logic that pauses containers on startup if there are database migrations pending and a companion task that actually applies pending migrations, orchestrated through cloudwatch events and lambdas. Couple of questions, though.