-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ECS] [request]: de-register from Cloud Map / R53 when instance is draining #473
Comments
Does it get removed from CloudMap/Route53 once the task is no longer running? |
Yes, it does |
ECS service discovery does expect "intelligent" client libraries or a client side proxy to handle this well, unlike the model where this is a capability of the load balancer. Have you considered using something like AWS App Mesh or a proxy like Envoy on the client side? |
Seems like even app mesh is unable to handle this currently. Have reduces TTL for entry but still not able to reduce the errors. We are using gRPC in production and have seen stream errors even when the container goes down gracefully. |
Is this a solvable problem ? would be glad if someone can share details on how this needs to be handled. As this issue directly affects the end users...seems like a blocker to move application behind App Mesh. |
Tell us about your request
I want ECS to better handle the state of DRAINING for tasks when it comes to using service discovery, in our case initiated by draining of EC2 instances
Which service(s) is this request for?
ECS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
We are running software using service discovery that needs to be gracefully drained, the same way that is handled with having a load balancer and draining, ie removing from load balancing traffic when draining.
The current working of ECS is that it does not remove the current task from Cloud map when it is put in draining which is the desired behavior so that we can drain the traffic from the instance
** What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.**
If we cannot gracefully drain the traffic from our task, the clients of that service will get an unnecessary high rate of connections that just die, possibly mid-request. Forcing uncessary resync and checking whether request was received on the target-end.
Are you currently working around this issue?
There is currently not any good work-around that we can find to this problem.
Additional context
Anything else we should know?
Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)
The text was updated successfully, but these errors were encountered: