Graceful shutdown period

When deploying a new version of a service behind a load balancer, it's useful to "gracefully" terminate the instances of the old version while directing all new traffic to the instances of the new version. In Kubernetes-land, this is usually done by signaling to the soon-to-be-terminated instances to stop responding successfully to health checks. When the load balancer notices they're failing health checks, it will stop sending new requests to them and direct all traffic to the new instances that _are_ passing health checks. Then, after some reasonable grace period, such as 30 seconds, the old instances can shut down entirely.

From what I understand, Misk's [current shutdown behavior](https://github.com/cashapp/misk/blob/5b6e3fa2f61e466307e09f9fa3995c1fbf75544c/misk/src/main/kotlin/misk/MiskApplication.kt#L91) is to just race to shutdown (while preserving `CoordinatedService` ordering), with no provision for a grace period of 1) intentionally failing health checks 2) refusing new requests or 3) both to give the upstream load balancer an opportunity to re-route traffic.

This manifests as spurious errors in services that depend on Misk services, because sometimes they make calls during the brief shutdown period and those calls fail in weird ways due to the rushed shutdown.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful shutdown period #2105

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Graceful shutdown period #2105

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions