-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve http graceful termination #4957
base: master
Are you sure you want to change the base?
improve http graceful termination #4957
Conversation
@@ -156,24 +163,50 @@ trait BasicHttpService extends Directives { | |||
} | |||
|
|||
object BasicHttpService { | |||
implicit val tid = TransactionId(systemPrefix + "http_service") | |||
//start with ready true | |||
protected[http] var ready = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used for anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ready
var? It is used to alter the response to /ready
endpoint defined in BasicRasService
.
I'll try to refactor this so that we're not modifying the object var from different places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks generally ok to me.
whisk { | ||
http { | ||
shutdown-unready-delay = 15 seconds # /ready will return 500 for this amount of time before connection draining starts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it would wait for 15 seconds, and start the termination phase.
But if there are still client connections connected, are they getting connection closed error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only allows an unready stage, which only affects the added /ready endpoint to return 503 (no other endpoints affected). This does not affect how akka http handles existing client connections during shutdown, so if akka allows gracefully terminating in flight requests, then no connection closed errors, but no new connections allowed. The purpose here is to allow external systems (e.g. kubernetes or reverse proxies) to probe the service and stop routing to this instance for some time before akka shutdown initiates.
if (readyState) { | ||
complete("ok") | ||
} else { | ||
logging.warn(this, "not ready...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this just be a debug? I think ready route returning 503 should be self explanatory
@tysonnorris would you still like to see this pr merged? |
Description
Deploying controller to kubernetes, we see some case where using /ping as livenessProbe and readinessProbe does not provide great resilience when deleting a controller pod while under load. A typical case for controller unexpected shutdown under load is eviction process for cluster autoscaler, or node maintenance.
This PR provides:
Related issue and scope
My changes affect the following components
Types of changes
Checklist: