-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli/start: remove the 1-minute hard shutdown timeout #44074
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Prior to this patch, after CockroachDB receives an instruction to gracefully shut down (signal, `Drain` request etc), the code for `cockroach start` would start a 1-minute countdown. If the graceful shutdown did not complete within that time, a hard shutdown was triggered instead. This behavior was neither necessary nor desirable. It is not necessary because process managers already have "process shutdown timeout" logic to force-shutdown a process that does not terminate in a timely manner. It is not the db's responsibility to do the service manager's job (in fact, the redundancy in behavior can be confusing to troubleshoot). It is not desirable either because in large clusters, a graceful shutdown may truly last longer than a minute. Graceful shutdowns are also rather important to ensure a smooth transition during e.g. a rolling upgrade, as they guarantee a transition without latency blips. Even though this `cockroach start` timeout is not the only such timeout through the code, it is one obstacle to painless graceful shutdowns and thus ought to be removed. This patch achieves just that. Release note (cli change): The CockroachDB node command (`start`/`start-single-node`) does not any more initiate a 1-minute hard shutdown countdown after a request to gracefully terminates. This means that graceful shutdowns are now free to take longer than one minute. It also means that deployments where a maximum shutdown time must be enforced must now use a service manager that is suitably configured to do so.
d1a0cbc
to
d697c92
Compare
tbg
approved these changes
Jan 17, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained
tfyr bors r=tbg |
craig bot
pushed a commit
that referenced
this pull request
Jan 17, 2020
44074: cli/start: remove the 1-minute hard shutdown timeout r=tbg a=knz Fixes #43902. Prior to this patch, after CockroachDB receives an instruction to gracefully shut down (signal, `Drain` request etc), the code for `cockroach start` would start a 1-minute countdown. If the graceful shutdown did not complete within that time, a hard shutdown was triggered instead. This behavior was neither necessary nor desirable. It is not necessary because process managers already have "process shutdown timeout" logic to force-shutdown a process that does not terminate in a timely manner. It is not the db's responsibility to do the service manager's job (in fact, the redundancy in behavior can be confusing to troubleshoot). It is not desirable either because in large clusters, a graceful shutdown may truly last longer than a minute. Graceful shutdowns are also rather important to ensure a smooth transition during e.g. a rolling upgrade, as they guarantee a transition without latency blips. Even though this `cockroach start` timeout is not the only such timeout through the code, it is one obstacle to painless graceful shutdowns and thus ought to be removed. This patch achieves just that. Release note (cli change): The CockroachDB node command (`start`/`start-single-node`) does not any more initiate a 1-minute hard shutdown countdown after a request to gracefully terminates. This means that graceful shutdowns are now free to take longer than one minute. It also means that deployments where a maximum shutdown time must be enforced must now use a service manager that is suitably configured to do so. Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Build succeeded |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #43902.
Prior to this patch, after CockroachDB receives an instruction to
gracefully shut down (signal,
Drain
request etc), the code forcockroach start
would start a 1-minute countdown. If the gracefulshutdown did not complete within that time, a hard shutdown was
triggered instead.
This behavior was neither necessary nor desirable.
It is not necessary because process managers already have "process
shutdown timeout" logic to force-shutdown a process that does not
terminate in a timely manner. It is not the db's responsibility to do
the service manager's job (in fact, the redundancy in behavior can be
confusing to troubleshoot).
It is not desirable either because in large clusters, a graceful
shutdown may truly last longer than a minute. Graceful shutdowns are
also rather important to ensure a smooth transition during e.g. a
rolling upgrade, as they guarantee a transition without latency
blips. Even though this
cockroach start
timeout is not theonly such timeout through the code, it is one obstacle to painless
graceful shutdowns and thus ought to be removed.
This patch achieves just that.
Release note (cli change): The CockroachDB node
command (
start
/start-single-node
) does not any more initiate a1-minute hard shutdown countdown after a request to gracefully
terminates. This means that graceful shutdowns are now free to take
longer than one minute. It also means that deployments where a
maximum shutdown time must be enforced must now use a service manager
that is suitably configured to do so.