-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
revise node decommissioning guidance and semantics #7304
Conversation
Looks good from a high level. I'm not sure I left the comments about cockroach quit in the right places (realized only later that there were multiple copies of the docs for the different versions), so please disregard them in that case - you are aware that we don't want to recommend quit in 20.1+ so if that's what the docs do that's great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And sorry it took so long.
@@ -1,4 +1,4 @@ | |||
By default, if a node stays offline for more than 5 minutes, the cluster will consider it dead and will rebalance its data to other nodes. Before temporarily stopping nodes for planned maintenance (e.g., upgrading system software), if you expect any nodes to be offline for longer than 5 minutes, you can prevent the cluster from unnecessarily rebalancing data off the nodes by increasing the `server.time_until_store_dead` [cluster setting](cluster-settings.html) to match the estimated maintenance window. | |||
By default, if a node stays offline for more than 5 minutes, the cluster will consider it dead and will rebalance its data to other nodes. Before [temporarily stopping nodes](../../cockroach-quit.html) for planned maintenance (e.g., upgrading system software), if you expect any nodes to be offline for longer than 5 minutes, you can prevent the cluster from unnecessarily rebalancing data off the nodes by increasing the `server.time_until_store_dead` [cluster setting](cluster-settings.html) to match the estimated maintenance window. | |||
|
|||
For example, let's say you want to maintain a group of servers, and the nodes running on the servers may be offline for up to 15 minutes as a result. Before shutting down the nodes, you would change the `server.time_until_store_dead` cluster setting as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the "resetting back to default" (further below, sorry, can't comment there) I think we'd want to advertise RESET CLUSTER SETTING x
instead.
|
||
When you decommission a node, CockroachDB lets the node finish in-flight requests, rejects any new requests, and transfers all range replicas and range leases off the node so that it can be safely shut down. See [Remove Nodes](remove-nodes.html) for more information. | ||
When you initiate the [decommissioning process](remove-nodes.html#how-it-works) on a node, CockroachDB lets the node finish in-flight requests, rejects any new requests, and transfers all range replicas and range leases off the node so that it can be safely shut down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither the old nor the new one are true. Decommissioning simply moves all data off the node (which includes moving all range leases). SQL sessions keep working forever, until the node is actually shut down. Shutting down the node gracefully will then let the node finish in-flight requests and drain clients etc.
v19.1/remove-nodes.md
Outdated
You might do this, for example, when downsizing a cluster or reacting to hardware failures. | ||
|
||
{{site.data.alerts.callout_info}} | ||
In other scenarios, such as when [upgrading your cluster's version of CockroachDB](upgrade-cockroach-version.html) or performing planned maintenance (e.g., upgrading system software), you will want to temporarily [stop the node](stop-a-node.html) and restart it later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might wanna call out explicitly that that is not what decommissioning is for.
v19.1/remove-nodes.md
Outdated
1. The node has completed the decommissioning process. | ||
2. The node has been stopped and has not [updated its liveness record](cluster-setup-troubleshooting.html#node-liveness-issues) for 5 minutes. | ||
|
||
The decommissioning process transfers all range replicas on the node to other nodes. During and after this process, the node is considered "decommissioning" and continues to accept new SQL connections. Even without replicas, the node can still function as a gateway to route connections to relevant data. However, note that the [`/health?ready=1` monitoring endpoint](monitoring-and-alerting.html#health-ready-1) considers the node "unready" and returns a `503 Service Unavailable` status response code so load balancers stop directing traffic to the node. This behavior has been fixed in v20.1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what has been fixed in v20.1?
v19.1/remove-nodes.md
Outdated
|
||
During and after decommissioning, the node continues to accept new SQL connections. Even without replicas, the node can still function as a gateway to route connections to relevant data. However, note that the [`/health?ready=1` monitoring endpoint](monitoring-and-alerting.html#health-ready-1) considers the node "unready" and returns a `503 Service Unavailable` status response code so load balancers stop directing traffic to the node. This behavior has been fixed in v20.1. | ||
After all range replicas have been transferred, it's typical to stop the node via a process manager or orchestration tool, or by sending `SIGTERM` manually. You can also use the [`cockroach quit`](stop-a-node.html) command to stop the node. When stopped, the node is drained of in-flight SQL connections and new SQL connections are rejected. At this point, the node is decommissioned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not mention ./cockroach quit
since we wish to get rid of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, the node is decommissioned.
You say earlier that it needs to meet two criteria above, but it will take five minutes until it meets the second.
v19.1/remove-nodes.md
Outdated
@@ -225,19 +277,19 @@ Select the **Replication** dashboard, and hover over the **Replicas per Store** | |||
|
|||
### Step 3. Decommission the nodes | |||
|
|||
SSH to any live node in the cluster and run the [`cockroach node decommission`](view-node-details.html) command with the IDs of the nodes to officially decommission: | |||
SSH to any live node in the cluster and run the [`cockroach node decommission`](cockroach-node.html) command with the IDs of the nodes to decommission: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
v19.1/remove-nodes.md
Outdated
|
||
For each decommissioned node, SSH to the machine running the node and execute the `cockroach quit` command: | ||
For each decommissioning node, SSH to the machine running the node and execute the `cockroach quit` command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
At this point, the nodes are decommissioned and will no longer appear in timeseries graphs unless you are viewing a time range during which the nodes were live. However, they will never disappear from the **Decommissioned Nodes** list. | ||
|
||
{{site.data.alerts.callout_info}} | ||
If you want to utilize a decommissioned node again, first [recommission](#recommission-nodes) the node to have the cluster rebalance data to the node. Then [restart](start-a-node.html) the node so that it accepts new SQL connections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
v19.1/remove-nodes.md
Outdated
@@ -394,6 +456,11 @@ $ cockroach node status --decommission --insecure --host=<address of any live no | |||
(5 rows) | |||
~~~ | |||
|
|||
- `is_decommissioning` will be `true` for a node that is undergoing or has completed the decommissioning process. This indicates that replicas are being or have been transferred to other nodes. | |||
- `is_draining` will be `true` for a node that has been stopped. This indicates that SQL connections are no longer being accepted by the node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nodes may be stopped and is_draining == false
as well. Just in case this may cause confusion.
@@ -4,9 +4,13 @@ summary: Learn how to temporarily stop a CockroachDB node. | |||
toc: true | |||
--- | |||
|
|||
This page shows you how to use the `cockroach quit` [command](cockroach-commands.html) to temporarily stop a node that you plan to restart, for example, during the process of [upgrading your cluster's version of CockroachDB](upgrade-cockroach-version.html) or to perform planned maintenance (e.g., upgrading system software). | |||
This page shows you how to use the `cockroach quit` [command](cockroach-commands.html) to temporarily stop a node that you plan to restart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about quit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jseldess and @tbg)
_includes/v20.1/faq/planned-maintenance.md, line 3 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
For the "resetting back to default" (further below, sorry, can't comment there) I think we'd want to advertise
RESET CLUSTER SETTING x
instead.
Done.
v19.1/admin-ui-cluster-overview-page.md, line 58 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Neither the old nor the new one are true. Decommissioning simply moves all data off the node (which includes moving all range leases). SQL sessions keep working forever, until the node is actually shut down. Shutting down the node gracefully will then let the node finish in-flight requests and drain clients etc.
Oops, right. Done!
v19.1/remove-nodes.md, line 12 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Might wanna call out explicitly that that is not what decommissioning is for.
Done.
v19.1/remove-nodes.md, line 24 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
what has been fixed in v20.1?
Done. Clarified that the health endpoint behavior was fixed.
v19.1/remove-nodes.md, line 26 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
At this point, the node is decommissioned.
You say earlier that it needs to meet two criteria above, but it will take five minutes until it meets the second.
Thank you! Fixed.
v19.1/remove-nodes.md, line 47 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
You say the same thing just below so I'm confused
Clarified the wording here (and in a following example). It means that adding the extra node won't fix a hung decommissioning process, and that you have to recommission the hung node first.
v19.1/remove-nodes.md, line 75 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Hmm same question here. Maybe I'm just misunderstanding how this renders (the HTML previews are 404)
Done. See above comment.
v19.1/remove-nodes.md, line 115 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This is a bit weird - we usually do SSH to the node because it's easiest from there to run commands against that node. But if you have the host, and it's reachable from wherever you are, then you can run the command from there instead via the
--host
flag. You specify that below, so it's weird that redundantly you ask users to ssh to a machine. I personally would focus on the--host
version and avoid recommending to ssh to a machine that runs CRDB since that isn't even always possible (think k8s)
Done. I had been confused by this and should have asked about it.
v19.1/remove-nodes.md, line 169 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Let's avoid cockroach quit.
cockroach quit
is removed from this doc starting with 20.1.
v19.1/remove-nodes.md, line 192 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
In the future this will be impossible unless the data dir is wiped (i.e. it counts as a new node). May want to incorporate this in the docs already (it doesn't hurt to do this today already)
Done, added to "Recommission nodes" subheading.
v19.1/remove-nodes.md, line 214 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Ditto
Done.
v19.1/remove-nodes.md, line 246 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Ditto
Done.
v19.1/remove-nodes.md, line 280 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Ditto
Done.
v19.1/remove-nodes.md, line 338 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Ditto
Done.
v19.1/remove-nodes.md, line 361 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Ditto
Done.
v19.1/remove-nodes.md, line 460 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Nodes may be stopped and
is_draining == false
as well. Just in case this may cause confusion.
@tbg I'm not quite following here. In what situations will a stopped node have is_draining == false
?
v19.1/stop-a-node.md, line 7 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Ditto about quit
I've added a hard-to-miss callout at the top of the 20.1 article that provides the latest guidance on stopping a node. When #6951 is complete I'll link to that article instead.
TFTR, @tbg! I had one question for you above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 22 of 30 files at r1, 10 of 13 files at r2, 3 of 3 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @jseldess, @taroface, and @tbg)
v19.1/remove-nodes.md, line 460 at r1 (raw file):
is_draining is probably true for a node that has been stopped, but it doesn't have to be that way. For example, if you hard-kill a node, it will never mark itself down as draining. A more correct statement is
is_draining == true
implies that the node is no longer accepting SQL connections. It is either in the process of shutting down or has already done so.
@knz Tobias has reviewed the node decommissioning changes. Requesting your review as I've also folded your updates from cockroachdb/cockroach#47692 into this PR. In addition, I have updated the node recommissioning guidance as per this thread. FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 13 files at r2, 9 of 9 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @jseldess, @taroface, and @tbg)
v19.1/remove-nodes.md, line 24 at r1 (raw file):
Previously, taroface (Ryan Kuo) wrote…
Done. Clarified that the health endpoint behavior was fixed.
I don't exactly know if that's true?
The /health?ready=0
endpoint reports the node as live (but I think it did that already)
The /health?ready=1
endpoint, hopefully reports the node as non-ready (that's what it should do) - because you want traffic to be redirected to other nodes I think?
v19.1/remove-nodes.md, line 26 at r1 (raw file):
Previously, taroface (Ryan Kuo) wrote…
Thank you! Fixed.
change "5 minutes" to "the configured value of .... (the cluster setting)"
v19.1/remove-nodes.md, line 47 at r1 (raw file):
Previously, taroface (Ryan Kuo) wrote…
Clarified the wording here (and in a following example). It means that adding the extra node won't fix a hung decommissioning process, and that you have to recommission the hung node first.
I am still confused by the text, but now maybe Tobias knows something I don't.
In my book, if there are 3 nodes, and you start decommissioning 1 of them, then the moment you add a 4th node the 3rd will proceed to decommission syccessfully.
The callout's suggestion to "recommission" the node is, in that view, incorrect.
Or is this behavior specific to 19.1 and has been fixed in a later version?
v19.1/remove-nodes.md, line 75 at r1 (raw file):
Previously, taroface (Ryan Kuo) wrote…
Done. See above comment.
see my response above.
v19.1/remove-nodes.md, line 22 at r4 (raw file):
1. The node has completed the decommissioning process. 2. The node has been stopped and has not [updated its liveness record](architecture/replication-layer.html#epoch-based-leases-table-data) for 5 minutes.
replace "for 5 minutes" with "for the duration configured via .... which defaults to 5 minutes"
v19.1/remove-nodes.md, line 94 at r4 (raw file):
- Before decommissioning each node verify that there are no [underreplicated or unavailable ranges](admin-ui-replication-dashboard.html). - If you have a decommissioning node that appears to be hung, you can [recommission](#recommission-nodes) or [restart](#restart-decommissioned-nodes) the node. If you notice any issues persisting, [contact our support team](support-resources.html).
see my comment above
v19.1/remove-nodes.md, line 350 at r4 (raw file):
</div> In about 5 minutes, you'll see the nodes move to the **Decommissioned Nodes** list.
change "5 minutes" to "the configured value of ..."
v19.1/remove-nodes.md, line 357 at r4 (raw file):
{{site.data.alerts.callout_info}} If you want to utilize a decommissioned node again, first [recommission](#recommission-nodes) the node to have the cluster rebalance data to the node. Then [restart](start-a-node.html) the node so that it accepts new SQL connections.
I think the best practice here is no never reuse a decommissioned node again.
In fact, we're going to remove this option completely starting in 20.2.
What's the point to keep this option documented here?
v19.1/remove-nodes.md, line 427 at r4 (raw file):
{% include {{ page.version.version }}/prod-deployment/node-shutdown.md %} After about 5 minutes, the stopped nodes will be considered [decommissioned](#how-it-works).
"5 minutes"
v19.1/stop-a-node.md, line 18 at r4 (raw file):
- Finishes in-flight requests. Note that this is a best effort that times out after the duration specified by the `server.shutdown.query_wait` [cluster setting](cluster-settings.html). - Transfers all **range leases** and Raft leadership to other nodes.
I may have missed something - why did you choose to remove the explanation about range leases? From a technical perspective, that part of it remains true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jseldess, @knz, @taroface, and @tbg)
v19.1/remove-nodes.md, line 24 at r1 (raw file):
@knz The current 20.1 docs state that:
During and after decommissioning, the node continues to accept new SQL connections. Even without replicas, the node can still function as a gateway to route connections to relevant data. For this reason, the /health?ready=1 monitoring endpoint continues to consider the node "ready" so load balancers can continue directing traffic to the node.
Is this incorrect? I took it to mean that since the node will accept SQL connections until it is drained, the health endpoint should still indicate "ready". (And then becomes "unready" when cockroach node drain
is run?)
v19.1/remove-nodes.md, line 26 at r1 (raw file):
Previously, knz (kena) wrote…
change "5 minutes" to "the configured value of .... (the cluster setting)"
Done.
v19.1/remove-nodes.md, line 47 at r1 (raw file):
Previously, knz (kena) wrote…
I am still confused by the text, but now maybe Tobias knows something I don't.
In my book, if there are 3 nodes, and you start decommissioning 1 of them, then the moment you add a 4th node the 3rd will proceed to decommission syccessfully.
The callout's suggestion to "recommission" the node is, in that view, incorrect.
Or is this behavior specific to 19.1 and has been fixed in a later version?
The 20.2 alpha behaves as you describe, so maybe the original doc was incorrect on this. @jseldess do you know if the currently documented behavior (decommissioning 1 node in a 3-node cluster will hang, unless you recommission that node, add a 4th node, and decommission again) was true of an earlier version? I'm able to simply add the 4th node to complete the decommissioning.
v19.1/remove-nodes.md, line 460 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
is_draining is probably true for a node that has been stopped, but it doesn't have to be that way. For example, if you hard-kill a node, it will never mark itself down as draining. A more correct statement is
is_draining == true
implies that the node is no longer accepting SQL connections. It is either in the process of shutting down or has already done so.
Done.
v19.1/remove-nodes.md, line 22 at r4 (raw file):
Previously, knz (kena) wrote…
replace "for 5 minutes" with "for the duration configured via .... which defaults to 5 minutes"
Done, assuming this is server.time_until_store_dead
.
v19.1/remove-nodes.md, line 350 at r4 (raw file):
Previously, knz (kena) wrote…
change "5 minutes" to "the configured value of ..."
Done.
v19.1/remove-nodes.md, line 427 at r4 (raw file):
Previously, knz (kena) wrote…
"5 minutes"
Done.
v19.1/stop-a-node.md, line 18 at r4 (raw file):
Previously, knz (kena) wrote…
I may have missed something - why did you choose to remove the explanation about range leases? From a technical perspective, that part of it remains true.
I removed "leases" for the reason Andrei gave here, on Jesse's prior update to this doc: #6825 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jseldess, @knz, @taroface, and @tbg)
v19.1/remove-nodes.md, line 357 at r4 (raw file):
Previously, knz (kena) wrote…
I think the best practice here is no never reuse a decommissioned node again.
In fact, we're going to remove this option completely starting in 20.2.What's the point to keep this option documented here?
Noted - I have removed this option and references to restarting/reusing decommissioned nodes.
The 20.2 alpha behaves as you describe, so maybe the original doc was
incorrect on this. @jseldess <https://github.com/jseldess> do you know if
the currently documented behavior (decommissioning 1 node in a 3-node
cluster will hang, unless you recommission that node, add a 4th node, and
decommission again) was true of an earlier version? I'm able to simply add
the 4th node to complete the decommissioning.
At some point in the past, decommissioning multiple nodes at a time was a
terrible UX footgun because the nodes would immediately start draining
(prior to even being fully decommissioned) and so you could easily nuke the
cluster out from under you. I don't remember when we fixed this, but I
would think it had been fixed in 19.1 already.
|
I'm looking at that other PR. In that PR I see two conflicting sentences:
This sentence is correct. then the callout:
I believe this sentence is incorrect. The thing is, I'm not sure. We have changed a few things in the code around this multiple times in the past. It may be that this sentence was correct at some point, and it is not any more (or vice-versa). |
Yeah, in that issue, @robert-s-lee reported: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this on 19.1. I'm pretty certain this is all the suspect status in the UI. It works as expected from a server perspective. If you have 3 nodes, decommission one, decommissioning will hang because there's not other place for the data to be moved and the node will be "suspect" in the UI. When you add a fourth node, the decommission resumes and completes fine. The decommissioned node, however, remains "suspect" in the UI until the node is stopped. Only at that point (or 5 min later) does the node move to the "Decommissioned Nodes" table.
We need to completely remove the note that says:
Adding a 4th node to the cluster at this point will neither enable the decommissioning process to complete nor change the Suspect node status. You can recommission the node to return it to a healthy state.
In addition, if we do anything else right now, I think we may want to look for ways to clarify that, in the UI, a fully decommissioned node will continue to be shown as "suspect" until the node is stopped. That's counterintuitive; if we have a "Decommissioned Nodes" list, it should show nodes that have been decommissioned, whether they are live or not. @knz tells me he and @irfansharif are working on this for 20.2.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jseldess, @knz, @taroface, and @tbg)
On a side note, I'm not sure it make sense to categorize a decommissioning or decommissioned node as "suspect", which indicates something potentially fishy or unintentional. |
I think it's because the UI only has 3 states "good" (green), "off" (red/absent) and "yellow" (everything else) It might make sense to add a new indicator color "node currently being removed" which will catch both decommissioning and nodes being shut down, and keep yellow for "suspect / don't know". That's squarely UI design though, cc @dhartunian @piyush-singh |
In the 20.1 UI, the Node List has a separate status for Once a node is moved into "Recently Decommissioned Nodes" with a I think versions 19.1-20.1 of this doc should clarify that a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @taroface. LGTM, but there are some broken links to fix before merging:
d4838dd1769bd861fac0b8b2bde286d124964d22/v19.1/remove-nodes.html
12:47:59
target does not exist --- d4838dd1769bd861fac0b8b2bde286d124964d22/v19.1/remove-nodes.html --> cockroach-quit.html
12:47:59
target does not exist --- d4838dd1769bd861fac0b8b2bde286d124964d22/v19.1/remove-nodes.html --> cockroach-node.html
12:47:59
d4838dd1769bd861fac0b8b2bde286d124964d22/v19.1/view-node-details.html
12:47:59
target does not exist --- d4838dd1769bd861fac0b8b2bde286d124964d22/v19.1/view-node-details.html --> cockroach-quit.html
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jseldess, @knz, @taroface, and @tbg)
port #7304 node decommissioning updates to 20.2
Fixes #6935.
Fixes #7384.
Fixes #7422.
cockroach node decommission
overcockroach quit --decommission
in 20.1, 19.2, 19.1.cockroach node drain
and--drain-wait
flag.cockroach quit
in 20.1 node decommissioning doc. Addedcockroach quit
deprecation announcement in 20.1.