Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: declare node ready while decommissioning #43889

Merged

Conversation

andreimatei
Copy link
Contributor

Before this patch, a decommissioning or draining, but otherwise healthy,
node would return an error from the /health?ready=1 endpoint, thus
declaring itself unready and signaling load balancers to send traffic
away. This patch makes it so that the decommissioning status no longer
matters for the readyness determination. Draining nodes continue to
declare themselves unready.
Note that decommissioning nodes typically go through draining at the
end of the process.

The justification is that a node can be decommissioning for an arbitrary
amount of time. During that time it can continue to hold leases, etc. So
trying to avoid traffic is not particularly desirable. In fact, some
people even want to keep a node in a decommissioning state indefinitely
(by restarting a decommissioning node without recommissioning it).
Also, the server.shutdown.drain_wait cluster setting is there to give
load balancers ample time to find out about a draining node.

Also, tactically, the code is simplified.

Release note (general change): A node no longer declares itself to not
be ready through the /health/ready=1 endpoint while it's in the process
of decommissioning. It continues to declare itself unready while
draining.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei
Copy link
Contributor Author

started a discussion about the change here
https://groups.google.com/forum/#!topic/cockroach-db/P3v85IJIycE

@andreimatei
Copy link
Contributor Author

I got a nod from @asubiotto on the mailing list, so I think this is good to go

Before this patch, a decommissioning or draining, but otherwise healthy,
node would return an error from the /health?ready=1 endpoint, thus
declaring itself unready and signaling load balancers to send traffic
away. This patch makes it so that the decommissioning status no longer
matters for the readyness determination. Draining nodes continue to
declare themselves unready.
Note that decommissioning nodes typically go through draining at the
end of the process.

The justification is that a node can be decommissioning for an arbitrary
amount of time. During that time it can continue to hold leases, etc. So
trying to avoid traffic is not particularly desirable. In fact, some
people even want to keep a node in a decommissioning state indefinitely
(by restarting a decommissioning node without recommissioning it).
Also, the server.shutdown.drain_wait cluster setting is there to give
load balancers ample time to find out about a draining node.

Also, tactically, the code is simplified.

Release note (general change): A node no longer declares itself to not
be ready through the /health/ready=1 endpoint while it's in the process
of decommissioning. It continues to declare itself unready while
draining.
@andreimatei andreimatei force-pushed the liveness.ready-on-decommissioning branch from 59f874a to 139bd21 Compare January 14, 2020 00:46
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

@craig
Copy link
Contributor

craig bot commented Jan 14, 2020

Build failed

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flaked on #43957

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

craig bot pushed a commit that referenced this pull request Jan 14, 2020
43417: sql: remove restriction of foreign key on multiple outbound columns r=lucy-zhang a=otan

Resolves #38850.

Removing the restriction of foreign keys on multiple outbound columns.
There is one minor issue we need to address which is done here -- if the
constraint name already exists, append a number next to it as postgres
does as well. We can make this fancier if we wish later.

This PR comes with a test suite of corner cases that we suspect may be
hinted at being wrong. It uses both legacy and the new optimizer
settings for foreign key checks.

Release note (sql change): We previously had a restriction that foreign
keys can only reference one outbound column at any given point in time,
e.g. in `CREATE TABLE test(a int)`, we cannot have two foreign keys
on column a. This PR removes this restriction.

43889: server: declare node ready while decommissioning r=andreimatei a=andreimatei

Before this patch, a decommissioning or draining, but otherwise healthy,
node would return an error from the /health?ready=1 endpoint, thus
declaring itself unready and signaling load balancers to send traffic
away. This patch makes it so that the decommissioning status no longer
matters for the readyness determination. Draining nodes continue to
declare themselves unready.
Note that decommissioning nodes typically go through draining at the
end of the process.

The justification is that a node can be decommissioning for an arbitrary
amount of time. During that time it can continue to hold leases, etc. So
trying to avoid traffic is not particularly desirable. In fact, some
people even want to keep a node in a decommissioning state indefinitely
(by restarting a decommissioning node without recommissioning it).
Also, the server.shutdown.drain_wait cluster setting is there to give
load balancers ample time to find out about a draining node.

Also, tactically, the code is simplified.

Release note (general change): A node no longer declares itself to not
be ready through the /health/ready=1 endpoint while it's in the process
of decommissioning. It continues to declare itself unready while
draining.

Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Jan 14, 2020

Build succeeded

@craig craig bot merged commit 139bd21 into cockroachdb:master Jan 14, 2020
@andreimatei andreimatei deleted the liveness.ready-on-decommissioning branch January 15, 2020 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants