Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: warn if trying to [rd]ecommision when already [rd]ecommissioned #43915

Merged
merged 1 commit into from
Jan 15, 2020

Conversation

knz
Copy link
Contributor

@knz knz commented Jan 13, 2020

Fixes #36624.
First commit from #43908

Release note (cli change): The CLI commands cockroach node decommission and cockroach node recommission now produce a warning
on the standard error if one of the node(s) specified is
already (d/r)ecommissioned.

@knz knz requested a review from tbg January 13, 2020 15:27
@knz knz requested a review from a team as a code owner January 13, 2020 15:27
@knz knz added this to To do in DB Server & Security via automation Jan 13, 2020
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz moved this from To do to In progress in DB Server & Security Jan 13, 2020
@knz knz force-pushed the 20200113-deco-reco-warning branch from 8871990 to 0c4ef5c Compare January 13, 2020 15:30
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please wait for the first commit's PR to merge and then rebase. (I know you were planning to anyway).

Reviewed 6 of 6 files at r1, 4 of 4 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/cli/node.go, line 329 at r2 (raw file):

) error {
	s := serverpb.NewStatusClient(conn)
	resp, err := s.Nodes(ctx, &serverpb.NodesRequest{})

Seems a bit silly to broadcast through the cluster if it's large, but this is probably good enough and better than writing yet another loop issueing NodeRequests.


pkg/cli/node.go, line 359 at r2 (raw file):

				fmt.Fprintln(stderr, "warning: node", nodeID, "is not decommissioned")
			default: // dead, unavailable, etc
				fmt.Fprintln(stderr, "warning: node", nodeID, "may not be recommissionable (status: %s)", liveness)

"recommissionable" is weird here, you can recommission the node whatever its state. Maybe just "warning: node X is in unexpected state %s"?


pkg/cmd/roachtest/decommission.go, line 442 at r2 (raw file):

	}

	t.l.Printf("trying to decommision a second time\n")

ss


pkg/cmd/roachtest/decommission.go, line 456 at r2 (raw file):

	}

	t.l.Printf("trying to recommision a second time\n")

ss

@knz knz force-pushed the 20200113-deco-reco-warning branch from 0c4ef5c to 6a57da8 Compare January 13, 2020 15:37
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/cli/node.go, line 329 at r2 (raw file):

Seems a bit silly to broadcast through the cluster if it's large, but this is probably good enough and better than writing yet another loop issueing NodeRequests.

If my reading of (*statusServer).Nodes is correct, this does not broadcast through the cluster.


pkg/cli/node.go, line 359 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

"recommissionable" is weird here, you can recommission the node whatever its state. Maybe just "warning: node X is in unexpected state %s"?

Done.


pkg/cmd/roachtest/decommission.go, line 442 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

ss

Done.


pkg/cmd/roachtest/decommission.go, line 456 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

ss

Done.

@knz
Copy link
Contributor Author

knz commented Jan 13, 2020

LGTM but please wait for the first commit's PR to merge and then rebase.

  1. you haven't approved the first PR yet so I wasn't planning on merging anything just now
  2. now that you've put a "change requested" review in, I won't be able to merge anyway until you clear the review status

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^- right, but the moment one bases an approved PR on an unapproved one, human error makes it possible to accidentally merge both. (This isn't any criticism in that you're doing this, I totally would absentmindedly merge anything that's green and tend to assume my colleagues work the same way, sometimes erroneously).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

@tbg tbg self-requested a review January 13, 2020 15:41
@knz knz force-pushed the 20200113-deco-reco-warning branch from 6a57da8 to cc4f7fe Compare January 15, 2020 11:05
Release note (cli change): The CLI commands `cockroach node
decommission` and `cockroach node recommission` now produce a warning
on the standard error if one of the node(s) specified is
already (d/r)ecommissioned.
@knz knz force-pushed the 20200113-deco-reco-warning branch from cc4f7fe to e85cdeb Compare January 15, 2020 11:42
@knz
Copy link
Contributor Author

knz commented Jan 15, 2020

rebased, PTAL

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@knz
Copy link
Contributor Author

knz commented Jan 15, 2020

thank you!

bors r=tbg

@craig
Copy link
Contributor

craig bot commented Jan 15, 2020

Build failed

@knz
Copy link
Contributor Author

knz commented Jan 15, 2020

CI ran into #43990. Retrying

bors r=tbg

craig bot pushed a commit that referenced this pull request Jan 15, 2020
43915: cli: warn if trying to [rd]ecommision when already [rd]ecommissioned r=tbg a=knz

Fixes #36624.
First commit from #43908

Release note (cli change): The CLI commands `cockroach node
decommission` and `cockroach node recommission` now produce a warning
on the standard error if one of the node(s) specified is
already (d/r)ecommissioned.

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
@craig
Copy link
Contributor

craig bot commented Jan 15, 2020

Build failed

@knz
Copy link
Contributor Author

knz commented Jan 15, 2020

same error, will retry rebasing later

@knz
Copy link
Contributor Author

knz commented Jan 15, 2020

bors r=tbg

craig bot pushed a commit that referenced this pull request Jan 15, 2020
42969: storage: rationalize server-side refreshes and fix bugs r=andreimatei a=andreimatei

Before this patch, we had several issues due to the server erroneously
considering that it's OK to commit a transaction at a bumped timestamp.

One of the issues was a lost update: a CPut could erroneously succeed
even though there's been a more recent write. This was caused by faulty
code in evaluateBatch() that was thinking that, just because an EndTxn
claimed to have been able to commit a transaction, that means that any
WriteTooOldError encountered previously by the batch was safe to
discard. An EndTxn might consider that it can commit even if there had
been previous write too old conditions if the NoRefreshSpans flag is
set. The problems is that a CPut that had returned a WriteTooOldError
also evaluated at the wrong read timestamp, and so its evaluation can't
be relied on.

Another issue is that, when the EndTxn code mentioned above considers
that it's safe to commit at a bumped timestamp, it doesn't take into
considerations that the EndTxn's batch might have performed reads (other
than CPuts)  that have been evaluated at a lower timestamp. This can
happen, for example in the following scenario: - a txn sends a Put which
gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The
scan gets evaluated at the original timestamp, but then we commit at a
bumped one because the NoRefreshSpans flag is set.

The patch fixes the bugs by reworking how evaluation takes advantage of
the fact that some requests have flexible timestamps. EndTxn no longer
is in the business of committing at bumped timestamps, and its code is
thus simplified. Instead, the replica's "local retries" loop takes over.
The replica already had code handling non-transactional batches that
evaluated them repeatedly in case of WriteTooOldErrors. This patch
rationalizes and expands this code to deal with transactional batches
too, and with pushes besides WriteTooOldErrors. This reevaluation loop
now handles the cases in which the EndTxn used to bump the commit
timestamp.

The patch also fixes a third bug: the logic evaluateBatch() for
resetting the WriteTooOld state after a successful EndTransaction was
ignoring the STAGING state, meaning that the server would return a
WriteTooOldError even though the transaction was committed. I'm not sure
if this had dramatic consequences or was benign...

Fixes #42849

Release note (bug fix): A bug causing lost update transaction anomalies
was fixed.

43915: cli: warn if trying to [rd]ecommision when already [rd]ecommissioned r=tbg a=knz

Fixes #36624.
First commit from #43908

Release note (cli change): The CLI commands `cockroach node
decommission` and `cockroach node recommission` now produce a warning
on the standard error if one of the node(s) specified is
already (d/r)ecommissioned.

43989: colexec: fix an issue with builtin operators and a minor cleanup r=yuzefovich a=yuzefovich

Flat bytes relies on coldata.Batch.SetLength call to maintain its
invariant. We assume that it is always called before return the batch in
which Bytes vector might have been modified. This was not the case for
default builtin and substring operators, and the calls were added.
Additionally, to be safe, similar calls have been in added in projection
operators.

In a few places where we were setting the length of an internal batch to
0 and then returning it, those were replaced with returning
coldata.ZeroBatch.

Fixes: #43656.

Release note: None

44004: githooks: accept release note category 'security update' r=knz a=knz

Forgot this in #43869 

44008: re-enable: roachprod: Make multiple set [provider]-zones always geo-distribute nodes r=jlinder a=jlinder

This re-enables commit d24e40e which was
reverted in commit 63279f9. It was reverted
due to roachtest automatically passing in a list of zones but only wanting the
first zone to be used (#43898)
which was fixed in f68c6d5 .

Before: if multiple zones were set for a provider and --geo wasn't set, all
hosts would be started in just one zone in one region.

Why change? Because if multiple zones are set, the intention is that they be
used.

Now, --geo and --[provider]-zones work as follows for gcloud, aws and azure:

1. when geo and zones are not set, nodes are all placed in one of the
   default zones
2. when geo is set but zones aren't, nodes are spread evenly across the
   default zones
3. when zones are set, nodes are spread evenly across the specified zones

Fixes #38542.

Release note: None

44016: builtins: miscellaneous fixes for the to_hex builtin r=mjibson a=otan

Resolves #41707.

Release note (sql change, bug fix):
* Added to_hex(string) -> string functionality.
* Previously, `to_hex(-1)` would return `-1` instead of the negative
hex representation (`FFFFFFFFFFFFFFFF`). This has been rectified in
this PR.

44023: roachtest: bump minimum version of the sqlsmith roachtest r=yuzefovich a=rohany

Fixes #43995.

This PR bumps the minimum version of the SQLSmith roachtest
to be v20.1.0.

Release note: None

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: James H. Linder <jamesl@cockroachlabs.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
@craig
Copy link
Contributor

craig bot commented Jan 15, 2020

Build succeeded

@craig craig bot merged commit e85cdeb into cockroachdb:master Jan 15, 2020
DB Server & Security automation moved this from In progress to Done 20.1 Jan 15, 2020
@knz knz deleted the 20200113-deco-reco-warning branch January 15, 2020 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

core: Warn users when they attempt to recommission a node that is not decommissioned.
3 participants