Tags

Merge #29129 #29177
29129: backport-2.1: storage: Up-replicate before removing replicas from dead stores r=a-robinson a=a-robinson

Backport 1/1 commits from #28875.

/cc @cockroachdb/release

---

Fixes #25392, by preventing the following series of events:

1. Node x dies
2. We remove node x's replica of some range
3. Node y dies before we up-replicate, leaving the range unavailable (1
   out of 2 replicas dead)
4. Node x comes back online. It can't help the situation, because its
   replica was officially removed from the range.

Instead, we now up-replicate first before removing node x's replica.

Release note: None


29177: release-2.1: server: add a separate sampling loop for Go mem stats r=tschottdorf a=petermattis

Backport 5/5 commits from #29061.

/cc @cockroachdb/release

---

`runtime.ReadMemStats` will block waiting for any running GC cycle to
finish. On a large heap this can take upwards of 10s which is in excess of
the default frequency that we sample runtime statistics (10s).

Fixes #27775

Release note: none


Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Co-authored-by: Peter Mattis <petermattis@gmail.com>
Merge #28911
28911: release-2.1: kv: lie better about commits that are really rollbacks r=andreimatei a=andreimatei

cc @cockroachdb/release 

Backport #28872

When a client tries to commit a txn that has performed writes at an old
epoch but has only done reads at the current epoch, one of the
TxnCoordSender interceptors turns the commit into a rollback (for
reasons described in the code).
This patch completes that interceptor's lie by updating the txn status
upon success to COMMITTED instead of ABORTED. Since a commit is what the
client asked for, it seems sane to pretend as best we can that that's
what it got. In particular, this is important for the sql module, where
the ConnExecutor looks at the txn proto's status to discriminate between
cases where a "1pc planNode" already committed an implicit txn versus
situations where it needs to commit it itself. This was causing the
executor to think the txn was not committed and to attempt to commit
again, which resulted in an error.
I don't know if we like the ConnExecutor looking at the proto status,
but I'll leave that alone.

Fixes #28554
Fixes #28796

Release note: None

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
kv: remove unused test arg
We used to check an "abandoned" metric for transactions, but that
doesn't exist any more. One doesn't simply abandon a transaction.

Release note: None
Merge #27868 #28225
27868: backport-2.0: storage: prevent unbounded raft log growth without quorum r=nvanbenschoten a=nvanbenschoten

Backport 2/2 commits from #27774.

/cc @cockroachdb/release

---

Fixes #27772.

This change adds safeguards to prevent cases where a raft log
would grow without bound during loss of quorum scenarios. It
also adds a new test that demonstrates that the raft log does
not grow without bound in these cases.

There are two cases that need to be handled to prevent the
unbounded raft log growth observed in #27772.
1. When the leader proposes a command and cannot establish a
   quorum. In this case, we know the leader has the entry in
   its log, so there's no need to refresh it with `reasonTicks`.
   To avoid this, we no longer use `refreshTicks` as a leader.
2. When a follower proposes a command that is forwarded to the
   leader who cannot establish a quorum. In this case, the
   follower can't be sure (currently) that the leader got the
   proposal, so it needs to refresh using `reasonTicks`. However,
   the leader now detects duplicate forwarded proposals and
   avoids appending redundant entries to its log. It does so
   by maintaining a set of in-flight forwarded proposals that
   it has received during its term as leader. This set is reset
   after every leadership change.

Both of these cases are tested against in the new
TestLogGrowthWhenRefreshingPendingCommands. Without both of
the safeguards introduced in this commit, the test fails.

Release note (bug fix): Prevent loss of quorum situations from
allowing unbounded growth of a Range's Raft log.


28225: release-2.0: importccl: Preserve '\r\n' during CSV import r=dt a=dt

Backport 1/1 commits from #28181.

/cc @cockroachdb/release

---

See #25344.


Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: neeral <neeral@users.noreply.github.com>
Co-authored-by: David Taylor <tinystatemachine@gmail.com>
teamcity-publish-artifacts: use updated name for cockroach release bi…
…naries

Release note: None
Merge #27158
27158: backport-2.0: cli: Deprecate --wait=live in `node decommission` r=bdarnell a=tschottdorf

This is a bit of an unusual backport. The previous behavior is correctly documented, but it simply isn't that useful to operators who want to feel reasonably certain that cluster health is back to green before carrying on with their day. I'm not sure what our policy here should be this late into 2.0, but I'm tempted to say that we're not going to cherry-pick this. @bdarnell, what's your take here?

Backport 1/1 commits from #27027.

/cc @cockroachdb/release

---

Refer to issue #26880.

When you try to decommission a node that is down, today one has to use
`decommission --wait=live`, which does not verify whether the down node
is part of any ranges, and manually verify that the cluster has
up-replicated all ranges elsewhere. This is far from ideal, in particular
since there is no automated way to reliably check this.

`--wait=live` is deprecated and its behaviour replaced by that of
`--wait=all`. Instead of relying on metrics for replica counts, which
may be stale, look at authoritive source - range metadata.

Release note (cli change):
Deprecate `--wait=live` parameter for `node decommission`. `--wait=all`
is the default behaviour. This ensures CockroachDB checks ranges are on
the node to be decommissioned are not under-replicated before the node
is decommissioned.


Co-authored-by: neeral <neeral@users.noreply.github.com>
Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Merge #26923
26923: sql: misc pretty-printing fixes r=knz a=knz

Fixes #26920.
Fixes #26921.

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Merge #26615
26615: release-2.0: sql,compactor: rate limit clear range requests r=bdarnell,petermattis a=benesch

Backports #26449.

I'm running a clearrange roachtest with this diff applied tonight. If it passes we're good to go.

```diff
diff --git a/pkg/cmd/roachtest/clearrange.go b/pkg/cmd/roachtest/clearrange.go
index ea5bcdff8..2b244af6d 100644
--- a/pkg/cmd/roachtest/clearrange.go
+++ b/pkg/cmd/roachtest/clearrange.go
@@ -30,19 +30,9 @@ func registerClearRange(r *registry) {
 		// thoroughly brick the cluster.
 		Stable: false,
 		Run: func(ctx context.Context, t *test, c *cluster) {
-			t.Status(`downloading store dumps`)
-			// Created via:
-			// roachtest --cockroach cockroach-v2.0.1 store-gen --stores=10 bank \
-			//           --payload-bytes=10240 --ranges=0 --rows=65104166
-			fixtureURL := `gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1`
-			location := storeDirURL(fixtureURL, c.nodes, "2.0")
+			t.Status(`waiting for compactions to disappear`)
+			time.Sleep(90 * time.Minute)
 
-			// Download this store dump, which measures around 2TB (across all nodes).
-			if err := downloadStoreDumps(ctx, c, location, c.nodes); err != nil {
-				t.Fatal(err)
-			}
-
-			c.Put(ctx, cockroach, "./cockroach")
 			c.Start(ctx)
 
 			// Also restore a much smaller table. We'll use it to run queries against
@@ -81,7 +71,7 @@ func registerClearRange(r *registry) {
 				// above didn't brick the cluster.
 				//
 				// Don't lower this number, or the test may pass erroneously.
-				const minutes = 60
+				const minutes = 120
 				t.WorkerStatus("repeatedly running COUNT(*) on small table")
 				for i := 0; i < minutes; i++ {
 					after := time.After(time.Minute)

```

---

Now that DBCompactRange no longer attempts to compact the entire
database (#26355), sending ClearRange requests in sequential batches of
50is enough to prevent a large DROP TABLE from bricking a cluster.
They're slow enough that the compaction queue can keep up and purge
range deletion tombstones before enough pile up to wedge the cluster.

This is a partial fix for #24029.

Release note (bug fix): The performance impact of dropping a large table
has been substantially reduced.

Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
storage: Tick all replicas, not just unquiesced ones
This reverts the main effect of #24956. The supporting infrastructure
is left in place to minimize merge conflicts and to aid in diagnosing
why the maps get out of sync.

Fixes #26257

Release note: None
Merge #25307
25307: release-2.0: importccl: check node health and compatibility during IMPORT planning r=mjibson a=mjibson

Backport 1/1 commits from #25162.

/cc @cockroachdb/release

---

Simplify the LoadCSV signature by taking just a PlanHookState for any
argument that can be fetched from it. Determine the node list using
this new health check function. We can remove the rand.Shuffle call
because the map iteration should produce some level of randomness.

Fixes #12876

Release note (bug fix): Fix problems with imports sometimes failing
after node decommissioning.


Co-authored-by: Matt Jibson <matt.jibson@gmail.com>