release-21.2: liveness: improve disk probes during node liveness updates #81514

erikgrinaker · 2022-05-19T09:13:38Z

Backport 1/1 commits from liveness: improve disk probes during node liveness updates #81133.
Backport 2/2 commits from liveness: run sync disk write in a stopper task #81813.

/cc @cockroachdb/release

When NodeLiveness updates the liveness record (e.g. during
heartbeats), it first does a noop sync write to all disks. This ensures
that a node with a stalled disk will fail to maintain liveness and lose
its leases.

However, this sync write could block indefinitely, and would not respect
the caller's context, which could cause the caller to stall rather than
time out. This in turn could lead to stalls higher up in the stack,
in particular with lease acquisitions that do a synchronous heartbeat.

This patch does the sync write in a separate goroutine in order to
respect the caller's context. The write operation itself will not
(can not) respect the context, and may thus leak a goroutine. However,
concurrent sync writes will coalesce onto an in-flight write.

Additionally, this runs the sync writes in parallel across all disks,
since we can now trivially do so. This may be advantageous on nodes with
many stores, to avoid spurious heartbeat failures under load.

Touches #81100.

Release note (bug fix): Disk write probes during node liveness
heartbeats will no longer get stuck on stalled disks, instead returning
an error once the operation times out. Additionally, disk probes now run
in parallel on nodes with multiple stores.

Release justification: cluster availability improvement.

blathers-crl · 2022-05-19T09:13:41Z

cockroach-teamcity · 2022-05-19T09:13:45Z

This change is

When `NodeLiveness` updates the liveness record (e.g. during heartbeats), it first does a noop sync write to all disks. This ensures that a node with a stalled disk will fail to maintain liveness and lose its leases. However, this sync write could block indefinitely, and would not respect the caller's context, which could cause the caller to stall rather than time out. This in turn could lead to stalls higher up in the stack, in particular with lease acquisitions that do a synchronous heartbeat. This patch does the sync write in a separate goroutine in order to respect the caller's context. The write operation itself will not (can not) respect the context, and may thus leak a goroutine. However, concurrent sync writes will coalesce onto an in-flight write. Additionally, this runs the sync writes in parallel across all disks, since we can now trivially do so. This may be advantageous on nodes with many stores, to avoid spurious heartbeat failures under load. Release note (bug fix): Disk write probes during node liveness heartbeats will no longer get stuck on stalled disks, instead returning an error once the operation times out. Additionally, disk probes now run in parallel on nodes with multiple stores.

Release note: None

This patch runs the sync disk write during node heartbeats in a stopper task. The write is done in a goroutine, so that we can respect the caller's context cancellation (even though the write itself won't). However, this could race with engine shutdown when stopping the node, violating the Pebble contract and triggering the race detector. Running it as a stopper task will cause the node to wait for the disk write to complete before closing the engine. Of course, if the disk stalls then node shutdown will now never complete. This is very unfortunate, since stopping the node is often the only mitigation to recover stuck ranges with stalled disks. This is mitigated by Pebble panic'ing the node on stalled disks, and Kubernetes and other orchestration tools killing the process after some time. Release note: None

erikgrinaker requested a review from tbg May 19, 2022 09:13

erikgrinaker requested review from a team as code owners May 19, 2022 09:13

erikgrinaker self-assigned this May 19, 2022

tbg approved these changes May 23, 2022

View reviewed changes

erikgrinaker added 3 commits June 8, 2022 17:39

liveness: move stopper to NodeLivenessOptions

c6ce5c8

Release note: None

erikgrinaker force-pushed the backport21.2-81133 branch from 669ff8c to c8dea85 Compare June 8, 2022 17:41

erikgrinaker requested a review from a team as a code owner June 8, 2022 17:41

erikgrinaker merged commit d3473b8 into cockroachdb:release-21.2 Jun 8, 2022

erikgrinaker deleted the backport21.2-81133 branch June 9, 2022 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-21.2: liveness: improve disk probes during node liveness updates #81514

release-21.2: liveness: improve disk probes during node liveness updates #81514

erikgrinaker commented May 19, 2022 •

edited

blathers-crl bot commented May 19, 2022

cockroach-teamcity commented May 19, 2022

release-21.2: liveness: improve disk probes during node liveness updates #81514

release-21.2: liveness: improve disk probes during node liveness updates #81514

Conversation

erikgrinaker commented May 19, 2022 • edited

blathers-crl bot commented May 19, 2022

cockroach-teamcity commented May 19, 2022

erikgrinaker commented May 19, 2022 •

edited