Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/current/_includes/v24.1/essential-alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,9 +318,9 @@ Send an alert when the number of ranges with replication below the replication f

- Refer to [Replication issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#replication-issues).

### Requests stuck in raft
### Requests stuck in Raft

Send an alert when requests are taking a very long time in replication. An (evaluated) request has to pass through the replication layer, notably the quota pool and raft. If it fails to do so within a highly permissive duration, the gauge is incremented (and decremented again once the request is either applied or returns an error). A nonzero value indicates range or replica unavailability, and should be investigated.
Send an alert when requests are taking a very long time in replication. An (evaluated) request has to pass through the replication layer, notably the quota pool and raft. If it fails to do so within a highly permissive duration, the gauge is incremented (and decremented again once the request is either applied or returns an error). A nonzero value indicates range or replica unavailability, and should be investigated. This can also be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).

**Metric**
<br>`requests.slow.raft`
Expand Down
4 changes: 2 additions & 2 deletions src/current/_includes/v24.3/essential-alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,9 +318,9 @@ Send an alert when the number of ranges with replication below the replication f

- Refer to [Replication issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#replication-issues).

### Requests stuck in raft
### Requests stuck in Raft

Send an alert when requests are taking a very long time in replication. An (evaluated) request has to pass through the replication layer, notably the quota pool and raft. If it fails to do so within a highly permissive duration, the gauge is incremented (and decremented again once the request is either applied or returns an error). A nonzero value indicates range or replica unavailability, and should be investigated.
Send an alert when requests are taking a very long time in replication. An (evaluated) request has to pass through the replication layer, notably the quota pool and raft. If it fails to do so within a highly permissive duration, the gauge is incremented (and decremented again once the request is either applied or returns an error). A nonzero value indicates range or replica unavailability, and should be investigated. This can also be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).

**Metric**
<br>`requests.slow.raft`
Expand Down
9 changes: 9 additions & 0 deletions src/current/v24.1/architecture/replication-layer.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,15 @@ A table's meta and system ranges (detailed in the [distribution layer]({% link {

However, unlike table data, system ranges cannot use epoch-based leases because that would create a circular dependency: system ranges are already being used to implement epoch-based leases for table data. Therefore, system ranges use expiration-based leases instead. Expiration-based leases expire at a particular timestamp (typically after a few seconds). However, as long as a node continues proposing Raft commands, it continues to extend the expiration of its leases. If it doesn't, the next node containing a replica of the range that tries to read from or write to the range will become the leaseholder.

#### Leader‑leaseholder splits

[Epoch-based leases](#epoch-based-leases-table-data) are vulnerable to _leader-leaseholder splits_. These can occur when a leaseholder's Raft log has fallen behind other replicas in its group and it cannot acquire Raft leadership. Coupled with a [network partition]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#network-partition), this split can cause permanent unavailability of the range if (1) the stale leaseholder continues heartbeating the [liveness range](#epoch-based-leases-table-data) to hold its lease but (2) cannot reach the leader to propose writes.

Symptoms of leader-leaseholder splits include a [stalled Raft log]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#requests-stuck-in-raft) on the leaseholder and [increased disk usage]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#disks-filling-up) on follower replicas buffering pending Raft entries. Remediations include:

- Restarting the affected nodes.
- Fixing the network partition (or slow networking) between nodes.

#### How leases are transferred from a dead node

When the cluster needs to access a range on a leaseholder node that is dead, that range's lease must be transferred to a healthy node. This process is as follows:
Expand Down
2 changes: 2 additions & 0 deletions src/current/v24.1/cluster-setup-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -387,6 +387,8 @@ Like any database system, if you run out of disk space the system will no longer
- [Why is disk usage increasing despite lack of writes?]({% link {{ page.version.version }}/operational-faqs.md %}#why-is-disk-usage-increasing-despite-lack-of-writes)
- [Can I reduce or disable the storage of timeseries data?]({% link {{ page.version.version }}/operational-faqs.md %}#can-i-reduce-or-disable-the-storage-of-time-series-data)

In rare cases, disk usage can increase on nodes with [Raft followers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) due to a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).

###### Automatic ballast files

CockroachDB automatically creates an emergency ballast file at [node startup]({% link {{ page.version.version }}/cockroach-start.md %}). This feature is **on** by default. Note that the [`cockroach debug ballast`]({% link {{ page.version.version }}/cockroach-debug-ballast.md %}) command is still available but deprecated.
Expand Down
2 changes: 1 addition & 1 deletion src/current/v24.1/monitoring-and-alerting.md
Original file line number Diff line number Diff line change
Expand Up @@ -1205,7 +1205,7 @@ Currently, not all events listed have corresponding alert rule definitions avail

#### Requests stuck in Raft

- **Rule:** Send an alert when requests are taking a very long time in replication.
- **Rule:** Send an alert when requests are taking a very long time in replication. This can be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).

- **How to detect:** Calculate this using the `requests_slow_raft` metric in the node's `_status/vars` output.

Expand Down
2 changes: 1 addition & 1 deletion src/current/v24.1/ui-slow-requests-dashboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Hovering over the graph displays values for the following metrics:

Metric | Description
--------|----
Slow Raft Proposals | The number of requests that have been stuck for longer than usual in [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), as tracked by the `requests.slow.raft` metric.
Slow Raft Proposals | The number of requests that have been stuck for longer than usual in [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), as tracked by the `requests.slow.raft` metric. This can be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).

## Slow DistSender RPCs

Expand Down
9 changes: 9 additions & 0 deletions src/current/v24.3/architecture/replication-layer.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,15 @@ A table's meta and system ranges (detailed in the [distribution layer]({% link {

However, unlike table data, system ranges cannot use epoch-based leases because that would create a circular dependency: system ranges are already being used to implement epoch-based leases for table data. Therefore, system ranges use expiration-based leases instead. Expiration-based leases expire at a particular timestamp (typically after a few seconds). However, as long as a node continues proposing Raft commands, it continues to extend the expiration of its leases. If it doesn't, the next node containing a replica of the range that tries to read from or write to the range will become the leaseholder.

#### Leader‑leaseholder splits

[Epoch-based leases](#epoch-based-leases-table-data) are vulnerable to _leader-leaseholder splits_. These can occur when a leaseholder's Raft log has fallen behind other replicas in its group and it cannot acquire Raft leadership. Coupled with a [network partition]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#network-partition), this split can cause permanent unavailability of the range if (1) the stale leaseholder continues heartbeating the [liveness range](#epoch-based-leases-table-data) to hold its lease but (2) cannot reach the leader to propose writes.

Symptoms of leader-leaseholder splits include a [stalled Raft log]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#requests-stuck-in-raft) on the leaseholder and [increased disk usage]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#disks-filling-up) on follower replicas buffering pending Raft entries. Remediations include:

- Restarting the affected nodes.
- Fixing the network partition (or slow networking) between nodes.

#### How leases are transferred from a dead node

When the cluster needs to access a range on a leaseholder node that is dead, that range's lease must be transferred to a healthy node. This process is as follows:
Expand Down
2 changes: 2 additions & 0 deletions src/current/v24.3/cluster-setup-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -387,6 +387,8 @@ Like any database system, if you run out of disk space the system will no longer
- [Why is disk usage increasing despite lack of writes?]({% link {{ page.version.version }}/operational-faqs.md %}#why-is-disk-usage-increasing-despite-lack-of-writes)
- [Can I reduce or disable the storage of timeseries data?]({% link {{ page.version.version }}/operational-faqs.md %}#can-i-reduce-or-disable-the-storage-of-time-series-data)

In rare cases, disk usage can increase on nodes with [Raft followers]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) due to a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).

###### Automatic ballast files

CockroachDB automatically creates an emergency ballast file at [node startup]({% link {{ page.version.version }}/cockroach-start.md %}). This feature is **on** by default. Note that the [`cockroach debug ballast`]({% link {{ page.version.version }}/cockroach-debug-ballast.md %}) command is still available but deprecated.
Expand Down
2 changes: 1 addition & 1 deletion src/current/v24.3/monitoring-and-alerting.md
Original file line number Diff line number Diff line change
Expand Up @@ -1205,7 +1205,7 @@ Currently, not all events listed have corresponding alert rule definitions avail

#### Requests stuck in Raft

- **Rule:** Send an alert when requests are taking a very long time in replication.
- **Rule:** Send an alert when requests are taking a very long time in replication. This can be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).

- **How to detect:** Calculate this using the `requests_slow_raft` metric in the node's `_status/vars` output.

Expand Down
2 changes: 1 addition & 1 deletion src/current/v24.3/ui-slow-requests-dashboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Hovering over the graph displays values for the following metrics:

Metric | Description
--------|----
Slow Raft Proposals | The number of requests that have been stuck for longer than usual in [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), as tracked by the `requests.slow.raft` metric.
Slow Raft Proposals | The number of requests that have been stuck for longer than usual in [Raft]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft), as tracked by the `requests.slow.raft` metric. This can be a symptom of a [leader-leaseholder split]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leader-leaseholder-splits).

## Slow DistSender RPCs

Expand Down
Loading