Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add region failover section #1056

Merged
merged 24 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
6a2b994
docs: add region failover section
WenyXu Jul 12, 2024
42f959a
Update docs/nightly/en/user-guide/operations/region-failover.md
WenyXu Jul 12, 2024
af5d394
Update docs/nightly/en/user-guide/operations/region-failover.md
WenyXu Jul 12, 2024
30bf8bf
fix: fix typo
WenyXu Jul 12, 2024
6c55849
chore: apply suggestions from CR
WenyXu Jul 15, 2024
3b0d9b8
fix: fix typo
WenyXu Jul 15, 2024
adace2a
refactor: refine read amplification definition
WenyXu Jul 15, 2024
edfc913
chore: fix typo
WenyXu Jul 15, 2024
6155665
chore: apply suggestions from CR
WenyXu Jul 15, 2024
78510cb
Update docs/nightly/en/user-guide/operations/region-failover.md
WenyXu Jul 15, 2024
f249bdd
Update docs/nightly/en/user-guide/operations/region-failover.md
WenyXu Jul 15, 2024
76957fd
Update docs/nightly/en/user-guide/operations/region-failover.md
WenyXu Jul 15, 2024
3c3260b
Update docs/nightly/en/user-guide/operations/region-failover.md
WenyXu Jul 15, 2024
9ca03d8
chore: apply suggestions from CR
WenyXu Jul 15, 2024
3661768
chore: apply suggestions from CR
WenyXu Jul 15, 2024
b4c1a61
chore: apply suggestions from CR
WenyXu Jul 15, 2024
d0e37c3
docs: add zh part
WenyXu Jul 16, 2024
2be66b3
chore: apply suggestions from CR
WenyXu Jul 16, 2024
326996f
Apply suggestions from code review
nicecui Jul 16, 2024
3ad907f
typo
nicecui Jul 16, 2024
6a3d663
docs: add zh part
WenyXu Jul 16, 2024
531b7ec
chore: lint markdown
WenyXu Jul 16, 2024
ea6c64d
Merge branch 'main' into feat/failover
nicecui Jul 16, 2024
c65a978
refine the docs
nicecui Jul 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/auto-imports.d.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
/* eslint-disable */
/* prettier-ignore */
// @ts-nocheck
// noinspection JSUnusedGlobalSymbols
// Generated by unplugin-auto-import
export {}
declare global {
Expand Down
1 change: 1 addition & 0 deletions docs/nightly/en/summary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@
- quick-start
- cluster-deployment
- region-migration
- region-failover
- monitoring
- tracing
# TODO
Expand Down
154 changes: 117 additions & 37 deletions docs/nightly/en/user-guide/operations/configuration.md

Large diffs are not rendered by default.

81 changes: 81 additions & 0 deletions docs/nightly/en/user-guide/operations/region-failover.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Region Failover

Region Failover provides the ability to recover regions from region failures without losing data. This is implemented via [Region Migration](/user-guide/operations/region-migration).

## Enable the Region Failover

This feature is only available on GreptimeDB running on distributed mode and
nicecui marked this conversation as resolved.
Show resolved Hide resolved

- Using Kafka WAL
- Using [shared storage](/user-guide/operations/configuration.md#storage-options) (e.g., AWS S3)
nicecui marked this conversation as resolved.
Show resolved Hide resolved

### Via configuration file
Set the `enable_region_failover=true` in [metasrv](/user-guide/operations/configuration.md#metasrv-only-configuration) configuration file.

### Via GreptimeDB Operator

Set the `meta.enableRegionFailover=true`, e.g.,
```bash
helm install greptimedb greptime/greptimedb-cluster \
--set meta.enableRegionFailover=true \
...
```

## The recovery time of Region Failover

The recovery time of Region Failover depends on:

- number of regions per Topic.
- the Kafka cluster read throughput performance.

### The read amplification

In best practices, [the number of topics/partitions supported by a Kafka cluster is limited](https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html) (exceeding this number can degrade Kafka cluster performance).
Therefore, we allow multiple regions to share a single topic as the WAL.
However, this may cause to the read amplification issue.

The data belonging to a specific region consists of data files plus data in the WAL (typically `WAL[LastCheckpoint...Latest]`). The failover of a specific region only requires reading the region's WAL data to reconstruct the memory state, which is called region replaying. However, If multiple regions share a single topic, replaying data for a specific region from the topic requires filtering out unrelated data (i.e., data from other regions). This means replaying data for a specific region from the topic requires reading more data than the actual size of the region's data in the topic, a phenomenon known as read amplification.
nicecui marked this conversation as resolved.
Show resolved Hide resolved

Although multiple regions share the same topic, allowing the Datanode to support more regions, the cost of this approach is read amplification during WAL replay.

For example, configure 128 topics for [metasrv](/user-guide/operations/configuration.md#metasrv-only-configuration), and if the whole cluster holds 1024 regions (physical regions), every 8 regions will share one topic.

![Read Amplification](/remote-wal-read-amplification.png)

<p style="text-align: center;"><b>(Figure1: recovery Region 3 need to read redundant data 7 times larger than the actual size)</b></p>


A simple model to estimate the read amplification factor (replay data size/actual data size):

- For a single topic, if we try to replay all regions that belong to the topic, then the amplification factor would be 7+6+...+1 = 28 times. (The Region WAL data distribution is shown in the Figure 1. Replaying Region 3 will read 7 times redundant data larger than the actual size; Region 6 will read 6 times, and so on)
- When recovering 100 regions (requiring about 13 topics), the amplification factor is approximately 28 \* 13 = 364 times.

Assuming we have 100 regions to recover, and the actual data size of all regions is 0.5GB, the following table shows the replay data size based on the number of regions per topic.

| Number of regions per Topic | Number of topics required for 100 Regions | Single topic read amplification factor | Total reading amplification factor | Replay data size (GB) |
nicecui marked this conversation as resolved.
Show resolved Hide resolved
| --------------------------- | ----------------------------------------- | -------------------------------------- | ---------------------------------- | ---------------- |
| 1 | 100 | 0 | 0 | 0.5 |
| 2 | 50 | 1 | 50 | 25.5 |
| 4 | 25 | 6 | 150 | 75.5 |
| 8 | 13 | 28 | 364 | 182.5 |
| 16 | 7 | 120 | 840 | 420.5 |


The following table shows the recovery time of 100 regions under different read throughput conditions of the Kafka cluster. For example, when providing a read throughput of 300MB/s, recovering 100 regions requires approximately 10 minutes (182.5GB/0.3GB = 10m).

| Number of regions per Topic | Replay data size (GB) | Kafka throughput 300MB/s- Recovery time (secs) | Kafka throughput 1000MB/s- Recovery time (secs) |
| --------------------------- | ---------------- | --------------------------------------------- | ---------------------------------------------- |
| 1 | 0.5 | 2 | 1 |
| 2 | 25.5 | 85 | 26 |
| 4 | 75.5 | 252 | 76 |
| 8 | 182.5 | 608 | 183 |
| 16 | 420.5 | 1402 | 421 |

WenyXu marked this conversation as resolved.
Show resolved Hide resolved

### Suggestions for improving recovery time

In the above example, we calculated the recovery time based on the number of Regions contained in each Topic for reference.
We have calculated the recovery time under different Number of regions per Topic configuration for reference.
nicecui marked this conversation as resolved.
Show resolved Hide resolved
nicecui marked this conversation as resolved.
Show resolved Hide resolved
In actual scenarios, the read amplification may be larger than this model.
If you are very sensitive to recovery time, we recommend that each region have its topic(i.e., Number of regions per Topic is 1).

Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Region Migration allows users to move regions between the Datanode.

:::warning Warning
This feature is only available on GreptimeDB running on cluster mode and
This feature is only available on GreptimeDB running on distributed mode and
- Using Kafka WAL
- Using [shared storage](/user-guide/operations/configuration.md#storage-options) (e.g., AWS S3)

Expand Down
1 change: 1 addition & 0 deletions docs/nightly/zh/summary-i18n.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Continuous-Aggregation: 持续聚合
Logs: 日志
Python-Scripts: Python 脚本
Operations: 运维操作
Remote-WAL: Remote-WAL
Deploy-on-Kubernetes: 部署到 Kubernetes
Table-Sharding: 表分片
Prometheus: Prometheus
Expand Down
Loading
Loading