Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion docs/reference/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,9 @@ An enterprise feature in GreptimeDB that creates additional read-only instances
### Region
A fundamental unit of data distribution in GreptimeDB's architecture. Regions contain a subset of table data and can be distributed across different nodes in a cluster. Each region manages its own storage, indexing, and query processing, enabling horizontal scalability and fault tolerance.

### Repartition
The process of adjusting table partition boundaries after creation by merging existing partitions and splitting them with new rules. Repartition is used to better match current data distribution, relieve hotspots, and reduce small cold regions.

### Rust
A modern programming language known for its performance and safety features, particularly in system-level programming. GreptimeDB is built with Rust, contributing to its superior performance and reliability.

Expand Down Expand Up @@ -195,6 +198,9 @@ A special timestamp column in GreptimeDB tables that serves as the primary time
### Time Series Database
A specialized database designed to handle time-series data, which consists of sequences of data points indexed by timestamps. GreptimeDB is a cloud-native time-series database optimized for analyzing and querying metrics, logs, and events.

### Table Sharding
The technique of splitting a large table into multiple smaller partitions. In GreptimeDB, table sharding helps distribute load across regions and improve throughput for hot or large tables.

---

## T
Expand Down Expand Up @@ -237,4 +243,4 @@ A time-series database designed to work seamlessly with vehicle data and cloud-b

---

*Note: This glossary is a work in progress and will be updated as new features and concepts emerge within the GreptimeDB ecosystem.*
*Note: This glossary is a work in progress and will be updated as new features and concepts emerge within the GreptimeDB ecosystem.*
Original file line number Diff line number Diff line change
Expand Up @@ -558,6 +558,29 @@ meta:
enableRegionFailover: true
```

### Enable GC

Repartitioning depends on shared object storage and GC. You can enable GC on both metasrv and datanode with the following example:

```yaml
meta:
configData: |
[gc]
enable = true
gc_cooldown_period = "5m"
datanode:
configData: |
[[region_engine]]
[region_engine.mito]
[region_engine.mito.gc]
enable = true
lingering_time = "10m"
unknown_file_lingering_time = "1h"
```

Make sure the datanode `lingering_time` is longer than the metasrv `gc_cooldown_period` to avoid deleting files that may still be in use.

#### Enable Region Failover on Local WAL

To enable Region Failover on local WAL, you need to set both `meta.enableRegionFailover: true` and add `allow_region_failover_on_local_wal = true` in the `meta.configData` field.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
---
keywords: [repartition, table sharding, GC, object storage, GreptimeDB, Helm Chart]
description: Explains GreptimeDB repartitioning, including prerequisites, hot partition detection, partition rules, and common Helm Chart configurations.
---

# Repartition

Repartition lets you adjust partition rules after a table has been created.
GreptimeDB does this through `ALTER TABLE` partition split and merge operations; see [ALTER TABLE](/reference/sql/alter.md#split-or-merge-partitions) for the syntax.

Repartition is only supported in distributed clusters.

## How it works

The core idea is to adjust partition rules and Region routing online instead of manually moving data into a new table.
GreptimeDB switches to the new partition layout by updating manifest file references for each Region, so the rules can better match the current data distribution.

This approach is useful when traffic patterns change over time and you want to keep partition rules aligned with the workload without rebuilding the table.

During repartitioning, writes may be briefly affected, so client-side retries are recommended.

## When to repartition

Consider repartitioning when:

- Some regions are consistently hotter than others.
- Your data distribution changes and the current partition boundaries no longer fit.
- Some regions become very small and cold, and you want to reduce resource usage by merging them.
- You want to further split a partition to improve write concurrency or query performance.

In general, when partition rules no longer reflect the current data distribution well, it is worth considering repartitioning.

## How to identify hot partitions

Before repartitioning, confirm which regions are hot.
Join region statistics with partition metadata to find the hottest rules:

```sql
SELECT
t.table_name,
r.region_id,
r.region_number,
p.partition_name,
p.partition_description,
r.region_role,
r.written_bytes_since_open,
r.region_rows
FROM information_schema.region_statistics r
JOIN information_schema.tables t
ON r.table_id = t.table_id
JOIN information_schema.partitions p
ON p.table_schema = t.table_schema
AND p.table_name = t.table_name
AND p.greptime_partition_id = r.region_id
WHERE t.table_schema = 'public'
AND t.table_name = 'your_table'
ORDER BY r.written_bytes_since_open DESC
LIMIT 10;
```

If some regions have a much higher `written_bytes_since_open` value over time, that partition rule is usually a good candidate for splitting.

Also check whether the region peers are healthy so node issues are not mistaken for hotspots:

```sql
SELECT
p.region_id,
p.peer_addr,
p.status,
p.down_seconds
FROM information_schema.region_peers p
WHERE p.table_schema = 'public'
AND p.table_name = 'your_table'
ORDER BY p.region_id, p.peer_addr;
```

If the nodes are healthy and the hotspot signal persists, you can move on to designing the repartition plan.

## Prerequisites

:::warning Warning
This feature is only available in distributed clusters and requires:

- Using [shared object storage](/user-guide/deployments-administration/configuration.md#storage-options) (e.g., AWS S3)
- Using [GC](/user-guide/deployments-administration/manage-data/gc.md) on both metasrv and all datanodes

Otherwise, you can't perform repartitioning.
:::

GreptimeDB supports repartitioning through repeated `SPLIT PARTITION` and `MERGE PARTITION` operations.
The most common cases are 1-to-2 splits and 2-to-1 merges.
More complex changes can also be done step by step.

Object storage stores region files, while GC reclaims old files only after references are released.
This helps prevent accidental deletion of data still in use.

### Through GreptimeDB Operator

If you deploy GreptimeDB with the GreptimeDB Operator, refer to [Common Helm Chart Configurations](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) for GC and object storage setup.

## Example

You can repartition a table by merging existing partitions and then splitting them with new rules.
The following example changes the `area` partition key from `South` to `North` for devices with `device_id < 100`:

```sql
ALTER TABLE sensor_readings MERGE PARTITION (
device_id < 100 AND area < 'South',
device_id < 100 AND area >= 'South'
);

ALTER TABLE sensor_readings SPLIT PARTITION (
device_id < 100
) INTO (
device_id < 100 AND area < 'North',
device_id < 100 AND area >= 'North'
);
```

## Further reading

For a step-by-step tutorial with more background and examples, see [How to Split and Merge Partitions Online in GreptimeDB](https://greptime.com/blogs/2026-03-19-greptimedb-repartition-guide).
Original file line number Diff line number Diff line change
Expand Up @@ -110,26 +110,7 @@ The following content uses the `sensor_readings` table with two partition column

## Repartition a sharded table

You can modify partition rules by first merging existing partitions and then splitting them with new rules. The example below shows how to change the partitioning on the `area` column from `South` to `North` for devices with `device_id < 100`:

```sql
ALTER TABLE sensor_readings MERGE PARTITION (
device_id < 100 AND area < 'South',
device_id < 100 AND area >= 'South'
);

ALTER TABLE sensor_readings SPLIT PARTITION (
device_id < 100
) INTO (
device_id < 100 AND area < 'North',
device_id < 100 AND area >= 'North'
);
```

:::caution Note
Repartitioning is only supported in distributed clusters.
You must enable shared object storage and [GC](docs/user-guide/deployments-administration/manage-data/gc.md), and ensure all datanodes can access the same object store before running repartitioning operations.
:::
If you need to modify partition rules for an existing table, refer to the separate [Repartition](/user-guide/deployments-administration/manage-data/repartition.md) page.

## Insert data into the table

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,9 @@ GreptimeDB 企业版中的功能,通过创建额外的只读数据实例来提
### Region (区域)
GreptimeDB 架构中数据分布的基本单元。Region 包含表数据的子集,可分布在集群的不同节点上。每个 Region 管理自己的存储、索引和查询处理,实现水平扩展和容错能力。

### Repartition (重分区)
通过合并已有分区并按新规则拆分分区来调整建表后的分区边界的过程。重分区用于更好地匹配当前数据分布、缓解热点,并减少冷小分区。

### Rust
以前沿内存安全特性著称的系统级编程语言。GreptimeDB 采用 Rust 语言构建,为其高性能与高可靠性提供底层保障。

Expand Down Expand Up @@ -193,6 +196,9 @@ GreptimeDB 表中的特殊时间戳列,作为时序数据的主要时间维度
### Time Series Database (时序数据库)
专为时间戳索引数据设计的数据库类型。GreptimeDB 作为云原生时序数据库,深度优化了对指标、日志及事件的分析查询性能。

### Table Sharding (表分片)
将一张大表拆分为多个更小分区的技术。在 GreptimeDB 中,表分片有助于将负载分散到多个 region 上,并提升热点表或大表的吞吐能力。

---

## T
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -558,6 +558,29 @@ meta:
enableRegionFailover: true
```

### 启用 GC

重分区依赖共享对象存储和 GC。你可以通过下面的示例同时开启 metasrv 和 datanode 的 GC:

```yaml
meta:
configData: |
[gc]
enable = true
gc_cooldown_period = "5m"

datanode:
configData: |
[[region_engine]]
[region_engine.mito]
[region_engine.mito.gc]
enable = true
lingering_time = "10m"
unknown_file_lingering_time = "1h"
```

请确保 `datanode` 的 `lingering_time` 大于 `meta` 的 `gc_cooldown_period`,以避免正在使用的文件过早被删除。

#### 启用 Region Failover 在本地 WAL

在本地 WAL 上启用 Region Failover,你需要设置 `meta.enableRegionFailover: true` 和在 `meta.configData` 字段中添加 `allow_region_failover_on_local_wal = true`。
Expand Down Expand Up @@ -609,4 +632,4 @@ datanode:
[wal]
provider = "kafka"
overwrite_entry_start_id = true
```
```
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ description: 提供 GreptimeDB 数据管理的概述,包括存储位置说明
* [更新或删除数据](/user-guide/manage-data/overview.md)
* [通过设置 TTL 过期数据](/user-guide/manage-data/overview.md#使用-ttl-策略保留数据)
* [表分片](table-sharding.md): 按 Region 对表进行分区
* [重分区](repartition.md): 调整已创建表的分区边界
* [Region Migration](region-migration.md): 为负载均衡迁移 Region
* [Region Failover](/user-guide/deployments-administration/manage-data/region-failover.md)
* [Compaction](compaction.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
keywords: [重分区, repartition, GC, 对象存储, GreptimeDB, Helm Chart]
description: 介绍 GreptimeDB 的重分区流程,以及在执行前需要准备的 GC 和对象存储配置,并提供 Helm Chart 的常见配置示例。
---

# 重分区

重分区可以在表创建后调整分区规则。GreptimeDB 通过 `ALTER TABLE` 的分区拆分与合并能力来完成重分区,详细语法请参考 [ALTER TABLE](/reference/sql/alter.md#分区拆分与合并)。

重分区仅支持分布式集群。

## 原理

重分区的核心是在线调整分区规则和 Region 路由,而不是把数据手工迁移到新的表。
GreptimeDB 会通过更新每个 Region 的 manifest 文件引用切换到新的分区布局,从而让分区规则重新贴合当前的数据分布。

当业务流量模式发生变化时,这种方式可以帮助你继续保持分区规则与负载匹配,而不需要重建整张表。

重分区期间,写入可能会出现短暂波动,建议客户端开启重试机制。

## 什么时候需要重分区

如果出现下面这些情况,就可以考虑重分区:

- 某些 Region 的写入或查询明显更热,负载长期不均衡;
- 业务分布发生变化,原有分区边界已经不再合适;
- 部分 Region 变得很小且很冷,希望通过合并减少资源占用;
- 需要把某个分区进一步细分,以改善写入并发或查询性能。

通常来说,当分区规则已经不能很好地反映当前数据分布时,就值得考虑重分区。

## 如何发现热点分区

在做重分区之前,建议先确认哪些 Region 已经出现热点。
你可以先把 Region 级别的统计信息和分区规则关联起来,找出最热的规则:

```sql
SELECT
t.table_name,
r.region_id,
r.region_number,
p.partition_name,
p.partition_description,
r.region_role,
r.written_bytes_since_open,
r.region_rows
FROM information_schema.region_statistics r
JOIN information_schema.tables t
ON r.table_id = t.table_id
JOIN information_schema.partitions p
ON p.table_schema = t.table_schema
AND p.table_name = t.table_name
AND p.greptime_partition_id = r.region_id
WHERE t.table_schema = 'public'
AND t.table_name = 'your_table'
ORDER BY r.written_bytes_since_open DESC
LIMIT 10;
```

如果某些 Region 的 `written_bytes_since_open` 长期明显更高,通常就说明这条分区规则比较热,适合优先考虑拆分。

同时也建议检查 Region 对应的节点是否正常,避免把节点抖动误判为热点:

```sql
SELECT
p.region_id,
p.peer_addr,
p.status,
p.down_seconds
FROM information_schema.region_peers p
WHERE p.table_schema = 'public'
AND p.table_name = 'your_table'
ORDER BY p.region_id, p.peer_addr;
```

如果节点状态正常,而热点信号持续存在,就可以继续设计重分区方案。

## 前置条件

:::warning 警告
该功能仅在 GreptimeDB 的分布式集群中可用,并且

- 使用[共享对象存储](/user-guide/deployments-administration/configuration.md#storage-options)(例如 AWS S3)
- 在 metasrv 和所有 datanode 上启用 [GC](/user-guide/deployments-administration/manage-data/gc.md)

否则你无法执行重分区。
:::

当前开源版支持通过多次 `SPLIT PARTITION` / `MERGE PARTITION` 组合完成分区调整,最常见的场景是 1 拆 2 或 2 合 1。对于更复杂的分区变更,也可以通过逐步拆分和合并来完成。

对象存储用于保存 region 文件,GC 则负责在引用释放后再回收旧文件,避免重分区过程中误删仍在使用的数据。

如需了解详细配置,请参考:

- [GC](/user-guide/deployments-administration/manage-data/gc.md)
- [对象存储配置](/user-guide/deployments-administration/configuration.md#storage-options)

### 通过 GreptimeDB Operator

如果你使用 GreptimeDB Operator 部署,可以参考 [常见 Helm Chart 配置项](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) 快速完成 GC 和对象存储配置。

## 重分区示例

你可以通过先合并现有分区,然后用新规则拆分它们来修改分区规则。下面的示例展示了如何将 `device_id < 100` 的设备的分区键 `area` 从 `South` 更改为 `North`:

```sql
ALTER TABLE sensor_readings MERGE PARTITION (
device_id < 100 AND area < 'South',
device_id < 100 AND area >= 'South'
);

ALTER TABLE sensor_readings SPLIT PARTITION (
device_id < 100
) INTO (
device_id < 100 AND area < 'North',
device_id < 100 AND area >= 'North'
);
```

## 延伸阅读

如果你希望查看包含更多背景说明和完整示例的教程,请参考博客文章:[如何在 GreptimeDB 中在线拆分与合并分区](https://greptime.cn/blogs/2026-03-19-greptimedb-repartition-guide)。

:::caution 注意
重分区仅支持在分布式集群中执行。请确认 GC 与对象存储已正确配置后,再运行相关操作。
:::
Loading
Loading