-
Notifications
You must be signed in to change notification settings - Fork 52
docs: add repartition section #2380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+613
−85
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
1eb4345
docs: add repartition guide and navigation
WenyXu 44fafe9
fix: update gc documentation links
WenyXu e5deaf1
docs: add GC configuration to Helm chart guide
WenyXu b8a5397
docs: simplify repartition labels in zh docs
WenyXu 4420213
docs: refine repartition guide wording
WenyXu 5ae37b0
docs: localize repartition blog title in zh docs
WenyXu 167d0a6
docs: sync repartition updates to v1.0
WenyXu 78faf65
docs: align repartition navigation across versions
WenyXu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
122 changes: 122 additions & 0 deletions
122
docs/user-guide/deployments-administration/manage-data/repartition.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,122 @@ | ||
| --- | ||
| keywords: [repartition, table sharding, GC, object storage, GreptimeDB, Helm Chart] | ||
| description: Explains GreptimeDB repartitioning, including prerequisites, hot partition detection, partition rules, and common Helm Chart configurations. | ||
| --- | ||
|
|
||
| # Repartition | ||
|
|
||
| Repartition lets you adjust partition rules after a table has been created. | ||
| GreptimeDB does this through `ALTER TABLE` partition split and merge operations; see [ALTER TABLE](/reference/sql/alter.md#split-or-merge-partitions) for the syntax. | ||
|
|
||
| Repartition is only supported in distributed clusters. | ||
|
|
||
| ## How it works | ||
|
|
||
| The core idea is to adjust partition rules and Region routing online instead of manually moving data into a new table. | ||
| GreptimeDB switches to the new partition layout by updating manifest file references for each Region, so the rules can better match the current data distribution. | ||
|
|
||
| This approach is useful when traffic patterns change over time and you want to keep partition rules aligned with the workload without rebuilding the table. | ||
|
|
||
| During repartitioning, writes may be briefly affected, so client-side retries are recommended. | ||
|
|
||
| ## When to repartition | ||
|
|
||
| Consider repartitioning when: | ||
|
|
||
| - Some regions are consistently hotter than others. | ||
| - Your data distribution changes and the current partition boundaries no longer fit. | ||
| - Some regions become very small and cold, and you want to reduce resource usage by merging them. | ||
| - You want to further split a partition to improve write concurrency or query performance. | ||
|
|
||
| In general, when partition rules no longer reflect the current data distribution well, it is worth considering repartitioning. | ||
|
|
||
| ## How to identify hot partitions | ||
|
|
||
| Before repartitioning, confirm which regions are hot. | ||
| Join region statistics with partition metadata to find the hottest rules: | ||
|
|
||
| ```sql | ||
| SELECT | ||
| t.table_name, | ||
| r.region_id, | ||
| r.region_number, | ||
| p.partition_name, | ||
| p.partition_description, | ||
| r.region_role, | ||
| r.written_bytes_since_open, | ||
| r.region_rows | ||
| FROM information_schema.region_statistics r | ||
| JOIN information_schema.tables t | ||
| ON r.table_id = t.table_id | ||
| JOIN information_schema.partitions p | ||
| ON p.table_schema = t.table_schema | ||
| AND p.table_name = t.table_name | ||
| AND p.greptime_partition_id = r.region_id | ||
| WHERE t.table_schema = 'public' | ||
| AND t.table_name = 'your_table' | ||
| ORDER BY r.written_bytes_since_open DESC | ||
| LIMIT 10; | ||
| ``` | ||
|
|
||
| If some regions have a much higher `written_bytes_since_open` value over time, that partition rule is usually a good candidate for splitting. | ||
|
|
||
| Also check whether the region peers are healthy so node issues are not mistaken for hotspots: | ||
|
|
||
| ```sql | ||
| SELECT | ||
| p.region_id, | ||
| p.peer_addr, | ||
| p.status, | ||
| p.down_seconds | ||
| FROM information_schema.region_peers p | ||
| WHERE p.table_schema = 'public' | ||
| AND p.table_name = 'your_table' | ||
| ORDER BY p.region_id, p.peer_addr; | ||
| ``` | ||
|
|
||
| If the nodes are healthy and the hotspot signal persists, you can move on to designing the repartition plan. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| :::warning Warning | ||
| This feature is only available in distributed clusters and requires: | ||
|
|
||
| - Using [shared object storage](/user-guide/deployments-administration/configuration.md#storage-options) (e.g., AWS S3) | ||
| - Using [GC](/user-guide/deployments-administration/manage-data/gc.md) on both metasrv and all datanodes | ||
|
|
||
| Otherwise, you can't perform repartitioning. | ||
| ::: | ||
|
|
||
| GreptimeDB supports repartitioning through repeated `SPLIT PARTITION` and `MERGE PARTITION` operations. | ||
| The most common cases are 1-to-2 splits and 2-to-1 merges. | ||
| More complex changes can also be done step by step. | ||
|
|
||
| Object storage stores region files, while GC reclaims old files only after references are released. | ||
| This helps prevent accidental deletion of data still in use. | ||
WenyXu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ### Through GreptimeDB Operator | ||
|
|
||
| If you deploy GreptimeDB with the GreptimeDB Operator, refer to [Common Helm Chart Configurations](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) for GC and object storage setup. | ||
WenyXu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Example | ||
|
|
||
| You can repartition a table by merging existing partitions and then splitting them with new rules. | ||
| The following example changes the `area` partition key from `South` to `North` for devices with `device_id < 100`: | ||
|
|
||
| ```sql | ||
| ALTER TABLE sensor_readings MERGE PARTITION ( | ||
| device_id < 100 AND area < 'South', | ||
| device_id < 100 AND area >= 'South' | ||
| ); | ||
|
|
||
| ALTER TABLE sensor_readings SPLIT PARTITION ( | ||
| device_id < 100 | ||
| ) INTO ( | ||
| device_id < 100 AND area < 'North', | ||
| device_id < 100 AND area >= 'North' | ||
| ); | ||
| ``` | ||
WenyXu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Further reading | ||
|
|
||
| For a step-by-step tutorial with more background and examples, see [How to Split and Merge Partitions Online in GreptimeDB](https://greptime.com/blogs/2026-03-19-greptimedb-repartition-guide). | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
126 changes: 126 additions & 0 deletions
126
...t-docs/current/user-guide/deployments-administration/manage-data/repartition.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,126 @@ | ||
| --- | ||
| keywords: [重分区, repartition, GC, 对象存储, GreptimeDB, Helm Chart] | ||
| description: 介绍 GreptimeDB 的重分区流程,以及在执行前需要准备的 GC 和对象存储配置,并提供 Helm Chart 的常见配置示例。 | ||
| --- | ||
|
|
||
| # 重分区 | ||
|
|
||
| 重分区可以在表创建后调整分区规则。GreptimeDB 通过 `ALTER TABLE` 的分区拆分与合并能力来完成重分区,详细语法请参考 [ALTER TABLE](/reference/sql/alter.md#分区拆分与合并)。 | ||
|
|
||
| 重分区仅支持分布式集群。 | ||
|
|
||
| ## 原理 | ||
|
|
||
| 重分区的核心是在线调整分区规则和 Region 路由,而不是把数据手工迁移到新的表。 | ||
| GreptimeDB 会通过更新每个 Region 的 manifest 文件引用切换到新的分区布局,从而让分区规则重新贴合当前的数据分布。 | ||
|
|
||
| 当业务流量模式发生变化时,这种方式可以帮助你继续保持分区规则与负载匹配,而不需要重建整张表。 | ||
|
|
||
| 重分区期间,写入可能会出现短暂波动,建议客户端开启重试机制。 | ||
|
|
||
| ## 什么时候需要重分区 | ||
|
|
||
| 如果出现下面这些情况,就可以考虑重分区: | ||
|
|
||
| - 某些 Region 的写入或查询明显更热,负载长期不均衡; | ||
| - 业务分布发生变化,原有分区边界已经不再合适; | ||
| - 部分 Region 变得很小且很冷,希望通过合并减少资源占用; | ||
| - 需要把某个分区进一步细分,以改善写入并发或查询性能。 | ||
|
|
||
| 通常来说,当分区规则已经不能很好地反映当前数据分布时,就值得考虑重分区。 | ||
|
|
||
| ## 如何发现热点分区 | ||
|
|
||
| 在做重分区之前,建议先确认哪些 Region 已经出现热点。 | ||
| 你可以先把 Region 级别的统计信息和分区规则关联起来,找出最热的规则: | ||
|
|
||
| ```sql | ||
| SELECT | ||
| t.table_name, | ||
| r.region_id, | ||
| r.region_number, | ||
| p.partition_name, | ||
| p.partition_description, | ||
| r.region_role, | ||
| r.written_bytes_since_open, | ||
| r.region_rows | ||
| FROM information_schema.region_statistics r | ||
| JOIN information_schema.tables t | ||
| ON r.table_id = t.table_id | ||
| JOIN information_schema.partitions p | ||
| ON p.table_schema = t.table_schema | ||
| AND p.table_name = t.table_name | ||
| AND p.greptime_partition_id = r.region_id | ||
| WHERE t.table_schema = 'public' | ||
| AND t.table_name = 'your_table' | ||
| ORDER BY r.written_bytes_since_open DESC | ||
| LIMIT 10; | ||
| ``` | ||
|
|
||
| 如果某些 Region 的 `written_bytes_since_open` 长期明显更高,通常就说明这条分区规则比较热,适合优先考虑拆分。 | ||
|
|
||
| 同时也建议检查 Region 对应的节点是否正常,避免把节点抖动误判为热点: | ||
|
|
||
| ```sql | ||
| SELECT | ||
| p.region_id, | ||
| p.peer_addr, | ||
| p.status, | ||
| p.down_seconds | ||
| FROM information_schema.region_peers p | ||
| WHERE p.table_schema = 'public' | ||
| AND p.table_name = 'your_table' | ||
| ORDER BY p.region_id, p.peer_addr; | ||
| ``` | ||
|
|
||
| 如果节点状态正常,而热点信号持续存在,就可以继续设计重分区方案。 | ||
|
|
||
| ## 前置条件 | ||
|
|
||
| :::warning 警告 | ||
| 该功能仅在 GreptimeDB 的分布式集群中可用,并且 | ||
|
|
||
| - 使用[共享对象存储](/user-guide/deployments-administration/configuration.md#storage-options)(例如 AWS S3) | ||
| - 在 metasrv 和所有 datanode 上启用 [GC](/user-guide/deployments-administration/manage-data/gc.md) | ||
|
|
||
| 否则你无法执行重分区。 | ||
| ::: | ||
|
|
||
| 当前开源版支持通过多次 `SPLIT PARTITION` / `MERGE PARTITION` 组合完成分区调整,最常见的场景是 1 拆 2 或 2 合 1。对于更复杂的分区变更,也可以通过逐步拆分和合并来完成。 | ||
|
|
||
| 对象存储用于保存 region 文件,GC 则负责在引用释放后再回收旧文件,避免重分区过程中误删仍在使用的数据。 | ||
|
|
||
| 如需了解详细配置,请参考: | ||
|
|
||
| - [GC](/user-guide/deployments-administration/manage-data/gc.md) | ||
| - [对象存储配置](/user-guide/deployments-administration/configuration.md#storage-options) | ||
|
|
||
| ### 通过 GreptimeDB Operator | ||
|
|
||
| 如果你使用 GreptimeDB Operator 部署,可以参考 [常见 Helm Chart 配置项](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) 快速完成 GC 和对象存储配置。 | ||
|
|
||
| ## 重分区示例 | ||
|
|
||
| 你可以通过先合并现有分区,然后用新规则拆分它们来修改分区规则。下面的示例展示了如何将 `device_id < 100` 的设备的分区键 `area` 从 `South` 更改为 `North`: | ||
|
|
||
| ```sql | ||
| ALTER TABLE sensor_readings MERGE PARTITION ( | ||
| device_id < 100 AND area < 'South', | ||
| device_id < 100 AND area >= 'South' | ||
| ); | ||
|
|
||
| ALTER TABLE sensor_readings SPLIT PARTITION ( | ||
| device_id < 100 | ||
| ) INTO ( | ||
| device_id < 100 AND area < 'North', | ||
| device_id < 100 AND area >= 'North' | ||
| ); | ||
| ``` | ||
|
|
||
| ## 延伸阅读 | ||
|
|
||
| 如果你希望查看包含更多背景说明和完整示例的教程,请参考博客文章:[如何在 GreptimeDB 中在线拆分与合并分区](https://greptime.cn/blogs/2026-03-19-greptimedb-repartition-guide)。 | ||
|
|
||
| :::caution 注意 | ||
| 重分区仅支持在分布式集群中执行。请确认 GC 与对象存储已正确配置后,再运行相关操作。 | ||
| ::: |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.