doc: update loader doc by kenssa4eedfd · Pull Request #445 · apache/hugegraph-doc

kenssa4eedfd · 2026-01-22T09:43:58Z

https://github.com/apache/incubator-hugegraph-toolchain/pull/704

Purpose of the PR

updated the parameter descriptions in the loader documentation.

kenssa4eedfd · 2026-01-22T09:45:17Z

https://github.com/apache/incubator-hugegraph-toolchain/pull/704

Copilot

Pull request overview

This PR updates the parameter descriptions in the HugeGraph Loader documentation for both English and Chinese versions. The changes expand the parameter table to include many additional configuration options that were previously undocumented.

Changes:

Removed the -gs short form for the --graphspace parameter
Updated the --help parameter to show -help as an alternative
Added 25 new parameters covering features like parallel processing, CDC integration, HBase configuration, and graph creation options

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
content/en/docs/quickstart/toolchain/hugegraph-loader.md	Expanded parameter documentation table with 25 new parameters and formatting adjustments
content/cn/docs/quickstart/toolchain/hugegraph-loader.md	Chinese translation of the same parameter documentation updates

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-22T09:53:42Z

content/cn/docs/quickstart/toolchain/hugegraph-loader.md

+| `--print-progress`                      | true        |      | 是否在控制台实时打印导入条数                                                    |
+| `--dry-run`                             | false       |      | 打开该模式，只解析不导入，通常用于测试                                               |
+| `--help` 或 `-help`                      | false       |      | 打印帮助信息                                                            |                                                  
+| `--parallel-count` 或 `--parallel-count` | max(2,CPUS) |      | 并行读取数据文件最大线程数                                         |


The parameter name is duplicated. The entry shows --parallel-count 或 --parallel-count where both alternatives are identical. This should either show different parameter names (like --parallel-count or -pc) or just show a single parameter name without the 或 (or) connector.

Suggested change

| `--parallel-count` 或 `--parallel-count` | max(2,CPUS) | | 并行读取数据文件最大线程数 |

| `--parallel-count` | max(2,CPUS) | | 并行读取数据文件最大线程数 |

Copilot · 2026-01-22T09:53:43Z

content/en/docs/quickstart/toolchain/hugegraph-loader.md

+| `--backend`                             | hstore      |      | The backend store type when creating graph if not exists           |
+| `--serializer`                          | binary      |      | The serializer type when creating graph if not exists              |
+| `--scheduler-type`                      | distributed |      | The task scheduler type when creating graph if not exists          |
+| `--batch-failure-fallback`              | true        |      | Whether to fallback to single insert when batch insert fails       |##### 3.4.2 Breakpoint Continuation Mode


Line 861 contains text that appears to be a duplicate of the following section header. The line ends with "Whether to fallback to single insert when batch insert fails" followed immediately by "##### 3.4.2 Breakpoint Continuation Mode" on the same line. The section header should be on line 862 by itself, and line 861 should end properly with just the parameter description.

Suggested change

| `--batch-failure-fallback` | true | | Whether to fallback to single insert when batch insert fails |##### 3.4.2 Breakpoint Continuation Mode

| `--batch-failure-fallback` | true | | Whether to fallback to single insert when batch insert fails |

Copilot · 2026-01-22T09:53:43Z

content/en/docs/quickstart/toolchain/hugegraph-loader.md

+|-----------------------------------------|---------------|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `-f` or `--file`                        |               | Y               | Path to configure script                                                                                                                                                                  |
+| `-g` or `--graph`                       |               | Y               | Graph name                                                                                                                                                                                |
+| `--graphspace`                          | DEFAULT       |                 | Graph space name                                                                                                                                                                          |


The parameter --graphspace is listed without the -gs short form alternative, but this is inconsistent with the PR description which states this is an update to parameter descriptions. The original version had -gs or --graphspace. If -gs is being removed intentionally, this should be clearly documented. If it's an error, the short form should be restored.

Suggested change

| `--graphspace` | DEFAULT | | Graph space name |

| `-gs` or `--graphspace` | DEFAULT | | Graph space name |

Copilot · 2026-01-22T09:53:43Z

content/cn/docs/quickstart/toolchain/hugegraph-loader.md

+|-----------------------------------------|-------------|------|-------------------------------------------------------------------|
+| `-f` 或 `--file`                         |             | Y    | 配置脚本的路径                                                           |
+| `-g` 或 `--graph`                        |             | Y    | 图名称                                                               |
+| `--graphspace`                          | DEFAULT     |      | 图空间                                                               |


The parameter --graphspace is listed without the -gs short form alternative, but the original version had -gs 或 --graphspace. This is inconsistent with other parameters that maintain their short forms. If -gs is being removed intentionally, this should be clearly documented. If it's an error, the short form should be restored to maintain consistency with the English version.

Suggested change

| `--graphspace` | DEFAULT | | 图空间 |

| `-gs` 或 `--graphspace` | DEFAULT | | 图空间 |

Copilot · 2026-01-22T09:53:43Z

content/en/docs/quickstart/toolchain/hugegraph-loader.md

+| `--check-vertex`                        | false         |                 | Whether to check if the vertices connected by the edge exist when inserting the edge                                                                                                      |
+| `--print-progress`                      | true          |                 | Whether to print the number of imported items in real time on the console                                                                                                                 |
+| `--dry-run`                             | false         |                 | Enable this mode to only parse data without importing; usually used for testing                                                                                                           |
+| `--help` or `-help`                     | false         |                 | Print help information                                                                                                     |


The --help parameter now shows --help or -help as alternatives. However, the standard convention for help flags is typically --help or -h. The short form -help is unusual and may not work as expected in many CLI frameworks. This should be verified against the actual implementation, and if -h is the correct short form, it should be used instead of -help.

Suggested change

| `--help` or `-help` | false | | Print help information |

| `--help` or `-h` | false | | Print help information |

Copilot · 2026-01-22T09:53:44Z

content/cn/docs/quickstart/toolchain/hugegraph-loader.md

+| `--check-vertex`                        | false       |      | 插入边时是否检查边所连接的顶点是否存在                                               |
+| `--print-progress`                      | true        |      | 是否在控制台实时打印导入条数                                                    |
+| `--dry-run`                             | false       |      | 打开该模式，只解析不导入，通常用于测试                                               |
+| `--help` 或 `-help`                      | false       |      | 打印帮助信息                                                            |                                                  


The --help parameter now shows --help 或 -help as alternatives. However, the standard convention for help flags is typically --help or -h. The short form -help is unusual and may not work as expected in many CLI frameworks. This should be verified against the actual implementation, and if -h is the correct short form, it should be used instead of -help.

Suggested change

| `--help` 或 `-help` | false | | 打印帮助信息 |

| `--help` 或 `-h` | false | | 打印帮助信息 |

Copilot · 2026-01-22T09:53:44Z

content/en/docs/quickstart/toolchain/hugegraph-loader.md

+| `--print-progress`                      | true          |                 | Whether to print the number of imported items in real time on the console                                                                                                                 |
+| `--dry-run`                             | false         |                 | Enable this mode to only parse data without importing; usually used for testing                                                                                                           |
+| `--help` or `-help`                     | false         |                 | Print help information                                                                                                     |
+| `--parallel-count` 或 `--parallel-count` | max(2,CPUS) |      | Parallel read pipelines for data files                             |


The parameter name appears to be duplicated in both Chinese and English. The entry shows --parallel-count 或 --parallel-count (or in Chinese). This should likely be a different alternative parameter name or just a single parameter name without alternatives.

Suggested change

| `--parallel-count` 或 `--parallel-count` | max(2,CPUS) | | Parallel read pipelines for data files |

| `--parallel-count` | max(2,CPUS) | | Parallel read pipelines for data files |

imbajin · 2026-01-22T09:53:49Z

content/cn/docs/quickstart/toolchain/hugegraph-loader.md

+| `--parallel-count` 或 `--parallel-count` | max(2,CPUS) |      | 并行读取数据文件最大线程数                                         |
+| `--start-file`                          | 0           |      | 用于部分（分片）导入的起始文件索引                                 |
+| `--end-file`                            | -1          |      | 用于部分导入的截止文件索引                                         |
+| `--scatter-sources`                     | false       |      | 分散（并行）读取多个数据源以优化 I/O 性能                           |


⚠️ 参数描述不够清晰 - 易用性问题

--scatter-sources 的描述"分散(并行)读取多个数据源以优化 I/O 性能"对于新用户来说不够直观。

建议补充:

什么场景下应该启用此参数?

与 --parallel-count 的区别和配合使用方式

对性能的具体影响(例如:"适用于多个大数据文件同时导入的场景")

imbajin · 2026-01-22T09:53:53Z

content/cn/docs/quickstart/toolchain/hugegraph-loader.md

+| `--use-prefilter`                       | false       |      | 是否预先过滤顶点                                                   |
+| `--short-id`                            |             |      | 将自定义 ID 映射为更短的 ID                                        |
+| `--vertex-edge-limit`                   | -1L         |      | 单个顶点的最大边数限制                                             |
+| `--sink-type`                           | true        |      | 数据接收端（Sink）存储类型开关                                     |


⚠️ 参数描述模糊 - 易用性问题

--sink-type 的默认值为 true,但描述"数据接收端(Sink)存储类型开关"让用户不清楚:

true 代表什么存储类型?

false 又代表什么?

可选的存储类型有哪些?

建议改进描述,例如:"是否使用 [存储类型A],false 时使用 [存储类型B]"或列举所有可选存储类型。

imbajin · 2026-01-22T09:53:56Z

content/cn/docs/quickstart/toolchain/hugegraph-loader.md

+| `--short-id`                            |             |      | 将自定义 ID 映射为更短的 ID                                        |
+| `--vertex-edge-limit`                   | -1L         |      | 单个顶点的最大边数限制                                             |
+| `--sink-type`                           | true        |      | 数据接收端（Sink）存储类型开关                                     |
+| `--vertex-partitions`                   | 64          |      | HBase 顶点表的预分区数量                                           |


⚠️ 缺少使用场景说明 - 易用性问题

新增的 HBase 相关参数(第 863-869 行):

--vertex-partitions

--edge-partitions

--vertex-table-name

--edge-table-name

--hbase-zk-quorum

--hbase-zk-port

--hbase-zk-parent

建议在参数表格前或"3.4.1 参数说明"章节开头补充:

这些参数适用于 HBase 后端存储场景

需要配合 --backend 参数使用

或者增加一个独立小节专门说明不同后端存储的参数配置

imbajin · 2026-01-22T09:53:59Z

content/cn/docs/quickstart/toolchain/hugegraph-loader.md

-| `--dry-run`               | false     |      | 打开该模式，只解析不导入，通常用于测试                                               |
-| `--help`                  | false     |      | 打印帮助信息                                                            |                                                  
-
+| 参数                                      | 默认值         | 是否必传 | 描述信息                                                              |


🧹 参数分组建议 - 提升易用性

当前参数表格包含 49 个参数,对新用户来说信息量过大。建议按功能分组,例如:

基础参数 (必填/常用):

-f, -g, -s, -h, -p 等

性能调优参数:

--batch-insert-threads, --max-conn, --batch-size, --parallel-count 等

高级特性参数:

--incremental-mode, --failure-mode, --dry-run 等

HBase 后端参数:

--vertex-partitions, --hbase-zk-quorum 等

CDC 相关参数:

--cdc-flush-interval, --cdc-sink-parallelism 等

这样可以帮助用户快速找到需要的参数。

imbajin · 2026-01-22T09:54:04Z

content/cn/docs/quickstart/toolchain/hugegraph-loader.md

+| `--max-read-lines`                      | -1L         |      | 最大读取行数限制；一旦达到此行数，导入任务将停止                   |
+| `--test-mode`                           | false       |      | 是否开启测试模式                                                   |
+| `--use-prefilter`                       | false       |      | 是否预先过滤顶点                                                   |
+| `--short-id`                            |             |      | 将自定义 ID 映射为更短的 ID                                        |


⚠️ 参数默认值与类型不匹配

--short-id 参数的默认值为空,但描述是"将自定义 ID 映射为更短的 ID",这让用户不清楚:

该参数的数据类型是什么?(布尔值?字符串?)

如何使用此参数?(需要传入映射规则?还是只是开关?)

建议:

补充参数类型说明

提供使用示例,例如:--short-id=true 或 --short-id=mapping-file.json

imbajin · 2026-01-22T09:54:07Z

content/cn/docs/quickstart/toolchain/hugegraph-loader.md

+| `--backend`                             | hstore      |      | 自动创建图（如果不存在）时的后端存储类型                           |
+| `--serializer`                          | binary      |      | 自动创建图（如果不存在）时的序列化器类型                           |
+| `--scheduler-type`                      | distributed |      | 自动创建图（如果不存在）时的任务调度器类型                         |
+| `--batch-failure-fallback`              | true        |      | 批量插入失败时是否回退至单条插入模式                               |


⚠️ 缺少参数使用示例 - 易用性问题

新增的 24 个参数都没有实际使用示例。建议在文档的"3.4.4 执行命令"或"4 完整示例"章节补充:

高性能导入配置示例(包含 --parallel-count, --scatter-sources 等)

HBase 后端导入示例

CDC 流式导入示例

分片导入示例(--start-file, --end-file)

这样可以显著提升文档的实用性。

imbajin · 2026-01-22T09:55:03Z

PR 审查意见 - Loader 文档易用性

你好!我已经审查了这个 PR 的 Loader 文档更新。总体来说,新增了 24 个参数的文档,信息量很大。从易用性角度,我发现了以下需要改进的问题:

‼️ 严重问题 (Critical)

1. `--graphspace` 参数丢失短参数形式

位置: content/cn/docs/quickstart/toolchain/hugegraph-loader.md:54

原来是 -gs 或 --graphspace,现在变成了只有 --graphspace。这会导致:

与实际命令行参数不匹配
用户无法使用短参数 -gs
与其他参数格式不一致

建议: 恢复为 -gs 或 --graphspace

2. `--parallel-count` 参数名重复

位置: content/cn/docs/quickstart/toolchain/hugegraph-loader.md:850

表格中写的是 --parallel-count 或 --parallel-count,参数名重复了。需要确认:

是否应该是 --parallel-count 或 -pc?
还是该参数本身就没有短参数形式?

3. 英文版包含中文字符

位置: content/en/docs/quickstart/toolchain/hugegraph-loader.md:207

英文文档中出现了中文 或 字符:--parallel-count 或 --parallel-count

应修改为:--parallel-count or --parallel-count 或删除重复部分。

⚠️ 重要问题 (Important)

4. 新增参数缺少短参数形式说明

新增的 24 个参数都没有短参数形式(如 -f, -g 等),导致表格第一列格式不统一。

建议:

如果确实没有短参数,在描述中说明
或统一标注"(无短参数)"

5. `--sink-type` 参数描述模糊

位置: content/cn/docs/quickstart/toolchain/hugegraph-loader.md:862

默认值为 true,但描述"数据接收端(Sink)存储类型开关"让用户不清楚:

true 代表什么存储类型?
false 代表什么?

建议: 明确说明可选的存储类型,例如:"是否使用 HBase 存储,false 时使用默认后端"

6. HBase 相关参数缺少使用场景说明

新增了 7 个 HBase 相关参数(--vertex-partitions, --hbase-zk-quorum 等),但没有说明:

这些参数仅适用于 HBase 后端
需要配合 --backend=hbase 使用

建议: 在 3.4.1 章节开头或增加子章节说明不同后端的参数配置

7. `--short-id` 参数类型不明确

位置: content/cn/docs/quickstart/toolchain/hugegraph-loader.md:860

默认值为空,但不清楚:

参数类型是什么?(布尔值/字符串/文件路径?)
如何使用?

建议: 补充参数类型和使用示例

🧹 优化建议 (Minor)

8. 参数数量过多,建议分组

当前表格包含 49 个参数,信息量大,建议按功能分组:

基础参数: -f, -g, -s 等
性能调优: --batch-insert-threads, --parallel-count 等
高级特性: --incremental-mode, --dry-run 等
HBase 后端: --vertex-partitions 等
CDC 相关: --cdc-flush-interval 等

9. 缺少实际使用示例

新增的 24 个参数都没有实际使用示例。

建议在 3.4.4 或第 4 章补充:

高性能导入示例(含 --parallel-count, --scatter-sources)
HBase 后端导入示例
CDC 流式导入示例
分片导入示例(--start-file, --end-file)

10. `--scatter-sources` 描述可以更详细

"分散(并行)读取多个数据源以优化 I/O 性能"对新用户来说不够直观。

建议补充:

适用场景(如:"适用于同时导入多个大文件")
与 --parallel-count 的区别和配合方式

总体评价:这次更新补充了大量参数文档,非常有价值!但从易用性角度看,建议:

修复 3 个严重问题(--graphspace, --parallel-count 重复,英文版中文字符)
完善参数描述,特别是 HBase/CDC 相关参数的使用场景
增加实际使用示例,帮助用户快速上手

期待更新!👍

kenssa4eedfd added 2 commits January 22, 2026 15:41

docs: update loader doc

c0e813e

docs: update loader doc

5aa1263

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 22, 2026

imbajin requested a review from Copilot January 22, 2026 09:50

Copilot started reviewing on behalf of imbajin January 22, 2026 09:51 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

imbajin reviewed Jan 22, 2026

View reviewed changes

docs: update loader doc

4c2b76b

imbajin approved these changes Jan 22, 2026

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 22, 2026

imbajin merged commit 2832781 into apache:master Jan 22, 2026
1 check passed

github-actions bot pushed a commit that referenced this pull request Jan 22, 2026

doc: update loader options (#445) 2832781

881dc66

	\| `--parallel-count` 或 `--parallel-count` \| max(2,CPUS) \| \| 并行读取数据文件最大线程数 \|
	\| `--parallel-count` \| max(2,CPUS) \| \| 并行读取数据文件最大线程数 \|

	\| `--batch-failure-fallback` \| true \| \| Whether to fallback to single insert when batch insert fails \|##### 3.4.2 Breakpoint Continuation Mode
	\| `--batch-failure-fallback` \| true \| \| Whether to fallback to single insert when batch insert fails \|

	\| `--graphspace` \| DEFAULT \| \| Graph space name \|
	\| `-gs` or `--graphspace` \| DEFAULT \| \| Graph space name \|

	\| `--graphspace` \| DEFAULT \| \| 图空间 \|
	\| `-gs` 或 `--graphspace` \| DEFAULT \| \| 图空间 \|

	\| `--help` or `-help` \| false \| \| Print help information \|
	\| `--help` or `-h` \| false \| \| Print help information \|

	\| `--help` 或 `-help` \| false \| \| 打印帮助信息 \|
	\| `--help` 或 `-h` \| false \| \| 打印帮助信息 \|

	\| `--parallel-count` 或 `--parallel-count` \| max(2,CPUS) \| \| Parallel read pipelines for data files \|
	\| `--parallel-count` \| max(2,CPUS) \| \| Parallel read pipelines for data files \|

Conversation

kenssa4eedfd commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of the PR

Uh oh!

kenssa4eedfd commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

imbajin Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

imbajin Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

imbajin Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

imbajin Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

imbajin Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

imbajin Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

imbajin commented Jan 22, 2026

PR 审查意见 - Loader 文档易用性

‼️ 严重问题 (Critical)

1. --graphspace 参数丢失短参数形式

2. --parallel-count 参数名重复

3. 英文版包含中文字符

⚠️ 重要问题 (Important)

4. 新增参数缺少短参数形式说明

5. --sink-type 参数描述模糊

6. HBase 相关参数缺少使用场景说明

7. --short-id 参数类型不明确

🧹 优化建议 (Minor)

8. 参数数量过多,建议分组

9. 缺少实际使用示例

10. --scatter-sources 描述可以更详细

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kenssa4eedfd commented Jan 22, 2026 •

edited

Loading

1. `--graphspace` 参数丢失短参数形式

2. `--parallel-count` 参数名重复

5. `--sink-type` 参数描述模糊

7. `--short-id` 参数类型不明确

10. `--scatter-sources` 描述可以更详细