From 844a07ca02e7739d2126269ea60bfc33639989d8 Mon Sep 17 00:00:00 2001 From: ksmou <13410244574@163.com> Date: Mon, 6 Nov 2023 15:12:55 +0800 Subject: [PATCH 1/2] [DOCS] update run_clustering docs --- website/docs/procedures.md | 76 ++++++++++++++++++++++++++++++++------ 1 file changed, 65 insertions(+), 11 deletions(-) diff --git a/website/docs/procedures.md b/website/docs/procedures.md index ba2d1c069686d..252ac1c8de0e8 100644 --- a/website/docs/procedures.md +++ b/website/docs/procedures.md @@ -1233,13 +1233,12 @@ call show_fsview_latest(table => 'test_hudi_table'); ## Optimization table ### run_clustering - -Trigger clustering on a hoodie table. By using partition predicates, clustering table can be run +Trigger clustering on a hoodie table. By using partition predicates, clustering table can be run with specified partitions, and you can also specify the order columns to sort data. :::note -Newly clustering instant will be generated every call, and all pending clustering instants are executed. -When calling this procedure, one of parameters ``table`` and ``path`` must be specified at least. If both +Newly clustering instant will be generated every call, or some pending clustering instants are executed. +When calling this procedure, one of parameters ``table`` and ``path`` must be specified at least. If both parameters are given, ``table`` will take effect. ::: @@ -1247,16 +1246,30 @@ parameters are given, ``table`` will take effect. **Input** -| Parameter Name | Type | Required | Default Value | Description | -|----------------|--------|----------|---------------|-------------------------------| -| table | String | N | None | Name of table to be clustered | -| path | String | N | None | Path of table to be clustered | -| predicate | String | N | None | Predicate to filter partition | -| order | String | N | None | Order column split by `,` | +| Parameter Name | Type | Required | Default Value | Description | +|-------------------------|---------|----------|---------------|----------------------------------------------------------------| +| table | String | N | None | Name of table to be clustered | +| path | String | N | None | Path of table to be clustered | +| predicate | String | N | None | Predicate to filter partition | +| order | String | N | None | Order column split by `,` | +| show_involved_partition | Boolean | N | false | Show involved partition in the output | +| op | String | N | None | Operation type, `EXECUTE` or `SCHEDULE` | +| order_strategy | String | N | None | Records layout optimization, `linear/z-order/hilbert` | +| options | String | N | None | Customize hudi configs in the format "key1=value1,key2=value2` | +| instants | String | N | None | Specified instants by `,` | +| selected_partitions | String | N | None | Partitions to run clustering by `,` | +| limit | Int | N | None | Max number of plans to be executed | **Output** -Empty +The output as follows: + +| Parameter Name | Type | Required | Default Value | Description | +|---------------------|--------|----------|---------------|------------------------------------------| +| timestamp | String | N | None | Instant name | +| input_group_size | Int | N | None | The input group sizes for each plan | +| state | String | N | None | The instant final state | +| involved_partitions | String | N | * | Show involved partitions, default is `*` | **Example** @@ -1275,6 +1288,47 @@ Clustering test_hudi_table with table name, predicate and order column call run_clustering(table => 'test_hudi_table', predicate => 'ts <= 20220408L', order => 'ts'); ``` +Clustering test_hudi_table with table name, show_involved_partition +``` +call run_clustering(table => 'test_hudi_table', show_involved_partition => true); +``` + +Clustering test_hudi_table with table name, op +``` +call run_clustering(table => 'test_hudi_table', op => 'schedule'); +``` + +Clustering test_hudi_table with table name, order_strategy +``` +call run_clustering(table => 'test_hudi_table', order_strategy => 'z-order'); +``` + +Clustering test_hudi_table with table name, op, options +``` +call run_clustering(table => 'test_hudi_table', op => 'schedule', options => ' +hoodie.clustering.plan.strategy.target.file.max.bytes=1024*1024*1024, +hoodie.clustering.plan.strategy.max.bytes.per.group=2*1024*1024*1024'); +``` + +Clustering test_hudi_table with table name, op, instants +``` +call run_clustering(table => 'test_hudi_table', op => 'execute', instants => 'ts1,ts2'); +``` + +Clustering test_hudi_table with table name, op, selected_partitions +``` +call run_clustering(table => 'test_hudi_table', op => 'execute', selected_partitions => 'par1,par2'); +``` + +Clustering test_hudi_table with table name, op, limit +``` +call run_clustering(table => 'test_hudi_table', op => 'execute', limit => 10); +``` +:::note +Limit parameter is valid only when op is execute. + +::: + ### show_clustering Show pending clusterings on a hoodie table. From 7511ea3661550432a95cb247cdbde9cbe97e7859 Mon Sep 17 00:00:00 2001 From: ksmou <13410244574@163.com> Date: Mon, 6 Nov 2023 15:51:40 +0800 Subject: [PATCH 2/2] fix --- website/docs/procedures.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/website/docs/procedures.md b/website/docs/procedures.md index 252ac1c8de0e8..e30abf4e090e5 100644 --- a/website/docs/procedures.md +++ b/website/docs/procedures.md @@ -1233,7 +1233,8 @@ call show_fsview_latest(table => 'test_hudi_table'); ## Optimization table ### run_clustering -Trigger clustering on a hoodie table. By using partition predicates, clustering table can be run + +Trigger clustering on a hoodie table. By using partition predicates, clustering table can be run with specified partitions, and you can also specify the order columns to sort data. :::note