diff --git a/website/docs/procedures.md b/website/docs/procedures.md index ba2d1c069686d..e30abf4e090e5 100644 --- a/website/docs/procedures.md +++ b/website/docs/procedures.md @@ -1238,8 +1238,8 @@ Trigger clustering on a hoodie table. By using partition predicates, clustering with specified partitions, and you can also specify the order columns to sort data. :::note -Newly clustering instant will be generated every call, and all pending clustering instants are executed. -When calling this procedure, one of parameters ``table`` and ``path`` must be specified at least. If both +Newly clustering instant will be generated every call, or some pending clustering instants are executed. +When calling this procedure, one of parameters ``table`` and ``path`` must be specified at least. If both parameters are given, ``table`` will take effect. ::: @@ -1247,16 +1247,30 @@ parameters are given, ``table`` will take effect. **Input** -| Parameter Name | Type | Required | Default Value | Description | -|----------------|--------|----------|---------------|-------------------------------| -| table | String | N | None | Name of table to be clustered | -| path | String | N | None | Path of table to be clustered | -| predicate | String | N | None | Predicate to filter partition | -| order | String | N | None | Order column split by `,` | +| Parameter Name | Type | Required | Default Value | Description | +|-------------------------|---------|----------|---------------|----------------------------------------------------------------| +| table | String | N | None | Name of table to be clustered | +| path | String | N | None | Path of table to be clustered | +| predicate | String | N | None | Predicate to filter partition | +| order | String | N | None | Order column split by `,` | +| show_involved_partition | Boolean | N | false | Show involved partition in the output | +| op | String | N | None | Operation type, `EXECUTE` or `SCHEDULE` | +| order_strategy | String | N | None | Records layout optimization, `linear/z-order/hilbert` | +| options | String | N | None | Customize hudi configs in the format "key1=value1,key2=value2` | +| instants | String | N | None | Specified instants by `,` | +| selected_partitions | String | N | None | Partitions to run clustering by `,` | +| limit | Int | N | None | Max number of plans to be executed | **Output** -Empty +The output as follows: + +| Parameter Name | Type | Required | Default Value | Description | +|---------------------|--------|----------|---------------|------------------------------------------| +| timestamp | String | N | None | Instant name | +| input_group_size | Int | N | None | The input group sizes for each plan | +| state | String | N | None | The instant final state | +| involved_partitions | String | N | * | Show involved partitions, default is `*` | **Example** @@ -1275,6 +1289,47 @@ Clustering test_hudi_table with table name, predicate and order column call run_clustering(table => 'test_hudi_table', predicate => 'ts <= 20220408L', order => 'ts'); ``` +Clustering test_hudi_table with table name, show_involved_partition +``` +call run_clustering(table => 'test_hudi_table', show_involved_partition => true); +``` + +Clustering test_hudi_table with table name, op +``` +call run_clustering(table => 'test_hudi_table', op => 'schedule'); +``` + +Clustering test_hudi_table with table name, order_strategy +``` +call run_clustering(table => 'test_hudi_table', order_strategy => 'z-order'); +``` + +Clustering test_hudi_table with table name, op, options +``` +call run_clustering(table => 'test_hudi_table', op => 'schedule', options => ' +hoodie.clustering.plan.strategy.target.file.max.bytes=1024*1024*1024, +hoodie.clustering.plan.strategy.max.bytes.per.group=2*1024*1024*1024'); +``` + +Clustering test_hudi_table with table name, op, instants +``` +call run_clustering(table => 'test_hudi_table', op => 'execute', instants => 'ts1,ts2'); +``` + +Clustering test_hudi_table with table name, op, selected_partitions +``` +call run_clustering(table => 'test_hudi_table', op => 'execute', selected_partitions => 'par1,par2'); +``` + +Clustering test_hudi_table with table name, op, limit +``` +call run_clustering(table => 'test_hudi_table', op => 'execute', limit => 10); +``` +:::note +Limit parameter is valid only when op is execute. + +::: + ### show_clustering Show pending clusterings on a hoodie table.