Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 64 additions & 9 deletions website/docs/procedures.md
Original file line number Diff line number Diff line change
Expand Up @@ -1238,25 +1238,39 @@ Trigger clustering on a hoodie table. By using partition predicates, clustering
with specified partitions, and you can also specify the order columns to sort data.

:::note
Newly clustering instant will be generated every call, and all pending clustering instants are executed.
When calling this procedure, one of parameters ``table`` and ``path`` must be specified at least. If both
Newly clustering instant will be generated every call, or some pending clustering instants are executed.
When calling this procedure, one of parameters ``table`` and ``path`` must be specified at least. If both
parameters are given, ``table`` will take effect.

:::


**Input**

| Parameter Name | Type | Required | Default Value | Description |
|----------------|--------|----------|---------------|-------------------------------|
| table | String | N | None | Name of table to be clustered |
| path | String | N | None | Path of table to be clustered |
| predicate | String | N | None | Predicate to filter partition |
| order | String | N | None | Order column split by `,` |
| Parameter Name | Type | Required | Default Value | Description |
|-------------------------|---------|----------|---------------|----------------------------------------------------------------|
| table | String | N | None | Name of table to be clustered |
| path | String | N | None | Path of table to be clustered |
| predicate | String | N | None | Predicate to filter partition |
| order | String | N | None | Order column split by `,` |
| show_involved_partition | Boolean | N | false | Show involved partition in the output |
| op | String | N | None | Operation type, `EXECUTE` or `SCHEDULE` |
| order_strategy | String | N | None | Records layout optimization, `linear/z-order/hilbert` |
| options | String | N | None | Customize hudi configs in the format "key1=value1,key2=value2` |
| instants | String | N | None | Specified instants by `,` |
| selected_partitions | String | N | None | Partitions to run clustering by `,` |
| limit | Int | N | None | Max number of plans to be executed |

**Output**

Empty
The output as follows:

| Parameter Name | Type | Required | Default Value | Description |
|---------------------|--------|----------|---------------|------------------------------------------|
| timestamp | String | N | None | Instant name |
| input_group_size | Int | N | None | The input group sizes for each plan |
| state | String | N | None | The instant final state |
| involved_partitions | String | N | * | Show involved partitions, default is `*` |

**Example**

Expand All @@ -1275,6 +1289,47 @@ Clustering test_hudi_table with table name, predicate and order column
call run_clustering(table => 'test_hudi_table', predicate => 'ts <= 20220408L', order => 'ts');
```

Clustering test_hudi_table with table name, show_involved_partition
```
call run_clustering(table => 'test_hudi_table', show_involved_partition => true);
```

Clustering test_hudi_table with table name, op
```
call run_clustering(table => 'test_hudi_table', op => 'schedule');
```

Clustering test_hudi_table with table name, order_strategy
```
call run_clustering(table => 'test_hudi_table', order_strategy => 'z-order');
```

Clustering test_hudi_table with table name, op, options
```
call run_clustering(table => 'test_hudi_table', op => 'schedule', options => '
hoodie.clustering.plan.strategy.target.file.max.bytes=1024*1024*1024,
hoodie.clustering.plan.strategy.max.bytes.per.group=2*1024*1024*1024');
```

Clustering test_hudi_table with table name, op, instants
```
call run_clustering(table => 'test_hudi_table', op => 'execute', instants => 'ts1,ts2');
```

Clustering test_hudi_table with table name, op, selected_partitions
```
call run_clustering(table => 'test_hudi_table', op => 'execute', selected_partitions => 'par1,par2');
```

Clustering test_hudi_table with table name, op, limit
```
call run_clustering(table => 'test_hudi_table', op => 'execute', limit => 10);
```
:::note
Limit parameter is valid only when op is execute.

:::

### show_clustering

Show pending clusterings on a hoodie table.
Expand Down