Skip to content

Commit 7f9f4ee

Browse files
findinpathraunaqmorarka
authored andcommitted
Detail the constraints in applying OPTIMIZE
Document the fact that partitions having only one data file without any delete files are disregarded for `OPTIMIZE` scenarios in case of the Iceberg connector.
1 parent c2eaffd commit 7f9f4ee

File tree

1 file changed

+41
-1
lines changed

1 file changed

+41
-1
lines changed

docs/src/main/sphinx/connector/iceberg.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -802,7 +802,47 @@ created before the partitioning change.
802802
The connector supports the following commands for use with {ref}`ALTER TABLE
803803
EXECUTE <alter-table-execute>`.
804804

805-
```{include} optimize.fragment
805+
##### optimize
806+
807+
The `optimize` command is used for rewriting the content of the specified
808+
table so that it is merged into fewer but larger files. If the table is
809+
partitioned, the data compaction acts separately on each partition selected for
810+
optimization. This operation improves read performance.
811+
812+
All files with a size below the optional `file_size_threshold` parameter
813+
(default value for the threshold is `100MB`) are merged in case any of the
814+
following conditions are met per partition:
815+
816+
- more than one data file to merge is present
817+
- at least one data file, with delete files attached, is present
818+
819+
```sql
820+
ALTER TABLE test_table EXECUTE optimize
821+
```
822+
823+
The following statement merges files in a table that are
824+
under 128 megabytes in size:
825+
826+
```sql
827+
ALTER TABLE test_table EXECUTE optimize(file_size_threshold => '128MB')
828+
```
829+
830+
You can use a `WHERE` clause with the columns used to partition the table
831+
to filter which partitions are optimized:
832+
833+
```sql
834+
ALTER TABLE test_partitioned_table EXECUTE optimize
835+
WHERE partition_key = 1
836+
```
837+
838+
You can use a more complex `WHERE` clause to narrow down the scope of the
839+
`optimize` procedure. The following example casts the timestamp values to
840+
dates, and uses a comparison to only optimize partitions with data from the year
841+
2022 or newer:
842+
843+
```
844+
ALTER TABLE test_table EXECUTE optimize
845+
WHERE CAST(timestamp_tz AS DATE) > DATE '2021-12-31'
806846
```
807847

808848
Use a `WHERE` clause with [metadata columns](iceberg-metadata-columns) to filter

0 commit comments

Comments
 (0)