Skip to content

Commit 1bea51a

Browse files
committed
Detail the constraints in applying OPTIMIZE
Document the fact that partitions having only one data file without any delete files are disregarded for `OPTIMIZE` scenarios in case of the Iceberg connector.
1 parent 4c507fd commit 1bea51a

File tree

1 file changed

+42
-1
lines changed

1 file changed

+42
-1
lines changed

docs/src/main/sphinx/connector/iceberg.md

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -802,7 +802,48 @@ created before the partitioning change.
802802
The connector supports the following commands for use with {ref}`ALTER TABLE
803803
EXECUTE <alter-table-execute>`.
804804

805-
```{include} optimize.fragment
805+
##### optimize
806+
807+
The `optimize` command is used for rewriting the content of the specified
808+
table so that it is merged into fewer but larger files. If the table is
809+
partitioned, the data compaction acts separately on each partition selected for
810+
optimization. This operation improves read performance.
811+
812+
All files with a size below the optional `file_size_threshold` parameter
813+
(default value for the threshold is `100MB`) are merged in case any of the
814+
following conditions per partition / table (for unpartitioned tables)
815+
are met:
816+
817+
- more than one data file to merge is present
818+
- at least one data file, with delete files attached, is present
819+
820+
```sql
821+
ALTER TABLE test_table EXECUTE optimize
822+
```
823+
824+
The following statement merges files in a table that are
825+
under 128 megabytes in size:
826+
827+
```sql
828+
ALTER TABLE test_table EXECUTE optimize(file_size_threshold => '128MB')
829+
```
830+
831+
You can use a `WHERE` clause with the columns used to partition the table
832+
to filter which partitions are optimized:
833+
834+
```sql
835+
ALTER TABLE test_partitioned_table EXECUTE optimize
836+
WHERE partition_key = 1
837+
```
838+
839+
You can use a more complex `WHERE` clause to narrow down the scope of the
840+
`optimize` procedure. The following example casts the timestamp values to
841+
dates, and uses a comparison to only optimize partitions with data from the year
842+
2022 or newer:
843+
844+
```
845+
ALTER TABLE test_table EXECUTE optimize
846+
WHERE CAST(timestamp_tz AS DATE) > DATE '2021-12-31'
806847
```
807848

808849
Use a `WHERE` clause with [metadata columns](iceberg-metadata-columns) to filter

0 commit comments

Comments
 (0)