Skip to content

Detail the constraints in applying OPTIMIZE in Iceberg #25844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 41 additions & 1 deletion docs/src/main/sphinx/connector/iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -802,7 +802,47 @@ created before the partitioning change.
The connector supports the following commands for use with {ref}`ALTER TABLE
EXECUTE <alter-table-execute>`.

```{include} optimize.fragment
##### optimize

The `optimize` command is used for rewriting the content of the specified
table so that it is merged into fewer but larger files. If the table is
partitioned, the data compaction acts separately on each partition selected for
optimization. This operation improves read performance.

All files with a size below the optional `file_size_threshold` parameter
(default value for the threshold is `100MB`) are merged in case any of the
following conditions are met per partition:

- more than one data file to merge is present
- at least one data file, with delete files attached, is present

```sql
ALTER TABLE test_table EXECUTE optimize
```

The following statement merges files in a table that are
under 128 megabytes in size:

```sql
ALTER TABLE test_table EXECUTE optimize(file_size_threshold => '128MB')
```

You can use a `WHERE` clause with the columns used to partition the table
to filter which partitions are optimized:

```sql
ALTER TABLE test_partitioned_table EXECUTE optimize
WHERE partition_key = 1
```

You can use a more complex `WHERE` clause to narrow down the scope of the
`optimize` procedure. The following example casts the timestamp values to
dates, and uses a comparison to only optimize partitions with data from the year
2022 or newer:

```
ALTER TABLE test_table EXECUTE optimize
WHERE CAST(timestamp_tz AS DATE) > DATE '2021-12-31'
```

Use a `WHERE` clause with [metadata columns](iceberg-metadata-columns) to filter
Expand Down