Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

…ables (apache#59979)

- Issue Number:  apache#58199

This PR implements the `expire_snapshots` procedure for Iceberg tables,
following the Apache Iceberg Spark procedure specification. This
procedure removes old snapshots from Iceberg tables to free up storage
space and improve metadata performance.

- **File:**
`fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/action/IcebergExpireSnapshotsAction.java`
- Implemented `executeAction()` method to expire snapshots using
Iceberg's `ExpireSnapshots` API
- Added `getResultSchema()` method returning 6-column output matching
Spark's schema
- Added `parseTimestamp()` helper method to support ISO datetime and
milliseconds formats
  - Updated validation to allow `snapshot_ids` as a standalone parameter
- Fixed `retain_last` behavior: when specified alone, automatically sets
`expireOlderThan` to current time

| Parameter | Description |
|-----------|-------------|
| `older_than` | Timestamp before which snapshots will be removed (ISO
datetime or milliseconds) |
| `retain_last` | Number of ancestor snapshots to preserve |
| `snapshot_ids` | Comma-separated list of specific snapshot IDs to
expire |
| `max_concurrent_deletes` | Size of thread pool for delete operations |
| `clean_expired_metadata` | When true, cleans up unused partition specs
and schemas |

The procedure returns 6 columns:
- `deleted_data_files_count`
- `deleted_position_delete_files_count`
- `deleted_equality_delete_files_count`
- `deleted_manifest_files_count`
- `deleted_manifest_lists_count`
- `deleted_statistics_files_count`

- **File:**
`regression-test/suites/external_table_p0/iceberg/action/test_iceberg_execute_actions.groovy`
- Added functional tests for `expire_snapshots` with `retain_last`
parameter
  - Added validation tests for `snapshot_ids` parameter
  - Updated error message expectations

```sql
-- Expire snapshots, keeping only the last 2
ALTER TABLE catalog.db.table EXECUTE expire_snapshots("retain_last" = "2");

-- Expire snapshots older than a specific timestamp
ALTER TABLE catalog.db.table EXECUTE expire_snapshots("older_than" = "2024-01-01T00:00:00");

-- Expire specific snapshots by ID
ALTER TABLE catalog.db.table EXECUTE expire_snapshots("snapshot_ids" = "123456789,987654321");

-- Combine parameters
ALTER TABLE catalog.db.table EXECUTE expire_snapshots("older_than" = "2024-06-01T00:00:00", "retain_last" = "5");
```

(cherry picked from commit 6d9883e)
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 10, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants