Skip to content

HIVE-28930: Implement a metastore service that expires iceberg table snapshots periodically #5786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 8, 2025

Conversation

abstractdog
Copy link
Contributor

@abstractdog abstractdog commented Apr 25, 2025

What changes were proposed in this pull request?

This patch introduces a metastore task as a MetastoreTaskThread that can expire snapshots of iceberg tables periodically according to configuration: catalog name, database pattern, table pattern. The configuration was inspired by the partition management task.

Patch contents:

  1. IcebergHouseKeeperService + TestIcebergHouseKeeperService unit test
  2. added the new task class (ICEBERG_TABLE_SNAPSHOT_EXPIRY_SERVICE_CLASS ) to the default housekeeping threads
  3. MiniHS2 changes: withHouseKeepingThreads (for manual testing)
  4. changing to keepJdbcUri=true in a call, otherwise in remote metastore mode, 2 different derby databases are used, leading to exotic problems
  5. Generalized TableFetcher, which is a basically a table filter builder, originally in the PartitionManagementTask, completely reused + TestTableFetcher unit test

Why are the changes needed?

This service could act as a convenient helper to maintain iceberg tables, which otherwise need explicit hive ql statements by the user.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests added.

Manual testing is also possible, as the patch adds MiniHS2 capability and fixes to run metastore tasks in remote mode, example command:

mvn clean install -Dtest=StartMiniHS2Cluster -DminiHS2.clusterType=llap -DminiHS2.conf="target/testconf/llap/hive-site.xml"  -DminiHS2.run=true -DminiHS2.usePortsFromConf=true -pl itests/hive-unit -Pitests -pl itests/util -DminiHS2.clusterType=LOCALFS_ONLY -DminiHS2.isMetastoreRemote=true -DminiHS2.withHouseKeepingThreads=true

@abstractdog
Copy link
Contributor Author

@deniskuzZ : this is the reusable, general part of the iceberg table maintenance service (no query history bits can be found here), I would appreciate a review in the future once you have time for that

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM, minor comments

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, pending tests

Copy link

sonarqubecloud bot commented Jun 8, 2025

@abstractdog abstractdog merged commit 88dc983 into apache:master Jun 8, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants