Skip to content

Conversation

@difin
Copy link
Contributor

@difin difin commented Dec 1, 2025

What changes were proposed in this pull request?

Enabled Iceberg post-commit cleanup for metadata.json files.

When an Iceberg table has a table property write.metadata.delete-after-commit.enabled with the value of true,
the write.metadata.previous-versions-max table property (or the default value 100) controls how many past versions of metadata.json files should be kept.

This PR sets the following default values in Hive:
write.metadata.delete-after-commit.enabled=true
write.metadata.previous-versions-max=100

Why are the changes needed?

To support automatic maintenance for the metadata.json files, to prevent their indefinite growth.

Does this PR introduce any user-facing change?

No

How was this patch tested?

new hive-iceberg unit test; existing pre-commit tests.

@difin difin force-pushed the metadata-files-cleanup branch from 3f15df6 to 7b4f9fe Compare December 3, 2025 00:39
@difin difin force-pushed the metadata-files-cleanup branch from 7b4f9fe to 9fce561 Compare December 3, 2025 17:37
@difin difin force-pushed the metadata-files-cleanup branch from 9fce561 to 3799c55 Compare December 3, 2025 17:41
@difin difin marked this pull request as draft December 3, 2025 18:44
@difin difin changed the title HIVE-28723: Iceberg: Support metadata files clean-up [WIP] HIVE-28723: Iceberg: Support metadata files clean-up Dec 3, 2025
@difin difin force-pushed the metadata-files-cleanup branch from 3799c55 to a26e314 Compare December 3, 2025 21:10
@difin difin marked this pull request as ready for review December 3, 2025 21:11
@difin difin changed the title [WIP] HIVE-28723: Iceberg: Support metadata files clean-up HIVE-28723: Iceberg: Support metadata files clean-up Dec 3, 2025
@difin difin force-pushed the metadata-files-cleanup branch from a26e314 to a46eb89 Compare December 4, 2025 03:55
@difin difin force-pushed the metadata-files-cleanup branch from a46eb89 to b7ffa55 Compare December 4, 2025 03:57
</property>

<property>
<name>iceberg.catalog-default.write.metadata.delete-after-commit.enabled</name>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's complicated and not very friendly, can we drop the catalog-default ?

Copy link
Contributor Author

@difin difin Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we drop catalog-default. it will cause ambiguity in CatalogUtils.getCatalogProperties because:

  • Default catalog prefix is iceberg.catalog-default.
  • Named catalog prefix is iceberg.catalog.<NAME>.
  public static Map<String, String> getCatalogProperties(Configuration conf, String catalogName) {
    Map<String, String> catalogProperties = Maps.newHashMap();
    String keyPrefix = CATALOG_CONFIG_PREFIX + catalogName;
    conf.forEach(config -> {
      if (config.getKey().startsWith(CatalogUtils.CATALOG_DEFAULT_CONFIG_PREFIX)) {
        catalogProperties.putIfAbsent(
            config.getKey().substring(CatalogUtils.CATALOG_DEFAULT_CONFIG_PREFIX.length()),
            config.getValue());
      } else if (config.getKey().startsWith(keyPrefix)) {
        catalogProperties.put(
            config.getKey().substring(keyPrefix.length() + 1),
            config.getValue());
      }
    });

Copy link
Member

@deniskuzZ deniskuzZ Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why

public static Map<String, String> getCatalogProperties(Configuration conf, String catalogName) {
    Map<String, String> catalogProperties = Maps.newHashMap();
    String namedPrefix = CATALOG_CONFIG_PREFIX + catalogName + ".";

    conf.forEach(config -> {
        String key = config.getKey();

        if (key.startsWith("iceberg.")) {
            if (key.startsWith(namedPrefix)) {
                // Named catalog overrides default
                catalogProperties.put(
                        key.substring(namedPrefix.length()),
                        config.getValue());
            } else {
                // Default config for all catalogs
                catalogProperties.putIfAbsent(
                        key.substring("iceberg.".length()),
                        config.getValue());
            }
        }
    });

    return catalogProperties;
}

}

if (properties.get(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED).equals("false")) {
properties.remove(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to remove anything?

Copy link
Contributor Author

@difin difin Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these properties are always present, we need to update a lot of unit tests and q-tests whose output will be different. And if it is false, the effect is identical to the properties being unset on a table. That's why I thought to remove these properties when false. Do you think it's better to keep these settings when false and update tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think it's better to update tests than add tricks in code

OAuth2Properties.CREDENTIAL,
OAuth2Properties.OAUTH2_SERVER_URI,
AuthProperties.AUTH_TYPE,
CatalogProperties.URI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 :)

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants