-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-28723: Iceberg: Support metadata files clean-up #6218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
3f15df6 to
7b4f9fe
Compare
7b4f9fe to
9fce561
Compare
9fce561 to
3799c55
Compare
3799c55 to
a26e314
Compare
a26e314 to
a46eb89
Compare
a46eb89 to
b7ffa55
Compare
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/BaseHiveIcebergMetaHook.java
Outdated
Show resolved
Hide resolved
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/BaseHiveIcebergMetaHook.java
Outdated
Show resolved
Hide resolved
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/BaseHiveIcebergMetaHook.java
Outdated
Show resolved
Hide resolved
a2eb1c5 to
43c8d9c
Compare
f9dec5c to
589b963
Compare
| </property> | ||
|
|
||
| <property> | ||
| <name>iceberg.catalog-default.write.metadata.delete-after-commit.enabled</name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's complicated and not very friendly, can we drop the catalog-default ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we drop catalog-default. it will cause ambiguity in CatalogUtils.getCatalogProperties because:
- Default catalog prefix is
iceberg.catalog-default. - Named catalog prefix is
iceberg.catalog.<NAME>.
public static Map<String, String> getCatalogProperties(Configuration conf, String catalogName) {
Map<String, String> catalogProperties = Maps.newHashMap();
String keyPrefix = CATALOG_CONFIG_PREFIX + catalogName;
conf.forEach(config -> {
if (config.getKey().startsWith(CatalogUtils.CATALOG_DEFAULT_CONFIG_PREFIX)) {
catalogProperties.putIfAbsent(
config.getKey().substring(CatalogUtils.CATALOG_DEFAULT_CONFIG_PREFIX.length()),
config.getValue());
} else if (config.getKey().startsWith(keyPrefix)) {
catalogProperties.put(
config.getKey().substring(keyPrefix.length() + 1),
config.getValue());
}
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why
public static Map<String, String> getCatalogProperties(Configuration conf, String catalogName) {
Map<String, String> catalogProperties = Maps.newHashMap();
String namedPrefix = CATALOG_CONFIG_PREFIX + catalogName + ".";
conf.forEach(config -> {
String key = config.getKey();
if (key.startsWith("iceberg.")) {
if (key.startsWith(namedPrefix)) {
// Named catalog overrides default
catalogProperties.put(
key.substring(namedPrefix.length()),
config.getValue());
} else {
// Default config for all catalogs
catalogProperties.putIfAbsent(
key.substring("iceberg.".length()),
config.getValue());
}
}
});
return catalogProperties;
}
| } | ||
|
|
||
| if (properties.get(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED).equals("false")) { | ||
| properties.remove(TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to remove anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these properties are always present, we need to update a lot of unit tests and q-tests whose output will be different. And if it is false, the effect is identical to the properties being unset on a table. That's why I thought to remove these properties when false. Do you think it's better to keep these settings when false and update tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think it's better to update tests than add tricks in code
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/CatalogUtils.java
Outdated
Show resolved
Hide resolved
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/CatalogUtils.java
Outdated
Show resolved
Hide resolved
| OAuth2Properties.CREDENTIAL, | ||
| OAuth2Properties.OAUTH2_SERVER_URI, | ||
| AuthProperties.AUTH_TYPE, | ||
| CatalogProperties.URI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 :)
|



What changes were proposed in this pull request?
Enabled Iceberg post-commit cleanup for
metadata.jsonfiles.When an Iceberg table has a table property
write.metadata.delete-after-commit.enabledwith the value oftrue,the
write.metadata.previous-versions-maxtable property (or the default value100) controls how many past versions ofmetadata.jsonfiles should be kept.This PR sets the following default values in Hive:
write.metadata.delete-after-commit.enabled=truewrite.metadata.previous-versions-max=100Why are the changes needed?
To support automatic maintenance for the
metadata.jsonfiles, to prevent their indefinite growth.Does this PR introduce any user-facing change?
No
How was this patch tested?
new hive-iceberg unit test; existing pre-commit tests.