Add table location validation and overlap checking for GenericTable create#4237
Add table location validation and overlap checking for GenericTable create#4237gh-yzou wants to merge 10 commits into
Conversation
|
@gh-yzou : WDYT about rebasing this PR and opening for full review? |
|
@dimas-b sure, there is couple more things i need to fix for this PR, but i will open it for review once it is done |
bebf5c0 to
58095a1
Compare
f5d3c59 to
88958e7
Compare
| * base-location property of each. The target entity's base location may not be a prefix or a | ||
| * suffix of any sibling entity's base location. | ||
| */ | ||
| public static <T extends PolarisEntity & LocationBasedEntity> void validateNoLocationOverlap( |
There was a problem hiding this comment.
those functions are pure copy from IcebergCatalog.java
There was a problem hiding this comment.
7-arg static method, can we remove the parameter realmConfig as polarisCallContext has it already?
| public void testCreateTableWithInvalidLocationFails() { | ||
| String deltatb = getTableNameWithRandomSuffix(); | ||
| String invalidLocation = | ||
| new File(System.getProperty("java.io.tmpdir"), "invalid_" + deltatb).getAbsolutePath(); |
There was a problem hiding this comment.
System.getProperty("java.io.tmpdir") looks a bit strange... why not use the @TempDir annotation supported by JUnit?
| List<Object[]> joinResult = | ||
| sql( | ||
| "SELECT icebergtb.col1 as id, icebergtb.col2 as str_col, deltatb.col2 as int_col from icebergtb inner join deltatb on icebergtb.col1 = deltatb.col1 order by id"); | ||
| assertThat(joinResult.get(0)).isEqualTo(new Object[] {1, "a", 3}); |
There was a problem hiding this comment.
Why does it matter for a "location validation" PR? 🤔
| genericTableCatalog.createGenericTable( | ||
| TableIdentifier.of("ns", "t2"), | ||
| "format", | ||
| "s3://my-bucket/path/to/data/ns/t1/sub", |
There was a problem hiding this comment.
What about the reverse order (first create a Generic table, then an Iceberg table with and overlapping location)?
| .setParentId(lastParent.getId()) | ||
| .setBaseLocation(baseLocation) | ||
| .build(); | ||
| CatalogUtils.validateNoLocationOverlap( |
There was a problem hiding this comment.
Feedback from artificial helpers:
P1 Generic-table overlap checks are bypassed with OPTIMIZED_SIBLING_CHECK=true. The new generic path calls CatalogUtils.validateNoLocationOverlap
(runtime/service/src/main/java/org/apache/polaris/service/catalog/generic/PolarisGenericTableCatalog.java:128), and CatalogUtils returns immediately when
the optimized check reports no conflict (runtime/service/src/main/java/org/apache/polaris/service/catalog/common/CatalogUtils.java:168). But generic
table locations are stored in internal properties (polaris-core/src/main/java/org/apache/polaris/core/entity/table/GenericTableEntity.java:70), while
optimized location indexing/read paths only include Iceberg tables/views and namespaces (persistence/relational-jdbc/src/main/java/org/apache/polaris/
persistence/relational/jdbc/models/ModelEntity.java:389, polaris-core/src/main/java/org/apache/polaris/core/persistence/transactional/
TreeMapTransactionalPersistenceImpl.java:663). With optimized checking enabled, an existing generic table is invisible, so overlapping generic/generic or
Iceberg/generic locations can be accepted despite ALLOW_TABLE_LOCATION_OVERLAP=false.
There was a problem hiding this comment.
Would it be possible to run the same tests with and without the "optimized" location check flag?
| // Parent of existing location | ||
| // Generic table at existing iceberg table location | ||
| assertThat( | ||
| createGenericTable( |
There was a problem hiding this comment.
Feedback from artificial helpers:
P3 The new generic-table overlap test helper builds an invalid API request: it sets name and base-location but omits required format (runtime/service/
src/test/java/org/apache/polaris/service/catalog/iceberg/IcebergOverlappingTableTest.java:94). The OpenAPI schema requires both name and format (spec/
polaris-catalog-apis/generic-tables-api.yaml:210), so these tests are not exercising a valid generic-table create request and could fail for the wrong
reason if request validation is tightened.
| realmConfig, | ||
| getMetaStoreManager(), | ||
| getCurrentPolarisContext(), | ||
| new ResolutionManifestFactoryImpl( |
There was a problem hiding this comment.
new ResolutionManifestFactoryImpl(diagnostics, callContext.getRealmContext(), resolverFactory) is rebuilt inline at all 3 validateNoLocationOverlap call sites (here + 737 + 1134) from fields that already exist. Hold the factory as a field (like PolarisGenericTableCatalog does) or extract a small helper. Low-risk cleanup, not blocking.
| * base-location property of each. The target entity's base location may not be a prefix or a | ||
| * suffix of any sibling entity's base location. | ||
| */ | ||
| public static <T extends PolarisEntity & LocationBasedEntity> void validateNoLocationOverlap( |
There was a problem hiding this comment.
7-arg static method, can we remove the parameter realmConfig as polarisCallContext has it already?
| "Failed to fetch resolved parent for TableIdentifier '%s'", tableIdentifier)); | ||
| } | ||
|
|
||
| if (baseLocation != null && !baseLocation.isEmpty()) { |
There was a problem hiding this comment.
Flagging that this is a behavior change: a generic table created with an explicit overlapping baseLocation that used to succeed now throws ForbiddenException. Should we call it out in CHANGELOG?
|
Heads up: I'm going to delete the branch soon, see dev-mailing-list discussion |
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)