Skip to content

Add table location validation and overlap checking for GenericTable create#4237

Closed
gh-yzou wants to merge 10 commits into
mainfrom
yzou-generic-table-location-validation
Closed

Add table location validation and overlap checking for GenericTable create#4237
gh-yzou wants to merge 10 commits into
mainfrom
yzou-generic-table-location-validation

Conversation

@gh-yzou

@gh-yzou gh-yzou commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

@dimas-b

dimas-b commented May 7, 2026

Copy link
Copy Markdown
Contributor

@gh-yzou : WDYT about rebasing this PR and opening for full review?

@gh-yzou

gh-yzou commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

@dimas-b sure, there is couple more things i need to fix for this PR, but i will open it for review once it is done

@gh-yzou gh-yzou force-pushed the yzou-generic-table-location-validation branch from bebf5c0 to 58095a1 Compare May 23, 2026 00:16
@gh-yzou gh-yzou force-pushed the yzou-generic-table-location-validation branch from f5d3c59 to 88958e7 Compare May 26, 2026 21:51
@gh-yzou gh-yzou changed the title Add table location validation for GenericTable Add table location validation and overlap checking for GenericTable create May 28, 2026
@gh-yzou gh-yzou marked this pull request as ready for review May 28, 2026 01:33
@gh-yzou gh-yzou requested review from dimas-b and flyrain May 28, 2026 01:33
@gh-yzou gh-yzou marked this pull request as draft May 28, 2026 01:34
@gh-yzou gh-yzou marked this pull request as ready for review May 28, 2026 02:12
* base-location property of each. The target entity's base location may not be a prefix or a
* suffix of any sibling entity's base location.
*/
public static <T extends PolarisEntity & LocationBasedEntity> void validateNoLocationOverlap(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those functions are pure copy from IcebergCatalog.java

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7-arg static method, can we remove the parameter realmConfig as polarisCallContext has it already?

@dimas-b dimas-b left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, @gh-yzou !

I think the concern with the "optimized" location check is pretty serious 🤔

public void testCreateTableWithInvalidLocationFails() {
String deltatb = getTableNameWithRandomSuffix();
String invalidLocation =
new File(System.getProperty("java.io.tmpdir"), "invalid_" + deltatb).getAbsolutePath();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.getProperty("java.io.tmpdir") looks a bit strange... why not use the @TempDir annotation supported by JUnit?

List<Object[]> joinResult =
sql(
"SELECT icebergtb.col1 as id, icebergtb.col2 as str_col, deltatb.col2 as int_col from icebergtb inner join deltatb on icebergtb.col1 = deltatb.col1 order by id");
assertThat(joinResult.get(0)).isEqualTo(new Object[] {1, "a", 3});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it matter for a "location validation" PR? 🤔

genericTableCatalog.createGenericTable(
TableIdentifier.of("ns", "t2"),
"format",
"s3://my-bucket/path/to/data/ns/t1/sub",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the reverse order (first create a Generic table, then an Iceberg table with and overlapping location)?

.setParentId(lastParent.getId())
.setBaseLocation(baseLocation)
.build();
CatalogUtils.validateNoLocationOverlap(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback from artificial helpers:

P1 Generic-table overlap checks are bypassed with OPTIMIZED_SIBLING_CHECK=true. The new generic path calls CatalogUtils.validateNoLocationOverlap
(runtime/service/src/main/java/org/apache/polaris/service/catalog/generic/PolarisGenericTableCatalog.java:128), and CatalogUtils returns immediately when
the optimized check reports no conflict (runtime/service/src/main/java/org/apache/polaris/service/catalog/common/CatalogUtils.java:168). But generic
table locations are stored in internal properties (polaris-core/src/main/java/org/apache/polaris/core/entity/table/GenericTableEntity.java:70), while
optimized location indexing/read paths only include Iceberg tables/views and namespaces (persistence/relational-jdbc/src/main/java/org/apache/polaris/
persistence/relational/jdbc/models/ModelEntity.java:389, polaris-core/src/main/java/org/apache/polaris/core/persistence/transactional/
TreeMapTransactionalPersistenceImpl.java:663). With optimized checking enabled, an existing generic table is invisible, so overlapping generic/generic or
Iceberg/generic locations can be accepted despite ALLOW_TABLE_LOCATION_OVERLAP=false.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to run the same tests with and without the "optimized" location check flag?

// Parent of existing location
// Generic table at existing iceberg table location
assertThat(
createGenericTable(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback from artificial helpers:

P3 The new generic-table overlap test helper builds an invalid API request: it sets name and base-location but omits required format (runtime/service/
src/test/java/org/apache/polaris/service/catalog/iceberg/IcebergOverlappingTableTest.java:94). The OpenAPI schema requires both name and format (spec/
polaris-catalog-apis/generic-tables-api.yaml:210), so these tests are not exercising a valid generic-table create request and could fail for the wrong
reason if request validation is tightened.

@flyrain flyrain left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Thanks for adding it, @gh-yzou !

realmConfig,
getMetaStoreManager(),
getCurrentPolarisContext(),
new ResolutionManifestFactoryImpl(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new ResolutionManifestFactoryImpl(diagnostics, callContext.getRealmContext(), resolverFactory) is rebuilt inline at all 3 validateNoLocationOverlap call sites (here + 737 + 1134) from fields that already exist. Hold the factory as a field (like PolarisGenericTableCatalog does) or extract a small helper. Low-risk cleanup, not blocking.

* base-location property of each. The target entity's base location may not be a prefix or a
* suffix of any sibling entity's base location.
*/
public static <T extends PolarisEntity & LocationBasedEntity> void validateNoLocationOverlap(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7-arg static method, can we remove the parameter realmConfig as polarisCallContext has it already?

"Failed to fetch resolved parent for TableIdentifier '%s'", tableIdentifier));
}

if (baseLocation != null && !baseLocation.isEmpty()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging that this is a behavior change: a generic table created with an explicit overlapping baseLocation that used to succeed now throws ForbiddenException. Should we call it out in CHANGELOG?

@snazy

snazy commented Jun 22, 2026

Copy link
Copy Markdown
Member

Heads up: I'm going to delete the branch soon, see dev-mailing-list discussion

@snazy snazy closed this Jun 24, 2026
@snazy snazy deleted the yzou-generic-table-location-validation branch June 24, 2026 08:31
@github-project-automation github-project-automation Bot moved this from PRs In Progress to Done in Basic Kanban Board Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants