Skip to content

Fix entity name overlap handling and tests (#4168)#4190

Open
Subham-KRLX wants to merge 3 commits intoapache:mainfrom
Subham-KRLX:fix/entity-name-overlap-tests-4168
Open

Fix entity name overlap handling and tests (#4168)#4190
Subham-KRLX wants to merge 3 commits intoapache:mainfrom
Subham-KRLX:fix/entity-name-overlap-tests-4168

Conversation

@Subham-KRLX
Copy link
Copy Markdown
Contributor

Fixes #4168 by adding shared integration tests for namespace/table/view name collisions and enforcing consistent AlreadyExistsException behavior across backends. Verified with RestCatalogFileIT and NoSqlCatalogIT.

@github-project-automation github-project-automation Bot moved this to PRs In Progress in Basic Kanban Board Apr 14, 2026
@Subham-KRLX Subham-KRLX force-pushed the fix/entity-name-overlap-tests-4168 branch from 524576c to d8f5928 Compare April 14, 2026 02:19
@Subham-KRLX Subham-KRLX force-pushed the fix/entity-name-overlap-tests-4168 branch from d8f5928 to 985169f Compare April 14, 2026 02:47
Copy link
Copy Markdown
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, @Subham-KRLX !

Stream.of(PolarisEntitySubType.ICEBERG_TABLE, PolarisEntitySubType.ICEBERG_VIEW)
.filter(
subType ->
listTableLike(subType, parentNamespace, PageToken.readEverything())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid "list" operations in this case?

Would it be possible to "resolve" the name to be created for any potential previous type and then do the "already exists" check?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the list based check with direct name lookup for table/view under the parent namespace and we still return the same AlreadyExistsException when a collision is found.

}

private boolean namespaceWithSameNameExists(TableIdentifier identifier) {
return listNamespaces(identifier.namespace()).stream()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, can we avoid "list" operations in this case too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the list based check with a direct name lookup for namespaces using readEntityByName to avoid the 'list' operation here too Verified the fix with RestCatalogFileIT

List<PolarisEntity> catalogPath = resolvedEntities.getRawFullPath();
EntityResult lookupResult =
getMetaStoreManager()
.readEntityByName(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should do it when we resolve resolvedEntityView.

That is, we should add a resolution path with the new entity name for all possible entity types when the PolarisResolutionManifest is constructed and do the look up for all of those potential name matches together (via the Resolver). WDYT?

Is that feasible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That approach seems very efficient for batching lookups within the Resolver I am planning to refactor the implementation to pre load all potential conflicting entity types (NAMESPACE and TABLE_LIKE) as optional paths in the PolarisResolutionManifest during the CatalogHandler authorization phase. Would this strategy be consistent with what you have in mind? If so then IcebergCatalog can perform collision checks by directly calling resolvedEntityView.getResolvedPath() which eliminates redundant metadata store lookups and ensures all relevant names are resolved together by the Resolver as you suggested.

@flyrain
Copy link
Copy Markdown
Contributor

flyrain commented Apr 15, 2026

Thanks @Subham-KRLX for working on it. This a behavior change. It's worth to share it in the dev mailing list for more visibility. Would you mind share it there?

Current behavior in the main branch:

  • Table/View collision prevention, Already enforced on main

What's NEW in this PR (behavior change):

  • Namespace/Table collision prevention. NEW enforcement
  • Namespace/View collision prevention. NEW enforcement

@jbonofre
Copy link
Copy Markdown
Member

I agree with @flyrain here. I think a discussion on the dev@ mailing list is appropriate and welcome.

@Subham-KRLX
Copy link
Copy Markdown
Contributor Author

Thanks @Subham-KRLX for working on it. This a behavior change. It's worth to share it in the dev mailing list for more visibility. Would you mind share it there?

Current behavior in the main branch:

  • Table/View collision prevention, Already enforced on main

What's NEW in this PR (behavior change):

  • Namespace/Table collision prevention. NEW enforcement
  • Namespace/View collision prevention. NEW enforcement

@flyrain @dimas-b I sent the mailing list proposal regarding Introduce DataSourceResolver for multi-datasource support in JDBC but have not received a response yet should we resume the review here to maintain momentum? Please let me know if there is another communication channel or community meeting where I should present this instead.

@dimas-b
Copy link
Copy Markdown
Contributor

dimas-b commented Apr 20, 2026

@Subham-KRLX : I believe this change is valuable, but it is not related to your DataSourceResolver PR.

My understanding is that other reviewers prefer to have a separate dev ML discussion about this PR. I see merit in starting such a discussion because strictly enforcing name uniqueness, while being reasonable from my personal POV, can indeed affect the behaviour of previous deployments (on upgrade).

Would you mind starting this new discussion on dev?

@Subham-KRLX
Copy link
Copy Markdown
Contributor Author

@Subham-KRLX : I believe this change is valuable, but it is not related to your DataSourceResolver PR.

My understanding is that other reviewers prefer to have a separate dev ML discussion about this PR. I see merit in starting such a discussion because strictly enforcing name uniqueness, while being reasonable from my personal POV, can indeed affect the behaviour of previous deployments (on upgrade).

Would you mind starting this new discussion on dev?

@dimas-b would it be okay to start the discussion in the #dev channel on Slack? It might be easier for a quick back-and-forth on the technical approach and 'easier' to maintain momentum once we align there I can post a summary of the outcome to the mailing list for the official record Would that work for you?

@dimas-b
Copy link
Copy Markdown
Contributor

dimas-b commented Apr 23, 2026

@Subham-KRLX : slack is not "visible" from the ASF process perspective 🤷 Please start an email thread. If a particular topic becomes convoluted we can certainly use slack to resolve it faster, but ultimately all design decisions will have to be carried over to the dev email thread.

Alternatively, I can send the initial email if you prefer - please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add entity name overlap tests

4 participants