SigV4 Auth Support for Catalog Federation - Part 1: Entity Transformation System #1899

XJDKC · 2025-06-16T06:41:56Z

Milestones

This is Part 1 of the [Splitting] Initial SigV4 Auth Support for Catalog Federation. Upcoming parts will build on this system:

SigV4 Auth Support for Catalog Federation - Part 1: Entity Transformation System #1899
SigV4 Auth Support for Catalog Federation - Part 2: Connection Config Persistence #1900
SigV4 Auth Support for Catalog Federation - Part 3: Service Identity Info Injection
SigV4 Auth Support for Catalog Federation - Part 4: Connection Credential Manager

Introduction

This PR introduces a general-purpose Entity Transformation System to support future features like injecting service identity metadata into catalog entities for federated access (e.g. AWS Glue using SigV4). It provides a clean and extensible hook mechanism for enriching entities during specific lifecycle phases.

This system is intended to serve as the foundation for service identity injection and other dynamic transformations in subsequent parts of the SigV4 support work.

Key Components

EntityTransformer: A functional interface representing a single transformation to be applied to a PolarisBaseEntity.
EntityTransformationEngine: A runtime engine that applies a sequence of EntityTransformers to an entity based on the configured TransformationPoint.
TransformationPoint: Enum representing defined hook points in the entity lifecycle (e.g. CATALOG_PRE_PERSIST).
@AppliesTo annotation: A repeatable qualifier that marks a transformer as applicable to a given TransformationPoint.
EntityTransformationConfiguration: Optional config-driven registration of transformers by identifier (preserving ordering).

Example Configuration:

polaris.entity-mutation.mutators=no-op,catalog-connection-config

Flowchart

snazy · 2025-06-16T13:38:44Z

@XJDKC I'm a bit concerned about updating the representation of entities, because those represent the persisted state. Maybe an opportunity to abstract such concerns from persistence?

XJDKC · 2025-06-16T16:53:10Z

@XJDKC I'm a bit concerned about updating the representation of entities, because those represent the persisted state. Maybe an opportunity to abstract such concerns from persistence?

Hey @snazy, it’s definitely possible with this framework, we could define a new TransformationPoint like CATALOG_PRE_LOAD and dynamically apply a transformer to inject the service identity info when loading the Catalog Entity from the metastore.

I think different vendors may have different views on this. For example, it's quite reasonable to persist the service identity info, since it's very unlikely to change. We already persist service identity in the metastore for storage config, so it might make sense to follow a similar pattern here. Even if we choose not to persist the service identity directly on the Catalog Entity, we still need to persist the mapping between the entity and its assigned identity somewhere (e.g. In a ServiceIdentityRegistry I'm planning to add)

Changing the service identity for a Catalog Entity is risky, e.g., users might configure their IAM role’s trust policy to allowlist the specific service identity initially assigned to their federated Polaris catalog. If Polaris later switches to a different identity when accessing the Glue Catalog, that integration could break unexpectedly.

So perhaps the right approach is to allow the service vendor to decide whether the service identity should be persisted or injected dynamically. WDYT?

eric-maynard · 2025-06-16T18:10:54Z

For better or worse, we currently use entities both for the persisted state of an object and for an in-memory representation of the same. I'm still a fan of separating this out (something like the DAO proposal from months back) but that a big refactor that I'm not sure I'd want to block SigV4.

@snazy perhaps we can view this as a step in the right direction as it encourages us to treat entities as immutable and to use discrete "transformations" to generate new entities instead of trying to modify them in place or to construct lots of new entities in an adhoc way throughout the code base. Eventually, entities should be truly immutable and this may become the canonical way to transform an entity into a new one.

I am, however, a little concerned about how heavyweight this process could become... if I just want to update an entity's version today, I can call a method like setGrantRecordsVersion. After this change, it looks like the expectation is that I have a transformation engine, and then I construct and execute a transformation to accomplish this change?

XJDKC · 2025-06-16T20:37:49Z

For better or worse, we currently use entities both for the persisted state of an object and for an in-memory representation of the same. I'm still a fan of separating this out (something like the DAO proposal from months back) but that a big refactor that I'm not sure I'd want to block SigV4.

@snazy perhaps we can view this as a step in the right direction as it encourages us to treat entities as immutable and to use discrete "transformations" to generate new entities instead of trying to modify them in place or to construct lots of new entities in an adhoc way throughout the code base. Eventually, entities should be truly immutable and this may become the canonical way to transform an entity into a new one.

I am, however, a little concerned about how heavyweight this process could become... if I just want to update an entity's version today, I can call a method like setGrantRecordsVersion. After this change, it looks like the expectation is that I have a transformation engine, and then I construct and execute a transformation to accomplish this change?

Yes, the Entity Transformation System I'm adding won't update the entity in place, it creates a copy and applies the modifications there.

What I want to point out is that, this system isn't meant to replace the current set methods used in the persistence layer. Its purpose is to provide hooks for applying transformations at specific points in the entity lifecycle.

It's admittedly a bit heavyweight, since each transformer creates a new copy of the entity. But without a builder-style abstraction (like what we have for generated model classes), we can't apply the transformation efficiently.

Given that we may eventually refactor this area to introduce a proper DAO layer, and that we're currently only using the transformation system for a single SigV4-related transformer, it might make sense to proceed with the current approach and revisit the refactor later. WDYT?

XJDKC

Link some comments from @dennishuo in this PR as well:

.../src/main/java/org/apache/polaris/core/entity/transformation/EntityTransformationEngine.java

XJDKC · 2025-06-16T22:39:45Z

.../src/main/java/org/apache/polaris/core/entity/transformation/EntityTransformationEngine.java

+   *
+   * @param transformationPoint The point in the entity lifecycle where transformers should be
+   *     applied.
+   * @param entity The original Polaris entity to mutate.


It's too bad we don't have a way to express C++-style const correctness as we move towards immutable PolarisBaseEntity, but maybe we can at least mention here that the intention is that implementations should not mutate the input entity, but instead only create the new transformed entity.

XJDKC · 2025-06-16T22:40:16Z

...ava/org/apache/polaris/core/persistence/transactional/TransactionalMetaStoreManagerImpl.java

@@ -477,6 +479,14 @@ private void revokeGrantRecord(

    ms.persistStorageIntegrationIfNeededInCurrentTxn(callCtx, catalog, integration);

+    // CATALOG_PRE_PERSIST transformation point
+    EntityTransformationEngine entityTransformationEngine = callCtx.getEntityTransformationEngine();


For now, unfortunately we didn't quite manage to refactor all of the shared logic nryerrn AtomicOperationMetaStoreManager and TransactionalMetaStoreManagerImpl into a base class yet; the best we have is the prepareToPersistNewEntity hook defined in BaseMetaStoreManager.

This means either this logic needs to be added in both the AtomicOperationMetaStoreManager and TransactionalMetaStoreManagerImpl, or needs to be abstracted into BaseMetaStoreManager somehow while making sure both of those impls invoke the shared logic. persistNewEntity isn't specific to catalog-creation so you'd need an enum on entityType there if you just put this logic in there.

XJDKC · 2025-06-16T22:40:47Z

quarkus/defaults/src/main/resources/application.properties

@@ -188,6 +188,9 @@ polaris.oidc.principal-roles-mapper.type=default
 # polaris.storage.gcp.token=token
 # polaris.storage.gcp.lifespan=PT1H

+# Polaris Entity Transformation Config
+polaris.entity-transformation.transformers=no-op


On the one hand, it's useful to be able to have this kind of generalized transformer/mutator hook, but if the hook is also strongly coupled to particular features, we don't want to require separate configuration of the info injection here vs the core feature (e.g. sigv4) somewhere else.

Especially if we end up having feature configurations for the core feature, then it's important that the feature itself auto-configures itself to be functional. In a way then the classes that define the sigv4 functionality should be what injects the mutator/transformer.

Rulin: That's a good point, perhaps we can consider making some transformers tied to the feature parameters rather than the config itself if they are part of the core functionality of the feature.

snazy · 2025-06-18T10:49:08Z

Given that we may eventually refactor this area to introduce a proper DAO layer, and that we're currently only using the transformation system for a single SigV4-related transformer, it might make sense to proceed with the current approach and revisit the refactor later. WDYT?

Honestly, I think the approach is too heavyweight and too broad to target a very specific problem. It will be very hard for people to realize (later, by accident) that some entity instance does not reflect the "real" state. That will cause confusion and likely hard to debug issues.

I propose to separate the relevant parts into APIs/implementations that deal with the problem/use-case at hand, and definitely outside of the metastore - storage credential handling doesn't belong there IMO.

XJDKC · 2025-06-25T01:29:29Z

Given that we may eventually refactor this area to introduce a proper DAO layer, and that we're currently only using the transformation system for a single SigV4-related transformer, it might make sense to proceed with the current approach and revisit the refactor later. WDYT?

Honestly, I think the approach is too heavyweight and too broad to target a very specific problem. It will be very hard for people to realize (later, by accident) that some entity instance does not reflect the "real" state. That will cause confusion and likely hard to debug issues.

I propose to separate the relevant parts into APIs/implementations that deal with the problem/use-case at hand, and definitely outside of the metastore - storage credential handling doesn't belong there IMO.

Agree with the concern about embedding this logic in the persistence layer, moving it outside does make sense in general as the final entity is different from the entity we pass to the metastore. That said, I want to point out a few things specific to the storage config use case.

For storage config, the service identity isn't assigned by Polaris itself, it's provided by users via the CreateCatalogRequest. That's why I have this proposal to fix this issue. Also, the storage integration is currently created within the persistence layer:
TransactionalMetaStoreManagerImpl.java#L972-L978.

That's why I was leaning toward assigning the service identity in the persistence layer. For certain vendors, we may need to store and retrieve the mapping between realm and service identity info from the database. Doing this lookup in the persistence layer allows us to keep everything within the same transaction when persisting the catalog entity. Moving this from the persistence layer may end up causing inconsistency.

Adding some hook points is also very useful in general, e.g. it provides the ability to do some patches if there are some bugs and we end up persisting some wrong data in database.

We can also put the service identity injection logic here:
PolarisAdminService.java#L730-L755

WDYT? @eric-maynard @snazy @dennishuo @dimas-b

XJDKC requested review from adutra, ashvina, dennishuo, dimas-b, eric-maynard and jackye1995 as code owners June 16, 2025 06:41

github-project-automation bot added this to Basic Kanban Board Jun 16, 2025

XJDKC requested review from jbonofre, vvcephei, collado-mike, snazy, RussellSpitzer, takidau, MonkeyCanCode, flyrain, ebyhr, ajantha-bhat, HonahX, singhpk234, pingtimeout and gh-yzou as code owners June 16, 2025 06:41

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Jun 16, 2025

Add EntityTransformationEngine

6a896c1

XJDKC force-pushed the rxing-catalog-federation-sigv4-part-1 branch from f5218b3 to 6a896c1 Compare June 16, 2025 09:22

Ignore split-packages

eace122

XJDKC mentioned this pull request Jun 16, 2025

SigV4 Auth Support for Catalog Federation - Part 2: Connection Config Persistence #1900

Open

This comment was marked as duplicate.

Sign in to view

XJDKC commented Jun 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SigV4 Auth Support for Catalog Federation - Part 1: Entity Transformation System #1899

SigV4 Auth Support for Catalog Federation - Part 1: Entity Transformation System #1899

Uh oh!

XJDKC commented Jun 16, 2025 •

edited

Loading

Uh oh!

snazy commented Jun 16, 2025

Uh oh!

XJDKC commented Jun 16, 2025 •

edited

Loading

Uh oh!

eric-maynard commented Jun 16, 2025 •

edited

Loading

Uh oh!

XJDKC commented Jun 16, 2025

Uh oh!

This comment was marked as duplicate.

Uh oh!

XJDKC left a comment

Uh oh!

Uh oh!

XJDKC Jun 16, 2025

Uh oh!

XJDKC Jun 16, 2025

Uh oh!

XJDKC Jun 16, 2025

Uh oh!

XJDKC Jun 16, 2025

Uh oh!

snazy commented Jun 18, 2025

Uh oh!

XJDKC commented Jun 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

SigV4 Auth Support for Catalog Federation - Part 1: Entity Transformation System #1899

Are you sure you want to change the base?

SigV4 Auth Support for Catalog Federation - Part 1: Entity Transformation System #1899

Uh oh!

Conversation

XJDKC commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Milestones

Introduction

Key Components

Example Configuration:

Flowchart

Uh oh!

snazy commented Jun 16, 2025

Uh oh!

XJDKC commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-maynard commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XJDKC commented Jun 16, 2025

Uh oh!

This comment was marked as duplicate.

Uh oh!

XJDKC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

XJDKC Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

XJDKC Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

XJDKC Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

XJDKC Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

snazy commented Jun 18, 2025

Uh oh!

XJDKC commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

XJDKC commented Jun 16, 2025 •

edited

Loading

XJDKC commented Jun 16, 2025 •

edited

Loading

eric-maynard commented Jun 16, 2025 •

edited

Loading

XJDKC commented Jun 25, 2025 •

edited

Loading