-
Notifications
You must be signed in to change notification settings - Fork 265
SigV4 Auth Support for Catalog Federation - Part 1: Entity Transformation System #1899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f5218b3
to
6a896c1
Compare
@XJDKC I'm a bit concerned about updating the representation of entities, because those represent the persisted state. Maybe an opportunity to abstract such concerns from persistence? |
Hey @snazy, it’s definitely possible with this framework, we could define a new I think different vendors may have different views on this. For example, it's quite reasonable to persist the service identity info, since it's very unlikely to change. We already persist service identity in the metastore for storage config, so it might make sense to follow a similar pattern here. Even if we choose not to persist the service identity directly on the Catalog Entity, we still need to persist the mapping between the entity and its assigned identity somewhere (e.g. In a Changing the service identity for a Catalog Entity is risky, e.g., users might configure their IAM role’s trust policy to allowlist the specific service identity initially assigned to their federated Polaris catalog. If Polaris later switches to a different identity when accessing the Glue Catalog, that integration could break unexpectedly. So perhaps the right approach is to allow the service vendor to decide whether the service identity should be persisted or injected dynamically. WDYT? |
For better or worse, we currently use entities both for the persisted state of an object and for an in-memory representation of the same. I'm still a fan of separating this out (something like the DAO proposal from months back) but that a big refactor that I'm not sure I'd want to block SigV4. @snazy perhaps we can view this as a step in the right direction as it encourages us to treat entities as immutable and to use discrete "transformations" to generate new entities instead of trying to modify them in place or to construct lots of new entities in an adhoc way throughout the code base. Eventually, entities should be truly immutable and this may become the canonical way to transform an entity into a new one. I am, however, a little concerned about how heavyweight this process could become... if I just want to update an entity's version today, I can call a method like |
Yes, the Entity Transformation System I'm adding won't update the entity in place, it creates a copy and applies the modifications there. What I want to point out is that, this system isn't meant to replace the current set methods used in the persistence layer. Its purpose is to provide hooks for applying transformations at specific points in the entity lifecycle. It's admittedly a bit heavyweight, since each transformer creates a new copy of the entity. But without a builder-style abstraction (like what we have for generated model classes), we can't apply the transformation efficiently. Given that we may eventually refactor this area to introduce a proper DAO layer, and that we're currently only using the transformation system for a single SigV4-related transformer, it might make sense to proceed with the current approach and revisit the refactor later. WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link some comments from @dennishuo in this PR as well:
.../src/main/java/org/apache/polaris/core/entity/transformation/EntityTransformationEngine.java
Show resolved
Hide resolved
* | ||
* @param transformationPoint The point in the entity lifecycle where transformers should be | ||
* applied. | ||
* @param entity The original Polaris entity to mutate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too bad we don't have a way to express C++-style const correctness as we move towards immutable PolarisBaseEntity, but maybe we can at least mention here that the intention is that implementations should not mutate the input entity, but instead only create the new transformed entity.
@@ -477,6 +479,14 @@ private void revokeGrantRecord( | |||
|
|||
ms.persistStorageIntegrationIfNeededInCurrentTxn(callCtx, catalog, integration); | |||
|
|||
// CATALOG_PRE_PERSIST transformation point | |||
EntityTransformationEngine entityTransformationEngine = callCtx.getEntityTransformationEngine(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, unfortunately we didn't quite manage to refactor all of the shared logic nryerrn AtomicOperationMetaStoreManager and TransactionalMetaStoreManagerImpl into a base class yet; the best we have is the prepareToPersistNewEntity hook defined in BaseMetaStoreManager.
This means either this logic needs to be added in both the AtomicOperationMetaStoreManager and TransactionalMetaStoreManagerImpl, or needs to be abstracted into BaseMetaStoreManager somehow while making sure both of those impls invoke the shared logic. persistNewEntity isn't specific to catalog-creation so you'd need an enum on entityType there if you just put this logic in there.
@@ -188,6 +188,9 @@ polaris.oidc.principal-roles-mapper.type=default | |||
# polaris.storage.gcp.token=token | |||
# polaris.storage.gcp.lifespan=PT1H | |||
|
|||
# Polaris Entity Transformation Config | |||
polaris.entity-transformation.transformers=no-op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the one hand, it's useful to be able to have this kind of generalized transformer/mutator hook, but if the hook is also strongly coupled to particular features, we don't want to require separate configuration of the info injection here vs the core feature (e.g. sigv4) somewhere else.
Especially if we end up having feature configurations for the core feature, then it's important that the feature itself auto-configures itself to be functional. In a way then the classes that define the sigv4 functionality should be what injects the mutator/transformer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rulin: That's a good point, perhaps we can consider making some transformers tied to the feature parameters rather than the config itself if they are part of the core functionality of the feature.
Honestly, I think the approach is too heavyweight and too broad to target a very specific problem. It will be very hard for people to realize (later, by accident) that some entity instance does not reflect the "real" state. That will cause confusion and likely hard to debug issues. I propose to separate the relevant parts into APIs/implementations that deal with the problem/use-case at hand, and definitely outside of the metastore - storage credential handling doesn't belong there IMO. |
Agree with the concern about embedding this logic in the persistence layer, moving it outside does make sense in general as the final entity is different from the entity we pass to the metastore. That said, I want to point out a few things specific to the storage config use case. For storage config, the service identity isn't assigned by Polaris itself, it's provided by users via the That's why I was leaning toward assigning the service identity in the persistence layer. For certain vendors, we may need to store and retrieve the mapping between realm and service identity info from the database. Doing this lookup in the persistence layer allows us to keep everything within the same transaction when persisting the catalog entity. Moving this from the persistence layer may end up causing inconsistency. Adding some hook points is also very useful in general, e.g. it provides the ability to do some patches if there are some bugs and we end up persisting some wrong data in database. We can also put the service identity injection logic here: |
Milestones
This is Part 1 of the [Splitting] Initial SigV4 Auth Support for Catalog Federation. Upcoming parts will build on this system:
Introduction
This PR introduces a general-purpose
Entity Transformation System
to support future features like injecting service identity metadata into catalog entities for federated access (e.g. AWS Glue using SigV4). It provides a clean and extensible hook mechanism for enriching entities during specific lifecycle phases.This system is intended to serve as the foundation for service identity injection and other dynamic transformations in subsequent parts of the SigV4 support work.
Key Components
EntityTransformer
: A functional interface representing a single transformation to be applied to a PolarisBaseEntity.EntityTransformationEngine
: A runtime engine that applies a sequence ofEntityTransformer
s to an entity based on the configuredTransformationPoint
.TransformationPoint
: Enum representing defined hook points in the entity lifecycle (e.g.CATALOG_PRE_PERSIST
).@AppliesTo
annotation: A repeatable qualifier that marks a transformer as applicable to a given TransformationPoint.EntityTransformationConfiguration
: Optional config-driven registration of transformers by identifier (preserving ordering).Example Configuration:
polaris.entity-mutation.mutators=no-op,catalog-connection-config
Flowchart