Skip to content

Conversation

@MonkeyCanCode
Copy link
Contributor

This PR addressed one of the issues reported in #3440 where when end-user is non-AWS S3-compatible backend and have region set on the catalog property. With current code, we are determining if a S3 backend if AWS or not based on checking if account id and region are both set which is a bit problematic IMO. The account id here is derived from IAM role and the region is be set implicitly. That being said, when an user is trying to use assume role to auth and interact with S3-compatible storage, it can cause good amount of confusion as our current code base will add wildcard KMS policy to it if the backend is "AWS" (in this case, if a region and account id are both present...and region itself is valid for MinIO AFAIK and it default to us-east-1).

Now with this, if an user is trying to use assume role for auth via STS and have region set, polaris will implicitly set wildcard KMS policy which is not compatible with MinIO thus raised error reported above. I am purposing we should be checking if an endpoint is being implicitly set and not contains "amazonaws.com" to correctly check if the backend is AWS or not.

If we think this is good to change, please wait for #3493 and #3494 to be merged first then I will refactor this one accordingly.

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

return endpoint == null || endpoint.contains(".amazonaws.com");
}

@JsonIgnore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not this change but is weird that the class is called AwsStorageConfigurationInfo and it implements a method with this name, I think eventually we want to refactor this class and may be call it S3StorageConfigurationInfo instead.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also a question if we want a single class for all S3 compatible storage or if we want to have a model where there are many subclasses based on the actual storage backend we are dealing with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, refactor this maybe wise. Base on what I recalled, S3-compatible was added later on and it was added on top of the AWS one. Thus current state.

public boolean isAwsS3() {
String endpoint = getEndpoint();
// AWS S3 if no endpoint is specified or if it uses an amazonaws.com endpoint
return endpoint == null || endpoint.contains(".amazonaws.com");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the endpoint be an Ip address rather than than a FQDN ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually AWS endpoint will be a pretty wide set of IPs and those IP can changed too as far as I know. I can't think about a reason on why we would ever want to pin a specific IP for using AWS endpoint as they all have geo routing already. But that is fair if somehow a user really wants to pined to a specific AWS IP address, this won't add wildcard KMS policy (as it will then get classified as non-AWS S3). But if user did specified KMS key on the catalog property, this will then work normally again with more detailed KMS policies.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although this is generally fair, and will work for most of the cases, I was thinking if it would make sense to enable KMS addition as a configuration rather than something that is tightly coupled with whether or not it the underlying storage is AWS. Not a blocker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for .amazonaws.com works for the current public regions and zones. Technically, it would be better to check for the string only in the host part of the endpoint URI though. I do not know the endpoints of the relatively new AWS local regions/zones. Whether those are underneath the amazonaws.com domain. Amazon promised a "EU sovereign cloud", and I think to make it completely sovereign, the endpoints would be in a different DNS domain.

OTOH this check wouldn't work for localstack, which supports KMS, or any other KMS implementation.

I think, it would be better to gate this check on a different condition/setting rather than looking for .amazonaws.com in the whole endpoint URI.

@dimas-b
Copy link
Contributor

dimas-b commented Jan 21, 2026

our current code base will add wildcard KMS policy to it if the backend is "AWS" [...]

Why not make this an explicit flag in storage config? The admin user will set it (or not) based on the specific deployment situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants