-
Notifications
You must be signed in to change notification settings - Fork 359
DO-NOT-MERGE (fix) Refined AWS vs non-AWS detection for S3-compatible storage #3496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| return endpoint == null || endpoint.contains(".amazonaws.com"); | ||
| } | ||
|
|
||
| @JsonIgnore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not this change but is weird that the class is called AwsStorageConfigurationInfo and it implements a method with this name, I think eventually we want to refactor this class and may be call it S3StorageConfigurationInfo instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also a question if we want a single class for all S3 compatible storage or if we want to have a model where there are many subclasses based on the actual storage backend we are dealing with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, refactor this maybe wise. Base on what I recalled, S3-compatible was added later on and it was added on top of the AWS one. Thus current state.
| public boolean isAwsS3() { | ||
| String endpoint = getEndpoint(); | ||
| // AWS S3 if no endpoint is specified or if it uses an amazonaws.com endpoint | ||
| return endpoint == null || endpoint.contains(".amazonaws.com"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the endpoint be an Ip address rather than than a FQDN ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually AWS endpoint will be a pretty wide set of IPs and those IP can changed too as far as I know. I can't think about a reason on why we would ever want to pin a specific IP for using AWS endpoint as they all have geo routing already. But that is fair if somehow a user really wants to pined to a specific AWS IP address, this won't add wildcard KMS policy (as it will then get classified as non-AWS S3). But if user did specified KMS key on the catalog property, this will then work normally again with more detailed KMS policies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although this is generally fair, and will work for most of the cases, I was thinking if it would make sense to enable KMS addition as a configuration rather than something that is tightly coupled with whether or not it the underlying storage is AWS. Not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for .amazonaws.com works for the current public regions and zones. Technically, it would be better to check for the string only in the host part of the endpoint URI though. I do not know the endpoints of the relatively new AWS local regions/zones. Whether those are underneath the amazonaws.com domain. Amazon promised a "EU sovereign cloud", and I think to make it completely sovereign, the endpoints would be in a different DNS domain.
OTOH this check wouldn't work for localstack, which supports KMS, or any other KMS implementation.
I think, it would be better to gate this check on a different condition/setting rather than looking for .amazonaws.com in the whole endpoint URI.
Why not make this an explicit flag in storage config? The admin user will set it (or not) based on the specific deployment situation. |
This PR addressed one of the issues reported in #3440 where when end-user is non-AWS S3-compatible backend and have region set on the catalog property. With current code, we are determining if a S3 backend if AWS or not based on checking if account id and region are both set which is a bit problematic IMO. The account id here is derived from IAM role and the region is be set implicitly. That being said, when an user is trying to use assume role to auth and interact with S3-compatible storage, it can cause good amount of confusion as our current code base will add wildcard KMS policy to it if the backend is "AWS" (in this case, if a region and account id are both present...and region itself is valid for MinIO AFAIK and it default to us-east-1).
Now with this, if an user is trying to use assume role for auth via STS and have region set, polaris will implicitly set wildcard KMS policy which is not compatible with MinIO thus raised error reported above. I am purposing we should be checking if an endpoint is being implicitly set and not contains "amazonaws.com" to correctly check if the backend is AWS or not.
If we think this is good to change, please wait for #3493 and #3494 to be merged first then I will refactor this one accordingly.
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)