-
Notifications
You must be signed in to change notification settings - Fork 368
Description
Is your feature request related to a problem or challenge?
Currently we only support s3:// style URLs for S3/AWS, and abfs(s)/wasb(s) style URLs for Azure data lake storage (ADLS) Gen2 and Azure blob storage.
However, both services expose HTTP(S) endpoints.
For S3, there are two ways (path-style and vhost), and also VPCs: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html
For Azure, http(s) URLs are always of form:
http(s)://account.blob.core.windows.net/container/for accounts without hierarchical namespace enabled (corresponds to wasb)http(s)://account.dfs.core.windows.net/container/for accounts with hierarchical namespace enabled (ADLS Gen 2)
For Azure, we might decide to simply parse these URLs and translate them to the currently supported wasb(s)/abfs(s) styles. However, these client drivers may have some limitations to what HTTP(s) protocol supports.
For S3, translation might be more limiting - it's impossible for VPC URLs, and I'm not sure about behaviour if translating vhost/path-style to s3 paths - it's something to explore. I seem to remember that there are some configurations though that prevent access with path-style URLs.
So perhaps the safe bet would be not to translate, but to always use the style as chosen in the metadata path and the metadata/manifest/manifest list files themselves, pass that to the underlying library, and then it'd be a matter of the underlying storage library whether it supports that style or not.
Note: this might be true for GCS, but I don't know much about its storage URLs.
cc @Xuanwo
Describe the solution you'd like
No response
Willingness to contribute
I would be willing to contribute to this feature with guidance from the Iceberg Rust community