Skip to content

Support http(s) storage URLs #1912

@vustef

Description

@vustef

Is your feature request related to a problem or challenge?

Currently we only support s3:// style URLs for S3/AWS, and abfs(s)/wasb(s) style URLs for Azure data lake storage (ADLS) Gen2 and Azure blob storage.
However, both services expose HTTP(S) endpoints.

For S3, there are two ways (path-style and vhost), and also VPCs: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html

For Azure, http(s) URLs are always of form:

  • http(s)://account.blob.core.windows.net/container/ for accounts without hierarchical namespace enabled (corresponds to wasb)
  • http(s)://account.dfs.core.windows.net/container/ for accounts with hierarchical namespace enabled (ADLS Gen 2)

For Azure, we might decide to simply parse these URLs and translate them to the currently supported wasb(s)/abfs(s) styles. However, these client drivers may have some limitations to what HTTP(s) protocol supports.
For S3, translation might be more limiting - it's impossible for VPC URLs, and I'm not sure about behaviour if translating vhost/path-style to s3 paths - it's something to explore. I seem to remember that there are some configurations though that prevent access with path-style URLs.

So perhaps the safe bet would be not to translate, but to always use the style as chosen in the metadata path and the metadata/manifest/manifest list files themselves, pass that to the underlying library, and then it'd be a matter of the underlying storage library whether it supports that style or not.

Note: this might be true for GCS, but I don't know much about its storage URLs.

cc @Xuanwo

Describe the solution you'd like

No response

Willingness to contribute

I would be willing to contribute to this feature with guidance from the Iceberg Rust community

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions