Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to get actually full path/URL, e.g. from ObjectMeta #4162

Open
ZetaTwo opened this issue May 1, 2023 · 3 comments
Open

Ability to get actually full path/URL, e.g. from ObjectMeta #4162

ZetaTwo opened this issue May 1, 2023 · 3 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@ZetaTwo
Copy link

ZetaTwo commented May 1, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

It would be nice to have a function/attribute to give me the full URL for an object. Currently, the ObjectMeta.location provides the path within the object store but I would like to get the full URL such as "gs://bucket/dir/file.txt".

Describe the solution you'd like

Maybe add an attribute "uri" to the ObjectMeta struct or provide a new function .get_uri on the ObjectStore struct.

Additional context

The reason I am looking for this is that I need to pass the URL a different program and it would be nice to keep all the logic centralised in the object_store library instead of having a separate function with "if gcp: "gs://" + path elif aws: "s3://" + path" etc.

@ZetaTwo ZetaTwo added the enhancement Any new improvement worthy of a entry in the changelog label May 1, 2023
@tustvold
Copy link
Contributor

tustvold commented May 2, 2023

I think we definitely should better flesh out the URL handling story in object_store, #4047 is also in a similar vein.

There are a couple of things though that make this challenging:

  • The URLs with custom schemes are not especially well standardised, for example, abfs:// has two completely different conventions
  • Store adapters like PrefixStore can't meaningfully reason about how to add a prefix to a URL returned by the inner store
  • The stores themselves are not URL-aware - see object_store: Why does builder take bucket? #3784

I wonder if you've given thought to simply storing the base URL alongside the ObjectStore, to allow constructing the URL in the form expected by the different program? Just spit-balling here...

@ZetaTwo
Copy link
Author

ZetaTwo commented May 2, 2023

Yeah I understand that there might be some complications in implementing this for all possible backends. I can totally store the URL in a different place but I just thought it would have been cool to get the info directly from the backend.

@tustvold
Copy link
Contributor

tustvold commented Nov 1, 2023

A further wrinkle here, is that some tools such as the AWS CLI are actually not following the URL specification, and are interpreting the URL paths verbatim.

See apache/datafusion#8009 and #5017.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants