Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Handle URI paths for filesystems? #34386

Closed
lidavidm opened this issue Feb 28, 2023 · 2 comments · Fixed by #34420
Closed

[C++] Handle URI paths for filesystems? #34386

lidavidm opened this issue Feb 28, 2023 · 2 comments · Fixed by #34420

Comments

@lidavidm
Copy link
Member

Describe the enhancement requested

Given a URI like s3://bucket/foo/bar.parquet, right now, you can turn that URI into a FileSystem instance + a non-URI path. That is useful, but then the FileSystem APIs don't work with URIs. For instance, it's explicitly rejected here:

if (internal::IsLikelyUri(s)) {
return Status::Invalid(
"Expected an S3 object path of the form 'bucket/key...', got a URI: '", s, "'");
}

So if you have a set of URIs and want to work with them with filesystems, you have to parse the URIs yourself. Given we already do the URI munging internally, it might be nice for FileSystem instances to expose a method to parse URIs into a path as well (raising an error if the URI is incompatible).

Component(s)

C++

@westonpace
Copy link
Member

Ah, so arrow::fs::FileSystemFromUri is sufficient for the single file case but not the multi-file case?

Maybe it would be simpler to add arrow::fs:FileSystemFromUris?

Otherwise I worry that there may be many entry points into filesystem that accept a path (and there are many filesystem implementations). For example, this change would touch the S3 filesystem and the GCS filesystem and so on. I think there is some advantage in keeping the filesystem implementations as simple as possible.

@lidavidm
Copy link
Member Author

lidavidm commented Mar 1, 2023

That would also work. In my use case, I would like to get all the paths back to pass on to another system (which does also use Arrow filesystems), but there is one step that would at least like to parse and validate that all the URIs are consistent for a filesystem.

westonpace added a commit that referenced this issue May 4, 2023
### Rationale for this change

We have some URI parsing indirectly exposed through FilesystemFromUri.  However, this isn't very useful if the user has multiple URIs or if they already have a filesystem.  This method allows that same URI handling to be used even if the user already has a filesystem.

### What changes are included in this PR?

Adds a new arrow::fs::PathFromUriOrPath method

### Are these changes tested?

Yes, via unit tests

### Are there any user-facing changes?

There is a new API method but no changes to any existing APIs
* Closes: #34386

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
@westonpace westonpace added this to the 13.0.0 milestone May 4, 2023
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this issue May 11, 2023
### Rationale for this change

We have some URI parsing indirectly exposed through FilesystemFromUri.  However, this isn't very useful if the user has multiple URIs or if they already have a filesystem.  This method allows that same URI handling to be used even if the user already has a filesystem.

### What changes are included in this PR?

Adds a new arrow::fs::PathFromUriOrPath method

### Are these changes tested?

Yes, via unit tests

### Are there any user-facing changes?

There is a new API method but no changes to any existing APIs
* Closes: apache#34386

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
### Rationale for this change

We have some URI parsing indirectly exposed through FilesystemFromUri.  However, this isn't very useful if the user has multiple URIs or if they already have a filesystem.  This method allows that same URI handling to be used even if the user already has a filesystem.

### What changes are included in this PR?

Adds a new arrow::fs::PathFromUriOrPath method

### Are these changes tested?

Yes, via unit tests

### Are there any user-facing changes?

There is a new API method but no changes to any existing APIs
* Closes: apache#34386

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
### Rationale for this change

We have some URI parsing indirectly exposed through FilesystemFromUri.  However, this isn't very useful if the user has multiple URIs or if they already have a filesystem.  This method allows that same URI handling to be used even if the user already has a filesystem.

### What changes are included in this PR?

Adds a new arrow::fs::PathFromUriOrPath method

### Are these changes tested?

Yes, via unit tests

### Are there any user-facing changes?

There is a new API method but no changes to any existing APIs
* Closes: apache#34386

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants