Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles #31051

Closed
asfimport opened this issue Feb 5, 2022 · 2 comments

Comments

@asfimport
Copy link

The Substrait read operator defines files with LocalFiles::FileOrFiles. These elements can take one of several forms:

uri_path (can be a file or a folder)
uri_path_glob (a glob expression)
uri_file (file only)
uri_folder (folder only)

The C++ Substrait consumer currently only supports uri_file. We should add support for the other options.

Reporter: Weston Pace / @westonpace
Assignee: Ariana Villegas / @ArianaVillegas

PRs and other links:

Note: This issue was originally created as ARROW-15587. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Weston Pace / @westonpace:
I think one approach could be to change the FromProto method to create a dataset using a dataset factory instead of a dataset.

  • The first thing we will need to do is scan the URIs and determine the filesystem. If there are multiple filesystems we could just return an error today (in the future we could maybe create multiple datasets and union them together).
  • Once we have a filesystem we can extract the path part from the URIs. I'm pretty sure paths can be folders or files.
  • If we want to add glob support we should add that in the FileSystemDatasetFactory. That can be done in a follow-up PR if we want to keep things simpler.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 12625
#12625

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant