Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Support for filename-based partitioning #30158

Closed
asfimport opened this issue Nov 5, 2021 · 4 comments
Closed

[C++] Support for filename-based partitioning #30158

asfimport opened this issue Nov 5, 2021 · 4 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Nov 5, 2021

Directory-based partitioning is a feature of Arrow, but could we support filename-based partitioning?

e.g. I have a series of CSV files here all called something like foo_month_year.csv and it'd be nice to be able to read them in and then the month/year bits of the filenames then appear as fields I can filter on etc.

 

Reporter: Nicola Crane / @thisisnic
Assignee: Sanjiban Sengupta / @sanjibansg

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-14612. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
The current APIs trim the filenames before they're handed to partitioning, but assuming we can change that, we add or update the partitioning schemes to allow for this as well without too much trouble, I think. (If the filenames weren't trimmed, then it could already be done - at least in C++ - via a FunctionPartitioning.)

@asfimport
Copy link
Collaborator Author

Will Jones / @wjones127:
Supporting adding filename as a column may be a decent substitute for this.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 12530
#12530

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant