Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Dataset] Expose more informational properties #23861

Closed
asfimport opened this issue Jan 17, 2020 · 4 comments
Closed

[C++][Dataset] Expose more informational properties #23861

asfimport opened this issue Jan 17, 2020 · 4 comments

Comments

@asfimport
Copy link

In thinking about what I'd want a useful print method for a Dataset in R to include, there are a few things that come to mind, and by skimming dataset.h, they're not available:

  • How many Sources it has
  • For a Source, what kind (local filesystem, other filesystem, etc.), base path (at least where we didn't provide a list of files), how many files, what file format

Reporter: Neal Richardson / @nealrichardson
Assignee: Francois Saint-Jacques / @fsaintjacques

PRs and other links:

Note: This issue was originally created as ARROW-7608. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
It would indeed be nice to have access to such information.
Additional information / inspection capabilities can be very useful for the purpose of debugging / understanding what is going on (eg being able to verify which file paths a dataset has discovered)

@asfimport
Copy link
Author

Francois Saint-Jacques / @fsaintjacques:

  • Dataset's source is found via Dataset::sources() returns vector

  • Souce's kind Source::type_name() returns string

  • Format's kind Format::type_name() returns string

    I'll add a method to FileSystemSource to return a list of file path.

@asfimport
Copy link
Author

Neal Richardson / @nealrichardson:
Source's type_name doesn't appear to be useful: what is "tree"? https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/dataset.h#L150

How do I get Format from Source?

@asfimport
Copy link
Author

Neal Richardson / @nealrichardson:
Issue resolved by pull request 6374
#6374

@asfimport asfimport added this to the 0.17.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants