-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-9749: [C++][GLib][Python][R][Ruby][Dataset] Introduce FragmentScanOptions, consolidate ScanContext/ScanOptions #9686
Conversation
lidavidm
commented
Mar 12, 2021
•
edited
Loading
edited
- ScanContext/ScanOptions have been merged, since they were essentially always passed together.
- For scan options that are specific to a scan (e.g. CSV conversion options), a new FragmentScanOptions has been added to ScanOptions. Currently only CSV has this and it only wraps csv::ConvertOptions; follow up issues can tackle the rest.
- GLib, Python, R, and Ruby bindings have been updated.
As it stands implementing fragment scan options for a union dataset is a bit tricky, because the fragment doesn't know which dataset it originally belonged to (and it would be weird for every scan task implementation to have to know about a hypothetical UnionFragmentScanOptions). We could add a FragmentScanOptions field to Fragment, allowing UnionDataset::GetFragmentsImpl to do the work of matching up options to fragments there. (That'd also let an advanced caller manually specify per-fragment options if they really wanted to.) However given we might just delete UnionDataset I'm going to leave this out of this PR for now. |
@mrkn would you be willing to look at the GLib/Datasets bindings changes here again? This reduces the API surface that has to be bound by a bit, fortunately. |
@lidavidm I will look at it tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this cleanup, just a couple comments:
Could you add a follow up JIRAs for extracting ParquetFragmentScanOptions and IpcFragmentScanOptions? |
Done, ARROW-11972. |
+1 for GLib part. |
I created a new issue to support CsvFragmentScanOption in GLib binding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, merging
…canOptions, consolidate ScanContext/ScanOptions - ScanContext/ScanOptions have been merged, since they were essentially always passed together. - For scan options that are specific to a scan (e.g. CSV conversion options), a new FragmentScanOptions has been added to ScanOptions. Currently only CSV has this and it only wraps csv::ConvertOptions; follow up issues can tackle the rest. - GLib, Python, R, and Ruby bindings have been updated. Closes apache#9686 from lidavidm/arrow-9749 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
…canOptions, consolidate ScanContext/ScanOptions - ScanContext/ScanOptions have been merged, since they were essentially always passed together. - For scan options that are specific to a scan (e.g. CSV conversion options), a new FragmentScanOptions has been added to ScanOptions. Currently only CSV has this and it only wraps csv::ConvertOptions; follow up issues can tackle the rest. - GLib, Python, R, and Ruby bindings have been updated. Closes apache#9686 from lidavidm/arrow-9749 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>