-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33209: [C++] Support for reading JSON Datasets #33732
Conversation
|
d96db04
to
8760cb2
Compare
8760cb2
to
b926a05
Compare
I attempted to implement the newer I suppose there's a possibility that the tests in |
For reference, here is where that test would fail (in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exciting. Sorry you took this on at a time when we have two mirror paths in the file format. Hopefully this will all be getting simplified soon.
0cae0e1
to
831df82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor nits from a tidy/iwyu pass but mostly clean, thank you.
TEST_P(TestJsonFormatScanNode, ScanProjectedMissingColumns) { | ||
TestScanProjectedMissingCols(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just scanning through the CSV tests I wonder if there are a few JSON-specific tests that might be nice to have:
- Confirm that alternate read/parse options (e.g. alternate block size or newlines allowed) get passed in correctly
- Does the JSON reader handle the the presence of a unicode BOM well? Maybe not as big an issue here where we have to explicitly handle it for the CSV reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reader doesn't explicitly handle BOM, but I suspect that rapidjson does, as it's at least referenced in their docs. Either way, I added a test for it (which currently passes).
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
67b62c1
to
5bbd53e
Compare
@felipecrv Feel free to take a look if you find the time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the addition. A very cool feature.
@benibus sorry, I only noticed this today. Feel free to mention my handle as soon as PR is out of draft status. |
Benchmark runs are scheduled for baseline = 9033573 and contender = 45918a9. 45918a9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
@felipecrv No worries! I'll let you in earlier on the next one. |
This adds initial support the JSON file format to the Dataset library. Since there's currently no public API for writing JSON files, this only deals with the reader-side facilities. * Closes: apache#33209 Lead-authored-by: benibus <bpharks@gmx.com> Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
This adds initial support the JSON file format to the Dataset library. Since there's currently no public API for writing JSON files, this only deals with the reader-side facilities. * Closes: apache#33209 Lead-authored-by: benibus <bpharks@gmx.com> Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
### Rationale for this change Enable Support for reading JSON Datasets #33732 on Java side ### What changes are included in this PR? Support for reading JSON Datasets ### Are these changes tested? Unit test added ### Are there any user-facing changes? No * Closes: #36421 Lead-authored-by: david dali susanibar arce <davi.sarces@gmail.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…he#36422) ### Rationale for this change Enable Support for reading JSON Datasets apache#33732 on Java side ### What changes are included in this PR? Support for reading JSON Datasets ### Are these changes tested? Unit test added ### Are there any user-facing changes? No * Closes: apache#36421 Lead-authored-by: david dali susanibar arce <davi.sarces@gmail.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
This adds initial support the JSON file format to the Dataset library. Since there's currently no public API for writing JSON files, this only deals with the reader-side facilities.