Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] JSON table writer #21529

Open
asfimport opened this issue Mar 27, 2019 · 6 comments
Open

[C++] JSON table writer #21529

asfimport opened this issue Mar 27, 2019 · 6 comments

Comments

@asfimport
Copy link

asfimport commented Mar 27, 2019

Users who need to emit json in line delimited format currently cannot do so using arrow. It should be straightforward to implement this efficiently, and it will be very helpful for testing and benchmarking

Reporter: Ben Kietzman / @bkietz

Related issues:

Note: This issue was originally created as ARROW-5033. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
As something to keep in mind, we will need to implement a "Sink" node type to be the flip side of "Scan" in a query engine context. To the user may wish to output the results of a query directly to CSV, JSON, Parquet or some other dataset format. So we need to develop a common API that this can hook into for this purpose

@asfimport
Copy link
Author

Nicola Crane / @thisisnic:
User request on StackOverflow for this feature to be implemented: https://stackoverflow.com/questions/71047976/fast-ldjson-writing-with-arrow

@asfimport
Copy link
Author

Weston Pace / @westonpace:
I came across an helpful Github issue today that explains that there are actually several standards for line delimited JSON and goes over a bit the differences. This might be a helpful reference when this gets implemented: ndjson/ndjson.github.io#1

@asfimport
Copy link
Author

Todd Farmer / @toddfarmer:
This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

@asfimport
Copy link
Author

Steve M. Kim:
As part of this feature request, do we contemplate generating a JSON Schema from a Arrow table schema? Given an Arrow schema and record batches, it would be useful to get a JSON schema and a sequence of JSON objects that conform to that schema. This would also facilitate testing the correctness of the Arrow JSON writer.

@asfimport
Copy link
Author

David Li / @lidavidm:
That's a new can of worms :) There's been some discussion about a way to represent Arrow schemas in JSON. See #13803 and #7110 and ARROW-8952.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant