Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-6762: [C++] Support reading JSON files with no newline at end #5564

Closed

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Oct 2, 2019

Also fix some lifetime issues in parallel mode, and add tests.

@github-actions
Copy link

github-actions bot commented Oct 2, 2019

@bkietz bkietz self-requested a review October 2, 2019 13:20
@pitrou pitrou force-pushed the ARROW-6762-json-parser-trailing-newline branch from a3498d7 to 71c05d8 Compare October 2, 2019 14:01
@pitrou
Copy link
Member Author

pitrou commented Oct 2, 2019

@ursabot build

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine, just a few nits.

I don't think there is a lifetime issue if ChunkedArrayBuilder::Finish() is called before the builder is destroyed. It calls TaskGroup::Finish(), so all tasks in any other threads with a reference to the builder will terminate. This is the case in TableReader::Read, for example. Is there some use case for running a ChunkedArrayBuilder in a task group but never calling finish on it?

cpp/src/arrow/json/chunker_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/json/chunker.h Show resolved Hide resolved
cpp/src/arrow/json/chunker.cc Show resolved Hide resolved
cpp/src/arrow/json/chunker.cc Show resolved Hide resolved
@pitrou
Copy link
Member Author

pitrou commented Oct 2, 2019

Is there some use case for running a ChunkedArrayBuilder in a task group but never calling finish on it?

I had some actual crashes when adding the tests. The main situation is when the reader exits early - for example because chunking fails, but I think it may also happen if readahead fails (think I/O or decompression error).

Also fix some lifetime issues in parallel mode, and add tests.
@pitrou pitrou force-pushed the ARROW-6762-json-parser-trailing-newline branch from 71c05d8 to 562783d Compare October 3, 2019 10:03
@pitrou pitrou closed this in 31a3259 Oct 3, 2019
@pitrou pitrou deleted the ARROW-6762-json-parser-trailing-newline branch October 3, 2019 14:58
kszucs pushed a commit that referenced this pull request Oct 5, 2019
Also fix some lifetime issues in parallel mode, and add tests.

Closes #5564 from pitrou/ARROW-6762-json-parser-trailing-newline and squashes the following commits:

562783d <Antoine Pitrou> ARROW-6762:  Support reading JSON files with no newline at end

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
kszucs pushed a commit to kszucs/arrow that referenced this pull request Oct 21, 2019
Also fix some lifetime issues in parallel mode, and add tests.

Closes apache#5564 from pitrou/ARROW-6762-json-parser-trailing-newline and squashes the following commits:

562783d <Antoine Pitrou> ARROW-6762:  Support reading JSON files with no newline at end

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
kszucs pushed a commit to kszucs/arrow that referenced this pull request Oct 21, 2019
Also fix some lifetime issues in parallel mode, and add tests.

Closes apache#5564 from pitrou/ARROW-6762-json-parser-trailing-newline and squashes the following commits:

562783d <Antoine Pitrou> ARROW-6762:  Support reading JSON files with no newline at end

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
kszucs pushed a commit to kszucs/arrow that referenced this pull request Oct 21, 2019
Also fix some lifetime issues in parallel mode, and add tests.

Closes apache#5564 from pitrou/ARROW-6762-json-parser-trailing-newline and squashes the following commits:

562783d <Antoine Pitrou> ARROW-6762:  Support reading JSON files with no newline at end

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants