ARROW-6063: [FlightRPC] implement half-closed semantics for DoPut#5196
Conversation
pitrou
left a comment
There was a problem hiding this comment.
Sounds good to me on the principle.
python/pyarrow/tests/test_flight.py
Outdated
There was a problem hiding this comment.
Hmm... why segfault? Isn't this something that we can detect and prevent programmatically? (raise an error if reading from a stream whose writer side was closed)
There was a problem hiding this comment.
Segfault is the wrong word, I'll clarify this - gRPC aborts if it sees an internal state it didn't expect. (Personally I think this is rather dumb, since it's fairly easy to cause, even just by accidentally calling certain methods twice or concurrently.)
|
Rebased/addressed comments. Personal Travis: https://travis-ci.com/lihalite/arrow/builds/124741965 |
python/pyarrow/tests/test_flight.py
Outdated
There was a problem hiding this comment.
Oh, and should this count of number of metadata items received? I assume it should be equal to the number of batches written, so we could check that after joining the reader.
There was a problem hiding this comment.
Good point - I'll improve the test.
Codecov Report
@@ Coverage Diff @@
## master #5196 +/- ##
==========================================
+ Coverage 87.64% 89.21% +1.56%
==========================================
Files 1033 750 -283
Lines 148463 107623 -40840
Branches 1437 0 -1437
==========================================
- Hits 130118 96014 -34104
+ Misses 17983 11609 -6374
+ Partials 362 0 -362
Continue to review full report at Codecov.
|
pitrou
left a comment
There was a problem hiding this comment.
LGTM on the principle. A couple nits.
cpp/src/arrow/flight/client.cc
Outdated
There was a problem hiding this comment.
The std::move here is pointless. C++ will move temporaries automatically.
There was a problem hiding this comment.
I think you can write auto values = batch.data->column_data(0)->GetValues<int64_t>(1).
|
Thanks for the review! Fixed those concerns & rebased. |
Implements ARROW-6063 for C++/Python. In C++, gRPC reads are not thread-safe with respect to each other, but in DoPut, closing the writer (which was the only way for the client to indicate that it was done writing) would also execute reads (to prevent gRPC from hanging due to unread messages). This meant there was no way to asynchronously read application metadata during a DoPut.
By explicitly introducing a "done writing" operation, the server can be notified that the client is done writing and shut down its side of the call, which then unblocks the read side of the client. This lets us run a background thread on the client to read from the channel, and block closing the reader/writer until the background thread is unblocked (by the server closing its side of the call).
Longer term, the "right" solution is to introduce nonblocking/asynchronous operations in Flight.
Personal Travis: https://travis-ci.com/lihalite/arrow/builds/124584278