-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-5397: [FlightRPC] Add TLS certificates for testing Flight #4510
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4510 +/- ##
===========================================
- Coverage 88.55% 75.34% -13.22%
===========================================
Files 796 56 -740
Lines 103239 3192 -100047
Branches 1253 0 -1253
===========================================
- Hits 91425 2405 -89020
+ Misses 11569 787 -10782
+ Partials 245 0 -245 Continue to review full report at Codecov.
|
The submodule is now up-to-date, let's make sure the tests run in CI... |
C++ with conda-forge appears to run the tests, but not Python with conda-forge (even though it sources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks good on the principle, just a couple comments.
python/pyarrow/tests/test_flight.py
Outdated
) as server_location: | ||
options = flight.FlightCallOptions(timeout=0.5) | ||
client = flight.FlightClient.connect(server_location) | ||
with pytest.raises(pa.ArrowIOError, match="Deadline Exceeded"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a point in testing per-call timeout here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an explanatory comment, but the purpose of the timeout is to make sure gRPC doesn't block there for a long time trying to reconnect over and over. (Not sure why it doesn't fail-fast in these cases.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, looks like we set gRPC's wait_for_ready
option, which is why it doesn't fail-fast. But unsetting that causes C++ tests to fail due to a race condition between starting the server subprocess and creating the client. I'll convert the tests to not use the subprocess (should make them run faster too), then drop the option and the timeout here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not using a subprocess is nice, but we'll still need to allow retries, otherwise the race condition will still be there AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to ping the server (with a short timeout) until a request succeeds, rather than rely on experimental gRPC options that we don't have much control over (again, in my opinion, blocking for some unbounded amount of time is a rather pathological way to handle connection failure...it also leads to more cases where Ctrl-C doesn't work in Python)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually as implemented, creating a server immediately starts it, so there's no need for even that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It starts the thread, but is the listening socket already created before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - FlightServerBase::Init
calls BuildAndStart
, so all Serve
actually does is set up signals and block until the server shuts down. (maybe we shouldn't rely on that..., but as-is, it's a little annoying to tell what the underlying error in a call is except by comparing strings)
Looks like one of the C++ builds flaked when downloading deps, but otherwise tests run in CI now. |
I restarted the failed build. |
Thanks, looks like tests pass! |
Gonna do a final review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just two nits.
cpp/src/arrow/flight/flight-test.cc
Outdated
@@ -620,5 +659,25 @@ TEST_F(TestAuthHandler, CheckPeerIdentity) { | |||
ASSERT_EQ(result->body->ToString(), "user"); | |||
} | |||
|
|||
TEST_F(TestTls, DoAction) { | |||
if (!client_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this actually occur?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is left over from when I made the TLS tests optional, I've removed it.
Did the Thrift download fail (I just opened https://issues.apache.org/jira/browse/ARROW-5576)? |
@wesm Yeah, that was it. |
OK, I looked into this and we're using the Apache dist system inappropriately, I commented at https://issues.apache.org/jira/browse/ARROW-5576?focusedCommentId=16863112&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16863112 |
Thanks @lihalite ! |
This needs apache/arrow-testing#2.