-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
THRIFT-5186: Rewrite address resolution in T{Nonblocking,}ServerSocket [REVIEW WANTED] #2151
Conversation
Nice work! I've tried to test it right away but it conflicts with another PR about Unix domain sockets that I submitted a few weeks back. I'll try to give it a shot sometime soon, but its not as straightforward as I'd hoped. |
I've made sure this applies to the latest How Unix domain sockets are done here, baffled me as well. Glad that someone tackles this. I you Mario will merge that before me — I will rebase, no problem. |
Linux CI got green. Appveyor though...
That's what I got here, with the relevant code: virtual std::string message(int code) const override {
return THRIFT_GAI_STRERROR(code);
} # ifdef _WIN32_WCE
# define THRIFT_GAI_STRERROR(...) thrift_wstr2str(gai_strerrorW(__VA_ARGS__))
# else
# define THRIFT_GAI_STRERROR gai_strerrorA
# endif Jeez, what a mess. |
What... This Travis failure looks like a fluke. I'll just retry. Weird that my |
The whole build system has way more false positives than it should. If you can spot a problem and fix it along the way, more than welcome. In any case, re-try is often a good idea :-) |
6925094
to
3d4ae28
Compare
Agreed, but well, racing in build systems and test runners is very very common, in my experience. And quite often, catching those races is orders of magnitude harder that just retrying :) |
Yay! All CI's green ✔️ Please review & merge. |
Awesome! |
Now we have to find someone that can review this monster :-) :-) :-) |
Recommend to ask on the mailing list. That's really a big one. |
3d4ae28
to
6b8e935
Compare
And most of this diff wouldn't exist, had the classes not been choke-full of copypaste. The main logical change is pretty simple, and I describe it thoroughly in the commit message: the retry loop of failed |
@Jens-G you mean the Look at the last month: the archive is just a dump of robotic notifications from Jira and GitHub. Among 3 pages of mails, just 2 threads are humans talking. That's way too low SNR, certainly lower than on GitHub. So what's the point? We're already in pretty good place to discuss the patch: here, in this PR on GitHub. |
I only made a proposal. You can do whatever you find helpful. |
6b8e935
to
6572215
Compare
Ok, @emmenlau has a point. It's weird, but I've removed the copyright line. Also in the last rebase:
|
6572215
to
f97e8f9
Compare
Rebased once again. The unrelated commit dropped (Boost unit-test warnings, see #2164). |
Travis error was transient, during docker build in
This caused two other jobs to cancel; I'm just going to kick it again with no-change restart. |
f97e8f9
to
eb14595
Compare
Same thing again.
Travis is doing way more uncached |
Client: cpp Patch: Max Ulidtko My previous patch (9b9567b) has exposed an issue shared by TServerSocket and TNonblockingServerSocket: the results of getaddrinfo() call aren't used "in full" -- just a single address is picked, and then bind() is retried in a loop on that single address. This leads to poor results if we start varying the network conditions. Like I show in the Jira issue, this is normal: [root@04dd07b70038 /]# ping -6 localhost ping: connect: Cannot assign requested address -- same with `ping -6 $(hostname)` -- a hostname may resolve to IPv6 address which we might not be able to connect to; and vice-versa, connecting to IPv4 addresses may fail on IPv6-only systems. The solution is to iterate over what getaddrinfo() returns while retrying failed bind(). This is what this patch implements. Server-side analogue of this behavior of curl: > curl -v localhost:8001/index.do * Trying ::1:8001... * connect to ::1 port 8001 failed: Connection refused * Trying 127.0.0.1:8001... * Connected to localhost (127.0.0.1) port 8001 (#0) > GET /index.do HTTP/1.1 [...] To achieve that, I throw away the TGetAddrInfoWrapper, and roll a proper one. The bind-retrying loop had to be restructured: since sa_family will vary from retry to retry, the socket() call has to be within the loop. All the setsockopt() business factored away into side-methods: ~hundred repetitive #ifdef-rich lines in the middle of important loop do get in the way. This patch has quite a lot of code movement; git config diff.colormoved zebra is strongly advised to the reader. Tested manually on Linux, by running CMake's `make test` in 3 Docker environments: 1) loopback-only (--net=none) 2) classic IPv4-only bridge 3) dual IPv6-IPv4 bridge The only failing test was 88 - PythonTestSSLSocket, which currently also fails on master for me (due to NULL cipher being unexpectedly accepted).
eb14595
to
edc8533
Compare
Sorry that you are bit by this. Its a known problem, I've brought it up on the mailing list recently. Sadly, there is no solution yet. For me, sometimes 100% of builds failed, so please don't get too hung up on this. And sorry again! |
With a diff that big I could have easily missed something but sometimes code needs an overhaul. Code changes looked clean to me. If CI is passing: +1 |
Huge thanks everyone! 🎆 |
Awesome! |
This is in followup to #2124. Passing
AI_ADDRCONFIG
hint togetaddrinfo()
breaks Thrift servers when there's no network (127.0.0.1 loopback only). Not passingAI_ADDRCONFIG
exposes deficient logic of processinggetaddrinfo()
results. This PR fixes the latter.Please see the commit messages for the details of why&how.
I'm also including a quick drive-by fix of a compile warning related to Boost.Test deprecations, as a separate commit.Split off to separate PR.Since I only tested on Linux — am very keen to see what AppVeyor and Travis CI's will have to say about this change (hopefully, some errors which point out my blunders).
Meanwhile — anyone interested is welcome to review. @emmenlau I think you'd want to take a look; this patch certainly needs another pair of eyes.