-
-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
socketpair: fix potential hangs on Windows #7144
socketpair: fix potential hangs on Windows #7144
Conversation
Fixes potential hang in accept by using select + non-blocking accept. Fixes potential hang in peer check by replacing the send/recv check with a getsockname/getpeername check. Adds length check for returned sockaddr data.
|
||
/* use non-blocking accept to make sure we don't block forever */ | ||
if(ioctlsocket(listener, FIONBIO, &nonblock) == -1) | ||
goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably use curlx_nonblock()
instead for greater portability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a point in using curlx_nonblock
in Windows-only code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry, just realized my mistake, see #7144 (comment)
lib/socketpair.c
Outdated
FD_SET(listener, &fds); | ||
/* this is Windows-only code -> using select is fine regardless | ||
of the _value_ of the socket descriptor, also nfds is ignored */ | ||
select(0, &fds, NULL, NULL, &timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then shouldn't we #ifdef WIN32
this section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I just realized my mistake: I assumed the #ifdef WIN32
in line 27 would span the entire file. But it doesn't.
The only part that keeps the function from being compiled on (our) other platforms is the #if !defined(HAVE_SOCKETPAIR) && !defined(CURL_DISABLE_SOCKETPAIR)
. But there might be other platforms - that we don't use but are supported by libcurl - which don't HAVE_SOCKETPAIR
.
Hm. That makes things more difficult. Is there an easy, portable way to wait for "ready to accept" without using select
? Do you know if poll
is available on all supported non-Windows systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ps: I tried skipping the select
call entirely on Windows, and this causes sporadic failures. Sub 1% IIRC but still easily reproducible. So just not calling select
is probably a bad idea. What I could do though is keep using blocking accept
on other systems. It would be good enough for us. If you think that would be acceptable, that would of course be the easiest solution.
Sorry for asking stupid questions without looking into the lincurl code myself. I fell into panic mode after I realized that I made that stupid mistake with the |
I am wondering if this PR is still necessary with the latest curl release which avoids using socketpairs in the multi interface (on Win32). |
@mback2k I don't know :) |
ps: What about the threaded DNS resolver, does that still use socketpairs? |
Yes it does, also in the Windows build, right @mback2k ? |
@bagder I have updated the PR to use |
Yes, that seems to be the case. Also NTLM makes use of |
Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to not use non-blocking IO here but replace the check with a getsockname + getpeername check instead -- because it's both easier and more secure.
Excuse me but how does the new code verify the identity of the connecting party?
getpeername
of a accepted socket is always the getsockname
of the listen socket, so now there is no verification of any kind.
ping @pgroke-dt |
Reopened for now so that we don't lose track of this. |
To simplify, I'm reverting this PR |
The old code is insecure indeed in a way that address of the |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The anti-hang changes are good, but it needs adjustments in the way it proofs the other end.
Actually, no. In the code, Line 99 in 38d2708
socks[1] is the accepted socket, see Line 109 in 38d2708
Then the code compares |
So, as I can not see anything wrong with the way the connection is verified and it's a lot simpler and also more secure than sending random numbers, I'd suggest to merge the PR as is. Unless there are other issues of course. |
You're absolutely correct! The verification code is working correctly, as you say. Sorry for the trouble. I agree that it can be merged. |
ok, thanks a lot for the extra round of verification and sorry for the trouble! |
Fixes potential hang in
accept
by usingselect
+ non-blockingaccept
.Fixes potential hang in peer check by replacing the
send
/recv
check with agetsockname
/getpeername
check.Adds length check for returned
sockaddr
data.We have, very rarely, seen
Curl_socketpair
hang inaccept
for more than 20 minutes (at which time our watchdog would write a minidump and restart the application). The callstacks were not always 100% identical but they all had in common thatCurl_socketpair
hangs inaccept
. Unfortunately we were not able to reproduce those hangs locally. After inspection of the code I found two weaknesses:AFAIU
accept
could block forever, for example if the OS drops the connection from the backlog for whatever reason. Also from my tests it looks like loopback connections really go through at least large parts of the TCP stack on Windows, so I think that other scenarios could also be possible. Maybe Windows drops loopback packets in certain situations? Or some kind of DoS guard in the OS kicks in even for loopback connections? IMO the easiest (and only) fix for this is to use a timeout withaccept
, so that's what I decided to do (select
followed by non-blockingaccept
). Re. the use ofselect
: Usingselect
for a single socket in Windows-only code should not be an issue. The dreadedselect
issue with high FDs on POSIX systems doesn't affect Windows: the only limitation there is the number of sockets in anfd_set
, the value of the socket handles doesn't matter.The
send
+recv
based peer validation could also lead to hangs. If a "foreign" socket connects first, trying torecv
from the accepted connection could block forever. I decided to not use non-blocking IO here but replace the check with agetsockname
+getpeername
check instead -- because it's both easier and more secure. We have never seen hangs inrecv
but I thought if I'm already working on a patch forCurl_socketpair
I should probably address that too.I also thought it could make sense to add a check for the returned
addrlen
so I did that as well. (Might be a little paranoid, but I think it can't hurt.)