-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
socket: segfault when requesting google.com twice #3986
Comments
|
This doesn't reproduce for me after having run it dozens of times. Also tried without proxy. I suppose this is a new crash? Can you git bisect which commit that causes it? Or perhaps first: build with debug symbols, run with gdb and do 'bt full' on the crash and show us that dump? |
Did some dancing with curl configure flags and it seems this is only reproducible when I add |
|
bisect has my name on it...
|
It is certainly related to that fix. The assert you see happens when it traverses the list of users of a socket and a pointer is NULL where it isn't supposed to be. The commit above (8581e19) was meant to make sure there couldn't be any such NULL entries in that list at that point in time. The problem most likely happens when a connection is closed with more than one stream using it. I could easily repro such a problem up until I landed 8581e19 but now I cannot and I've tried tuning your recipe in several ways with no luck... |
Well, it's indeed a bit hard to fix a bug that can't be reproduced... Maybe this log with debug enabled makes it easier to figure out how to reproduce it (idk...)
It does only seem to happen with c-ares enabled, and only when the proxy server needs to be resolved via DNS. Any chance you're attempting it via a 'localhost' proxy? That may not hit dns. It's hitting the assertion on every run, it does not seem to be a race condition style bug. |
I take it back, it does seem to be a race condition with DNS resolution. Changing the hostname from 'my-proxy-server' to '1.2.3.4.xip.io' (resolves to same IP address, but slower because external DNS) makes the behavior slightly less predictable (but still crashes on every run). If you were using a localhost proxy, try |
Thanks for that extra detail, I just made the assert trigger here... |
I believe this is the situation (with some certainty). The problemWe have a hash table where we add all sockets in use. The "sockhash". In each "sockhash" entry there's linked list. That list contains all "transfers" that are using the same socket, so that when there's activity on the socket we can iterate over all the relevant transfers since one or more of them might be affected. Turns out that I was a bit silly previously since each transfer ( In the current problematic case in here, we can see that due to the race the transfer is sometimes added to a new socket and after that it gets removed from the previous. But since there's just a single storage area, the "removal from the previous" clears the current single association and badness ensues. The fix(Current thinking) We need to make sure that each sockhash can have a linked list with transfers in so that each transfer can exist in multiple lists at the same time. We need to give up using a single storage area for the |
Since more than one socket can be used by each handle at a given time. TODO: change to use a separate hash table in each Curl_sh_entry object instead so that removing a handle from it doesn't have to iterate. A single socket can be used by hundreds or even thousands of different transfers at the same time. Reported-by: Tom van der Woerdt Fixes #3986
Since more than one socket can be used by each transfer at a given time, each sockhash entry how has its own hash table with transfers using that socket. In addition, the sockhash entry can now be marked 'blocked = TRUE'" which then makes the delete function just set 'removed = TRUE' instead of removing it "for real", as a way to not rip out the carpet under the feet of a parent function that iterates over the transfers of that same sockhash entry. Reported-by: Tom van der Woerdt Fixes #3961 Fixes #3986 Fixes #3995 Closes #3997
Thought I'd give curl master a try before 7.65.1 gets released to see if it passes my own unit tests... it did not 😢
I did this
https://gist.github.com/TvdW/d310abd835550dbc3c45649b04373186
Basically just two requests to google.com via the socket api
I expected the following
No segfault 😄
curl/libcurl version
(reports 7.65.0 because I packaged it via
maketgz
first)The text was updated successfully, but these errors were encountered: