-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tsan: data race in multi.c when shared connection cache #4915
Comments
FYI, I found multiple (valid, IMO) data races this way. Should I report a separate issue for every one? |
If they're related to the shared connection cache I think you can keep them here in the same issue. If they're related to other shared things or similar, then maybe separate? |
Thanks, then I'll update this issue with the other ones I found. I'll also create new issues per other component, if any. Edit: Done. These are all the data races I could find. They are all about the shared connection cache (unsurprisingly). |
Hi @bagder, I've tried to fix these issues myself, but several of the data races are about interactions with specific modules in particular ways. I do not know enough about the Curl internals and relations between (optional) components to know how to fix it. Would there be any chance you, or anyone else could help out? |
Thanks. Yes, this is really low-level libcurl stuff that isn't that too easy to get into. I plan to look deeper at this asap. |
I've taken a first deep look on this issue and I learned plenty. The issues are realtsan is awesome and this recipe is a very simple and effective way to prove some of the problems. WIP patchI have a work-in-progress patch that starts to clean up things and improve the situation somewhat. I might submit it as a WIP PR just to get it tested and commented, but it probably isn't too interesting to land as long as it only partially addresses the issues. FixTo fix this proper, I need to change how we handle connections and the connection cache. Right now, the connection cache still holds the in-use connections (to enable multiplexing etc) but doing that if one of the connections is currently in use in another thread is a recipe for disaster. This is going to be come a somewhat larger internal change and it will take me a little time to figure out a new model and then to implement that. |
Thanks for verifying it! Yes, all the sanitizers are awesome (TSAN, ASAN, etc) :) Maybe it could be part of curl's CI? Looking forward to the fix; I'm willing to help where/if I can. |
We already run several sanitizers in the CI. I think the tsan one is left behind because we don't do threading natively, but once we fix this PR we should create a test case out of the repro and to a tsan build. I haven't had much time to work/think on this issue so it certainly will not be fixed before the pending next release. |
More connection cache accesses are protected by locks. CONNCACHE_* is a beter prefix for the connection cache lock macros. Curl_attach_connnection: now called as soon as there's a connection struct available and before the connection is added to the connection cache. Curl_disconnect: now assumes that the connection is already removed from the connection cache. Ref: #4915
Now #5009 contains the first steps towards fixing this. It's not done yet. |
More connection cache accesses are protected by locks. CONNCACHE_* is a beter prefix for the connection cache lock macros. Curl_attach_connnection: now called as soon as there's a connection struct available and before the connection is added to the connection cache. Curl_disconnect: now assumes that the connection is already removed from the connection cache. Ref: #4915
More connection cache accesses are protected by locks. CONNCACHE_* is a beter prefix for the connection cache lock macros. Curl_attach_connnection: now called as soon as there's a connection struct available and before the connection is added to the connection cache. Curl_disconnect: now assumes that the connection is already removed from the connection cache. Ref: #4915
More connection cache accesses are protected by locks. CONNCACHE_* is a beter prefix for the connection cache lock macros. Curl_attach_connnection: now called as soon as there's a connection struct available and before the connection is added to the connection cache. Curl_disconnect: now assumes that the connection is already removed from the connection cache. Ref: #4915 Closes #5009
More connection cache accesses are protected by locks. CONNCACHE_* is a beter prefix for the connection cache lock macros. Curl_attach_connnection: now called as soon as there's a connection struct available and before the connection is added to the connection cache. Curl_disconnect: now assumes that the connection is already removed from the connection cache. Ref: #4915 Closes #5009
More connection cache accesses are protected by locks. CONNCACHE_* is a beter prefix for the connection cache lock macros. Curl_attach_connnection: now called as soon as there's a connection struct available and before the connection is added to the connection cache. Curl_disconnect: now assumes that the connection is already removed from the connection cache. Ref: #4915 Closes #5009
It seems safer to stop using the CURL_SHARE features until curl/curl#4915 is fixed. We have outstanding bug reports of crashes in multi-threaded applications, and our own tests trigger errors when runing under ThreadSanitizer. I considered making this an optional feature, disabled by default, but the only case where this could be useful is a single-threaded application that starts multiple downloads at the same time, and the benefits in that case are (probably) very modest: the connection setup would be slightly faster, but everything else would be the same.
It seems safer to stop using the CURL_SHARE features until curl/curl#4915 is fixed. We have outstanding bug reports of crashes in multi-threaded applications, and our own tests trigger errors when runing under ThreadSanitizer. I considered making this an optional feature, disabled by default, but the only case where this could be useful is a single-threaded application that starts multiple downloads at the same time, and the benefits in that case are (probably) very modest: the connection setup would be slightly faster, but everything else would be the same.
More connection cache accesses are protected by locks. CONNCACHE_* is a beter prefix for the connection cache lock macros. Curl_attach_connnection: now called as soon as there's a connection struct available and before the connection is added to the connection cache. Curl_disconnect: now assumes that the connection is already removed from the connection cache. Ref: #4915 Closes #5009
More connection cache accesses are protected by locks. CONNCACHE_* is a beter prefix for the connection cache lock macros. Curl_attach_connnection: now called as soon as there's a connection struct available and before the connection is added to the connection cache. Curl_disconnect: now assumes that the connection is already removed from the connection cache. Ref: #4915 Closes #5009
Hi @bagder, any luck in approaching a fix? I'm willing to help somehow if possible... |
(I documented something about the problems in our wiki) I haven't done much apart from landing that minor cleanup in #5009. Fixing this issue correctly is a rather big undertaking and because of that, I've mostly postponed it. I also haven't quite figured out how we should architect this so that we can accomplish the desired functionality with the smallest amount of refactoring and code churn. Excuses, I know, but... |
I don't see myself able to fix this in near term. I'll shift focus to document the current situation instead so that users know what to expect. |
Known bug 11.11 is the shared object's connection cache is not thread safe, so we should not have an example for it. Ref: curl#4915 Ref: https://curl.se/docs/knownbugs.html#A_shared_connection_cache_is_not Closes #xxxx
Known bug 11.11 is the shared object's connection cache is not thread safe, so we should not have an example for it. Ref: #4915 Ref: https://curl.se/docs/knownbugs.html#A_shared_connection_cache_is_not Closes #6795
I did this
Configure & build curl with TSAN (
--without-zlib/ssl
is just to remove some dependencies)Run the attached example program that does
curl_easy_perform
in separate threads, with shared connection cache: main.zipNote: some data races are time-dependent, too. They may not always show up.
I got the following
Several data races:
multi.c
, lines 512, 513 and 515:(and identical traces for lines multi.c:513 and multi.c:515).
I believe this is caused by multiple threads writing to the (shared)
closure_handle
in the connection cache without a lock. These writes should probably be made while the connection cache lock is held.A data race at
url.c:1112
:This one is about
bits.close
of a connection in the shared connection cache. It's checked by url.c:1112 and the connection cache (under the connection lock), but it's set at url.c:3922 without the lock.Note that url.c:3922 does not write to the
close
bitfield, but to another bitfield. The problem here is false sharing in that read and writes to adjacent bitfields are not guaranteed to be thread safe since be may be stored in the same word that is read and written.A data race at
http.c:3720
:Here the problem is concurrent writes to the shared bundle's
multiuse
field without a lock. There are several cases ofmultiuse
being written directly. I wonder if that shouldn't go throughCurl_multiuse_state
somehow?A data race at
url.c:1022
:The problem is concurrent writes to
last_cleanup
in the shared connection cache inprune_dead_connections
without a lock. PerhapsCurl_conncache_foreach
should not lock and unlock the connection cache, but it should be the responsibility of the caller to do so, so that this type of additional work can also be done under lock (e.g.prune_dead_connections
should just grab the lock as a whole).Data race at
llist.c:86
, viamulti.c:857
:This one is about
Curl_attach_connnection
modifying a shared connection'seasyq
without a lock while the queue is checked byextract_if_dead
(which does have the connection lock).Data race at
multi.c:944
:This one happens because
multi_getsock
sets a shared connection's ownership inconn->data
without a lock whileextract_if_dead
checks that field (under lock).As an aside, I wonder what the purpose of the code in
multi_getsock
is. Shouldn't the connection already be guaranteed to have an owner at this point?I expected the following
No data races reported by TSAN.
curl/libcurl version
libcurl:
master (8957e6e4ed4d0841638511e0decf31abf006c6d2)
gcc:
(Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
operating system
Ubuntu 18.04 LTS:
Linux 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: