Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsan: data race in multi.c when shared connection cache #4915

Closed
MikeLankamp opened this issue Feb 12, 2020 · 12 comments
Closed

tsan: data race in multi.c when shared connection cache #4915

MikeLankamp opened this issue Feb 12, 2020 · 12 comments

Comments

@MikeLankamp
Copy link

MikeLankamp commented Feb 12, 2020

I did this

Configure & build curl with TSAN (--without-zlib/ssl is just to remove some dependencies)

./buildconf
./configure CFLAGS='-fsanitize=thread' --enable-debug --without-ssl --without-zlib`
make

Run the attached example program that does curl_easy_perform in separate threads, with shared connection cache: main.zip

Note: some data races are time-dependent, too. They may not always show up.

I got the following

Several data races:

multi.c, lines 512, 513 and 515:

WARNING: ThreadSanitizer: data race (pid=28712)
  Write of size 8 at 0x7b8800000790 by thread T1:
    #0 curl_multi_add_handle curl/lib/multi.c:512 (main+0xf983)
    #1 easy_perform curl/lib/easy.c:683 (main+0x7777)
    #2 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #3 do_transfer main.c:30 (main+0x6424)
    #4 thread_func main.c:35 (main+0x6450)
    #5 <null> <null> (libtsan.so.0+0x296ad)

  Previous write of size 8 at 0x7b8800000790 by main thread:
    #0 curl_multi_add_handle curl/lib/multi.c:512 (main+0xf983)
    #1 easy_perform curl/lib/easy.c:683 (main+0x7777)
    #2 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #3 do_transfer main.c:30 (main+0x6424)
    #4 main main.c:50 (main+0x650f)

  Location is heap block of size 6000 at 0x7b8800000000 allocated by main thread:
    #0 calloc <null> (libtsan.so.0+0x2afc3)
    #1 curl_dbg_calloc lib/memdebug.c:205 (main+0x97e2)
    #2 Curl_open lib/url.c:584 (main+0x641f7)
    #3 curl_easy_init lib/easy.c:304 (main+0x6a50)
    #4 Curl_conncache_init lib/conncache.c:120 (main+0x249c9)
    #5 curl_share_setopt lib/share.c:106 (main+0x19e66)
    #6 main main.c:43 (main+0x64aa)

  Thread T1 (tid=28714, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x2bcee)
    #1 main main.c:49 (main+0x6503)

SUMMARY: ThreadSanitizer: data race lib/multi.c:512 in curl_multi_add_handle
==================

(and identical traces for lines multi.c:513 and multi.c:515).

I believe this is caused by multiple threads writing to the (shared) closure_handle in the connection cache without a lock. These writes should probably be made while the connection cache lock is held.

A data race at url.c:1112:

==================
WARNING: ThreadSanitizer: data race (pid=3148)
  Read of size 1 at 0x7b6800000390 by thread T1 (mutexes: write M2):
    #0 ConnectionExists curl/lib/url.c:1112 (main+0x65f3d)
    #1 create_conn curl/lib/url.c:3627 (main+0x6eb3e)
    #2 Curl_connect curl/lib/url.c:3878 (main+0x6f5ae)
    #3 multi_runsingle curl/lib/multi.c:1633 (main+0x136d3)
    #4 curl_multi_perform curl/lib/multi.c:2348 (main+0x15cb0)
    #5 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #6 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #7 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #8 do_transfer main.c:30 (main+0x6424)
    #9 thread_func main.c:35 (main+0x6450)
    #10 <null> <null> (libtsan.so.0+0x296ad)

  Previous write of size 1 at 0x7b6800000390 by main thread:
    #0 Curl_init_do curl/lib/url.c:3922 (main+0x6f736)
    #1 create_conn curl/lib/url.c:3753 (main+0x6efa9)
    #2 Curl_connect curl/lib/url.c:3878 (main+0x6f5ae)
    #3 multi_runsingle curl/lib/multi.c:1633 (main+0x136d3)
    #4 curl_multi_perform curl/lib/multi.c:2348 (main+0x15cb0)
    #5 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #6 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #7 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #8 do_transfer main.c:30 (main+0x6424)
    #9 main main.c:50 (main+0x650f)

  Location is heap block of size 1504 at 0x7b6800000000 allocated by main thread:
    #0 calloc <null> (libtsan.so.0+0x2afc3)
    #1 curl_dbg_calloc curl/lib/memdebug.c:205 (main+0x97e2)
    #2 allocate_conn curl/lib/url.c:1554 (main+0x67323)
    #3 create_conn curl/lib/url.c:3363 (main+0x6d9b2)
    #4 Curl_connect curl/lib/url.c:3878 (main+0x6f5ae)
    #5 multi_runsingle curl/lib/multi.c:1633 (main+0x136d3)
    #6 curl_multi_perform curl/lib/multi.c:2348 (main+0x15cb0)
    #7 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #8 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #9 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #10 do_transfer main.c:30 (main+0x6424)
    #11 main main.c:50 (main+0x650f)

  Mutex M2 (0x5607d6940208) created at:
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x3fadb)
    #1 lock_func main.c:17 (main+0x6360)
    #2 Curl_share_lock curl/lib/share.c:236 (main+0x1a658)
    #3 Curl_conncache_foreach curl/lib/conncache.c:335 (main+0x258b7)
    #4 prune_dead_connections curl/lib/url.c:1017 (main+0x65c0b)
    #5 create_conn curl/lib/url.c:3608 (main+0x6ea2b)
    #6 Curl_connect curl/lib/url.c:3878 (main+0x6f5ae)
    #7 multi_runsingle curl/lib/multi.c:1633 (main+0x136d3)
    #8 curl_multi_perform curl/lib/multi.c:2348 (main+0x15cb0)
    #9 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #10 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #11 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #12 do_transfer main.c:30 (main+0x6424)
    #13 main main.c:50 (main+0x650f)

  Thread T1 (tid=3153, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x2bcee)
    #1 main main.c:49 (main+0x6503)

SUMMARY: ThreadSanitizer: data race curl/lib/url.c:1112 in ConnectionExists
==================

This one is about bits.close of a connection in the shared connection cache. It's checked by url.c:1112 and the connection cache (under the connection lock), but it's set at url.c:3922 without the lock.

Note that url.c:3922 does not write to the close bitfield, but to another bitfield. The problem here is false sharing in that read and writes to adjacent bitfields are not guaranteed to be thread safe since be may be stored in the same word that is read and written.

A data race at http.c:3720:

==================
WARNING: ThreadSanitizer: data race (pid=3148)
  Write of size 4 at 0x7b1000000008 by main thread:
    #0 Curl_http_readwrite_headers curl/lib/http.c:3720 (main+0x5ee41)
    #1 readwrite_data curl/lib/transfer.c:649 (main+0x7f23f)
    #2 Curl_readwrite curl/lib/transfer.c:1240 (main+0x81401)
    #3 multi_runsingle curl/lib/multi.c:2085 (main+0x1500b)
    #4 curl_multi_perform curl/lib/multi.c:2348 (main+0x15cb0)
    #5 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #6 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #7 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #8 do_transfer main.c:30 (main+0x6424)
    #9 main main.c:50 (main+0x650f)

  Previous write of size 4 at 0x7b1000000008 by thread T1:
    #0 Curl_http_readwrite_headers curl/lib/http.c:3720 (main+0x5ee41)
    #1 readwrite_data curl/lib/transfer.c:649 (main+0x7f23f)
    #2 Curl_readwrite curl/lib/transfer.c:1240 (main+0x81401)
    #3 multi_runsingle curl/lib/multi.c:2085 (main+0x1500b)
    #4 curl_multi_perform curl/lib/multi.c:2348 (main+0x15cb0)
    #5 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #6 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #7 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #8 do_transfer main.c:30 (main+0x6424)
    #9 thread_func main.c:35 (main+0x6450)
    #10 <null> <null> (libtsan.so.0+0x296ad)

  Location is heap block of size 64 at 0x7b1000000000 allocated by main thread:
    #0 malloc <null> (libtsan.so.0+0x2ae03)
    #1 curl_dbg_malloc curl/lib/memdebug.c:174 (main+0x9689)
    #2 bundle_create curl/lib/conncache.c:57 (main+0x2467a)
    #3 Curl_conncache_add_conn curl/lib/conncache.c:248 (main+0x2529e)
    #4 create_conn curl/lib/url.c:3727 (main+0x6ef89)
    #5 Curl_connect curl/lib/url.c:3878 (main+0x6f5ae)
    #6 multi_runsingle curl/lib/multi.c:1633 (main+0x136d3)
    #7 curl_multi_perform curl/lib/multi.c:2348 (main+0x15cb0)
    #8 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #9 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #10 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #11 do_transfer main.c:30 (main+0x6424)
    #12 main main.c:50 (main+0x650f)

  Thread T1 (tid=3153, finished) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x2bcee)
    #1 main main.c:49 (main+0x6503)

SUMMARY: ThreadSanitizer: data race curl/lib/http.c:3720 in Curl_http_readwrite_headers
==================

Here the problem is concurrent writes to the shared bundle's multiuse field without a lock. There are several cases of multiuse being written directly. I wonder if that shouldn't go through Curl_multiuse_state somehow?

A data race at url.c:1022:

==================
WARNING: ThreadSanitizer: data race (pid=13573)
  Write of size 8 at 0x7b3800000068 by thread T1:
    #0 prune_dead_connections curl/lib/url.c:1022 (main+0x65d91)
    #1 create_conn curl/lib/url.c:3608 (main+0x6eb84)
    #2 Curl_connect curl/lib/url.c:3878 (main+0x6f707)
    #3 multi_runsingle curl/lib/multi.c:1647 (main+0x1382c)
    #4 curl_multi_perform curl/lib/multi.c:2362 (main+0x15e09)
    #5 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #6 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #7 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #8 do_transfer main.c:30 (main+0x6424)
    #9 thread_func main.c:35 (main+0x6450)
    #10 <null> <null> (libtsan.so.0+0x296ad)

  Previous write of size 8 at 0x7b3800000068 by main thread:
    #0 prune_dead_connections curl/lib/url.c:1022 (main+0x65d91)
    #1 create_conn curl/lib/url.c:3608 (main+0x6eb84)
    #2 Curl_connect curl/lib/url.c:3878 (main+0x6f707)
    #3 multi_runsingle curl/lib/multi.c:1647 (main+0x1382c)
    #4 curl_multi_perform curl/lib/multi.c:2362 (main+0x15e09)
    #5 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #6 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #7 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #8 do_transfer main.c:30 (main+0x6424)
    #9 main main.c:50 (main+0x650f)

  Location is heap block of size 216 at 0x7b3800000000 allocated by main thread:
    #0 calloc <null> (libtsan.so.0+0x2afc3)
    #1 curl_dbg_calloc curl/lib/memdebug.c:205 (main+0x97e2)
    #2 curl_share_init curl/lib/share.c:38 (main+0x19c30)
    #3 main main.c:42 (main+0x648b)

  Thread T1 (tid=13577, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x2bcee)
    #1 main main.c:49 (main+0x6503)

SUMMARY: ThreadSanitizer: data race curl/lib/url.c:1022 in prune_dead_connections
==================

The problem is concurrent writes to last_cleanup in the shared connection cache in prune_dead_connections without a lock. Perhaps Curl_conncache_foreach should not lock and unlock the connection cache, but it should be the responsibility of the caller to do so, so that this type of additional work can also be done under lock (e.g. prune_dead_connections should just grab the lock as a whole).

Data race at llist.c:86, via multi.c:857:

==================
WARNING: ThreadSanitizer: data race (pid=16788)
  Write of size 8 at 0x7b6800000458 by main thread:
    #0 Curl_llist_insert_next curl/lib/llist.c:86 (main+0x854b3)
    #1 Curl_attach_connnection curl/lib/multi.c:857 (main+0x11199)
    #2 Curl_connect curl/lib/url.c:3904 (main+0x6faa1)
    #3 multi_runsingle curl/lib/multi.c:1647 (main+0x13aa6)
    #4 curl_multi_perform curl/lib/multi.c:2362 (main+0x16083)
    #5 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #6 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #7 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #8 do_transfer main.c:30 (main+0x6424)
    #9 main main.c:50 (main+0x650f)

  Previous read of size 8 at 0x7b6800000458 by thread T1 (mutexes: write M2):
    #0 extract_if_dead curl/lib/url.c:949 (main+0x65b5d)
    #1 call_extract_if_dead curl/lib/url.c:994 (main+0x65d64)
    #2 Curl_conncache_foreach curl/lib/conncache.c:347 (main+0x25d0e)
    #3 prune_dead_connections curl/lib/url.c:1018 (main+0x65f4c)
    #4 create_conn curl/lib/url.c:3610 (main+0x6ee08)
    #5 Curl_connect curl/lib/url.c:3880 (main+0x6f98b)
    #6 multi_runsingle curl/lib/multi.c:1647 (main+0x13aa6)
    #7 curl_multi_perform curl/lib/multi.c:2362 (main+0x16083)
    #8 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #9 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #10 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #11 do_transfer main.c:30 (main+0x6424)
    #12 thread_func main.c:35 (main+0x6450)
    #13 <null> <null> (libtsan.so.0+0x296ad)

  Location is heap block of size 1504 at 0x7b6800000000 allocated by main thread:
    #0 calloc <null> (libtsan.so.0+0x2afc3)
    #1 curl_dbg_calloc curl/lib/memdebug.c:205 (main+0x991f)
    #2 allocate_conn curl/lib/url.c:1556 (main+0x67700)
    #3 create_conn curl/lib/url.c:3365 (main+0x6dd8f)
    #4 Curl_connect curl/lib/url.c:3880 (main+0x6f98b)
    #5 multi_runsingle curl/lib/multi.c:1647 (main+0x13aa6)
    #6 curl_multi_perform curl/lib/multi.c:2362 (main+0x16083)
    #7 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #8 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #9 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #10 do_transfer main.c:30 (main+0x6424)
    #11 main main.c:50 (main+0x650f)

  Mutex M2 (0x55f757ddc208) created at:
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x3fadb)
    #1 lock_func main.c:17 (main+0x6360)
    #2 Curl_share_lock curl/lib/share.c:236 (main+0x1aa2b)
    #3 curl_multi_add_handle curl/lib/multi.c:521 (main+0xfbee)
    #4 easy_perform curl/lib/easy.c:683 (main+0x7777)
    #5 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #6 do_transfer main.c:30 (main+0x6424)
    #7 main main.c:50 (main+0x650f)

  Thread T1 (tid=16792, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x2bcee)
    #1 main main.c:49 (main+0x6503)

SUMMARY: ThreadSanitizer: data race curl/lib/llist.c:86 in Curl_llist_insert_next
==================

This one is about Curl_attach_connnection modifying a shared connection's easyq without a lock while the queue is checked by extract_if_dead (which does have the connection lock).

Data race at multi.c:944:

==================
WARNING: ThreadSanitizer: data race (pid=16788)
  Write of size 8 at 0x7b6800000008 by main thread:
    #0 multi_getsock curl/lib/multi.c:944 (main+0x115f5)
    #1 Curl_multi_wait curl/lib/multi.c:1076 (main+0x11cc7)
    #2 curl_multi_poll curl/lib/multi.c:1269 (main+0x1288c)
    #3 easy_transfer curl/lib/easy.c:602 (main+0x74ef)
    #4 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #5 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #6 do_transfer main.c:30 (main+0x6424)
    #7 main main.c:50 (main+0x650f)

  Previous read of size 8 at 0x7b6800000008 by thread T1 (mutexes: write M2):
    #0 extract_if_dead curl/lib/url.c:949 (main+0x65b7d)
    #1 call_extract_if_dead curl/lib/url.c:994 (main+0x65d64)
    #2 Curl_conncache_foreach curl/lib/conncache.c:347 (main+0x25d0e)
    #3 prune_dead_connections curl/lib/url.c:1018 (main+0x65f4c)
    #4 create_conn curl/lib/url.c:3610 (main+0x6ee08)
    #5 Curl_connect curl/lib/url.c:3880 (main+0x6f98b)
    #6 multi_runsingle curl/lib/multi.c:1647 (main+0x13aa6)
    #7 curl_multi_perform curl/lib/multi.c:2362 (main+0x16083)
    #8 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #9 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #10 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #11 do_transfer main.c:30 (main+0x6424)
    #12 thread_func main.c:35 (main+0x6450)
    #13 <null> <null> (libtsan.so.0+0x296ad)

  Location is heap block of size 1504 at 0x7b6800000000 allocated by main thread:
    #0 calloc <null> (libtsan.so.0+0x2afc3)
    #1 curl_dbg_calloc curl/lib/memdebug.c:205 (main+0x991f)
    #2 allocate_conn curl/lib/url.c:1556 (main+0x67700)
    #3 create_conn curl/lib/url.c:3365 (main+0x6dd8f)
    #4 Curl_connect curl/lib/url.c:3880 (main+0x6f98b)
    #5 multi_runsingle curl/lib/multi.c:1647 (main+0x13aa6)
    #6 curl_multi_perform curl/lib/multi.c:2362 (main+0x16083)
    #7 easy_transfer curl/lib/easy.c:605 (main+0x750b)
    #8 easy_perform curl/lib/easy.c:698 (main+0x77e1)
    #9 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #10 do_transfer main.c:30 (main+0x6424)
    #11 main main.c:50 (main+0x650f)

  Mutex M2 (0x55f757ddc208) created at:
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x3fadb)
    #1 lock_func main.c:17 (main+0x6360)
    #2 Curl_share_lock curl/lib/share.c:236 (main+0x1aa2b)
    #3 curl_multi_add_handle curl/lib/multi.c:521 (main+0xfbee)
    #4 easy_perform curl/lib/easy.c:683 (main+0x7777)
    #5 curl_easy_perform curl/lib/easy.c:717 (main+0x7832)
    #6 do_transfer main.c:30 (main+0x6424)
    #7 main main.c:50 (main+0x650f)

  Thread T1 (tid=16792, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x2bcee)
    #1 main main.c:49 (main+0x6503)

SUMMARY: ThreadSanitizer: data race curl/lib/multi.c:944 in multi_getsock
==================

This one happens because multi_getsock sets a shared connection's ownership in conn->data without a lock while extract_if_dead checks that field (under lock).
As an aside, I wonder what the purpose of the code in multi_getsock is. Shouldn't the connection already be guaranteed to have an owner at this point?

I expected the following

No data races reported by TSAN.

curl/libcurl version

libcurl: master (8957e6e4ed4d0841638511e0decf31abf006c6d2)
gcc: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

operating system

Ubuntu 18.04 LTS:
Linux 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

@MikeLankamp
Copy link
Author

FYI, I found multiple (valid, IMO) data races this way. Should I report a separate issue for every one?

@bagder
Copy link
Member

bagder commented Feb 12, 2020

If they're related to the shared connection cache I think you can keep them here in the same issue. If they're related to other shared things or similar, then maybe separate?

@bagder bagder changed the title tsan: data race in multi.c tsan: data race in multi.c when shared connection cache Feb 12, 2020
@bagder bagder added the crash label Feb 12, 2020
@MikeLankamp
Copy link
Author

MikeLankamp commented Feb 12, 2020

Thanks, then I'll update this issue with the other ones I found. I'll also create new issues per other component, if any.

Edit: Done. These are all the data races I could find. They are all about the shared connection cache (unsurprisingly).

@MikeLankamp
Copy link
Author

Hi @bagder, I've tried to fix these issues myself, but several of the data races are about interactions with specific modules in particular ways. I do not know enough about the Curl internals and relations between (optional) components to know how to fix it. Would there be any chance you, or anyone else could help out?

@bagder
Copy link
Member

bagder commented Feb 17, 2020

Thanks. Yes, this is really low-level libcurl stuff that isn't that too easy to get into. I plan to look deeper at this asap.

@bagder bagder self-assigned this Feb 17, 2020
@bagder
Copy link
Member

bagder commented Feb 20, 2020

I've taken a first deep look on this issue and I learned plenty.

The issues are real

tsan is awesome and this recipe is a very simple and effective way to prove some of the problems.

WIP patch

I have a work-in-progress patch that starts to clean up things and improve the situation somewhat. I might submit it as a WIP PR just to get it tested and commented, but it probably isn't too interesting to land as long as it only partially addresses the issues.

Fix

To fix this proper, I need to change how we handle connections and the connection cache.

Right now, the connection cache still holds the in-use connections (to enable multiplexing etc) but doing that if one of the connections is currently in use in another thread is a recipe for disaster. This is going to be come a somewhat larger internal change and it will take me a little time to figure out a new model and then to implement that.

@MikeLankamp
Copy link
Author

Thanks for verifying it! Yes, all the sanitizers are awesome (TSAN, ASAN, etc) :) Maybe it could be part of curl's CI?

Looking forward to the fix; I'm willing to help where/if I can.

@bagder
Copy link
Member

bagder commented Feb 28, 2020

We already run several sanitizers in the CI. I think the tsan one is left behind because we don't do threading natively, but once we fix this PR we should create a test case out of the repro and to a tsan build.

I haven't had much time to work/think on this issue so it certainly will not be fixed before the pending next release.

bagder added a commit that referenced this issue Mar 1, 2020
More connection cache accesses are protected by locks.

CONNCACHE_* is a beter prefix for the connection cache lock macros.

Curl_attach_connnection: now called as soon as there's a connection
struct available and before the connection is added to the connection
cache.

Curl_disconnect: now assumes that the connection is already removed from
the connection cache.

Ref: #4915
@bagder
Copy link
Member

bagder commented Mar 1, 2020

Now #5009 contains the first steps towards fixing this. It's not done yet.

bagder added a commit that referenced this issue Mar 1, 2020
More connection cache accesses are protected by locks.

CONNCACHE_* is a beter prefix for the connection cache lock macros.

Curl_attach_connnection: now called as soon as there's a connection
struct available and before the connection is added to the connection
cache.

Curl_disconnect: now assumes that the connection is already removed from
the connection cache.

Ref: #4915
bagder added a commit that referenced this issue Mar 23, 2020
More connection cache accesses are protected by locks.

CONNCACHE_* is a beter prefix for the connection cache lock macros.

Curl_attach_connnection: now called as soon as there's a connection
struct available and before the connection is added to the connection
cache.

Curl_disconnect: now assumes that the connection is already removed from
the connection cache.

Ref: #4915
bagder added a commit that referenced this issue Mar 27, 2020
More connection cache accesses are protected by locks.

CONNCACHE_* is a beter prefix for the connection cache lock macros.

Curl_attach_connnection: now called as soon as there's a connection
struct available and before the connection is added to the connection
cache.

Curl_disconnect: now assumes that the connection is already removed from
the connection cache.

Ref: #4915
Closes #5009
bagder added a commit that referenced this issue Mar 31, 2020
More connection cache accesses are protected by locks.

CONNCACHE_* is a beter prefix for the connection cache lock macros.

Curl_attach_connnection: now called as soon as there's a connection
struct available and before the connection is added to the connection
cache.

Curl_disconnect: now assumes that the connection is already removed from
the connection cache.

Ref: #4915
Closes #5009
bagder added a commit that referenced this issue Apr 14, 2020
More connection cache accesses are protected by locks.

CONNCACHE_* is a beter prefix for the connection cache lock macros.

Curl_attach_connnection: now called as soon as there's a connection
struct available and before the connection is added to the connection
cache.

Curl_disconnect: now assumes that the connection is already removed from
the connection cache.

Ref: #4915
Closes #5009
coryan added a commit to coryan/google-cloud-cpp that referenced this issue Apr 20, 2020
It seems safer to stop using the CURL_SHARE features until
curl/curl#4915 is fixed. We have outstanding bug reports of crashes in
multi-threaded applications, and our own tests trigger errors when
runing under ThreadSanitizer. I considered making this an optional
feature, disabled by default, but the only case where this could be
useful is a single-threaded application that starts multiple downloads
at the same time, and the benefits in that case are (probably) very
modest: the connection setup would be slightly faster, but everything
else would be the same.
coryan added a commit to googleapis/google-cloud-cpp that referenced this issue Apr 20, 2020
It seems safer to stop using the CURL_SHARE features until
curl/curl#4915 is fixed. We have outstanding bug reports of crashes in
multi-threaded applications, and our own tests trigger errors when
runing under ThreadSanitizer. I considered making this an optional
feature, disabled by default, but the only case where this could be
useful is a single-threaded application that starts multiple downloads
at the same time, and the benefits in that case are (probably) very
modest: the connection setup would be slightly faster, but everything
else would be the same.
bagder added a commit that referenced this issue Apr 26, 2020
More connection cache accesses are protected by locks.

CONNCACHE_* is a beter prefix for the connection cache lock macros.

Curl_attach_connnection: now called as soon as there's a connection
struct available and before the connection is added to the connection
cache.

Curl_disconnect: now assumes that the connection is already removed from
the connection cache.

Ref: #4915
Closes #5009
bagder added a commit that referenced this issue Apr 30, 2020
More connection cache accesses are protected by locks.

CONNCACHE_* is a beter prefix for the connection cache lock macros.

Curl_attach_connnection: now called as soon as there's a connection
struct available and before the connection is added to the connection
cache.

Curl_disconnect: now assumes that the connection is already removed from
the connection cache.

Ref: #4915
Closes #5009
@MikeLankamp
Copy link
Author

Hi @bagder, any luck in approaching a fix? I'm willing to help somehow if possible...

@bagder
Copy link
Member

bagder commented Jun 10, 2020

(I documented something about the problems in our wiki)

I haven't done much apart from landing that minor cleanup in #5009. Fixing this issue correctly is a rather big undertaking and because of that, I've mostly postponed it. I also haven't quite figured out how we should architect this so that we can accomplish the desired functionality with the smallest amount of refactoring and code churn. Excuses, I know, but...

@bagder
Copy link
Member

bagder commented Aug 11, 2020

I don't see myself able to fix this in near term. I'll shift focus to document the current situation instead so that users know what to expect.

@bagder bagder closed this as completed in cb8cf9d Aug 12, 2020
jay added a commit to jay/curl that referenced this issue Mar 26, 2021
Known bug 11.11 is the shared object's connection cache is not thread
safe, so we should not have an example for it.

Ref: curl#4915
Ref: https://curl.se/docs/knownbugs.html#A_shared_connection_cache_is_not

Closes #xxxx
jay added a commit that referenced this issue Mar 26, 2021
Known bug 11.11 is the shared object's connection cache is not thread
safe, so we should not have an example for it.

Ref: #4915
Ref: https://curl.se/docs/knownbugs.html#A_shared_connection_cache_is_not

Closes #6795
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants