-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
invalid conn->data
pointer in ConnectionExists when multiplexing
#4845
Comments
Additionally I can provide following details:
|
This looks like a problem we've already fixed. Can you reproduce this with 7.68.0? |
@bagder, thanks for quick feedback. I'm gonna try this now. I've looked throughout changelog before posting the issue but haven't found any |
@bagder , tried with 7.68.0 - I still can reproduce the problem |
I've checked that this condition inside curl_multi_remove_handle() is not met So detach_connection() is really not called. This is why CONN_INUSE returns true - actual value of c->easyq.size is 1. So, it seems that in HTTP/2 there is a code path when data->conn is NULL, while data actually is backed up by some connection. Sadly I'm not into connection management of curl :( Upd: seems that this is wrong path of investigation. detach_connection() is happened for this connection before. |
It might be worth trying to create a stand-alone version of the code that can reproduce this problem so that you can share that with us and we can help out with debugging from our ends... |
@bagder , this seems too complicated, because it depends on a pretty big piece of code that combines curl with boost::asio. Anyway, seems that I've found the problem. This code is at the end of ConnectionExists function in url.c:
Here the easy handle I only do not understand why valgrind finds such case strictly once per program run though it seems that this situation have to happen again and again. |
Are you perchance using a shared connection cache? |
No. Never knew about it until you asked. Is it meanful when using multi
interface?
чт, 23 янв. 2020 г., 23:51 Daniel Stenberg <notifications@github.com>:
… Are you perchance using a shared connection cache?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4845?email_source=notifications&email_token=AGLSFXVHDEB3AYPRAN37JTTQ7H7M5A5CNFSM4KKZQJAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJYZZMY#issuecomment-577871027>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGLSFXVKPHOID3LIDXSCIV3Q7H7M5ANCNFSM4KKZQJAA>
.
|
It's not meaningful if you only use a single multi handle in a single thread, no, since a multi handle already shares the connection cache with all the added easy handles. I asked because a shared connection cache makes a different set of race conditions (when two threads share the same cache) that I thought could be playing a part here. But then I was wrong. |
To exclude additional factors I have to mention that it happens in a single
thread. So any race conditions should be out of scope.
пт, 24 янв. 2020 г., 00:08 Daniel Stenberg <notifications@github.com>:
… It's not meaningful if you only use a single multi handle in a single
thread, no, since a multi handle already shares the connection cache with
all the added easy handles.
I asked because a shared connection cache makes a different set of race
conditions (when two threads share the same cache) that I thought could be
playing a part here. But then I was wrong.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4845?email_source=notifications&email_token=AGLSFXSUCARUQYGXG335L2TQ7IBLPA5CNFSM4KKZQJAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJY3L4Q#issuecomment-577877490>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGLSFXW7QJDN7SR55KJ4WMLQ7IBLPANCNFSM4KKZQJAA>
.
|
You need to give me a lot more details if you want my help as this is really hard for me to debug without a recipe that reproduces in my end. The problem happens when you add a new transfer. I presume the previous transfer(s) still going at that point are intended? Are they many? Did any previous transfer fail (early) ? Can you add a debug output that counts users/transfers on each connection as an aid to see if we can detect when that gets wrong at first. |
Ok, Daniel. I've taken your example https://curl.haxx.se/libcurl/c/asiohiper.html and adapted it to make use of HTTP/2 and multiple outstanding request. Valgrind reports invalid read in Hope it will help. Here is the code:
|
That feels really unreliable. Can we fix the example first? |
I think I fixed it. Code below
|
Thanks! I don't know anything about asio. This code looks wrong - it uses the socket open and close callbacks to judge if the socket should be monitored. Creating and deleting sockets don't imply monitoring hints! I realize this is based on an example we provide but this is not a good use of our API. I'd even call it downright wrong. (subject for a separate issue if we can't work it out in here) Can you make the example work without using CURLOPT_OPENSOCKETFUNCTION and CURLOPT_CLOSESOCKETFUNCTION ? It could also simplify the code. |
I think you were misleaded by the comment
No, I can't. It would be kind of plane without wings. As far as I understand curl and boost that is impossible. |
You're supposed to add/remove the socket to monitor based on the info in the socket callback. Why isn't that enough? The special-handling of c-ares sockets in |
We can't play with curl's socket handles if we want to make use of boost async functionality. For this we need to create
I agree that code looks ugly. But either you do know some other perfect way to use multi interface in async mode, or it is the way which multi interface actually works. |
Why can't you call those functions and set that up when the socket callback tells you to monitor a socket and then also remove that socket again when the socket callback tells you to stop monitoring that socket? That's how the socket callback works, that's how it has always been intended to be used and that's how you use it with any other event library as well. What makes asio so special that you need to get pre-alerted that there's a socket coming?
Every event library needs to act on the new action in the new callback. That's why the callback is called for new actions.
I said "broken" and I stand by it. It willfully ignores to monitor a socket (or more) that libcurl asks the application to monitor and therefore it will do badly then. That's worse than just ugly, it's wrong. |
I built the example and I've run it multiple times but it does not reproduce on my Debian Linux, libcurl from git master, libasio 1.12.2 (from a Debian Package). Tried with and without valgrind. Does it repro every time for you? Should I keep trying? |
Just wanted to update here that I'm facing similar issue with set of tests on 7.68.0 + 9607532 This was tracked in #4779
I'm trying to reproduce this and once I have some substantial information, I shall update -- may be as a different issue. Just for reference here. |
That certainly looks like a very similar stack trace so it might very well be the same problem. It would be helpful if (both of you) could:
It looks like it is the So detailed observations about current and past set of transfers at the time of the crash is very important. How come this connection gets a broken transfer-pointer. How did the specific transfer that used that connection end etc? |
In a debug build of curl, maybe the following can help to catch the problem sooner: diff --git a/lib/url.c b/lib/url.c
index 689668e04..837e62a09 100644
--- a/lib/url.c
+++ b/lib/url.c
@@ -1211,10 +1211,12 @@ ConnectionExists(struct Curl_easy *data,
continue;
}
}
}
+ DEBUGASSERT(!check->data || GOOD_EASY_HANDLE(check->data));
+
if(!canmultiplex && check->data)
/* this request can't be multiplexed but the checked connection is
already in use so we skip it */
continue;
|
Sure, shall try with git master and above |
yes it hits at assertion
https://gist.github.com/kunalekawde/59a6e3c7a737b36bc9502c25b16a8373 |
I cannot say with confirmation but git latest + assertion, its hitting very soon, earlier related crash was seen a little later.
I could also get valgrind for same:
Probably check->data was freed earlier -- this would happen on response:
which was allocated -- while sending request
|
There's no need to start spewing stack traces here for when the assert or valgrind problems trigger. We already know what's wrong then. We need to figure out how the problem is introduced. Why is the |
For me it happens every time when I run example. Just a couple of seconds.
Have you built it with libnghttp2? Key point that it reproduces only when HTTP/2 is being used.
There is really no code which checks during curl_multi_remove_handle() if easy handle is bundled with any connection and checks it conn->data. Get back to this message #4845 (comment) - here is the place where we end up with some conn->data assigned, but there is no other place in code (i haven't found) where conn->data is cleared or assigned to another easy handle. |
Yes I did. Possibly that c-ares flaw haunted my attempts. I'll try a build without c-ares.
In the Lines 772 to 773 in 872ea75
That removes the connection from the easy handle to the connection. The other direction is cleared in Line 600 in 872ea75
Where the connection's 'data' pointer is cleared after the transfer is "done". |
The example crash reproduces for me with a clang sanitizer build with the threaded resolver. |
... since the current transfer is being killed. Setting to NULL is wrong, leaving it pointing to 'data' is wrong since that handle might be about to get freed. Fixes #4845 Reported-by: dmitrmax on github
#4858 made the issue not appear for me anymore. |
conn->data
pointer in ConnectionExists when multiplexing
Yeah this seems to be working for my test run as well. |
Thanks, guys! |
I did this
Actually I'm trying to reopen issue #4062 as I have new details.
valgrind run app with libcurl.so with nghttp2 1.38.0
I expected the following
no detected problems with memory access
curl/libcurl version
7.66.0
==307== Invalid read of size 8
==307== at 0x603CBC1: ConnectionExists (url.c:1190)
==307== by 0x6041FD6: create_conn (url.c:3704)
==307== by 0x60427C3: Curl_connect (url.c:3972)
==307== by 0x6008A42: multi_runsingle (multi.c:1374)
==307== by 0x600AFD1: multi_socket (multi.c:2553)
==307== by 0x600B6A2: curl_multi_socket_action (multi.c:2666)
1189 if(CONN_INUSE(check) && check->data &&
1190 (check->data->multi != needle->data->multi))
As far as I tracked this problem it happens only on HTTP/2.
Invalid read is about this access: check->data->multi
It happens right after easy handle with value in check->data is removed from multi with curl_multi_remove_handle() call.
Somehow easy handle is left in the connection cache. Actually I don't see the code, which is responsible for connection cache cleaning during curl_multi_remove_handle() and CONN_INUSE(check) is 1 for this connection.
Probably there is a case when detach_connection() is not called for this easy handle in the multi.c file.
operating system
Linux (CentOS 7.7)
The text was updated successfully, but these errors were encountered: