-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGSEGV in on_stream_close() from multi in combination with nghttp2 #10936
Comments
Maybe build with debug symbols present so that the stack trace offers more details? If not, we really can't do much unless you detail how to reproduce this. problem. |
Fully understand @bagder, thank for the tip. I will hunt a way to reproduce the problem, could of course be me doing something wrong as well. |
No news. Let's reopen if we can reproduce or otherwise act on it. |
Hello, I have a dozen of reproductions of that stacktrace. Below stacktrace based on this commit. Looking onto code, it corresponds. Where crash happens on second line:
Stacktrace:
Hope it would help with resolving the issue, and it would be nice to reopen :) |
Which version exactly are you building with? On tag |
I spot no less than 37 individual commits to lib/http2.c since 8.0.1 in git. It would make a lot of sense to first test this problem with curl 8.2.0 or current git since it is likely that we already fixed or at least changed this issue. |
Thank you for fast responses!
Head was at f7a3fae for the stacktrace. I the left link to make it easy to looks at the tree snapshot at the moment and check sources.
It definitelly might be a case. But what was curious for me, is that in the code below, if consider everything executed in order, assigning something to 'closed' does not cause violation. I was thinking, that may be a clue for someone deep into the code.
|
Checked other crashes, and found that there are some for very old libcurl.so version 7.70.0.
Downloaded zipped sources, and lookc like it's quite the same line:
So, it might exist for years already. |
Sure, and we might already have fixed it. Why don't you start with checking if this is already fixed? |
I do not think you are debugging what you think you are debugging. That code originally mentioned does not exist in 7.70.0. |
I plan to test latest nghttp2 and curl release, but currently far away from a proper computer so it will be a couple weeks before I can get a proper look at things. |
Sorry, It's not something I may reproduce easily, rather collect bugrepors from devices in a field. After some time I'd see if it's fixed. |
Ah, ok. Yeah that makes it more complicated- |
Since we believe this is fixed, I will close this now. If you think otherwise after having had a chance to try this out (for a while), just let us know and we come back and continue. |
FYI I am running cURL release 8.2.1 along with nghttp2 release 1.55.1 now, will update bug in some days about current status. |
Still see this with cURL 8.2.1 and nghttp2 1.55.1, reverting cURL to 7.86.0 works around the issue. Backtrace is slightly different, but I would state still similar. Sadly I have not been able to trigger this behavior through testing, but see it from embedded devices running in the field.
|
Does your code pause or resume connections? Does it call libcurl functions from more than one thread? Can you please give us some idea of how you use libcurl even if you don't have a self contained repro? Are you saving any debug logs from these devices? Here is a simple program that sets environment variable #include <stdio.h>
#include <stdlib.h>
#include <curl/curl.h>
int debug_callback(CURL *handle, curl_infotype type, char *data, size_t size,
void *clientp)
{
if(type == CURLINFO_TEXT && size) {
fprintf((FILE *)clientp, "== Info [%p]: %.*s", handle, (int)size, data);
}
return 0;
}
static char linebuf[4096 + 1];
int main()
{
CURL *curl = NULL;
CURLcode res = CURLE_FAILED_INIT;
FILE *fp = NULL;
const char *logname = "debug.log";
if(putenv("CURL_DEBUG=http/2")) {
fprintf(stderr, "Error: putenv failed to set CURL_DEBUG\n");
goto error;
}
fp = fopen(logname, "a");
if(!fp) {
fprintf(stderr, "Error: fopen failed to open %s\n", logname);
goto error;
}
#ifdef WIN32
/* Set file buffer mode to unbuffered.
Windows doesn't support line buffering. */
setvbuf(fp, NULL, _IONBF, 0);
#else
/* Set file buffer mode to line buffering. */
setvbuf(fp, linebuf, _IOLBF, sizeof(linebuf));
#endif
if(curl_global_init(CURL_GLOBAL_ALL)) {
fprintf(stderr, "Error: curl_global_init failed\n");
goto error;
}
curl = curl_easy_init();
if(!curl) {
fprintf(stderr, "Error: curl_easy_init failed\n");
goto error;
}
fprintf(fp, "\n!!!!!!!!!!!!!!!!!\n!!!!! START !!!!!\n!!!!!!!!!!!!!!!!!\n");
curl_easy_setopt(curl, CURLOPT_DEBUGFUNCTION, debug_callback);
curl_easy_setopt(curl, CURLOPT_DEBUGDATA, fp);
curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
curl_easy_setopt(curl, CURLOPT_CAINFO, "curl-ca-bundle.crt");
curl_easy_setopt(curl, CURLOPT_URL, "https://curl.se");
res = curl_easy_perform(curl);
if(res != CURLE_OK) {
fprintf(fp, "libcurl: (%d) %s\n", res, curl_easy_strerror(res));
if(stderr != fp)
fprintf(stderr, "libcurl: (%d) %s\n", res, curl_easy_strerror(res));
}
goto cleanup;
error:
res = CURLE_FAILED_INIT;
cleanup:
if(curl)
curl_easy_cleanup(curl);
curl_global_cleanup();
if(fp)
fclose(fp);
return (res == CURLE_OK ? 0 : 1);
} |
Would it be possible to get the line number of the crash? I fail to see where this is happening in 8.1.2. |
We're seeing this in the wild with a beta that includes 8.0.1. Unfortunately we weren't able to update to 8.2.1 so the code referenced will be a bit out of date. Hopefully there's enough that might give you all some hints on what's happening. An engineer looking at the crash dumps see that the address indicated for They went further into the crash dumps and saw that when it happened the value of We've yet to create a reduced case that triggers the issue. If we get that we'll follow up back here. If there's any more information needed I'll help however I can. Thanks in advance! |
Unfortunately, I was not able to make any progress on this. If we do not find a way to reproduce with current curl, there is nothing I can think of. |
I don't see how we can act on this without a reproducible, without it happening on a recent version and without anyone digging in to analyze and figure out what happens. Closing. If something of this changes substantially, we can always reopen again. |
@yunanchangCorp you are talking about a failed request or also about the crash in |
yes I'm not mixing things up, a failed request get executed later and trying to close the stream results in a crash (since that request already closed), that's what I have observed. |
I'm suspecting nghttp2_submit_rst_stream is not cleaning up ob_syn in nghttp2, adding more log and wait for crash. |
- refs curl#10936 - there seems to be a code path that cleans up easy handles without triggering DONE or DETACH events to the connection filters. This would explain wh nghttp2 still holds stream user data - add GOOD check to easy handle used in on_close_callback to prevent crashes, ASSERTs in debug builds.
Made #12562 with some small changes in clearing stream user data and checks on the seen user data in on_stream_close. Maybe that helps you to track down the fault. I am still unable to reproduce locally a crash. |
I may have got the problem, please hear my story and see if it makes sense... I have almost zero HTTP2 knowledge before solving this bug, so may make some mistake in detail, please don't hesitate to correct me. So the story: For a request that is timed out. It is closed through CF_CTRL_DATA_DONE. Which will go through cf_h2_cntrl and then http2_data_done. Inside http2_data_done, we call nghttp2_submit_rst_stream to say we are not doing this frame anymore. Inside nghttp2, nghttp2_submit_rst_stream in turns called nghttp2_session_add_rst_stream, Then it returned, curl returned, too, and mark the handle as complete, the user closes the easy handle. --------------------------------------------- 15 minutes later -------------------------------------------------- In nghttp2, the same session goes on, and sometime in the future it will get the previously cancelled outbound item. But it's not skipping the item, instead it goes into session_prep_frame and starts to open a stream, and until after the stream is opened, it start checking if the item is cancelled, returning a error. But the stream is opened now, it must close the stream. And to close the stream, it must call the callback which is curl's on_stream_close. And finally inside on_stream_close, we tried to access the easy handle that is closed 15 minutes ago, so it crashed. I currently feel it's nghttp2's problem for trying to open and close stream for a cancelled item. Also I will try to create a minimal example based on this story, though I don't know how to control HTTP2 frames precisely so might or might not succeed... |
I has a app use libcurl, recent get error from windows. I can not get traceback. (with some error like this, after I upgrade llvm toolchain) attribute, but range extraction failed (missing or invalid range list table), please file a bug and attach the file at the start of this error message runManually indexing DWARF for test.exe...
error: unknown errorindexing DWARF for the last logs is from on_curl_data callback. test with #12562 * frame #0: 0x00007ff7bad3768c aplugin.exe`Curl_llist_insert_next
frame #1: 0x00007ff7bab1b2aa aplugin.exe`multi_runsingle
frame #2: 0x00007ff7bab1d5b2 aplugin.exe`multi_socket
frame #3: 0x00007ff7bab1db0d aplugin.exe`curl_multi_socket_action
frame #4: 0x00007ff7ba955f3e aplugin.exe`curl_perform
frame #5: 0x00007ff7baae9387 aplugin.exe`uv__process_poll_req
frame #6: 0x00007ff7baabd4d3 aplugin.exe`uv__process_reqs
frame #7: 0x00007ff7baabc78b aplugin.exe`uv_run |
I think you are onto something here. I opened nghttp2/nghttp2#2018 at nghttp2 to get their opinion on my theory on what is happening. Let's see how they respond. In the meantime, you could check in your local setup if calling |
ok, adding right before nghttp2_submit_rst_stream and waiting for crash (hope not). |
if(MSTATE_COMPLETED == data->mstate) {
if(data->set.fmultidone) {
/* signal via callback instead */
data->set.fmultidone(data, result);
}
else {
/* now fill in the Curl_message with this info */
msg = &data->msg;
msg->extmsg.msg = CURLMSG_DONE;
msg->extmsg.easy_handle = data;
msg->extmsg.data.result = result;
multi_addmsg(multi, msg);
DEBUGASSERT(!data->conn);
}
multistate(data, MSTATE_MSGSENT);
/* add this handle to the list of msgsent handles */
printf("[%s:%d] data=%p\n", __FILE__, __LINE__, data);
Curl_llist_insert_next(&multi->msgsent, multi->msgsent.tail, data,
&data->connect_queue);
/* unlink from the main list */
unlink_easy(multi, data);
return CURLM_OK;
}
} while((rc == CURLM_CALL_MULTI_PERFORM) || multi_ischanged(multi, FALSE)); after upgrade to 8.50 version (with 12562 patch ), I get crash at this call from lldb: thread #1, stop reason = Exception 0xc0000005 encountered at address 0x7ff6ab987b4c: Access violation reading location 0xffffffffffffffff
* frame #0: 0x00007ff7bad3768c aplugin.exe`Curl_llist_insert_next
frame #1: 0x00007ff7bab1b2aa aplugin.exe`multi_runsingle
frame #2: 0x00007ff7bab1d5b2 aplugin.exe`multi_socket
frame #3: 0x00007ff7bab1db0d aplugin.exe`curl_multi_socket_action
frame #4: 0x00007ff7ba955f3e aplugin.exe`curl_perform
frame #5: 0x00007ff7baae9387 aplugin.exe`uv__process_poll_req
frame #6: 0x00007ff7baabd4d3 aplugin.exe`uv__process_reqs
frame #7: 0x00007ff7baabc78b aplugin.exe`uv_run this is come from windows. (other platfrom not upgraded) any suggestion how can I trace this problem? |
@calvin2021y please do not highjack other issues. This crash is happening elsewhere. Open a new issue instead. Thank you. |
will do. before I trace to this line (multi.c 2660 call Curl_llist_insert_next), I was think this is the same issue. |
make sure you are following the thread safety guidelines |
@icing after 16 hours there's no single crash (usually it will have 6~7). I think this solves the problem. So the real story is the nghttp2 requires the user to take care of the user data before cancelling, rather than nghttp2 skip the cancelled item, right? And curl made some mistake so the user data is not correctly cleared. |
Pasing on some details from my testing as well. I use curl 8.5.0 release + change in #12562, and I have not seen any crashes yet. It has been running for 2 days, a bit limited amount of testing due to slow startup after the holidays. |
No, this is not merged and released with version 8.5.0, you need to cherry pick commit 3538027 on top of the 8.5.0 release. IIRC it was a trivial cherry pick. I would assume this will be part of 8.6.0 release, but it is completely up to @bagder to steer the release train (-: |
Contains a fix for HTTP/2 connections curl/curl#10936.
Contains a fix for HTTP/2 connections curl/curl#10936.
Trying to upgrade to latest curl releases causes very sporadic crashes for me. I have yet to be able to find a consistent method to reproduce this. It happens on random systems "in the field", hence currently hard to analyze from my point of view.
I am creating this issue in case this corresponds to other observations, or if people with better knowledge of the sources can see where things might go wrong.
I am happy to apply debug patches or test potentional changes to see if things behave better.
Currently I am running curl 7.86.0 release, since all later releases appear to trigger this behavior for me.
I did this
curl_multi_perform(), and it ended with SIGSEGV in on_stream_close()
I got the following backtrace from the crash
Looking into the on_stream_close() function at offset 0x54, I suspect this corresponds to the code:
curl/libcurl version
operating system
Linux 4.9.337 based embedded system, aarch64 GNU/Linux. Toolchain based upon GCC 11.3.0.
The text was updated successfully, but these errors were encountered: