New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashes in DoH requests (consistent, unknown cause) #6604
Comments
AFAICT no. If you are threading please read thread safety. Can you make a minimal self-contained program that can be used to reproduce? |
Yes, we're threading and I have read that page.
One thread can make requests (configuring the easy handle and passing it off to another thread). That said, with threading issues I'd have expected the crash to be more random, but it's always here (
Unfortunately not at the moment as I haven't seen the crash in person, we only receive reports about them. We're also not yet in a position to tell if it's something computer-specific or is simply rare (might get that data in a week or two) but it does seem to happen far more frequently with corporate network setups. Are there any settings other than proxy which might be controlled by the environment/network instead of the code that uses libcurl? |
We've fixed several issues in DoH (and elsewhere) since 40259ca. It seems a worthwhile investment to first bump up the version to the latest libcurl before continuing the search.
No. The easy handles used for DoH requests are never reused and those transfers are never restarted.
Why not? When there's no more pending DoH request they're not needed anymore. If you think you have a better way to do it, so by all means, please provide a PR! We can always improve the code. |
I'd like that too but we're not in control of the release/update schedules. We'll try to update on our end sometime soon but it may take months to see the results.
Sure, it just seems easier to reason about the code if all the resources were freed together, seeing as they were allocated and otherwise used together. Currently it looks like the headers are freed from one state of the last request out of two to finish and the handles are freed from another state of another request, and to someone like me in this situation it's not immediately obvious whether this works in all cases.
Thanks, I might look into that once there's time to update. |
I would urge you to work on creating a way to force the problem to trigger, probably with a stand-alone separate app that maybe mimics what your real-life application does. Then you can make that with the latest libcurl and once you manage to show the crash, you can share the code with us. A reported crash that nobody can reproduce with a libcurl version with several DoH fixes missing is hard to act on. |
Sorry, yeah, I know it's hard without the data and I don't expect much here. Just wanted to see if you had any ideas on what could potentially be the issue (or if it was already fixed), and to ask about the headers.
I've been doing that and will continue, but do you have any suggestions on e.g. fuzzers and their configurations that could be used for this purpose? |
I think using valgrind and memory/address sanitizers are the most useful tools for this. Fuzzing this area of the code more would be awesome too and was done to trigger #4592 if I remember correctly, but I don't have anything prepared or setup to reproduce that other than what is already mentioned in that issue. |
In general we've occasionally seen weird arbitrary crashes in libcurl when users don't do thread synchronization properly. Though your crash has some consistency I would still double check that. Some kind of synchronization primitive is needed (eg critical section). volatile doesn't cut it and is too often misused. |
Thanks, will try fuzzing once I have time to boot into Linux.
Yep, doing that regularly.
It's a good thing that we don't use I've started looking into a new theory (non-safe free > allocating headers in the same spot > dangling pointer freed by other code > crash), dumping all the frees that later turn into headers. P.S. Things I've found so far:
|
Prior to this change if the user specified a default protocol and a separately allocated non-absolute URL was used then it was freed prematurely, before it was then used to make the replacement URL. Bug: curl#6604 (comment) Reported-by: arvids-kokins-bidstack@users.noreply.github.com Closes #xxxx
#6613 thanks |
Prior to this change if the user specified a default protocol and a separately allocated non-absolute URL was used then it was freed prematurely, before it was then used to make the replacement URL. Bug: #6604 (comment) Reported-by: arvids-kokins-bidstack@users.noreply.github.com Closes #6613
haven't had a lot of time to look into this lately but there are good news and bad news: there was an increase in crashes from increased user count but a relatively small one so very few people are having this issue, which also makes it extremely difficult to debug the percentage of users affected by it is about as much as one would expect from random network failures in addition, the same issue seems to be present on Macs too, though our crash dumps currently are a bit too imprecise to confirm this with 100% accuracy (same thread/error/frequency) another curious observation is that I'm not getting any reports from development builds anymore (I asked them to try disabling DoH but no idea yet if they actually did) |
@arvids-kokins-bidstack do you think there's actual value in keeping this issue open further? I don't see how we can do anything more from our end... |
@bagder I was about to write actually, some new information has showed up basically I'd like to pick your brain on another matter and then feel free to close the issue afterwards so lately the crashes had increased and the only events we've correlated them to were:
so the question currently on my mind is - do you know of any firewall software that could in theory:
and also:
|
No "firewall software" can do that.
I would advice you don't stick to this old version when you have issues. We've fixed DoH issues since, and other things. At least make sure that the latest version also has this problem as otherwise you've been hunting this in vain.
I think you should scrap that idea. This is just a bug,
No, but all the tests are available so you can make them. I would of course suggest that code-reviewing to find this is a rather long shot. I would work focus my efforts on making a recipe for reproducing it. |
Which part of the process seems most unlikely to you? https://steamcommunity.com/app/455820/discussions/1/1696048879951565073 Also we've had experience with firewalls not letting unusual ports through in a strange way - the first few packets do go through but then they stop coming and the connection isn't closed either. Some can even be configured to inspect HTTPS traffic. That said, I've already tried reproducing by toggling Windows firewall rules and using clumsy and those were not sufficient.
Will upgrade eventually but if we don't know what causes the issue, there's really no guarantee that it would be solved by an upgrade. Not a fan of blindly doing things and praying that they work.
That would mean at least a month or two of doing nothing and waiting for upgrades to trickle down, hoping that they don't create new issues, all the while invalidating any existing data we may have on this issue, which adds another month of subsequent data collection.
It's a bug that normally does not appear. Most people (including me) are completely unaffected but there are a few that have been getting it repeatedly with relatively few attempts. Reproducing the bug has been the focus already. Various differentiators have already been ruled out but until we find the right one, I don't see how there's any chance of reproducing it. |
I'll ask again: is there a purpose to keep this open? Without more details or a way to reproduce we have nothing to act on here... |
I did say that you can feel free to close it but I guess I can do it too p.s. updating to 7.76.0 and turning off DOH for the upcoming release, so there probably won't be much I can add anyway thanks for bearing with me, and if we do have something more concrete to share (like a bug reproduction example), I'll let you know |
OS: Windows (probably doesn't matter though)
libcurl version: 7.65.2-DEV (based on 40259ca with minor changes)
call stack:
while(*first && *second && max) {
--first
is an invalid valuedata->set.headers
head
seems to point to wrong memory (dangling pointer?) but often the memory is still usable enough to crash a bit deeperconn->ip_addr_str = "1.1.1.1"
conn->connection_id = 18
(from one of the dumps, some are higher) so it's not a guaranteed failurepath = "/dns-query"
host = "1.1.1.1"
sadly the minidump does not include the entire
connectdata
orCurl_easy
data structures (they seem to be too big) but feel free to ask for any data early in the structure, I may be able to provide someI've also noticed one possibly related issue that is marked as a known bug: #4592
2 questions:
doh_done
?doh_done
separately, before freeing the request handles? this seems to be true even in the latest version: https://github.com/curl/curl/blob/master/lib/doh.c#L200this seems to be a relatively rare issue but I'd still like to at least understand what's causing it since it's difficult to reproduce
The text was updated successfully, but these errors were encountered: