-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Application segfault in 8.10.0 #14860
Comments
Any chance we can get the stack trace from a libcurl built with debug that keeps the symbols? |
That is doable, but a little tricky. Please feel free to ping me at the end of the day tomorrow if I haven't reported back here. (It's a busy time for Homebrew because we're preparing for the release of macOS 15, so I've got a lot on my plate at the moment.) |
Looks like a NULL deref to me:
curl_multi_cleanup() will eventually call the socket_callback. In unction socket_callback( ...):
As I understand it |
An added check as in #14862 will help libcurl detect this user mistake and avoid the crash, but the actual problem appears to be in the application. |
Thanks for the quick turnaround here @bagder |
Thanks for resolving this! |
Upstream issue: curl/curl#14860 Upstream patch: curl/curl@48f61e7
For what is worth, I've been trying to see where exactly |
FYI fallout in ostreedev/ostree#3299 - we're still evaluating whether there's an outstanding bug in ostree here or whether this is actually a libcurl regression. It seems more likely to be the former, but still. |
I mentioned at JuliaLang/Downloads.jl#260 (comment) that I just managed to hit the null handle with libcurl 8.10, but the offending code path isn't hit at all in previous versions of libcurl (no change of code on the julia side). Is there any relevant change in libcurl 8.10 that libraries wrapping it should be aware of? |
I believe it's a reasonable assumption that the callback can get called while the callback is in place and the object the callback is associated with is still in existence. This means that any resource referenced by the callback must remain valid, too. Prematurely cleaning up resources (such as pointers) that might get used by the callback seems like a bug to me. I think the above applies regardless of libcurl version. |
Is that expected? That didn't happen with libcurl 8.9.0. Going through this with lldb I can see the callback is called at Lines 2758 to 2760 in 5a26371
If I delete the line multi.handle = C_NULL at https://github.com/JuliaLang/Downloads.jl/blob/89d3c7dded535a77551e763a437a6d31e4d9bf84/src/Curl/Multi.jl#L28 then the assertion at Line 2719 in 5a26371
|
Yes, it is to be expected. The callback is called a little differently now due to internal changes, but the API is the same and the application should be prepared for a callback there as well. It should also be prepared to be called for an "internal" easy handle that the application did not add itself. |
That's not an assertion unless you run a debug build, that's just a run-time check to prevent the function from working on a handle that is not good. The assertion in a debug build is to help users catch this situation and address it. |
That's what I did to be able to go through this with lldb, yes. I'm not familiar with libcurl nor with diff --git a/src/Curl/Multi.jl b/src/Curl/Multi.jl
index d2be032..888246b 100644
--- a/src/Curl/Multi.jl
+++ b/src/Curl/Multi.jl
@@ -162,6 +162,7 @@ function socket_callback(
return -1
end
multi = unsafe_pointer_to_objref(multi_p)::Multi
+ multi.handle == C_NULL && return -1
if watcher_p != C_NULL
old_watcher = unsafe_pointer_to_objref(watcher_p)::FDWatcher
@check curl_multi_assign(multi.handle, sock, C_NULL) at https://github.com/JuliaLang/Downloads.jl/blob/89d3c7dded535a77551e763a437a6d31e4d9bf84/src/Curl/Multi.jl#L164 to return immediately out of the callback if the handle is null. Does this make sense? Is the |
That means the callback signals an error back. I don't think it matters much if you return ok or error since this happens when shutting down a multi handle. |
Thanks. That was my idea, yes.
I see. But the rest of the change (don't proceed if the handle is null) is what you'd expect from an application to handle this situation? |
Not really. The socket update is still valid. If it tells you it removes a socket, that is still done independently of which easy handle it is told for. If your application cares about sockets and what activities to wait for, then it should act on this update as well. |
Unless of course it removes all knowledge of them anyway since it is closing down the multi handle, and then you can of course ignore it. |
Because curl_multi_cleanup may invoke callbacks, we effectively have some circular references going on here. See discussion in curl/curl#14860 Basically what we do is the socket callback libcurl may invoke into a no-op when we detect we're finalizing. The data structures are owned by this object and not by the callbacks, and will be destroyed below. Note that e.g. g_hash_table_unref() may itself invoke callbacks, which is where some data is cleaned up. Signed-off-by: Colin Walters <walters@verbum.org>
(Since sometimes it needs to be said: @bagder thanks for all your years of maintaining curl, it is appreciated!)
I haven't dug deep into this yet, but it seems correct to say that there was a behavior change in the semantics of callbacks invoked during I could imagine that some libcurl users depended on freeing data based on those callbacks (especially I did ostreedev/ostree#3307 which fixes this problem, and I know this might appear as an instance of Hyrum's Law but...I'd appreciate it if some consideration was giving to reverting #14862 until fixes have had a chance to propagate for ostree (which has quite a bit of users) - even just 3 months would be very helpful. |
Hmm something I honestly don't understand at all is why libcurl is invoking the socket callback at shutdown time with I guess I'd need to dig through other consumers of this API to see if we're doing something wrong but...at a quick glance e.g. systemd seems to do something very similar.
|
That change was merged to prevent application crashes - and it has already proven to do so. How is this prevention bad for your application? Presumably you don't run debug builds of libcurl in production? |
[ commit afc1a416e6889ed8198a9411ba09b0cb60e977cb ] Upstream issue: curl/curl#14860 Upstream patch: curl/curl@48f61e7
This isn't about debug builds of curl - the change caused I did verify myself (as others did in ostreedev/ostree#3299) that it's that specific curl patch that causes ostree to start hitting an internal assertion error - reverting it on top of 8.10.1 fixes things. To elaborate more, the assertion we're hitting isn't one from curl, it's an internal assertion we have around the association of our sockets. With the updated curl, I still haven't yet done a full analysis of what other projects are doing here - the fix I did I'm pretty confident in, but there may be a more "curl idiomatic" fix. In any case though:
Yes, now we're in an unfortunate situation where a change fixes one app and breaks another. I don't know that anyone is in a position to weigh julia versus ostree objectively 😄 so I won't try...but am I correct in understanding that the julia crash was itself provoked by a refactoring of the ¹ Yes, I also changed that but it's a "shouldn't fail" situation so like many other ones if we had checked it before it certainly would have been an assertion, which would just change the crash scenario for us. |
Because curl_multi_cleanup may invoke callbacks, we effectively have some circular references going on here. See discussion in curl/curl#14860 Basically what we do is the socket callback libcurl may invoke into a no-op when we detect we're finalizing. The data structures are owned by this object and not by the callbacks, and will be destroyed below. Note that e.g. g_hash_table_unref() may itself invoke callbacks, which is where some data is cleaned up. Signed-off-by: Colin Walters <walters@verbum.org> Origin: upstream, 2024.8, commit:4d755a85225ea0a02d4580d088bb8a97138cb040 Bug: ostreedev/ostree#3299 Bug-Debian: https://bugs.debian.org/1082121 Gbp-Pq: Name curl-Make-socket-callback-during-cleanup-into-no-op.patch
I did this
We're upgrading Homebrew's version of
curl
to 8.10.0. When testing dependents, we found this error fromjulia
:Backtrace
Complete logs are available here. The referenced error starts here on macOS 14 arm64.
I am unsure if the problem is in
curl
or injulia
, but it seems like the segfault is happening insideCurl_hash_pick
.I expected the following
No segfault. The segfault doesn't occur with
curl
8.9.1.curl/libcurl version
operating system
But note that this occurs on multiple other versions of macOS (macOS 12, 13, and 14 on both Intel and ARM64). We haven't yet been able to verify that this happens on Linux, but I will check this later.
The text was updated successfully, but these errors were encountered: