New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid segfault when ptls->safepoint is NULL #27020
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely because you are calling Julia from the wrong thread. It should be fixed properly instead. This change is actually not a step in the right direction.
I don't understand. I'm running Julia from the command line repl. How am I using the wrong thread? |
From the backtrace the thread is started by Python. |
The 3 lines of code I pasted above are the only lines I've executed. Are you saying that after julia calls python, python is calling julia again? There is no julia code in the python library. |
Yes. From the backtrace it's done through gmp. In this case, the simplest fix would be to use this check to skip the gc collection and the throw. |
Hmm, so is my code fix correct or do I need to add additional checks or move it elsewhere, eg, should I move the |
FYI the PR should target master rather than release-0.6. |
I can do that, will it also be backported to 0.6? |
Yes, and also the throw part. |
Depends on whether it has the potential to break existing code. |
And also other related functions (realloc, calloc) |
Ok I'm confused again. What's the throw part? |
@ararslan i'm currently using it with 0.6. I don't know if it works with 0.7. |
Oh, actually, no you shouldn't move this into |
Ok thanks |
@yuyichao updated the PR, let me know if I'm on the right track and what else is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks correct, though you still didn't put the condition on the throw.
Thanks, I think I got it this time. |
Would it be possible to reduce the underlying issue into a test that we can run on CI to ensure this issue doesn't reoccur in the future? |
can that be a julia file? |
Yes, all of our tests are at the Julia level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I also realized that our calloc
doesn't actually handle sz == 0
.
If by julia file you mean your repro then no. PyCall
is too much a dependency. You could use @threadcall
to write something though it'll still be not very relevant (can easily miss breakage).
I'll try to create a julia repro that doesn't use |
@yuyichao do you want me to handle the sz==0 case as well? I would think that that should return |
You need to trigger a GC on another thread while calling one of these functions. That wouldn't be easy and will be highly system dependent (you can easily have a verion of the code that runs without triggering anything on one machine and OOM another). There's also basically no way to test that throw branch without messing with libc.
Yes that's what I'm talking about. You don't have to add that but it'll be nice. Do note that if you want to fix this one you also need to fix |
I have been trying this: julia> x = BigInt()
julia> @threadcall((:__gmpz_init,:libgmp), Void, (Ptr{BigInt},), &x)
ERROR: unsupported or misplaced expression & The same call works with |
I don't understand. How do you have a merge commit before doing a merge? Or are you saying that the test code merges the branch locally and then runs tests? |
It tests on a commit that would have occurred if the branch was merged. |
Can anyone help me understand why the tests are failing? |
There are cases (eg: when __gmpz_init is called from python via PyCall in a secondary thread) where ptls->safepoint is NULL when maybe_collect is called, and *ptls->safepoint causes a NULL pointer dereference and a segfault. This change fixes that case. Included test in test/threads.jl that will reproduce this issue. See JuliaLang#27020 for backtrace and more details
On all platforms (oddly with the exception of FreeBSD), the test case you've written triggers a SIGABRT:
I'm not sure it's correct to declare the input type in |
I'm going to try pushing this change rebased onto the current master and see if it helps. |
a6ebb13
to
e4e8e59
Compare
@ararslan I wrote the test that way because it's what I narrowed down the failing code to (ie, the PyCall call to snowflake) |
I've pushed a rebased version of this, so we'll see how CI does. I'm a bit confused about the state of affairs, however: @bluesmoon, it seems that you have put an assert in your code which fails on all platforms and you're asking why CI fails... it fails because of the failing assert. It seems like you're the person in the best position to understand how to fix that. The possible solutions seem to be one of: remove the assertion because it's incorrect or fix the code so the assertion passes. Are you asking for help with the latter? And again, does this patch pass on Julia 0.6? |
Ok, now I'm confused because I just rechecked the diff and there are no assertions in my code. I'll rerun the tests against the latest 0.6 branch and report here when it's done. |
I've now completed running all tests against the release-0.6 branch with my patch applied and this is the result. In short, all the threads tests pass (which is what I've touched) while there are failures in other areas that I'm not familiar with:
|
Do those failures happen without this patch? Presumably not since it’s a release version and has been thoroughly CI’d. But who knows—we certainly want to know of it does. |
Yup, though the failure count is different:
|
Can you make a new PR against the |
There are cases (eg: when __gmpz_init is called from python via PyCall in a secondary thread) where ptls->safepoint is NULL when maybe_collect is called, and *ptls->safepoint causes a NULL pointer dereference and a segfault. This change fixes that case. Included test in test/threads.jl that will reproduce this issue. See JuliaLang#27020 for backtrace and more details
Added #28105 |
LGTM, just need to get CI passing |
There are cases (eg: when __gmpz_init is called from python via PyCall in a secondary thread) where ptls->safepoint is NULL when maybe_collect is called, and *ptls->safepoint causes a NULL pointer dereference and a segfault. This change fixes that case. Included test in test/threads.jl that will reproduce this issue. See JuliaLang#27020 for backtrace and more details
There are cases (eg: when __gmpz_init is called from python via PyCall in a secondary thread) where ptls->safepoint is NULL when maybe_collect is called, and *ptls->safepoint causes a NULL pointer dereference and a segfault. This change fixes that case. Included test in test/threads.jl that will reproduce this issue. See JuliaLang#27020 for backtrace and more details
This is still an issue (we just hit this with a library using openmp and gmp) and the patch still applies to the current master (except for some minor changes in the testcase). Can someone have a look at this please? I have also tested the patch with release 1.1.1 and all tests seem to pass (except for some definitely unrelated things). |
Please open a new PR against master (OP is also welcome to rebase this one). |
There are cases (eg: when __gmpz_init is called from python via PyCall in a secondary thread) where ptls->safepoint is NULL when maybe_collect is called, and *ptls->safepoint causes a NULL pointer dereference and a segfault. This change fixes that case. Included test in test/threads.jl that will reproduce this issue. See JuliaLang#27020 for backtrace and more details
There are cases (eg: when __gmpz_init is called from python via PyCall in a secondary thread) where ptls->safepoint is NULL when maybe_collect is called, and *ptls->safepoint causes a NULL pointer dereference and a segfault. This change fixes that case. Included test in test/threads.jl that will reproduce this issue. See JuliaLang#27020 for backtrace and more details
There are cases (eg: when __gmpz_init is called from python via PyCall in a secondary thread) where ptls->safepoint is NULL when maybe_collect is called, and *ptls->safepoint causes a NULL pointer dereference and a segfault. This change fixes that case. Included test in test/threads.jl that will reproduce this issue. See JuliaLang#27020 for backtrace and more details
When using the snowflake connector through
PyCall
, we end up with a segfault due toptls->safepoint
beingNULL
when dereferenced viamaybe_collect
. This patch checks for nullness before dereferencing the pointer.To reproduce:
sudo pip3 install --upgrade snowflake-connector-python
The values for
account
,user
andpassword
are not important, but all three must be present. At this point the snowflake connector attempts to connect to the snowflake server, and we get a segfault with the following stack trace:The problem appears to be in
maybe_collect
where it callsjl_gc_safepoint_(ptls)
, which is a macro that tries to dereferenceptls->safepoint
, which at this point isNULL
.I don't know why this value is
NULL
, but wrapping it in a nullcheck avoids the segfault and my code executes correctly after that.