Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults with frequent python calls from thread 1 on multithreaded Julia #201

Open
oschulz opened this issue Jul 17, 2022 · 18 comments
Open

Comments

@oschulz
Copy link

oschulz commented Jul 17, 2022

In a complex application with high-frequency python calls I always run into

signal (11): Segmentation fault
in expression starting at REPL[41]:2
_PyTrash_begin at /opt/conda/conda-bld/python-split_1649141344976/work/Objects/object.c:2125

segfaults when Julia has multiple threads (e.g. JULIA_NUM_THREADS=4). I do make sure to only use PythonCall from thread 1 (and I use asserts to verify that) but somehow that doesn't seem to be enough. When running with JULIA_NUM_THREADS=1, all is fine. Is PythonCall incompatible with multi-threaded Julia in general?

@cjdoris
Copy link
Collaborator

cjdoris commented Jul 18, 2022

I have never really tried it out, but I'm not too surprised.

AFAIK it should work fine provided all the calls into the CPython API occur from the same thread (or are guarded with PyGILState_Ensure) which includes not just user code, but also the __init__ function (which initialises the Python interpreter) and the finalizer for Py (which frees Python objects). It sounds like the init/finalizer functions will need to be modified to allow them to run on other threads even if user code is all on thread 1.

Do you have a MWE I can try?

@oschulz
Copy link
Author

oschulz commented Jul 18, 2022

Do you have a MWE I can try?

At the moment, it's definitely not minimal, but I'll see what I can cook up.

(or are guarded with PyGILState_Ensure)

Oh, how do I do that?

the finalizer for Py (which frees Python objects).

I had a feeling it might be related to that - can I control where the finalizer runs?

@cjdoris
Copy link
Collaborator

cjdoris commented Jul 19, 2022

See #202 (comment)

If you try out that branch and put PythonCall.C.gc_disable() and PythonCall.C.gc_enable() around the threaded part of your code, maybe it will work?

@oschulz
Copy link
Author

oschulz commented Jul 19, 2022

around the threaded part of your code, maybe it will work?

The problem is, Python is calling an MCMC sampler written in Julia, which then calls back to evaluate the actual target function in Python. So I'd have to disable Python GC for a very long time, memory will overflow. The Julia code itself does container multithreaded operations, though I've made sure that the callbacks to Python are always originating from thread 1. Other parts of the Julia code (not calling back into Python) will use threads, but from Python, only a single algorithm is called, I I can't "inject" gc-disable/enables into the Julia code in lots of places.

@cjdoris
Copy link
Collaborator

cjdoris commented Aug 20, 2022

I've just released a new version including these GC functions, they are now called PythonCall.GC.[dis/en]able.

In your case, you should be able to do the whole optimization with GC disabled, but temporarily re-enable it within the callback function.

@oschulz
Copy link
Author

oschulz commented Aug 20, 2022

Thanks a lot @cjdoris , will try it out!

@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2023

This issue has been marked as stale because it has been open for 30 days with no activity. If the issue is still relevant then please leave a comment, or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues about to be auto-closed label Sep 8, 2023
@github-actions
Copy link
Contributor

This issue has been closed because it has been stale for 7 days. If it is still
relevant, please re-open it.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 15, 2023
@cjdoris cjdoris reopened this Sep 22, 2023
@cjdoris cjdoris removed the stale Issues about to be auto-closed label Sep 22, 2023
@untoreh
Copy link

untoreh commented Apr 6, 2024

Why not just check in the enable function if the current thread is the main thread and skip GC if that's the case?

@hhaensel
Copy link

I had the same problem and for me, disabling/enabling was not sufficient. I finally ended up with the following code:

using PythonCall
using ThreadPools

macro pythread(expr)
    quote
        fetch(@tspawnat 1 begin
            $(esc(expr))
        end)
    end
end

function babel_convert(input, informat = "cdxml", outformat = "mol")
    @pythread begin
        ob = pyimport("openbabel.openbabel")
        conv = ob.OBConversion()
        conv.SetInAndOutFormats(informat, outformat)
        
        mol = ob.OBMol()
        conv.ReadString(mol, input)
        
        if ! Bool(mol.Has2D()) && ! Bool(mol.Has3D())
            pgen = ob.OBOp.FindType("gen2D")
            pgen !== nothing && pgen.Do(mol)
        end

        pyconvert(String, conv.WriteString(mol))
    end
end

I also placed this on discourse

@hhaensel
Copy link

@cjdoris Maybe the macro @pythread could be included in PythonCall? - Naming could be different, of course.

Maybe, some of the open segfault issues are related and could be solved by using the macro?

@oschulz
Copy link
Author

oschulz commented Jun 13, 2024

@hhaensel would this workaround also work if the Python code calls back into Julia (the called Julia code would use Julia multithreading).

@hhaensel
Copy link

Not sure, I never used juliacall. But I guess it's safe.

Meanwhile I change @pythread in my own project to look like this, because PythonCall.CG.disable() needs to be called from thread 1 as well. As I can't control, when the processes are spawned, I needed to integrate the GC control in the function call.

macro pythread(expr)
    quote
        fetch(@tspawnat 1 begin
            PythonCall.GC.disable()
            res = $(esc(expr))
            PythonCall.GC.enable()
            res
        end)
    end
end

Not sure, how expensive that is performancewise. Maybe @cjdoris cann comment.

@oschulz
Copy link
Author

oschulz commented Jun 13, 2024

Any idea why this doesn't seem to affect PyCall.jl? Or does PyCall force everything to run on thread 1 internally?

@hhaensel
Copy link

It does effect PyCall, the following lines crash julia:

using PyCall

for _ in 1:1000
    Threads.@spawn pyimport("os")
end

Also in this case @pythread can help

macro pythread(expr)
    quote
        fetch(@tspawnat 1 begin
            $(esc(expr))
        end)
    end
end

for _ in 1:1000
    Threads.@spawn @pythread pyimport("os")
end

What I cannot answer is, how GC is organised in PyCall and how you can make sure your variables are not garbage collected.

@oschulz
Copy link
Author

oschulz commented Jun 13, 2024

Thanks!

@cjdoris
Copy link
Collaborator

cjdoris commented Jun 14, 2024

Not sure, I never used juliacall. But I guess it's safe.

Meanwhile I change @pythread in my own project to look like this, because PythonCall.CG.disable() needs to be called from thread 1 as well. As I can't control, when the processes are spawned, I needed to integrate the GC control in the function call.

macro pythread(expr)
    quote
        fetch(@tspawnat 1 begin
            PythonCall.GC.disable()
            res = $(esc(expr))
            PythonCall.GC.enable()
            res
        end)
    end
end

Not sure, how expensive that is performancewise. Maybe @cjdoris cann comment.

No idea about performance but if you do two concurrent @pythread calls then the GC.enable() of one might happen before the other is finished, which will be problematic. You could use a lock to ensure only one such task is running.

@ericphanson
Copy link
Contributor

Why not just check in the enable function if the current thread is the main thread and skip GC if that's the case?

I'm curious about this solution. For me I don't mind if my python code is always behind a lock, but it seems like GC happens at various points that I can't control, and then it segfaults, so I must basically avoid all threading in the whole (mostly julia) program.

Could

if ENABLED[]
check if finalization is legal (e.g. thread 1 if that's required, and with a lock if concurrency must be avoided), and otherwise enqueue? And if it is legal, maybe it could empty the queue too?

It seems like something like that could make things work "automatically", though if the finalizer is never called from thread one, that would be an issue (but there could be function to do so, some gc() = @tspawnat 1 empty_gc_queue()).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants