You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use PyCall in a Julia solution for various IO tasks (such as handling Parquet files and interacting with various Azure resources). I would like to have these PyCall tasks run in the background on one thread, doing a bunch of IO work using a fancy Python SDK while I do multithreaded number-crunching in Julia. I'm running into a weird issue though. With this script I wrote, I'm getting a fatal error if I run the @sync python task while running a Threads.@Spawn task, but everything is fine if I use Threads.@threads. Is there any reason why @threads works while @Spawn doesn't? Is there a way to make this safe with Threads.@Spawn because we can't know, unless we look at the code, if an external library uses Threads.@Spawn under the hood.
I'm going through all the docs of PyCall and looking at issues with multi-threading (and there are a few of them that discuss this, like #882 and #883) but we need a better understanding of what we can and can't do in Julia while PyCall is doing something. I have a script that puts locks around the python process, and only executes Python as an @async, from the main thread so it should always happen on the main thread as suggested in #882 (I think?). Anyway, I'm getting fatal errors in Windows (and segfaults in Linux) when I try to run the script where a Threads.@Spawn happens while PyCall is running, but it's fine if I use a Threads.@threads
Anyway, here's the script:
using PyCall
const PY_LOCK = ReentrantLock()
const PY_JSON = pyimport("json")
const PYLOCK = Ref{ReentrantLock}()
PYLOCK[] = ReentrantLock()
# acquire the lock before any code calls Python
pylock(f::Function) = Base.lock(f, PYLOCK[])
function write_json_file(fileName::String, outputData::Dict)
pylock() do
open(fileName, "w") do outputFile
PY_JSON.dump(deepcopy(outputData), outputFile)
end
end
return nothing
end
function file_operation(fileName::String)
outputData = Dict("a"=>randn(), "b"=>randn())
write_json_file(fileName, outputData)
return outputData
end
function multithread_calc(x::Vector)
y = zeros(Float64, length(x))
Threads.@threads for ii in eachindex(y)
y[ii] = log(exp(x[ii]))
end
return y
end
function calc_task(x::Vector)
t = Threads.@spawn log.(exp.(x))
return fetch(t)
end
calcInput = randn(10000000)
for ii in 1:100
display(ii)
file_operation("testfile.json")
fileTasks = [@async file_operation("testfile$(ii).json") for ii in 1:30]
#calcResults = multithread_calc(calcInput)
calcResults = calc_task(calcInput)
wait.(fileTasks)
end
Now, if I modify this script so that it executes a multithreaded calculation using Threads.@threads in multithread_calc() I get no issue
calcInput = randn(10000000)
for ii in 1:100
display(ii)
file_operation("testfile.json")
fileTasks = [@async file_operation("testfile$(ii).json") for ii in 1:30]
calcResults = multithread_calc(calcInput)
#calcResults = calc_task(calcInput)
wait.(fileTasks)
end
Otherwise, if I use the calc_task() function that has Threads.@Spawn while running the Python tasks, I get the following error.
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffa4e385adb -- PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
in expression starting at g:\My Drive\tests\julia_pylock_json.jl:42
PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyLong_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_DecodeUTF8Stateful at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_FromId at C:\Users\user\Miniconda3\python39.dll (unknown line)
Py_FinalizeEx at C:\Users\user\Miniconda3\python39.dll (unknown line)
Py_FinalizeEx at C:\Users\user\Miniconda3\python39.dll (unknown line)
Py_Finalize at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyinit.jl:125
unknown function (ip: 00000000618a7a23)
_atexit at .\initdefs.jl:372
unknown function (ip: 00000000618a7013)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
ijl_atexit_hook at /cygdrive/c/buildbot/worker/package_win64/build/src\init.c:219
ijl_exit at /cygdrive/c/buildbot/worker/package_win64/build/src\jl_uv.c:640
jl_exception_handler at /cygdrive/c/buildbot/worker/package_win64/build/src\signals-win.c:322
__julia_personality at /cygdrive/c/buildbot/worker/package_win64/build/src\win32_ucontext.c:28
_chkstk at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
RtlRaiseException at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
KiUserExceptionDispatcher at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyLong_New at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicodeWriter_WriteASCIIString at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyUnicode_FromFormatV at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyErr_Format at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyObject_GetBuffer at C:\Users\user\Miniconda3\python39.dll (unknown line)
isbuftype! at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pybuffer.jl:134 [inlined]
isbuftype at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pybuffer.jl:148 [inlined]
pysequence_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:759
pytype_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:773
#36 at .\none:0 [inlined]
iterate at .\generator.jl:47
unknown function (ip: 000000006189dc50)
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:703
typetuple at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:745
unknown function (ip: 000000006189cd9a)
pysequence_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:754
pytype_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:773
pytype_query at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:806 [inlined]
convert at C:\Users\user\.julia\packages\PyCall\ygXW2\src\conversions.jl:831
julia_args at C:\Users\user\.julia\packages\PyCall\ygXW2\src\callback.jl:18 [inlined]
_pyjlwrap_call at C:\Users\user\.julia\packages\PyCall\ygXW2\src\callback.jl:24
unknown function (ip: 000000006189cc0a)
pyjlwrap_call at C:\Users\user\.julia\packages\PyCall\ygXW2\src\callback.jl:44
unknown function (ip: 000000006187c398)
PyObject_Call at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyEval_EvalFrameDefault at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\Users\user\Miniconda3\python39.dll (unknown line)
Py_NewReference at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyEval_EvalFrameDefault at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyFunction_Vectorcall at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyVectorcall_Call at C:\Users\user\Miniconda3\python39.dll (unknown line)
PyObject_Call at C:\Users\user\Miniconda3\python39.dll (unknown line)
macro expansion at C:\Users\user\.julia\packages\PyCall\ygXW2\src\exception.jl:95 [inlined]
#107 at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:43 [inlined]
disable_sigint at .\c.jl:473 [inlined]
__pycall! at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:42 [inlined]
_pycall! at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:29
_pycall! at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:11
unknown function (ip: 000000006188c5f5)
#_#114 at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:730
PyObject at C:\Users\user\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
#2 at g:\My Drive\tests\julia_pylock_json.jl:16484
#open#378 at .\io.jl:384
open at .\io.jl:381 [inlined]
#1 at g:\My Drive\tests\julia_pylock_json.jl:15 [inlined]
lock at .\lock.jl:185
pylock at g:\My Drive\tests\julia_pylock_json.jl:10 [inlined]
write_json_file at g:\My Drive\tests\julia_pylock_json.jl:14 [inlined]
file_operation at g:\My Drive\tests\julia_pylock_json.jl:24
#11 at .\task.jl:484
unknown function (ip: 00000000618a5de3)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:931
Allocations: 10783290 (Pool: 10776630; Big: 6660); GC: 14
The text was updated successfully, but these errors were encountered:
the-noble-argon
changed the title
Need better docs on understanding PyCall and multithreading
Threads.@spawn fatally crashes PyCall @async task, but Threads.@threads works fine
Sep 17, 2022
the-noble-argon
changed the title
Threads.@spawn fatally crashes PyCall @async task, but Threads.@threads works fine
Threads.@spawn can fatally crash PyCall @async task, but Threads.@threads works fine
Sep 17, 2022
I managed to find a temporary workaround for this issue. I had to modify the pylock() function to not only to do lock/unlock but to also disable garbage collection while the Python code is running (which is probably not ideal).
pylock(f::Function) = Base.lock(PYLOCK[]) do
prev_gc = GC.enable(false)
try
return f()
finally
GC.enable(prev_gc) # recover previous state
end
end
This highlights the importance of #883 in making sure that PyCall tasks can't be corrupted by garbage-collections triggered by other threads, becuase I don't think this drastic kludge of disabling/enabling the garbage collector is the kind of solution Julia devs would encourage.
the-noble-argon
changed the title
Threads.@spawn can fatally crash PyCall @async task, but Threads.@threads works fine
Threads.@spawn can fatally crash PyCall @async task
Sep 19, 2022
I'm trying to use PyCall in a Julia solution for various IO tasks (such as handling Parquet files and interacting with various Azure resources). I would like to have these PyCall tasks run in the background on one thread, doing a bunch of IO work using a fancy Python SDK while I do multithreaded number-crunching in Julia. I'm running into a weird issue though. With this script I wrote, I'm getting a fatal error if I run the @sync python task while running a Threads.@Spawn task, but everything is fine if I use Threads.@threads. Is there any reason why @threads works while @Spawn doesn't? Is there a way to make this safe with Threads.@Spawn because we can't know, unless we look at the code, if an external library uses Threads.@Spawn under the hood.
I'm going through all the docs of PyCall and looking at issues with multi-threading (and there are a few of them that discuss this, like #882 and #883) but we need a better understanding of what we can and can't do in Julia while PyCall is doing something. I have a script that puts locks around the python process, and only executes Python as an @async, from the main thread so it should always happen on the main thread as suggested in #882 (I think?). Anyway, I'm getting fatal errors in Windows (and segfaults in Linux) when I try to run the script where a Threads.@Spawn happens while PyCall is running, but it's fine if I use a Threads.@threads
Anyway, here's the script:
Now, if I modify this script so that it executes a multithreaded calculation using Threads.@threads in multithread_calc() I get no issue
Otherwise, if I use the calc_task() function that has Threads.@Spawn while running the Python tasks, I get the following error.
The text was updated successfully, but these errors were encountered: