Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 0.29.29 regression: Python memory allocator called without holding the GIL #4796

Closed
pitrou opened this issue May 17, 2022 · 3 comments
Closed

Comments

@pitrou
Copy link
Contributor

pitrou commented May 17, 2022

In PyArrow we have two helper functions looking like this:

cdef int check_status(const CStatus& status) nogil except -1

cdef int plasma_check_status(const CStatus& status) nogil except -1:
    if status.ok():
        return 0

    with gil:
        message = frombytes(status.message())
        if IsPlasmaObjectExists(status):
            raise PlasmaObjectExists(message)
        elif IsPlasmaObjectNotFound(status):
            raise PlasmaObjectNotFound(message)
        elif IsPlasmaStoreFull(status):
            raise PlasmaStoreFull(message)

    return check_status(status)

Starting with 0.29.29, Cython generates an incorrect epilog for plasma_check_status:

  /* "pyarrow/_plasma.pyx":275
 * 
 * 
 * cdef int plasma_check_status(const CStatus& status) nogil except -1:             # <<<<<<<<<<<<<<
 *     if status.ok():
 *         return 0
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  #ifdef WITH_THREAD
  __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __Pyx_AddTraceback("pyarrow._plasma.plasma_check_status", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_message);
  return __pyx_r;

Notice that __Pyx_XDECREF is called after releasing the GIL.

Here's the diff between 0.29.28 and 0.29.29 generated code:

   /* function exit code */
-  __pyx_r = 0;
-  goto __pyx_L0;
   __pyx_L1_error:;
   __Pyx_XDECREF(__pyx_t_2);
   __Pyx_XDECREF(__pyx_t_3);
   __Pyx_XDECREF(__pyx_t_4);
   __Pyx_XDECREF(__pyx_t_5);
+  #ifdef WITH_THREAD
+  __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
+  #endif
   __Pyx_AddTraceback("pyarrow._plasma.plasma_check_status", __pyx_clineno, __pyx_lineno, __pyx_filename);
   __pyx_r = -1;
-  __pyx_L0:;
-  __Pyx_XDECREF(__pyx_v_message);
   #ifdef WITH_THREAD
   __Pyx_PyGILState_Release(__pyx_gilstate_save);
   #endif
+  __pyx_L0:;
+  __Pyx_XDECREF(__pyx_v_message);
   return __pyx_r;
@pitrou
Copy link
Contributor Author

pitrou commented May 17, 2022

I forgot to mention the symptom is a fatal error:

Fatal Python error: Python memory allocator called without holding the GIL

and the GDB backtrace looks like this:

#0  0x00007f32630e703b in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f32630c6859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00000000004a2f15 in ?? ()
#3  0x00000000004a2f31 in Py_FatalError ()
#4  0x00000000004d9cd3 in ?? ()
#5  0x000000000064987f in _PyFaulthandler_Fini ()
#6  0x00000000004a2efe in ?? ()
#7  0x00000000004a2f31 in Py_FatalError ()
#8  0x00000000004d9cd3 in ?? ()
#9  0x00007f321184075c in _Py_DECREF (filename=0x7f321186c000 "/usr/include/python3.8/object.h", lineno=541, op=0x7f3211313e40)
    at /usr/include/python3.8/object.h:478
#10 0x00007f32118407ac in _Py_XDECREF (op=0x7f3211313e40) at /usr/include/python3.8/object.h:541
#11 0x00007f3211844b4d in __pyx_f_7pyarrow_7_plasma_plasma_check_status (__pyx_v_status=...) at _plasma.cpp:6523
#12 0x00007f32118545b8 in __pyx_pf_7pyarrow_7_plasma_4connect (__pyx_self=0x0, __pyx_v_store_socket_name=0x7f32118a28e0, __pyx_v_num_retries=1) at _plasma.cpp:12132
#13 0x00007f32118541a6 in __pyx_pw_7pyarrow_7_plasma_5connect (__pyx_self=0x0, __pyx_args=0x7f32112a1a00, __pyx_kwds=0x7f321153c650) at _plasma.cpp:12051
#14 0x00000000005f3989 in PyCFunction_Call ()

@scoder
Copy link
Contributor

scoder commented May 17, 2022

Thanks for the quick report. I've reverted #4749 in 7fbf5ee in the 0.29.x branch. Could you please verify that this fixes the issue?

@pitrou
Copy link
Contributor Author

pitrou commented May 17, 2022

Thanks for the quick report. I've reverted #4749 in 7fbf5ee in the 0.29.x branch. Could you please verify that this fixes the issue?

It does, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants