[BUG] Should cython object buffers check for NULL? #4858

seberg · 2022-06-22T23:17:57Z

Pandas seems to semi-regularly run into issues with NumPy's logic which currently says that object arrays may be filled with NULL at initialization so that NULL is accepted everywhere to have the same meaning as None.
(See also pandas-dev/pandas#47097)

This is a bit weird, since NumPy also fills the array with None almost always, so in the few places where it doesn't it is unexpected!

I have opened numpy/numpy#21817 to solve this in NumPy. The intention would be that NumPy for now should accept NULL, but defines it as incorrect and will never produce it on its own (with some "internal" exceptions).

Now, I am not sure what the best solution is here and if you think that Cython should fix this (or we should do both), I can look into it.
In some sense, a fix in cython might be best... then pandas can just upgrade its Cython dependency and stop worrying about these oddities.

The text was updated successfully, but these errors were encountered:

seberg · 2022-06-23T00:14:37Z

I think this would be the right changes (probably, I did not test it thoroughly):

diff --git a/Cython/Compiler/ExprNodes.py b/Cython/Compiler/ExprNodes.py
index c20a76bd4..19aa787e0 100644
--- a/Cython/Compiler/ExprNodes.py
+++ b/Cython/Compiler/ExprNodes.py
@@ -4578,17 +4578,17 @@ class BufferIndexNode(_IndexingBaseNode):
         buffer_entry, ptrexpr = self.buffer_lookup_code(code)
 
         if self.buffer_type.dtype.is_pyobject:
-            # Must manage refcounts. Decref what is already there
-            # and incref what we put in.
+            # Must manage refcounts. XDecref what is already there
+            # and incref what we put in (NumPy allows there to be NULL)
             ptr = code.funcstate.allocate_temp(buffer_entry.buf_ptr_type,
                                                manage_ref=False)
             rhs_code = rhs.result()
             code.putln("%s = %s;" % (ptr, ptrexpr))
-            code.put_gotref("*%s" % ptr, self.buffer_type.dtype)
-            code.putln("__Pyx_INCREF(%s); __Pyx_DECREF(*%s);" % (
+            code.put_xgotref("*%s" % ptr, self.buffer_type.dtype)
+            code.putln("__Pyx_INCREF(%s); __Pyx_XDECREF(*%s);" % (
                 rhs_code, ptr))
             code.putln("*%s %s= %s;" % (ptr, op, rhs_code))
-            code.put_giveref("*%s" % ptr, self.buffer_type.dtype)
+            code.put_xgiveref("*%s" % ptr, self.buffer_type.dtype)
             code.funcstate.release_temp(ptr)
         else:
             # Simple case
@@ -4609,8 +4609,11 @@ class BufferIndexNode(_IndexingBaseNode):
             # is_temp is True, so must pull out value and incref it.
             # NOTE: object temporary results for nodes are declared
             #       as PyObject *, so we need a cast
-            code.putln("%s = (PyObject *) *%s;" % (self.result(), self.buffer_ptr_code))
-            code.putln("__Pyx_INCREF((PyObject*)%s);" % self.result())
+            res = self.result()
+            code.putln("%s = (PyObject *) *%s;" % (res, self.buffer_ptr_code))
+            # NumPy does (occasionally) allow NULL to denote None.
+            code.putln("if (%s == NULL) %s = Py_None;" % (res, res))
+            code.putln("__Pyx_INCREF((PyObject*)%s);" % res)
 
     def free_subexpr_temps(self, code):
         for temp in self.index_temps:

I had a brief look at the test, but would have to look more to figure out how to get the NULLs in there best.

While NumPy tends to not actively create object buffers initialized only with NULL (rather than filled with None), at least older versions of NumPy did do that. And NumPy guards against this. This guards against embedded NULLs in object buffers interpreting a NULL as None (and anticipating a NULL value also when setting the buffer for reference count purposes). Closes cythongh-4858

* BUG: Fortify object buffers against included NULLs While NumPy tends to not actively create object buffers initialized only with NULL (rather than filled with None), at least older versions of NumPy did do that. And NumPy guards against this. This guards against embedded NULLs in object buffers interpreting a NULL as None (and anticipating a NULL value also when setting the buffer for reference count purposes). Closes gh-4858

da-woods · 2022-07-03T07:34:35Z

I guess the follow-up question is: is there an equivalent issue with object memoryviews?

seberg · 2022-07-03T12:53:27Z

Oh, this only fixes np.ndaray and not object[:, :]? Because in that case... yes, and I should look into it.

da-woods · 2022-07-03T12:55:46Z

I believe so (although I haven't actually tested what typed memoryviews do). I'll re-open this issue and so it covers them too

da-woods · 2022-07-03T13:04:51Z

So it looks like this did affect memoryviews too (which I'm slightly surprised at but I guess is good). So false alarm I think

da-woods · 2022-07-03T13:07:42Z

I'll just create a copy of the tests for memoryviews and then it can be closed properly

seberg · 2022-07-06T19:54:07Z

I think we can close this now, thanks! Will try to remember to ping the pandas folks when I see the release, it sounds like they can remove some awful hacks then ;).

seberg mentioned this issue Jun 23, 2022

BUG: Fortify object buffers against included NULLs #4859

Merged

da-woods closed this as completed in #4859 Jul 3, 2022

da-woods added Buffers numpy labels Jul 3, 2022

da-woods added this to the 0.29.31 milestone Jul 3, 2022

da-woods reopened this Jul 3, 2022

da-woods mentioned this issue Jul 3, 2022

Tests for NULL objects in memoryviews #4871

Merged

seberg closed this as completed Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Should cython object buffers check for NULL? #4858

[BUG] Should cython object buffers check for NULL? #4858

seberg commented Jun 22, 2022

seberg commented Jun 23, 2022

da-woods commented Jul 3, 2022

seberg commented Jul 3, 2022

da-woods commented Jul 3, 2022

da-woods commented Jul 3, 2022

da-woods commented Jul 3, 2022

seberg commented Jul 6, 2022

[BUG] Should cython object buffers check for NULL? #4858

[BUG] Should cython object buffers check for NULL? #4858

Comments

seberg commented Jun 22, 2022

seberg commented Jun 23, 2022

da-woods commented Jul 3, 2022

seberg commented Jul 3, 2022

da-woods commented Jul 3, 2022

da-woods commented Jul 3, 2022

da-woods commented Jul 3, 2022

seberg commented Jul 6, 2022