Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Should cython object buffers check for NULL? #4858

Closed
seberg opened this issue Jun 22, 2022 · 7 comments · Fixed by #4859
Closed

[BUG] Should cython object buffers check for NULL? #4858

seberg opened this issue Jun 22, 2022 · 7 comments · Fixed by #4859

Comments

@seberg
Copy link
Contributor

seberg commented Jun 22, 2022

Pandas seems to semi-regularly run into issues with NumPy's logic which currently says that object arrays may be filled with NULL at initialization so that NULL is accepted everywhere to have the same meaning as None.
(See also pandas-dev/pandas#47097)

This is a bit weird, since NumPy also fills the array with None almost always, so in the few places where it doesn't it is unexpected!

I have opened numpy/numpy#21817 to solve this in NumPy. The intention would be that NumPy for now should accept NULL, but defines it as incorrect and will never produce it on its own (with some "internal" exceptions).

Now, I am not sure what the best solution is here and if you think that Cython should fix this (or we should do both), I can look into it.
In some sense, a fix in cython might be best... then pandas can just upgrade its Cython dependency and stop worrying about these oddities.

@seberg
Copy link
Contributor Author

seberg commented Jun 23, 2022

I think this would be the right changes (probably, I did not test it thoroughly):

diff --git a/Cython/Compiler/ExprNodes.py b/Cython/Compiler/ExprNodes.py
index c20a76bd4..19aa787e0 100644
--- a/Cython/Compiler/ExprNodes.py
+++ b/Cython/Compiler/ExprNodes.py
@@ -4578,17 +4578,17 @@ class BufferIndexNode(_IndexingBaseNode):
         buffer_entry, ptrexpr = self.buffer_lookup_code(code)
 
         if self.buffer_type.dtype.is_pyobject:
-            # Must manage refcounts. Decref what is already there
-            # and incref what we put in.
+            # Must manage refcounts. XDecref what is already there
+            # and incref what we put in (NumPy allows there to be NULL)
             ptr = code.funcstate.allocate_temp(buffer_entry.buf_ptr_type,
                                                manage_ref=False)
             rhs_code = rhs.result()
             code.putln("%s = %s;" % (ptr, ptrexpr))
-            code.put_gotref("*%s" % ptr, self.buffer_type.dtype)
-            code.putln("__Pyx_INCREF(%s); __Pyx_DECREF(*%s);" % (
+            code.put_xgotref("*%s" % ptr, self.buffer_type.dtype)
+            code.putln("__Pyx_INCREF(%s); __Pyx_XDECREF(*%s);" % (
                 rhs_code, ptr))
             code.putln("*%s %s= %s;" % (ptr, op, rhs_code))
-            code.put_giveref("*%s" % ptr, self.buffer_type.dtype)
+            code.put_xgiveref("*%s" % ptr, self.buffer_type.dtype)
             code.funcstate.release_temp(ptr)
         else:
             # Simple case
@@ -4609,8 +4609,11 @@ class BufferIndexNode(_IndexingBaseNode):
             # is_temp is True, so must pull out value and incref it.
             # NOTE: object temporary results for nodes are declared
             #       as PyObject *, so we need a cast
-            code.putln("%s = (PyObject *) *%s;" % (self.result(), self.buffer_ptr_code))
-            code.putln("__Pyx_INCREF((PyObject*)%s);" % self.result())
+            res = self.result()
+            code.putln("%s = (PyObject *) *%s;" % (res, self.buffer_ptr_code))
+            # NumPy does (occasionally) allow NULL to denote None.
+            code.putln("if (%s == NULL) %s = Py_None;" % (res, res))
+            code.putln("__Pyx_INCREF((PyObject*)%s);" % res)
 
     def free_subexpr_temps(self, code):
         for temp in self.index_temps:

I had a brief look at the test, but would have to look more to figure out how to get the NULLs in there best.

seberg added a commit to seberg/cython that referenced this issue Jun 23, 2022
While NumPy tends to not actively create object buffers initialized
only with NULL (rather than filled with None), at least older versions
of NumPy did do that.  And NumPy guards against this.

This guards against embedded NULLs in object buffers interpreting
a NULL as None (and anticipating a NULL value also when setting
the buffer for reference count purposes).

Closes cythongh-4858
da-woods pushed a commit that referenced this issue Jul 3, 2022
* BUG: Fortify object buffers against included NULLs

While NumPy tends to not actively create object buffers initialized
only with NULL (rather than filled with None), at least older versions
of NumPy did do that.  And NumPy guards against this.

This guards against embedded NULLs in object buffers interpreting
a NULL as None (and anticipating a NULL value also when setting
the buffer for reference count purposes).

Closes gh-4858
da-woods pushed a commit that referenced this issue Jul 3, 2022
* BUG: Fortify object buffers against included NULLs

While NumPy tends to not actively create object buffers initialized
only with NULL (rather than filled with None), at least older versions
of NumPy did do that.  And NumPy guards against this.

This guards against embedded NULLs in object buffers interpreting
a NULL as None (and anticipating a NULL value also when setting
the buffer for reference count purposes).

Closes gh-4858
@da-woods
Copy link
Contributor

da-woods commented Jul 3, 2022

I guess the follow-up question is: is there an equivalent issue with object memoryviews?

@da-woods da-woods added this to the 0.29.31 milestone Jul 3, 2022
@seberg
Copy link
Contributor Author

seberg commented Jul 3, 2022

Oh, this only fixes np.ndaray and not object[:, :]? Because in that case... yes, and I should look into it.

@da-woods
Copy link
Contributor

da-woods commented Jul 3, 2022

I believe so (although I haven't actually tested what typed memoryviews do). I'll re-open this issue and so it covers them too

@da-woods da-woods reopened this Jul 3, 2022
@da-woods
Copy link
Contributor

da-woods commented Jul 3, 2022

So it looks like this did affect memoryviews too (which I'm slightly surprised at but I guess is good). So false alarm I think

@da-woods
Copy link
Contributor

da-woods commented Jul 3, 2022

I'll just create a copy of the tests for memoryviews and then it can be closed properly

@seberg
Copy link
Contributor Author

seberg commented Jul 6, 2022

I think we can close this now, thanks! Will try to remember to ping the pandas folks when I see the release, it sounds like they can remove some awful hacks then ;).

@seberg seberg closed this as completed Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants