Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect result when comparing strings in memoryview of numpy recarray #3214

Open
synapticarbors opened this issue Oct 28, 2019 · 2 comments

Comments

@synapticarbors
Copy link

@synapticarbors synapticarbors commented Oct 28, 2019

Testing the latest release of Cython (0.29.13) using both Python2 and Python3 (with language_level "2" and "3", respectively), I believe I found a bug when comparing fields of a memoryview of a numpy recarray that are byte strings. The comparison always results in False. As a minimal example from the python3 code:

import numpy as np
cimport numpy as np

cdef packed struct a_type:
    np.float64_t a1
    char[5] a2

cpdef double test_func(a_type[::1] x):
    cdef:
        int i
        double res

    res = 0.0

    for i in range(1, x.shape[0]):
        print(x[0].a2, x[i].a2, x[0].a2 == x[i].a2, x[i].a2 == b'abcd')
        if x[0].a2 == x[i].a2:
            res += 1.0

    return 0.0

and then running the code:

import numpy as np

from mod1.ftest import test_func

if __name__ == '__main__':
    a = np.recarray(4, dtype=[('a1', np.float64), ('a2', 'S5')])
    a[0] = (9.0, b'abcd')
    a[1] = (8.0, b'abcd')
    a[2] = (8.0, b'xyz')
    a[3] = (8.0, b'abcd')

    print(test_func(a))

I get the following output:

b'abcd' b'abcd' False True
b'abcd' b'xyz' False False
b'abcd' b'abcd' False True
0.0

Comparing the field to a literal byte string works correctly as demonstrated in the print statement. Calling .decode() on the field also corrects the issue.

I've also pulled up an old environment I had laying around with Cython 0.25.2, and I see the issue there as well.

@achernomorov

This comment has been minimized.

Copy link

@achernomorov achernomorov commented Oct 29, 2019

x[0].a2 == x[i].a2
transform to

__pyx_t_11 = __Pyx_PyBool_FromLong(((*((struct __pyx_t_6gh3214_a_type *) ( /* dim=0 */ ((char *) (((struct __pyx_t_6gh3214_a_type *) __pyx_v_x.data) + __pyx_t_9)) ))).a2 == (*((struct __pyx_t_6gh3214_a_type *) ( /* dim=0 */ ((char *) (((struct __p
yx_t_6gh3214_a_type *) __pyx_v_x.data) + __pyx_t_10)) ))).a2));

so we are comparing char[] memory addresses NOT string literals.

I could suggest such a workaround

for i in range(1, x.shape[0]):
        print(x[0].a2, x[i].a2, str(x[0].a2) == str(x[i].a2), x[i].a2 == b'abcd')
        if x[0].a2 == x[i].a2:
            res += 1.0

As i can see it adding type conversion call
__Pyx_PyObject_CallOneArg(((PyObject *)(&PyString_Type)), __pyx_t_10)
but after it's using
PyObject_RichCompare instead of ==.

@synapticarbors

This comment has been minimized.

Copy link
Author

@synapticarbors synapticarbors commented Oct 29, 2019

#1141 seems related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.