Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when passing single numpy record to cython function containing string #2978

Open
synapticarbors opened this issue May 31, 2019 · 0 comments

Comments

Projects
None yet
1 participant
@synapticarbors
Copy link

commented May 31, 2019

I ran across a bug today (tested with 0.29.8), where if you define a struct to match the dtype of a structured array, and the structured array has a string field, if you pass a single "row" from that array into a cython cpdef function expecting the matching struct, and the record does not have a string of the same length as the char array, cython throws an error. This error does not occur if you are operating fully on the cython side and never cross the python/cython boundary.

In the below gist is a minimal example that demonstrates the bug
https://gist.github.com/synapticarbors/0f9014084994f384c0713dc27be96006

I'm copying the example below in full:

testlib.pyx

import numpy as np
cimport numpy as np


cdef packed struct foo_type:
    np.float64_t x
    char[4] y
    np.int64_t z


cpdef double get_foo_x(foo_type f):
    return f.x


cpdef double foo_test_struct():
    cdef:
        foo_type f

    f.x = 1.0
    for i, c in enumerate(b'ab'):
        f.y[i] = c
    f.z = 2

    return get_foo_x(f)


cpdef double foo_test_nparray():
    cdef:
        foo_type[:] x

    N = 5
    dtype = [('x', np.float64), ('y', 'S4'), ('z', np.int64)]
    xarr = np.zeros(N, dtype=dtype)

    xarr['x'] = np.arange(N) + 1.0
    xarr['y'] = b'ab'
    xarr['z'] = np.arange(N, 0, -1)

    x = xarr

    return get_foo_x(x[0])

test.py

import numpy as np

import testlib

print('####################')
print('Internally created struct: {} (expected: 1.0)'.format(testlib.foo_test_struct()))
print('####################')
print('Internally created np array: {} (expected: 1.0)'.format(testlib.foo_test_nparray()))
print('####################')
print('####################')
print('Testing np record from python -> cython')
N = 5
dtype = [('x', np.float64), ('y', 'S4'), ('z', np.int64)]
x = np.zeros(N, dtype=dtype)

x['x'] = np.arange(N).astype(np.float64) + 1
x['y'] = b'ab'
x['z'] = np.arange(N, 0, -1)

# Set first record to have a string that takes up the entire 4 elements
x['y'][0] = 'abcd'

# This works
print('Externally created np array w/full string: {} (expected: 1.0)'.format(testlib.get_foo_x(x[0])))

# This crashes
print('Externally created np array w/short string: {} (expected: 2.0)'.format(testlib.get_foo_x(x[1])))

Running test.py gives the following output:

####################
Internally created struct: 1.0 (expected: 1.0)
####################
Internally created np array: 1.0 (expected: 1.0)
####################
####################
Testing np record from python -> cython
Externally created np array w/full string: 1.0 (expected: 1.0)
Traceback (most recent call last):
  File "test.py", line 27, in <module>
    print('Externally created np array w/full string: {} (expected: 2.0)'.format(testlib.get_foo_x(x[1])))
  File "testlib.pyx", line 11, in testlib.get_foo_x
    cpdef double get_foo_x(foo_type f):
  File "stringsource", line 25, in FromPyStructUtility.__pyx_convert__from_py_struct____pyx_t_7testlib_foo_type
  File "stringsource", line 93, in carray.from_py.__Pyx_carray_from_py_char
IndexError: not enough values found during array assignment, expected 4, got 2

It looks like the source of the problem is:

@cname("{{cname}}")
cdef int {{cname}}(object o, {{base_type}} *v, Py_ssize_t length) except -1:
cdef Py_ssize_t i = length
try:
i = len(o)
except (TypeError, OverflowError):
pass
if i == length:
for i, item in enumerate(o):
if i >= length:
break
v[i] = item
else:
i += 1 # convert index to length
if i == length:
return 0
PyErr_Format(
IndexError,
("too many values found during array assignment, expected %zd"
if i >= length else
"not enough values found during array assignment, expected %zd, got %zd"),
length, i)

I think when it iterates over the character array it doesn't take into account what comes after the null terminated end of the string and then thinks there isn't enough data there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.