You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When testing round trips of empty arrays between Python and Go using the C Data Interface, I found an issue with the binary and string data type arrays.
The data types: pa.binary(), pa.large_binary(), pa.string(), pa.large_string() all throw an error when calling validate(full=True) after the _import_from_c that contained a null value data buffer:
Traceback (most recent call last):
File "/home/zeroshade/Projects/GitHub/arrow/go/arrow/cdata/test/test_export_to_cgo.py", line 218, in test
b.validate(full=True)
File "pyarrow/array.pxi", line 1501, in pyarrow.lib.Array.validate
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Value data buffer is null
Following up from #14805 clarifying that buffers can be null in a 0-length array. My guess here is rather than the offsets buffer, the issue is the second data buffer which would contain the actual binary/utf-8 data if the array had a length >0. But that's just a theory, I haven't confirmed it.
Component(s)
C++, Python
The text was updated successfully, but these errors were encountered:
Hmm, I'm not sure it's a good idea to relax the validation check after all.
There's no guarantee that a null data pointer on binary arrays would work correctly with all our internal routines.
Rather than that, it sounds safer to turn the null pointer into a non-null 0-size buffer at the C Data Interface boundary.
Would this still be something that needs to be fixed / changed in the PyArrow / C++ libs? Or should the Go lib be changed to not provide a null pointer in this situation?
…14814)
The C data interface may expose null data pointers for zero-sized buffers.
Make sure that all buffer pointers remain non-null internally.
Followup to GH-14805
* Closes: #14875
Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…14814)
The C data interface may expose null data pointers for zero-sized buffers.
Make sure that all buffer pointers remain non-null internally.
Followup to GH-14805
* Closes: #14875
Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Describe the bug, including details regarding any error messages, version, and platform.
Spinning off from #14814:
When testing round trips of empty arrays between Python and Go using the C Data Interface, I found an issue with the binary and string data type arrays.
The data types:
pa.binary()
,pa.large_binary()
,pa.string()
,pa.large_string()
all throw an error when callingvalidate(full=True)
after the_import_from_c
that contained a null value data buffer:Following up from #14805 clarifying that buffers can be null in a 0-length array. My guess here is rather than the offsets buffer, the issue is the second data buffer which would contain the actual binary/utf-8 data if the array had a length >0. But that's just a theory, I haven't confirmed it.
Component(s)
C++, Python
The text was updated successfully, but these errors were encountered: