ARROW-2141: [Python] Support variable length binary conversion from Pandas#1689
Conversation
There was a problem hiding this comment.
This line looks misindented. Does your editor insert tabs instead of spaces?
There was a problem hiding this comment.
oops, let me fix that.
|
See also linting failure at https://travis-ci.org/apache/arrow/jobs/348494207#L750 |
That sounds reasonable to me. Actually, any object supporting the buffer protocol should do. |
|
Thanks @pitrou! Let me fix the formatting issues and I'll look into supporting bytearrays. |
There was a problem hiding this comment.
We're probably doing this kind of dance (taking a bytes object and extracting a pointer and size) in other places already. I think it would be nice to factor that out somewhere (see e.g. src/arrow/python/helpers.h). That would also allow supporting bytearray in other places.
There was a problem hiding this comment.
I have trouble parsing this... If we don't know how to convert this, we should fail, not simply break from the loop, no?
There was a problem hiding this comment.
After reading the code a bit more carefully, I understand... though there is still a problem: what if length is larger than kBinaryMemoryLimit?
|
I moved this JIRA to 0.10.0 so we can give this situation a working over |
|
No problem to hold off on this for a while, seems like there are maybe some issues that @pitrou pointed out that need a deeper look |
|
@wesm and @pitrou, since the issues brought up here did not originate from this pr, do you think this could be merged and then follow up with another JIRA to look at these issues? This is blocking https://issues.apache.org/jira/browse/SPARK-23555 so if possible I'd like to get it resolved. |
|
Can you open the relevant JIRA issue? Also, before this PR is merged, the Travis-CI failures should be fixed. |
130899d to
f2c0956
Compare
|
I made https://issues.apache.org/jira/browse/ARROW-2380 to cover the issues brought up |
|
The remaining Travis failures are with GLib builds and seem unrelated |
Currently, when performing
from_pandasconversion with binary data and the user specifies the type as variable length binarypa.binary()then the type is inferred and a cast from binary to binary is attempted. The casting then fails because the cast kernel does not support binary types. This PR checks if the user specifies a variable length binary type in conversion, and then copies data to a BinaryArray instead of trying to infer the type and then casting.