-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): Create string/binary arrays from iterables #430
Conversation
@@ -23,7 +23,7 @@ | |||
import warnings | |||
|
|||
|
|||
# Generate the nanoarrow_c.pxd file used by the Cython extension | |||
# Generate the nanoarrow_c.pxd file used by the Cython extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was basically just to force the wheels to build 😬
code = ArrowBufferReserve(self._buffer._ptr, bytes_per_element) | ||
if code != NANOARROW_OK: | ||
Error.raise_error("ArrowBufferReserve()", code) | ||
|
||
pack_into(self, self._buffer._ptr.size_bytes, *item) | ||
self._buffer._ptr.size_bytes += bytes_per_element | ||
write(pack(*item)) | ||
n_values += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you rolled this back?
(AFAIK this part was not the reason for the failures?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't a reason for the original failure but was exposed once that was fixed. Exactly what type of object is considered a buffer is different for (maybe just some versions of) PyPy, so Struct.packinto()
threw an error. If we have time we can roll that back in (or figure out if there's some minimum version of PyPy we have to drop to make it work), but at least this gets us back to wheels that build!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, sounds good!
This PR adds support for building string and binary arrays via iterable.
It also cleans up a few parts of #426 that resulted in the wheel builds failing for (at least) PyPy 3.8 and 3.9. We can circle back to the performance of building from iterables (and whether or not
pack_into()
is essential) when all the wheels are building reliably.The "build from iterable" code is now sufficiently complicated that it should be separated out. I did an initial attempt at that for this PR; however, it scrambles things up a bit and is complicated by the interdependence between the functions that sanitize arguments (e.g.,
c_schema()
,c_array()
) and the functions that build from iterable.Currently faster for strings and slightly slower for bytes than pyarrow.