Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-copy getting Arrow array? #872

Closed
Moelf opened this issue Dec 27, 2020 · 6 comments
Closed

Non-copy getting Arrow array? #872

Moelf opened this issue Dec 27, 2020 · 6 comments

Comments

@Moelf
Copy link

Moelf commented Dec 27, 2020

The package awkward provides support for rugged array:

arr = [[1,2,3], [], [4,5]]

Obviously Numpy doesn't support this, but Arrow does, is there a way to interface with such object so we don't have to copy?

If not general, is there a way for me to implement specifically for this python package?

@stevengj
Copy link
Member

Yes, just do pycall(ak.to_arrayset, PyObject, ...), and so on. By passing PyObject to the pycall function, you tell it not to convert/copy to a Julia object.

@Moelf
Copy link
Author

Moelf commented Dec 27, 2020

sorry I wasn't clear about my question, this works, but I can't use this python arrow array like a Julia array:

julia> @time ak.to_arrow(arr)
  0.001698 seconds (6 allocations: 288 bytes)
PyObject <pyarrow.lib.StructArray object at 0x7ff0b726ca68>
-- is_valid: all not null
-- child 0 type: list<item: double>
  [
    [
      0.304889,
      0.353568,
      1.23984,
      2.58227,
.....

julia> PyArray(ak.to_arrow(arr))
ERROR: PyError ($(Expr(:escape, :(ccall(#= /home/net3/jiling/.julia/packages/PyCall/BcTLp/src/pybuffer.jl:124 =# @pysym(:PyObject_GetBuffer), Cint, (PyPtr, Ref{PyBuffer}, Cint), o, b, flags))))) <class 'TypeError'>
TypeError("a bytes-like object is required, not 'pyarrow.lib.StructArray'",)

@Moelf
Copy link
Author

Moelf commented Dec 27, 2020

@stevengj I believed technically I should be able to do

julia> pycall(ak.to_arrow, Arrow.List, arr)
ERROR: MethodError: Cannot `convert` an object of type
  PyObject to an object of type
  Arrow.List
Closest candidates are:
  convert(::Type{T}, ::T) 

since Arrow List should have a consistent memory layout across the board. But I'm not sure if I can make this happen trivially. Maybe it's also a question for @quinnj (if it's possible at all)

@stevengj
Copy link
Member

PyObject is all you need here. Arrow.List objects are PyObjects.

@Moelf
Copy link
Author

Moelf commented Dec 27, 2020

sorry, the Arrow.List is from Arrow.jl

What I want is a rugged vector that can be iterated/mapped by Julia functions without significant slowdown due to calling into libpython

@stevengj
Copy link
Member

Then Arrow.jl should define a convert method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants