Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading column_data from RecordBatch in Cython #11523

Closed
darklord1807 opened this issue Oct 22, 2021 · 3 comments
Closed

Reading column_data from RecordBatch in Cython #11523

darklord1807 opened this issue Oct 22, 2021 · 3 comments

Comments

@darklord1807
Copy link

Hello,

I am trying tor read the column data from the CRecordBatch object, but it says Object of type 'CRecordBatch' has no attribute 'column_data'.

Example.pyx

from pyarrow.lib cimport *
from pyarrow cimport import_pyarrow
import_pyarrow()

def get_array_length(obj):
    cdef shared_ptr[CRecordBatch] record = pyarrow_unwrap_batch(obj)
    if record.get() == NULL:
        raise TypeError("not a batch")
    return record.get().column_data(1)

test.py

batch = pa.RecordBatch.from_pandas(df)
print(example.get_array_length(batch))

Full Error:

  tree = Parsing.p_module(s, pxd, full_module_name)

Error compiling Cython file:
------------------------------------------------------------
...
    cdef shared_ptr[CRecordBatch] record = pyarrow_unwrap_batch(obj)

    if record.get() == NULL:
        raise TypeError("not a batch")

    return record.get().column_data(1)
                      ^
------------------------------------------------------------

example.pyx:15:23: Object of type 'CRecordBatch' has no attribute 'column_data'
Traceback (most recent call last):
  File "setup.py", line 9, in <module>
    ext_modules = cythonize("example.pyx")
  File "/usr/local/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1102, in cythonize
    cythonize_one(*args)
  File "/usr/local/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1225, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: example.pyx
@edponce
Copy link
Contributor

edponce commented Oct 22, 2021

Hi @darklord1807, thanks for informing there is a missing method in RecordBatch. I created ARROW-14449 to track this issue.

@edponce
Copy link
Contributor

edponce commented Oct 22, 2021

You still can access the data (ArrayData) as follows:

def get_array_length(obj):
    cdef:
        shared_ptr[CRecordBatch] record = pyarrow_unwrap_batch(obj)
        shared_ptr[CArray] array
        shared_ptr[CArrayData] data
    if record.get() == NULL:
        raise TypeError("not a batch")
    array = record.get().column(1)
    data = array.get().data()
    return data.get().length

@emkornfield
Copy link
Contributor

Closing as there is now a tracking issue and workaround posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants