New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-14449: [Python] RecordBatch in Cython is missing column_data method #11527
ARROW-14449: [Python] RecordBatch in Cython is missing column_data method #11527
Conversation
Note: The def get_array_data(obj):
cdef:
shared_ptr[CRecordBatch] record = pyarrow_unwrap_batch(obj)
shared_ptr[CArray] array
shared_ptr[CArrayData] data
array = record.get().column(1)
data = array.get().data()
return data.get() but def get_array_data(obj):
cdef:
shared_ptr[CRecordBatch] record = pyarrow_unwrap_batch(obj)
shared_ptr[CArrayData] data
data = record.get().column_data(1)
return data.get() |
cc @jorisvandenbossche Please review this PR when you get a chance. Do you think it is valid to expose |
Quick thought: the code looks fine, but I am wondering how could "test" something like this? How do we ensure we don't accidentally remove it later one? (for example if someone sees it and that it is not used internally and cleans it up) Unless we make it a principle that all the public C++ APIs can/should also be exposed in cython, even if we don't use them ourselves (but so that's the broader discussion on the mailing list, I think) |
@jorisvandenbossche We can definitely add a test case invoking the |
Why not, but we don't promise to reflect all C++ APIs in the Cython includes. Also, we should certainly not start adding tests for this, IMHO. |
FYI, this PR was submitted as a solution to this GH issue. |
If we don't want to promise this, and don't want to test this, I am not sure if we should actually add it. Because without any test, we have no way to prevent that in a next PR someone might remove it again as clean-up (since it's not used internally), breaking downstream code. (I know this is the same for existing cython APIs as well, if we would stop using one internally, though) |
I think we should close this PR and not add Also, we should first establish the rules for which C++ API should be made available in Cython. |
Adds API for column_data() methods to Cython's
CRecordBatch
but not to Python's RecordBatch because ArrayData is not exposed in Python, only in Cython.