Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14449: [Python] RecordBatch in Cython is missing column_data method #11527

Conversation

edponce
Copy link
Contributor

@edponce edponce commented Oct 22, 2021

Adds API for column_data() methods to Cython's CRecordBatch but not to Python's RecordBatch because ArrayData is not exposed in Python, only in Cython.

@github-actions
Copy link

@edponce
Copy link
Contributor Author

edponce commented Oct 22, 2021

Note: The CArrayData of a CRecordBatch can still be accessed without column_data() via the data() method of CArray.

def get_array_data(obj):
    cdef:
        shared_ptr[CRecordBatch] record = pyarrow_unwrap_batch(obj)
        shared_ptr[CArray] array
        shared_ptr[CArrayData] data
    array = record.get().column(1)
    data = array.get().data()
    return data.get()

but column_data() allows a more direct access

def get_array_data(obj):
    cdef:
        shared_ptr[CRecordBatch] record = pyarrow_unwrap_batch(obj)
        shared_ptr[CArrayData] data
    data = record.get().column_data(1)
    return data.get()

@edponce
Copy link
Contributor Author

edponce commented Oct 22, 2021

cc @jorisvandenbossche Please review this PR when you get a chance. Do you think it is valid to expose column_data() based on previous Note?

@jorisvandenbossche
Copy link
Member

Quick thought: the code looks fine, but I am wondering how could "test" something like this? How do we ensure we don't accidentally remove it later one? (for example if someone sees it and that it is not used internally and cleans it up) Unless we make it a principle that all the public C++ APIs can/should also be exposed in cython, even if we don't use them ourselves (but so that's the broader discussion on the mailing list, I think)

@edponce
Copy link
Contributor Author

edponce commented Oct 28, 2021

@jorisvandenbossche We can definitely add a test case invoking the column_data() methods. I understand that the discussion in ML is important and is still unresolved, but my reasoning for allowing the column_data() method is because the Array type already has a data() method that returns the CArrayData so this PR would be giving the same access to RecordBatch in a more concise manner.

@pitrou
Copy link
Member

pitrou commented Nov 3, 2021

Why not, but we don't promise to reflect all C++ APIs in the Cython includes.

Also, we should certainly not start adding tests for this, IMHO.

@edponce
Copy link
Contributor Author

edponce commented Nov 3, 2021

FYI, this PR was submitted as a solution to this GH issue.

@jorisvandenbossche
Copy link
Member

Why not, but we don't promise to reflect all C++ APIs in the Cython includes.

Also, we should certainly not start adding tests for this, IMHO.

If we don't want to promise this, and don't want to test this, I am not sure if we should actually add it. Because without any test, we have no way to prevent that in a next PR someone might remove it again as clean-up (since it's not used internally), breaking downstream code.

(I know this is the same for existing cython APIs as well, if we would stop using one internally, though)

@edponce
Copy link
Contributor Author

edponce commented Nov 15, 2021

I think we should close this PR and not add column_data() method to CRecordBatch unless more users or other code require it. As shown in examples above, there exists a way to access the CArrayData without the need of column_data().

Also, we should first establish the rules for which C++ API should be made available in Cython.

@edponce edponce closed this Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants