Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Add C Data Interface format import/export functionality for arrow.tabular.RecordBatch #41803

Closed
sgilmore10 opened this issue May 23, 2024 · 1 comment

Comments

@sgilmore10
Copy link
Member

Describe the enhancement requested

Now that #41656 has been closed, we should add MATLAB APIs for importing/exporting arrow.tabular.RecordBatches using the C Data Interface format.

The C Data Interface format import/export workflows would like this:

Import into MATLAB

cArray = arrow.c.Array
cSchema = arrow.c.Schema
.
.
. 
% Pass cArray and cSchema to export APIs of another Arrow language bindings to fill in C struct details
.
.
.
% Import Arrow RecordBatch from pre-populated C Data interface format C structs
rb = arrow.tabular.RecordBatch.importFromC(cArray, cSchema);

Export from MATLAB

.
.
. 
% Create C Data Interface format ArrowArray and ArrowSchema C structs using APIs of another Arrow language binding ...
.
.
.
rb = arrow.recordBatch(table(1:10)')).
% Export Arrow RecordBatch from MATLAB to C Data Interface format and fill in C struct details
rb.exportToC(cArrayAddress, cSchemaAddress)
.
.
.
% Import Arrow RecordBatch from pre-populated C Data Interface format C structs using APIs of another Arrow language binding 

We can implement this functionality using the C Data Interface format C++ APIS defined in https://github.com/apache/arrow/blob/main/cpp/src/arrow/c/bridge.h.

Component(s)

MATLAB

@sgilmore10 sgilmore10 self-assigned this May 23, 2024
sgilmore10 added a commit that referenced this issue May 28, 2024
…ality for `arrow.tabular.RecordBatch` (#41817)

### Rationale for this change

This pull requests adds two new APIs for importing and exporting `arrow.tabular.RecordBatch` instances using the C Data Interface format.

**Example:**
```matlab
>> T = table((1:3)', ["A"; "B"; "C"]);
>> expected = arrow.recordBatch(T)

expected = 

  Arrow RecordBatch with 3 rows and 2 columns:

    Schema:

        Var1: Float64 | Var2: String

    First Row:

        1 | "A"

>> cArray = arrow.c.Array();
>> cSchema = arrow.c.Schema();

% Export the RecordBatch to C Data Interface Format
>> expected.export(cArray.Address, cSchema.Address);

% Import the RecordBatch from C Data Interface Format
>> actual = arrow.tabular.RecordBatch.import(cArray, cSchema)

actual = 

  Arrow RecordBatch with 3 rows and 2 columns:

    Schema:

        Var1: Float64 | Var2: String

    First Row:

        1 | "A"

% The RecordBatch is the same after round-tripping to the C Data Interface format
>> isequal(actual, expected)

ans =

  logical

   1

```
### What changes are included in this PR?

1. Added a new method `arrow.tabular.RecordBatch.export` for exporting `RecordBatch` objects to the C Data Interface format.
2. Added a new static method `arrow.tabular.RecordBatch.import` for importing `RecordBatch` objects from the C Data Interface format.
3. Added a new internal class `arrow.c.internal.RecordBatchImporter` for importing `RecordBatch` objects from the C Data Interface format.

### Are these changes tested?

Yes.

1. Added a new test file `matlab/test/arrow/c/tRoundtripRecordBatch.m` which has basic round-trip tests for importing and exporting `RecordBatch` objects.

### Are there any user-facing changes?

Yes.

1. Two new user-facing methods were added to `arrow.tabular.RecordBatch`. The first is `arrow.tabular.RecordBatch.export(cArrowArrayAddress, cArrowSchemaAddress)`. The second is `arrow.tabular.RecordBatch.import(cArray, cSchema)`. These APIs can be used to export/import `RecordBatch` objects using the C Data Interface format.

### Future Directions

1. Add integration tests for sharing data between MATLAB/mlarrow and Python/pyarrow running in the same process using the [MATLAB interface to Python](https://www.mathworks.com/help/matlab/call-python-libraries.html).
2. Add support for the Arrow [C stream interface format](https://arrow.apache.org/docs/format/CStreamInterface.html).

### Notes

1. Thanks to @ kevingurney for the help with this feature! 
* GitHub Issue: #41803

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Sarah Gilmore <sgilmore@mathworks.com>
@sgilmore10 sgilmore10 added this to the 17.0.0 milestone May 28, 2024
@sgilmore10
Copy link
Member Author

Issue resolved by pull request 41817
#41817

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

1 participant