Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-33854: [MATLAB] Add basic libmexclass integration code to MATLAB interface #34563

Merged
merged 57 commits into from
Apr 25, 2023

Conversation

kevingurney
Copy link
Member

@kevingurney kevingurney commented Mar 14, 2023

Rationale for this change

This pull request is a follow up to this mailing list discussion about integrating mathworks/libmexclass with the MATLAB Interface to Arrow code base.

We've spent the last few months working on building libmexclass from scratch in order to ease development of the MATLAB Interface to Arrow. libmexclass essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the Proxy Design Pattern.

Our hope is that using libmexclass will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects.

What changes are included in this PR?

  1. Modifications were made to the CMake build system for the MATLAB interface to use libmexclass under the hood. This includes the addition of a new build flag -D MATLAB_ARROW_INTERFACE = ON | OFF which toggles building the new code that uses libmexclass under the hood.
  2. To illustrate the basic usage of libmexclass, we have added one new MATLAB class arrow.array.Float64Array. This class allows users to construct an Arrow array with logical type Float64 from a MATLAB double array with zero data copies. Under the hood, a Proxy wraps and bounds the lifetime of the underlying Arrow C++ Float64Array object. In addition, this Proxy is responsible for delegating method calls on an arrow.array.Float64Array to the corresponding Arrow C++ Float64Array.

Are these changes tested?

Yes, these changes have been tested on Linux, macOS, and Windows.

  1. We've modified the MATLAB CI GitHub Actions workflow (.github/workflows/matlab.yml) to build the new arrow.array.Float64Array code using libmexclass. This includes passing -D MATLAB_ARROW_INTERFACE=ON to the cmake command call in ci/scripts/matlab_build.sh.
  2. We've added a new basic MATLAB test test/arrow/array/tFloat64Array.m which tests for successful construction of an arrow.array.Float64Array. This test is passing successfully in the MATLAB CI workflow.
  3. We've confirmed that the Dev CI workflow linting checks are all passing and appropriate Apache license headers have been added.
  4. We've manually tested creation, deletion, and assignment of multiple arrow.array.Float64Array instances on Linux, macOS, and Windows with a variety of different MATLAB double arrays.

Are there any user-facing changes?

Yes, there is now a public class named arrow.array.Float64Array which is added to the MATLAB Path.

Included below is a simple example of creating two different arrow.array.Float64Array objects in MATLAB:

>> A = arrow.array.Float64Array([1, 2, 3])            

A = 

[
  1,
  2,
  3
]

>> random = arrow.array.Float64Array(rand(1, 10, 100))

random = 

[
  0.6311887342690112,
  0.355073651878849,
  0.9970032716066477,
  0.22417149898312716,
  0.6524510729686149,
  0.6049906419082594,
  0.38724543148313495,
  0.14218715929050407,
  0.025134985710203117,
  0.4211122537652413,
  ...
  0.6228027906591304,
  0.7966246853083961,
  0.74587490154065,
  0.12553623135481973,
  0.8223940067590204,
  0.02515050142850217,
  0.41442888092403163,
  0.7314074679729372,
  0.7813740002759628,
  0.367285915131369
]

Note: This is an early stage PR, so the naming scheme arrow.array.<Type>Array might change in the future.

Future Directions

  1. Currently, the "old" featherread/featherwrite code is still being built by CMake and installed to the specified CMAKE_INSTALL_PREFIX. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g. arrow.Table, arrow.Schema, arrow.RecordBatch, etc.) we should consider re-implementing this functionality in terms of the new APIs.
  2. We would like to start adding more numeric array classes like (arrow.array.UInt8Array, arrow.array.Int64Array, etc.).
  3. We only added one very basic test for arrow.array.Float64Array in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc.
  4. We don't have any documentation for arrow.array.Float64Array right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them.
  5. Lots more! This is just the beginning of building out the MATLAB Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as we go.

Notes

  1. Creating libmexclass and integrating it with the Arrow code base was a team effort! Thank you to @sreeharihegden, @lafiona, @sgilmore10, @jhughes-mw, and others at @mathworks for their help with this pull request!
  2. Closes: [MATLAB] Add basic libmexclass integration code to MATLAB interface #33854

kevingurney and others added 30 commits March 8, 2023 16:58
CMakeLists.txt.

Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
…libmexclass CMakeLists.txt in the

libmexclass/cpp directory.

Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
Co-authored-by: Fiona la <fionala7@gmail.com>
Also, update the register proxy macro used to reflect the libmexclass changes for mathworks/libmexclass#20.
Co-authored-by: Fiona la <fionala7@gmail.com>
…_matlab.

Also, updates to reflect latest libmexclass.
@github-actions github-actions bot added the awaiting changes Awaiting changes label Apr 20, 2023
@kevingurney
Copy link
Member Author

I believe we've addressed all the feedback on this pull request at this point. Thank you for all the helpful suggestions!

The refactored libmexclass code have been integrated and all CI checks are passing.

ci/scripts/matlab_build.sh Outdated Show resolved Hide resolved
matlab/CMakeLists.txt Show resolved Hide resolved
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 21, 2023
…Y_LIBRARY_INCLUDE_DIRS.

Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 24, 2023
@kevingurney
Copy link
Member Author

It looks like the CI failure for Python / AMD64 macOS 12 Python 3 (pull_request) is unrelated and is failing in main too.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Thanks!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Apr 25, 2023
@kou kou merged commit 9009dd7 into apache:main Apr 25, 2023
@ursabot
Copy link

ursabot commented Apr 25, 2023

Benchmark runs are scheduled for baseline = 966a804 and contender = 9009dd7. 9009dd7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Finished ⬇️0.77% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.48% ⬆️0.03%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 9009dd76 ec2-t3-xlarge-us-east-2
[Failed] 9009dd76 test-mac-arm
[Finished] 9009dd76 ursa-i9-9960x
[Finished] 9009dd76 ursa-thinkcentre-m75q
[Finished] 966a8040 ec2-t3-xlarge-us-east-2
[Failed] 966a8040 test-mac-arm
[Finished] 966a8040 ursa-i9-9960x
[Finished] 966a8040 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

kou added a commit that referenced this pull request May 9, 2023
…ays (#35479)

### Rationale for this change

This pull request is a followup to #34563. To facilitate the implementation of future array types, we would like to first create a C++ template class for numeric arrays. We also want to start adding basic tests for the array functionality in the MATLAB interface. 

### What changes are included in this PR?

1. Added a C++ template Class called `NumericArray` templated on `CType`.
2. Re-implemented the `Float64Array` C++ proxy class in terms of the new template class, i.e. `NumericArray<double>`.
3. Added a method called `double()` on the MATLAB Float64Array class to convert the arrow.[Type]Array to a MATLAB `double` array.
4. Added basic tests for round-tripping float64 arrays.
5. Created a base C++ proxy `Array` class that all proxy array classes will inherit from.
6. Renamed `Print()` to `ToString()` and made it return a string instead of printing to the screen.

### Are these changes tested?

Yes, we added automated test cases to the test class `tFloat64Array.m`. In addition, we manually qualified these changes on macOS.

### Are there any user-facing changes?
Yes, the `Print()` method is no longer public and there is now a method called `double()` on `arrow.array.Float64Array`. 

Included below is a simple example of using the `double()` method:

```matlab
>> arrowArray = arrow.array.Float64Array([1, 2, 3])            

arrowArray = 

[
  1,
  2,
  3
]

>> matlabArray = double(arrowArray)

matlabArray =

     1
     2
     3

>> class(arrowArray)

ans =

    'arrow.array.Float64Array'

>> class(matlabArray)

ans =

    'double'
```

### Future Directions

1. Support the rest of the numeric types.
2. Add an abstract MATLAB base class called `arrow.array.Array`.
3. Continue building out the methods (e.g. `length()`)
4.  Support `null` values (validity bitmap).
5. Handle converting non-ascii characters from `UTF-8` to `UTF-16`.
6. Handle errors in the C++ layer. 

* Closes: #35411

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
…TLAB interface (apache#34563)

### Rationale for this change

This pull request is a follow up to [this mailing list discussion](https://lists.apache.org/thread/2kgxbs54dw4wvcwrthzvb1ljqcvnrv7h) about integrating [`mathworks/libmexclass`](https://github.com/mathworks/libmexclass/) with the MATLAB Interface to Arrow code base.

We've spent the last few months working on building `libmexclass` from scratch in order to ease development of the MATLAB Interface to Arrow. `libmexclass` essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the [Proxy Design Pattern](https://en.wikipedia.org/wiki/Proxy_pattern).

Our hope is that using `libmexclass` will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects.

### What changes are included in this PR?

1. Modifications were made to the CMake build system for the MATLAB interface to use `libmexclass` under the hood. This includes the addition of a new build flag `-D MATLAB_ARROW_INTERFACE = ON | OFF` which toggles building the new code that uses `libmexclass` under the hood.
2. To illustrate the basic usage of `libmexclass`, we have added one new MATLAB class `arrow.array.Float64Array`. This class allows users to construct an Arrow array with logical type `Float64` from a MATLAB `double` array with zero data copies. Under the hood, a `Proxy` wraps and bounds the lifetime of the underlying Arrow C++ `Float64Array` object. In addition, this `Proxy` is responsible for delegating method calls on an `arrow.array.Float64Array` to the corresponding Arrow C++ `Float64Array`.

### Are these changes tested?

Yes, these changes have been tested on Linux, macOS, and Windows.

1. We've modified the MATLAB CI GitHub Actions workflow (`.github/workflows/matlab.yml`) to build the new  `arrow.array.Float64Array` code using `libmexclass`. This includes passing `-D MATLAB_ARROW_INTERFACE=ON` to the `cmake` command call in `ci/scripts/matlab_build.sh`.
2. We've added a new basic MATLAB test `test/arrow/array/tFloat64Array.m` which tests for successful construction of an `arrow.array.Float64Array`. This [test is passing successfully in the MATLAB CI workflow](https://github.com/mathworks/arrow/actions/runs/4419365852/jobs/7747694543#step:6:50).
3. We've confirmed that the [`Dev` CI workflow linting checks are all passing](https://github.com/mathworks/arrow/actions/runs/4419365845) and appropriate Apache license headers have been added.
4. We've manually tested creation, deletion, and assignment of multiple `arrow.array.Float64Array` instances on Linux, macOS, and Windows with a variety of different MATLAB `double` arrays.

### Are there any user-facing changes?

Yes, there is now a public class named `arrow.array.Float64Array` which is added to the MATLAB Path.

Included below is a simple example of creating two different `arrow.array.Float64Array` objects in MATLAB:

```matlab
>> A = arrow.array.Float64Array([1, 2, 3])            

A = 

[
  1,
  2,
  3
]

>> random = arrow.array.Float64Array(rand(1, 10, 100))

random = 

[
  0.6311887342690112,
  0.355073651878849,
  0.9970032716066477,
  0.22417149898312716,
  0.6524510729686149,
  0.6049906419082594,
  0.38724543148313495,
  0.14218715929050407,
  0.025134985710203117,
  0.4211122537652413,
  ...
  0.6228027906591304,
  0.7966246853083961,
  0.74587490154065,
  0.12553623135481973,
  0.8223940067590204,
  0.02515050142850217,
  0.41442888092403163,
  0.7314074679729372,
  0.7813740002759628,
  0.367285915131369
]

```

**Note**: This is an early stage PR, so the naming scheme `arrow.array.<Type>Array` might change in the future.

### Future Directions

1. Currently, the "old" `featherread`/`featherwrite` code is still being built by CMake and installed to the specified `CMAKE_INSTALL_PREFIX`. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g. `arrow.Table`, `arrow.Schema`, `arrow.RecordBatch`, etc.) we should consider re-implementing this functionality in terms of the new APIs.
2. We would like to start adding more numeric array classes like (`arrow.array.UInt8Array`, `arrow.array.Int64Array`, etc.).
3. We only added one very basic test for `arrow.array.Float64Array` in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc.
4. We don't have any documentation for `arrow.array.Float64Array` right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them.
5. Lots more! This is just the beginning of building out the MATLAB Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as we go.

### Notes

1. Creating `libmexclass` and integrating it with the Arrow code base was a team effort! Thank you to @ sreeharihegden, @ lafiona, @ sgilmore10, @ jhughes-mw, and others at @ MathWorks for their help with this pull request!
2. Closes: apache#33854

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Fiona La <fionala7@gmail.com>
Co-authored-by: shegden <shegden@mathworks.com>
Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
Co-authored-by: Fiona la <fionala7@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
…ic Arrays (apache#35479)

### Rationale for this change

This pull request is a followup to apache#34563. To facilitate the implementation of future array types, we would like to first create a C++ template class for numeric arrays. We also want to start adding basic tests for the array functionality in the MATLAB interface. 

### What changes are included in this PR?

1. Added a C++ template Class called `NumericArray` templated on `CType`.
2. Re-implemented the `Float64Array` C++ proxy class in terms of the new template class, i.e. `NumericArray<double>`.
3. Added a method called `double()` on the MATLAB Float64Array class to convert the arrow.[Type]Array to a MATLAB `double` array.
4. Added basic tests for round-tripping float64 arrays.
5. Created a base C++ proxy `Array` class that all proxy array classes will inherit from.
6. Renamed `Print()` to `ToString()` and made it return a string instead of printing to the screen.

### Are these changes tested?

Yes, we added automated test cases to the test class `tFloat64Array.m`. In addition, we manually qualified these changes on macOS.

### Are there any user-facing changes?
Yes, the `Print()` method is no longer public and there is now a method called `double()` on `arrow.array.Float64Array`. 

Included below is a simple example of using the `double()` method:

```matlab
>> arrowArray = arrow.array.Float64Array([1, 2, 3])            

arrowArray = 

[
  1,
  2,
  3
]

>> matlabArray = double(arrowArray)

matlabArray =

     1
     2
     3

>> class(arrowArray)

ans =

    'arrow.array.Float64Array'

>> class(matlabArray)

ans =

    'double'
```

### Future Directions

1. Support the rest of the numeric types.
2. Add an abstract MATLAB base class called `arrow.array.Array`.
3. Continue building out the methods (e.g. `length()`)
4.  Support `null` values (validity bitmap).
5. Handle converting non-ascii characters from `UTF-8` to `UTF-16`.
6. Handle errors in the C++ layer. 

* Closes: apache#35411

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…TLAB interface (apache#34563)

### Rationale for this change

This pull request is a follow up to [this mailing list discussion](https://lists.apache.org/thread/2kgxbs54dw4wvcwrthzvb1ljqcvnrv7h) about integrating [`mathworks/libmexclass`](https://github.com/mathworks/libmexclass/) with the MATLAB Interface to Arrow code base.

We've spent the last few months working on building `libmexclass` from scratch in order to ease development of the MATLAB Interface to Arrow. `libmexclass` essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the [Proxy Design Pattern](https://en.wikipedia.org/wiki/Proxy_pattern).

Our hope is that using `libmexclass` will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects.

### What changes are included in this PR?

1. Modifications were made to the CMake build system for the MATLAB interface to use `libmexclass` under the hood. This includes the addition of a new build flag `-D MATLAB_ARROW_INTERFACE = ON | OFF` which toggles building the new code that uses `libmexclass` under the hood.
2. To illustrate the basic usage of `libmexclass`, we have added one new MATLAB class `arrow.array.Float64Array`. This class allows users to construct an Arrow array with logical type `Float64` from a MATLAB `double` array with zero data copies. Under the hood, a `Proxy` wraps and bounds the lifetime of the underlying Arrow C++ `Float64Array` object. In addition, this `Proxy` is responsible for delegating method calls on an `arrow.array.Float64Array` to the corresponding Arrow C++ `Float64Array`.

### Are these changes tested?

Yes, these changes have been tested on Linux, macOS, and Windows.

1. We've modified the MATLAB CI GitHub Actions workflow (`.github/workflows/matlab.yml`) to build the new  `arrow.array.Float64Array` code using `libmexclass`. This includes passing `-D MATLAB_ARROW_INTERFACE=ON` to the `cmake` command call in `ci/scripts/matlab_build.sh`.
2. We've added a new basic MATLAB test `test/arrow/array/tFloat64Array.m` which tests for successful construction of an `arrow.array.Float64Array`. This [test is passing successfully in the MATLAB CI workflow](https://github.com/mathworks/arrow/actions/runs/4419365852/jobs/7747694543#step:6:50).
3. We've confirmed that the [`Dev` CI workflow linting checks are all passing](https://github.com/mathworks/arrow/actions/runs/4419365845) and appropriate Apache license headers have been added.
4. We've manually tested creation, deletion, and assignment of multiple `arrow.array.Float64Array` instances on Linux, macOS, and Windows with a variety of different MATLAB `double` arrays.

### Are there any user-facing changes?

Yes, there is now a public class named `arrow.array.Float64Array` which is added to the MATLAB Path.

Included below is a simple example of creating two different `arrow.array.Float64Array` objects in MATLAB:

```matlab
>> A = arrow.array.Float64Array([1, 2, 3])            

A = 

[
  1,
  2,
  3
]

>> random = arrow.array.Float64Array(rand(1, 10, 100))

random = 

[
  0.6311887342690112,
  0.355073651878849,
  0.9970032716066477,
  0.22417149898312716,
  0.6524510729686149,
  0.6049906419082594,
  0.38724543148313495,
  0.14218715929050407,
  0.025134985710203117,
  0.4211122537652413,
  ...
  0.6228027906591304,
  0.7966246853083961,
  0.74587490154065,
  0.12553623135481973,
  0.8223940067590204,
  0.02515050142850217,
  0.41442888092403163,
  0.7314074679729372,
  0.7813740002759628,
  0.367285915131369
]

```

**Note**: This is an early stage PR, so the naming scheme `arrow.array.<Type>Array` might change in the future.

### Future Directions

1. Currently, the "old" `featherread`/`featherwrite` code is still being built by CMake and installed to the specified `CMAKE_INSTALL_PREFIX`. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g. `arrow.Table`, `arrow.Schema`, `arrow.RecordBatch`, etc.) we should consider re-implementing this functionality in terms of the new APIs.
2. We would like to start adding more numeric array classes like (`arrow.array.UInt8Array`, `arrow.array.Int64Array`, etc.).
3. We only added one very basic test for `arrow.array.Float64Array` in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc.
4. We don't have any documentation for `arrow.array.Float64Array` right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them.
5. Lots more! This is just the beginning of building out the MATLAB Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as we go.

### Notes

1. Creating `libmexclass` and integrating it with the Arrow code base was a team effort! Thank you to @ sreeharihegden, @ lafiona, @ sgilmore10, @ jhughes-mw, and others at @ MathWorks for their help with this pull request!
2. Closes: apache#33854

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Fiona La <fionala7@gmail.com>
Co-authored-by: shegden <shegden@mathworks.com>
Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
Co-authored-by: Fiona la <fionala7@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…ic Arrays (apache#35479)

### Rationale for this change

This pull request is a followup to apache#34563. To facilitate the implementation of future array types, we would like to first create a C++ template class for numeric arrays. We also want to start adding basic tests for the array functionality in the MATLAB interface. 

### What changes are included in this PR?

1. Added a C++ template Class called `NumericArray` templated on `CType`.
2. Re-implemented the `Float64Array` C++ proxy class in terms of the new template class, i.e. `NumericArray<double>`.
3. Added a method called `double()` on the MATLAB Float64Array class to convert the arrow.[Type]Array to a MATLAB `double` array.
4. Added basic tests for round-tripping float64 arrays.
5. Created a base C++ proxy `Array` class that all proxy array classes will inherit from.
6. Renamed `Print()` to `ToString()` and made it return a string instead of printing to the screen.

### Are these changes tested?

Yes, we added automated test cases to the test class `tFloat64Array.m`. In addition, we manually qualified these changes on macOS.

### Are there any user-facing changes?
Yes, the `Print()` method is no longer public and there is now a method called `double()` on `arrow.array.Float64Array`. 

Included below is a simple example of using the `double()` method:

```matlab
>> arrowArray = arrow.array.Float64Array([1, 2, 3])            

arrowArray = 

[
  1,
  2,
  3
]

>> matlabArray = double(arrowArray)

matlabArray =

     1
     2
     3

>> class(arrowArray)

ans =

    'arrow.array.Float64Array'

>> class(matlabArray)

ans =

    'double'
```

### Future Directions

1. Support the rest of the numeric types.
2. Add an abstract MATLAB base class called `arrow.array.Array`.
3. Continue building out the methods (e.g. `length()`)
4.  Support `null` values (validity bitmap).
5. Handle converting non-ascii characters from `UTF-8` to `UTF-16`.
6. Handle errors in the C++ layer. 

* Closes: apache#35411

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this pull request May 16, 2023
…TLAB interface (apache#34563)

### Rationale for this change

This pull request is a follow up to [this mailing list discussion](https://lists.apache.org/thread/2kgxbs54dw4wvcwrthzvb1ljqcvnrv7h) about integrating [`mathworks/libmexclass`](https://github.com/mathworks/libmexclass/) with the MATLAB Interface to Arrow code base.

We've spent the last few months working on building `libmexclass` from scratch in order to ease development of the MATLAB Interface to Arrow. `libmexclass` essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the [Proxy Design Pattern](https://en.wikipedia.org/wiki/Proxy_pattern).

Our hope is that using `libmexclass` will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects.

### What changes are included in this PR?

1. Modifications were made to the CMake build system for the MATLAB interface to use `libmexclass` under the hood. This includes the addition of a new build flag `-D MATLAB_ARROW_INTERFACE = ON | OFF` which toggles building the new code that uses `libmexclass` under the hood.
2. To illustrate the basic usage of `libmexclass`, we have added one new MATLAB class `arrow.array.Float64Array`. This class allows users to construct an Arrow array with logical type `Float64` from a MATLAB `double` array with zero data copies. Under the hood, a `Proxy` wraps and bounds the lifetime of the underlying Arrow C++ `Float64Array` object. In addition, this `Proxy` is responsible for delegating method calls on an `arrow.array.Float64Array` to the corresponding Arrow C++ `Float64Array`.

### Are these changes tested?

Yes, these changes have been tested on Linux, macOS, and Windows.

1. We've modified the MATLAB CI GitHub Actions workflow (`.github/workflows/matlab.yml`) to build the new  `arrow.array.Float64Array` code using `libmexclass`. This includes passing `-D MATLAB_ARROW_INTERFACE=ON` to the `cmake` command call in `ci/scripts/matlab_build.sh`.
2. We've added a new basic MATLAB test `test/arrow/array/tFloat64Array.m` which tests for successful construction of an `arrow.array.Float64Array`. This [test is passing successfully in the MATLAB CI workflow](https://github.com/mathworks/arrow/actions/runs/4419365852/jobs/7747694543#step:6:50).
3. We've confirmed that the [`Dev` CI workflow linting checks are all passing](https://github.com/mathworks/arrow/actions/runs/4419365845) and appropriate Apache license headers have been added.
4. We've manually tested creation, deletion, and assignment of multiple `arrow.array.Float64Array` instances on Linux, macOS, and Windows with a variety of different MATLAB `double` arrays.

### Are there any user-facing changes?

Yes, there is now a public class named `arrow.array.Float64Array` which is added to the MATLAB Path.

Included below is a simple example of creating two different `arrow.array.Float64Array` objects in MATLAB:

```matlab
>> A = arrow.array.Float64Array([1, 2, 3])            

A = 

[
  1,
  2,
  3
]

>> random = arrow.array.Float64Array(rand(1, 10, 100))

random = 

[
  0.6311887342690112,
  0.355073651878849,
  0.9970032716066477,
  0.22417149898312716,
  0.6524510729686149,
  0.6049906419082594,
  0.38724543148313495,
  0.14218715929050407,
  0.025134985710203117,
  0.4211122537652413,
  ...
  0.6228027906591304,
  0.7966246853083961,
  0.74587490154065,
  0.12553623135481973,
  0.8223940067590204,
  0.02515050142850217,
  0.41442888092403163,
  0.7314074679729372,
  0.7813740002759628,
  0.367285915131369
]

```

**Note**: This is an early stage PR, so the naming scheme `arrow.array.<Type>Array` might change in the future.

### Future Directions

1. Currently, the "old" `featherread`/`featherwrite` code is still being built by CMake and installed to the specified `CMAKE_INSTALL_PREFIX`. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g. `arrow.Table`, `arrow.Schema`, `arrow.RecordBatch`, etc.) we should consider re-implementing this functionality in terms of the new APIs.
2. We would like to start adding more numeric array classes like (`arrow.array.UInt8Array`, `arrow.array.Int64Array`, etc.).
3. We only added one very basic test for `arrow.array.Float64Array` in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc.
4. We don't have any documentation for `arrow.array.Float64Array` right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them.
5. Lots more! This is just the beginning of building out the MATLAB Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as we go.

### Notes

1. Creating `libmexclass` and integrating it with the Arrow code base was a team effort! Thank you to @ sreeharihegden, @ lafiona, @ sgilmore10, @ jhughes-mw, and others at @ MathWorks for their help with this pull request!
2. Closes: apache#33854

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Fiona La <fionala7@gmail.com>
Co-authored-by: shegden <shegden@mathworks.com>
Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
Co-authored-by: Fiona la <fionala7@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this pull request May 16, 2023
…ic Arrays (apache#35479)

### Rationale for this change

This pull request is a followup to apache#34563. To facilitate the implementation of future array types, we would like to first create a C++ template class for numeric arrays. We also want to start adding basic tests for the array functionality in the MATLAB interface. 

### What changes are included in this PR?

1. Added a C++ template Class called `NumericArray` templated on `CType`.
2. Re-implemented the `Float64Array` C++ proxy class in terms of the new template class, i.e. `NumericArray<double>`.
3. Added a method called `double()` on the MATLAB Float64Array class to convert the arrow.[Type]Array to a MATLAB `double` array.
4. Added basic tests for round-tripping float64 arrays.
5. Created a base C++ proxy `Array` class that all proxy array classes will inherit from.
6. Renamed `Print()` to `ToString()` and made it return a string instead of printing to the screen.

### Are these changes tested?

Yes, we added automated test cases to the test class `tFloat64Array.m`. In addition, we manually qualified these changes on macOS.

### Are there any user-facing changes?
Yes, the `Print()` method is no longer public and there is now a method called `double()` on `arrow.array.Float64Array`. 

Included below is a simple example of using the `double()` method:

```matlab
>> arrowArray = arrow.array.Float64Array([1, 2, 3])            

arrowArray = 

[
  1,
  2,
  3
]

>> matlabArray = double(arrowArray)

matlabArray =

     1
     2
     3

>> class(arrowArray)

ans =

    'arrow.array.Float64Array'

>> class(matlabArray)

ans =

    'double'
```

### Future Directions

1. Support the rest of the numeric types.
2. Add an abstract MATLAB base class called `arrow.array.Array`.
3. Continue building out the methods (e.g. `length()`)
4.  Support `null` values (validity bitmap).
5. Handle converting non-ascii characters from `UTF-8` to `UTF-16`.
6. Handle errors in the C++ layer. 

* Closes: apache#35411

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kevingurney kevingurney deleted the GH-33854 branch August 21, 2023 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MATLAB] Add basic libmexclass integration code to MATLAB interface
6 participants