Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Add basic libmexclass integration code to MATLAB interface #33854

Closed
sreeharihegden opened this issue Jan 24, 2023 · 2 comments · Fixed by #34563
Closed

[MATLAB] Add basic libmexclass integration code to MATLAB interface #33854

sreeharihegden opened this issue Jan 24, 2023 · 2 comments · Fixed by #34563

Comments

@sreeharihegden
Copy link
Contributor

sreeharihegden commented Jan 24, 2023

Describe the enhancement requested

libmexclass is a MATLAB framework that enables users to implement the functionality of MATLAB classes in terms of equivalent C++ classes using MEX.

This Issue is an Enhancement Request @kevingurney, @lafiona, @sreeharihegden to integrate the libmexclass with the MATLAB interface for Apache Arrow.

Component(s)

MATLAB

@sreeharihegden sreeharihegden changed the title Add basic [libmexclass](https://github.com/mathworks/libmexclass/) integration code to MATLAB interface [MATLAB] Add basic libmexclass integration code to MATLAB interface Jan 24, 2023
@sreeharihegden
Copy link
Contributor Author

take

@kevingurney
Copy link
Member

take

kou pushed a commit that referenced this issue Apr 25, 2023
…nterface (#34563)

### Rationale for this change

This pull request is a follow up to [this mailing list discussion](https://lists.apache.org/thread/2kgxbs54dw4wvcwrthzvb1ljqcvnrv7h) about integrating [`mathworks/libmexclass`](https://github.com/mathworks/libmexclass/) with the MATLAB Interface to Arrow code base.

We've spent the last few months working on building `libmexclass` from scratch in order to ease development of the MATLAB Interface to Arrow. `libmexclass` essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the [Proxy Design Pattern](https://en.wikipedia.org/wiki/Proxy_pattern).

Our hope is that using `libmexclass` will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects.

### What changes are included in this PR?

1. Modifications were made to the CMake build system for the MATLAB interface to use `libmexclass` under the hood. This includes the addition of a new build flag `-D MATLAB_ARROW_INTERFACE = ON | OFF` which toggles building the new code that uses `libmexclass` under the hood.
2. To illustrate the basic usage of `libmexclass`, we have added one new MATLAB class `arrow.array.Float64Array`. This class allows users to construct an Arrow array with logical type `Float64` from a MATLAB `double` array with zero data copies. Under the hood, a `Proxy` wraps and bounds the lifetime of the underlying Arrow C++ `Float64Array` object. In addition, this `Proxy` is responsible for delegating method calls on an `arrow.array.Float64Array` to the corresponding Arrow C++ `Float64Array`.

### Are these changes tested?

Yes, these changes have been tested on Linux, macOS, and Windows.

1. We've modified the MATLAB CI GitHub Actions workflow (`.github/workflows/matlab.yml`) to build the new  `arrow.array.Float64Array` code using `libmexclass`. This includes passing `-D MATLAB_ARROW_INTERFACE=ON` to the `cmake` command call in `ci/scripts/matlab_build.sh`.
2. We've added a new basic MATLAB test `test/arrow/array/tFloat64Array.m` which tests for successful construction of an `arrow.array.Float64Array`. This [test is passing successfully in the MATLAB CI workflow](https://github.com/mathworks/arrow/actions/runs/4419365852/jobs/7747694543#step:6:50).
3. We've confirmed that the [`Dev` CI workflow linting checks are all passing](https://github.com/mathworks/arrow/actions/runs/4419365845) and appropriate Apache license headers have been added.
4. We've manually tested creation, deletion, and assignment of multiple `arrow.array.Float64Array` instances on Linux, macOS, and Windows with a variety of different MATLAB `double` arrays.

### Are there any user-facing changes?

Yes, there is now a public class named `arrow.array.Float64Array` which is added to the MATLAB Path.

Included below is a simple example of creating two different `arrow.array.Float64Array` objects in MATLAB:

```matlab
>> A = arrow.array.Float64Array([1, 2, 3])            

A = 

[
  1,
  2,
  3
]

>> random = arrow.array.Float64Array(rand(1, 10, 100))

random = 

[
  0.6311887342690112,
  0.355073651878849,
  0.9970032716066477,
  0.22417149898312716,
  0.6524510729686149,
  0.6049906419082594,
  0.38724543148313495,
  0.14218715929050407,
  0.025134985710203117,
  0.4211122537652413,
  ...
  0.6228027906591304,
  0.7966246853083961,
  0.74587490154065,
  0.12553623135481973,
  0.8223940067590204,
  0.02515050142850217,
  0.41442888092403163,
  0.7314074679729372,
  0.7813740002759628,
  0.367285915131369
]

```

**Note**: This is an early stage PR, so the naming scheme `arrow.array.<Type>Array` might change in the future.

### Future Directions

1. Currently, the "old" `featherread`/`featherwrite` code is still being built by CMake and installed to the specified `CMAKE_INSTALL_PREFIX`. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g. `arrow.Table`, `arrow.Schema`, `arrow.RecordBatch`, etc.) we should consider re-implementing this functionality in terms of the new APIs.
2. We would like to start adding more numeric array classes like (`arrow.array.UInt8Array`, `arrow.array.Int64Array`, etc.).
3. We only added one very basic test for `arrow.array.Float64Array` in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc.
4. We don't have any documentation for `arrow.array.Float64Array` right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them.
5. Lots more! This is just the beginning of building out the MATLAB Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as we go.

### Notes

1. Creating `libmexclass` and integrating it with the Arrow code base was a team effort! Thank you to @ sreeharihegden, @ lafiona, @ sgilmore10, @ jhughes-mw, and others at @ MathWorks for their help with this pull request!
2. Closes: #33854

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Fiona La <fionala7@gmail.com>
Co-authored-by: shegden <shegden@mathworks.com>
Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
Co-authored-by: Fiona la <fionala7@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou kou added this to the 13.0.0 milestone Apr 25, 2023
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this issue May 11, 2023
…TLAB interface (apache#34563)

### Rationale for this change

This pull request is a follow up to [this mailing list discussion](https://lists.apache.org/thread/2kgxbs54dw4wvcwrthzvb1ljqcvnrv7h) about integrating [`mathworks/libmexclass`](https://github.com/mathworks/libmexclass/) with the MATLAB Interface to Arrow code base.

We've spent the last few months working on building `libmexclass` from scratch in order to ease development of the MATLAB Interface to Arrow. `libmexclass` essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the [Proxy Design Pattern](https://en.wikipedia.org/wiki/Proxy_pattern).

Our hope is that using `libmexclass` will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects.

### What changes are included in this PR?

1. Modifications were made to the CMake build system for the MATLAB interface to use `libmexclass` under the hood. This includes the addition of a new build flag `-D MATLAB_ARROW_INTERFACE = ON | OFF` which toggles building the new code that uses `libmexclass` under the hood.
2. To illustrate the basic usage of `libmexclass`, we have added one new MATLAB class `arrow.array.Float64Array`. This class allows users to construct an Arrow array with logical type `Float64` from a MATLAB `double` array with zero data copies. Under the hood, a `Proxy` wraps and bounds the lifetime of the underlying Arrow C++ `Float64Array` object. In addition, this `Proxy` is responsible for delegating method calls on an `arrow.array.Float64Array` to the corresponding Arrow C++ `Float64Array`.

### Are these changes tested?

Yes, these changes have been tested on Linux, macOS, and Windows.

1. We've modified the MATLAB CI GitHub Actions workflow (`.github/workflows/matlab.yml`) to build the new  `arrow.array.Float64Array` code using `libmexclass`. This includes passing `-D MATLAB_ARROW_INTERFACE=ON` to the `cmake` command call in `ci/scripts/matlab_build.sh`.
2. We've added a new basic MATLAB test `test/arrow/array/tFloat64Array.m` which tests for successful construction of an `arrow.array.Float64Array`. This [test is passing successfully in the MATLAB CI workflow](https://github.com/mathworks/arrow/actions/runs/4419365852/jobs/7747694543#step:6:50).
3. We've confirmed that the [`Dev` CI workflow linting checks are all passing](https://github.com/mathworks/arrow/actions/runs/4419365845) and appropriate Apache license headers have been added.
4. We've manually tested creation, deletion, and assignment of multiple `arrow.array.Float64Array` instances on Linux, macOS, and Windows with a variety of different MATLAB `double` arrays.

### Are there any user-facing changes?

Yes, there is now a public class named `arrow.array.Float64Array` which is added to the MATLAB Path.

Included below is a simple example of creating two different `arrow.array.Float64Array` objects in MATLAB:

```matlab
>> A = arrow.array.Float64Array([1, 2, 3])            

A = 

[
  1,
  2,
  3
]

>> random = arrow.array.Float64Array(rand(1, 10, 100))

random = 

[
  0.6311887342690112,
  0.355073651878849,
  0.9970032716066477,
  0.22417149898312716,
  0.6524510729686149,
  0.6049906419082594,
  0.38724543148313495,
  0.14218715929050407,
  0.025134985710203117,
  0.4211122537652413,
  ...
  0.6228027906591304,
  0.7966246853083961,
  0.74587490154065,
  0.12553623135481973,
  0.8223940067590204,
  0.02515050142850217,
  0.41442888092403163,
  0.7314074679729372,
  0.7813740002759628,
  0.367285915131369
]

```

**Note**: This is an early stage PR, so the naming scheme `arrow.array.<Type>Array` might change in the future.

### Future Directions

1. Currently, the "old" `featherread`/`featherwrite` code is still being built by CMake and installed to the specified `CMAKE_INSTALL_PREFIX`. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g. `arrow.Table`, `arrow.Schema`, `arrow.RecordBatch`, etc.) we should consider re-implementing this functionality in terms of the new APIs.
2. We would like to start adding more numeric array classes like (`arrow.array.UInt8Array`, `arrow.array.Int64Array`, etc.).
3. We only added one very basic test for `arrow.array.Float64Array` in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc.
4. We don't have any documentation for `arrow.array.Float64Array` right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them.
5. Lots more! This is just the beginning of building out the MATLAB Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as we go.

### Notes

1. Creating `libmexclass` and integrating it with the Arrow code base was a team effort! Thank you to @ sreeharihegden, @ lafiona, @ sgilmore10, @ jhughes-mw, and others at @ MathWorks for their help with this pull request!
2. Closes: apache#33854

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Fiona La <fionala7@gmail.com>
Co-authored-by: shegden <shegden@mathworks.com>
Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
Co-authored-by: Fiona la <fionala7@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
…TLAB interface (apache#34563)

### Rationale for this change

This pull request is a follow up to [this mailing list discussion](https://lists.apache.org/thread/2kgxbs54dw4wvcwrthzvb1ljqcvnrv7h) about integrating [`mathworks/libmexclass`](https://github.com/mathworks/libmexclass/) with the MATLAB Interface to Arrow code base.

We've spent the last few months working on building `libmexclass` from scratch in order to ease development of the MATLAB Interface to Arrow. `libmexclass` essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the [Proxy Design Pattern](https://en.wikipedia.org/wiki/Proxy_pattern).

Our hope is that using `libmexclass` will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects.

### What changes are included in this PR?

1. Modifications were made to the CMake build system for the MATLAB interface to use `libmexclass` under the hood. This includes the addition of a new build flag `-D MATLAB_ARROW_INTERFACE = ON | OFF` which toggles building the new code that uses `libmexclass` under the hood.
2. To illustrate the basic usage of `libmexclass`, we have added one new MATLAB class `arrow.array.Float64Array`. This class allows users to construct an Arrow array with logical type `Float64` from a MATLAB `double` array with zero data copies. Under the hood, a `Proxy` wraps and bounds the lifetime of the underlying Arrow C++ `Float64Array` object. In addition, this `Proxy` is responsible for delegating method calls on an `arrow.array.Float64Array` to the corresponding Arrow C++ `Float64Array`.

### Are these changes tested?

Yes, these changes have been tested on Linux, macOS, and Windows.

1. We've modified the MATLAB CI GitHub Actions workflow (`.github/workflows/matlab.yml`) to build the new  `arrow.array.Float64Array` code using `libmexclass`. This includes passing `-D MATLAB_ARROW_INTERFACE=ON` to the `cmake` command call in `ci/scripts/matlab_build.sh`.
2. We've added a new basic MATLAB test `test/arrow/array/tFloat64Array.m` which tests for successful construction of an `arrow.array.Float64Array`. This [test is passing successfully in the MATLAB CI workflow](https://github.com/mathworks/arrow/actions/runs/4419365852/jobs/7747694543#step:6:50).
3. We've confirmed that the [`Dev` CI workflow linting checks are all passing](https://github.com/mathworks/arrow/actions/runs/4419365845) and appropriate Apache license headers have been added.
4. We've manually tested creation, deletion, and assignment of multiple `arrow.array.Float64Array` instances on Linux, macOS, and Windows with a variety of different MATLAB `double` arrays.

### Are there any user-facing changes?

Yes, there is now a public class named `arrow.array.Float64Array` which is added to the MATLAB Path.

Included below is a simple example of creating two different `arrow.array.Float64Array` objects in MATLAB:

```matlab
>> A = arrow.array.Float64Array([1, 2, 3])            

A = 

[
  1,
  2,
  3
]

>> random = arrow.array.Float64Array(rand(1, 10, 100))

random = 

[
  0.6311887342690112,
  0.355073651878849,
  0.9970032716066477,
  0.22417149898312716,
  0.6524510729686149,
  0.6049906419082594,
  0.38724543148313495,
  0.14218715929050407,
  0.025134985710203117,
  0.4211122537652413,
  ...
  0.6228027906591304,
  0.7966246853083961,
  0.74587490154065,
  0.12553623135481973,
  0.8223940067590204,
  0.02515050142850217,
  0.41442888092403163,
  0.7314074679729372,
  0.7813740002759628,
  0.367285915131369
]

```

**Note**: This is an early stage PR, so the naming scheme `arrow.array.<Type>Array` might change in the future.

### Future Directions

1. Currently, the "old" `featherread`/`featherwrite` code is still being built by CMake and installed to the specified `CMAKE_INSTALL_PREFIX`. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g. `arrow.Table`, `arrow.Schema`, `arrow.RecordBatch`, etc.) we should consider re-implementing this functionality in terms of the new APIs.
2. We would like to start adding more numeric array classes like (`arrow.array.UInt8Array`, `arrow.array.Int64Array`, etc.).
3. We only added one very basic test for `arrow.array.Float64Array` in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc.
4. We don't have any documentation for `arrow.array.Float64Array` right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them.
5. Lots more! This is just the beginning of building out the MATLAB Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as we go.

### Notes

1. Creating `libmexclass` and integrating it with the Arrow code base was a team effort! Thank you to @ sreeharihegden, @ lafiona, @ sgilmore10, @ jhughes-mw, and others at @ MathWorks for their help with this pull request!
2. Closes: apache#33854

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Fiona La <fionala7@gmail.com>
Co-authored-by: shegden <shegden@mathworks.com>
Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
Co-authored-by: Fiona la <fionala7@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
…TLAB interface (apache#34563)

### Rationale for this change

This pull request is a follow up to [this mailing list discussion](https://lists.apache.org/thread/2kgxbs54dw4wvcwrthzvb1ljqcvnrv7h) about integrating [`mathworks/libmexclass`](https://github.com/mathworks/libmexclass/) with the MATLAB Interface to Arrow code base.

We've spent the last few months working on building `libmexclass` from scratch in order to ease development of the MATLAB Interface to Arrow. `libmexclass` essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the [Proxy Design Pattern](https://en.wikipedia.org/wiki/Proxy_pattern).

Our hope is that using `libmexclass` will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects.

### What changes are included in this PR?

1. Modifications were made to the CMake build system for the MATLAB interface to use `libmexclass` under the hood. This includes the addition of a new build flag `-D MATLAB_ARROW_INTERFACE = ON | OFF` which toggles building the new code that uses `libmexclass` under the hood.
2. To illustrate the basic usage of `libmexclass`, we have added one new MATLAB class `arrow.array.Float64Array`. This class allows users to construct an Arrow array with logical type `Float64` from a MATLAB `double` array with zero data copies. Under the hood, a `Proxy` wraps and bounds the lifetime of the underlying Arrow C++ `Float64Array` object. In addition, this `Proxy` is responsible for delegating method calls on an `arrow.array.Float64Array` to the corresponding Arrow C++ `Float64Array`.

### Are these changes tested?

Yes, these changes have been tested on Linux, macOS, and Windows.

1. We've modified the MATLAB CI GitHub Actions workflow (`.github/workflows/matlab.yml`) to build the new  `arrow.array.Float64Array` code using `libmexclass`. This includes passing `-D MATLAB_ARROW_INTERFACE=ON` to the `cmake` command call in `ci/scripts/matlab_build.sh`.
2. We've added a new basic MATLAB test `test/arrow/array/tFloat64Array.m` which tests for successful construction of an `arrow.array.Float64Array`. This [test is passing successfully in the MATLAB CI workflow](https://github.com/mathworks/arrow/actions/runs/4419365852/jobs/7747694543#step:6:50).
3. We've confirmed that the [`Dev` CI workflow linting checks are all passing](https://github.com/mathworks/arrow/actions/runs/4419365845) and appropriate Apache license headers have been added.
4. We've manually tested creation, deletion, and assignment of multiple `arrow.array.Float64Array` instances on Linux, macOS, and Windows with a variety of different MATLAB `double` arrays.

### Are there any user-facing changes?

Yes, there is now a public class named `arrow.array.Float64Array` which is added to the MATLAB Path.

Included below is a simple example of creating two different `arrow.array.Float64Array` objects in MATLAB:

```matlab
>> A = arrow.array.Float64Array([1, 2, 3])            

A = 

[
  1,
  2,
  3
]

>> random = arrow.array.Float64Array(rand(1, 10, 100))

random = 

[
  0.6311887342690112,
  0.355073651878849,
  0.9970032716066477,
  0.22417149898312716,
  0.6524510729686149,
  0.6049906419082594,
  0.38724543148313495,
  0.14218715929050407,
  0.025134985710203117,
  0.4211122537652413,
  ...
  0.6228027906591304,
  0.7966246853083961,
  0.74587490154065,
  0.12553623135481973,
  0.8223940067590204,
  0.02515050142850217,
  0.41442888092403163,
  0.7314074679729372,
  0.7813740002759628,
  0.367285915131369
]

```

**Note**: This is an early stage PR, so the naming scheme `arrow.array.<Type>Array` might change in the future.

### Future Directions

1. Currently, the "old" `featherread`/`featherwrite` code is still being built by CMake and installed to the specified `CMAKE_INSTALL_PREFIX`. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g. `arrow.Table`, `arrow.Schema`, `arrow.RecordBatch`, etc.) we should consider re-implementing this functionality in terms of the new APIs.
2. We would like to start adding more numeric array classes like (`arrow.array.UInt8Array`, `arrow.array.Int64Array`, etc.).
3. We only added one very basic test for `arrow.array.Float64Array` in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc.
4. We don't have any documentation for `arrow.array.Float64Array` right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them.
5. Lots more! This is just the beginning of building out the MATLAB Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as we go.

### Notes

1. Creating `libmexclass` and integrating it with the Arrow code base was a team effort! Thank you to @ sreeharihegden, @ lafiona, @ sgilmore10, @ jhughes-mw, and others at @ MathWorks for their help with this pull request!
2. Closes: apache#33854

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Fiona La <fionala7@gmail.com>
Co-authored-by: shegden <shegden@mathworks.com>
Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
Co-authored-by: Fiona la <fionala7@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants