Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38659: [CI][MATLAB][Packaging] Add MATLAB packaging task to crossbow tasks.yml #38660

Merged
merged 80 commits into from
Mar 29, 2024

Conversation

kevingurney
Copy link
Member

@kevingurney kevingurney commented Nov 9, 2023

Rationale for this change

Per the following mailing list discussion:

https://lists.apache.org/thread/0xyow40h7b1bptsppb0rxd4g9r1xpmh6

to integrate the MATLAB interface code with the existing Arrow release tooling, we first need to add a task to the packaging group to crossbow. This packaging task will automatically create a MLTBX file (the MATLAB equivalent to a Python binary wheel or Ruby gem) that can be installed via a "one-click" workflow in MATLAB. This will enable MATLAB users to install the interface without needing to build from source.

Licensing

For more information about licensing of the MLTBX file contents, please refer to the mailing list discussion and ASF Legal ticket linked below:

  1. https://lists.apache.org/thread/zlpnncgvo6l4cvkxfxn7zt4q7qhptotw
  2. https://issues.apache.org/jira/browse/LEGAL-665

What changes are included in this PR?

  1. Added a matlab task to the packaging group in dev/tasks/tasks.yml.
  2. Added a new GitHub Actions workflow called dev/tasks/matlab/github.yml which builds the MATLAB interface code on all platforms (Windows, macOS, and Ubuntu 20.04) and packages the generated build artifacts into a single MLTBX file using matlab.addons.toolbox.packageToolbox.
  3. Changed the GitHub-hosted runner to ubuntu-20.04 from ubuntu-latest for the MATLAB CI check (i.e. .github/workflows/matlab.yml). The rationale for this change is that we primarily develop and qualify against Debian 11 locally, but the CI check has been building against ubuntu-latest (i.e. ubuntu-22.04). There are two issues with using ubuntu-22.04. The first is that the version of GLIBC shipped with ubuntu-22.04 is not fully compatible with the version of GLIBC shipped with Debian 11. This results in a runtime linker error when qualifying the packaged MATLAB interface code locally on Debian 11. The second issue with using ubuntu-22.04 is that the system version of GLIBCXX is not fully compatible with the version of GLIBCXX bundled with MATLAB R2023a (this is a relatively common issue - e.g. see: https://www.mathworks.com/matlabcentral/answers/1907290-how-to-manually-select-the-libstdc-library-to-use-to-resolve-a-version-glibcxx_-not-found). Previously, we worked around this issue in GitHub Actions by using LD_PRELOAD before starting up MATLAB to run the unit tests. On the other hand, the version of GLIBCXX shipped with ubuntu-20.04 is binary compatible with the version bundled with MATLAB R2023a. Therefore, we believe it would be better to use ubuntu-20.04 in the MATLAB CI checks for the time being until we can qualify the MATLAB interface against ubuntu-22.04.

Are these changes tested?

Yes.

  1. Successfully submitted a crossbow packaging job for the MATLAB interface by commenting @github-actions crossbow submit matlab. Example of a successful packaging job: https://github.com/ursacomputing/crossbow/actions/runs/6893506432/job/18753227453.
  2. Manually installed the resulting MLTBX file on macOS, Windows, Debian 11, and Ubuntu 20.04. Ran all tests under matlab/test using runtests . IncludeSubFolders 1.

Are there any user-facing changes?

No.

Notes

  1. While qualifying, we discovered that MATLAB's programmatic packaging interface does not properly include symbolic link files in the packaged MLTBX file. We've reported this bug to the relevant MathWorks development team. As a temporary workaround, we included a step to change the expected name of the Arrow C++ libraries (using patchelf/install_name_tool) which libarrowproxy.so/libarrowproxy.dylib depends on to libarrow.so.1500.0.0/libarrow.1500.0.0.dylib instead of libarrow.so.1500/libarrow.1500.dylib, respectively. Once this bug is resolved, we will remove this step from the workflow.

Future Directions

  1. Add tooling to upload release candidate (RC) MLTBX files to apache/arrow's GitHub Releases area and mark them as "Prerelease". In other words, modify https://github.com/apache/arrow/blob/main/dev/release/05-binary-upload.sh.
  2. Add a post-release script to upload release MLTBX files to apache/arrow's GitHub Releases area (similar to how https://github.com/apache/arrow/blob/main/dev/release/post-09-python.sh works).
  3. Enable nightly builds for the MATLAB interface.
  4. Document how to qualify a MATLAB Arrow interface release.
  5. Enable building and testing the MATLAB Arrow interface on multiple Ubuntu distributions simulatneously (e.g. 20.04 and 22.04).

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Nov 9, 2023
@kevingurney
Copy link
Member Author

@github-actions crossbow submit matlab

Copy link

github-actions bot commented Nov 9, 2023

Revision: 54f7715

Submitted crossbow builds: ursacomputing/crossbow @ actions-e85d018cce

Task Status
matlab Github Actions

@kevingurney
Copy link
Member Author

@github-actions crossbow submit matlab

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Mar 29, 2024
@kevingurney
Copy link
Member Author

@royfielding - thank you for taking the time to respond to this thread.

First off - I want to emphasize that we in no way intended to waste the time of ASF Legal, nor that of any other ASF community members. Our intent was only to ensure we are doing the right thing for the ASF. The Apache Arrow community and ASF Legal has been extremely patient and supportive throughout this process, and we sincerely appreciate everyone's support.

Perhaps, the follow up question about the "general case" that we asked to ASF Legal was in some way not clear enough. I would like to explicitly clarify that our intent was in no way to ask ASF Legal for clarification on the terms of the Apache V2 license itself. Nor were we in any way trying to question whether we can rely on the terms of the license.

We were specifically seeking clarification on whether the ASF was OK with the fact that the proprietary MathWorks bits that would be distributed via the ASF are not normally licensed under the Apache V2 license in the "general case". We asked this follow up question specifically because we wanted to address Kou's earlier comment to the best of our ability. We thought Kou's question was a very reasonable one, and we didn't want to merge in these changes prematurely.

That being said, we appreciate you providing assurance that we can move ahead with merging in these changes.

Thanks again for sharing your expertise.

Best,

Kevin

@kevingurney
Copy link
Member Author

+1

@kevingurney
Copy link
Member Author

kevingurney commented Mar 29, 2024

Before merging this in, @sgilmore10 and I were just reviewing the code changes one last time to make sure there aren't any lingering technical issues since this pull request has had a lot of comments on it since it was initially opened in November.

We noticed that libarrow.1500.0.0.dylib, libarrow.so.1500.0.0, and arrow.dll are for some reason being copied to fsroot/+arrow as well as fsroot/+libmexclass/+proxy/. They should only be copied once to fsroot/+libmexclass/+proxy/. This is likely a result of a bug in the CMakeLists.txt for the MATLAB bindings.

We are going to take a quick look at this to see if we can push a fix before merging this PR.

Our apologies for the slight delay.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 29, 2024
@kevingurney
Copy link
Member Author

@github-actions crossbow submit matlab

Copy link

Revision: d3026b6

Submitted crossbow builds: ursacomputing/crossbow @ actions-65852cf029

Task Status
matlab GitHub Actions

@kevingurney
Copy link
Member Author

@github-actions crossbow submit matlab

Copy link

Revision: 4c4511b

Submitted crossbow builds: ursacomputing/crossbow @ actions-7a8caf7ce2

Task Status
matlab GitHub Actions

@royfielding
Copy link

We were specifically seeking clarification on whether the ASF was OK with the fact that the proprietary MathWorks bits that would be distributed via the ASF are not normally licensed under the Apache V2 license in the "general case". We asked this follow up question specifically because we wanted to address Kou's earlier comment to the best of our ability. We thought Kou's question was a very reasonable one, and we didn't want to merge in these changes prematurely.

I know, but copyright law is not about bits. It is about copyrightable expressions and their control. A license doesn't change the bits -- it gives permission from the controller, to us and our downstream recipients, to do things that they have the right to control. The same bits can be covered by a hundred different licenses and it doesn't matter. We only need one permission.

@kevingurney
Copy link
Member Author

@royfielding - thank you again for clarifying.

Based on this discussion, I am going to go ahead and merge in these changes.

@kevingurney
Copy link
Member Author

Ok, the duplcated arrow C++ shared libraries issue is fixed now.

Here is the list all files that are included in the MLTBX file:

./fsroot/featherread.m
./fsroot/LICENSE.txt
./fsroot/+arrow/int32.m
./fsroot/+arrow/field.m
./fsroot/+arrow/schema.m
./fsroot/+arrow/+buffer/Buffer.m
./fsroot/+arrow/+util/createMetadataStruct.m
./fsroot/+arrow/+util/table2mlarrow.m
./fsroot/+arrow/+util/createVariableStruct.m
./fsroot/+arrow/+util/makeValidMATLABTableVariableNames.m
./fsroot/+arrow/int8.m
./fsroot/+arrow/string.m
./fsroot/+arrow/uint64.m
./fsroot/+arrow/uint32.m
./fsroot/+arrow/+type/Time64Type.m
./fsroot/+arrow/+type/StructType.m
./fsroot/+arrow/+type/Time32Type.m
./fsroot/+arrow/+type/BooleanType.m
./fsroot/+arrow/+type/Float64Type.m
./fsroot/+arrow/+type/UInt8Type.m
./fsroot/+arrow/+type/+traits/Int8Traits.m
./fsroot/+arrow/+type/+traits/BooleanTraits.m
./fsroot/+arrow/+type/+traits/ListTraits.m
./fsroot/+arrow/+type/+traits/Int32Traits.m
./fsroot/+arrow/+type/+traits/StringTraits.m
./fsroot/+arrow/+type/+traits/Float32Traits.m
./fsroot/+arrow/+type/+traits/UInt32Traits.m
./fsroot/+arrow/+type/+traits/Date64Traits.m
./fsroot/+arrow/+type/+traits/Float64Traits.m
./fsroot/+arrow/+type/+traits/UInt16Traits.m
./fsroot/+arrow/+type/+traits/StructTraits.m
./fsroot/+arrow/+type/+traits/Int64Traits.m
./fsroot/+arrow/+type/+traits/TypeTraits.m
./fsroot/+arrow/+type/+traits/UInt8Traits.m
./fsroot/+arrow/+type/+traits/Int16Traits.m
./fsroot/+arrow/+type/+traits/traits.m
./fsroot/+arrow/+type/+traits/TimestampTraits.m
./fsroot/+arrow/+type/+traits/Date32Traits.m
./fsroot/+arrow/+type/+traits/UInt64Traits.m
./fsroot/+arrow/+type/+traits/Time32Traits.m
./fsroot/+arrow/+type/+traits/Time64Traits.m
./fsroot/+arrow/+type/TemporalType.m
./fsroot/+arrow/+type/TimeType.m
./fsroot/+arrow/+type/NumericType.m
./fsroot/+arrow/+type/FixedWidthType.m
./fsroot/+arrow/+type/Int64Type.m
./fsroot/+arrow/+type/Date32Type.m
./fsroot/+arrow/+type/ID.m
./fsroot/+arrow/+type/Field.m
./fsroot/+arrow/+type/UInt32Type.m
./fsroot/+arrow/+type/Int16Type.m
./fsroot/+arrow/+type/Float32Type.m
./fsroot/+arrow/+type/StringType.m
./fsroot/+arrow/+type/TimeUnit.m
./fsroot/+arrow/+type/Int32Type.m
./fsroot/+arrow/+type/TimestampType.m
./fsroot/+arrow/+type/Int8Type.m
./fsroot/+arrow/+type/ListType.m
./fsroot/+arrow/+type/DateUnit.m
./fsroot/+arrow/+type/DateType.m
./fsroot/+arrow/+type/Type.m
./fsroot/+arrow/+type/Date64Type.m
./fsroot/+arrow/+type/UInt16Type.m
./fsroot/+arrow/+type/UInt64Type.m
./fsroot/+arrow/uint16.m
./fsroot/+arrow/uint8.m
./fsroot/+arrow/recordBatch.m
./fsroot/+arrow/date64.m
./fsroot/+arrow/table.m
./fsroot/+arrow/time64.m
./fsroot/+arrow/float64.m
./fsroot/+arrow/+internal/+test/+display/makeLinkString.m
./fsroot/+arrow/+internal/+test/+display/verify.m
./fsroot/+arrow/+internal/+test/+display/makeDimensionString.m
./fsroot/+arrow/+internal/+test/+io/+feather/roundtrip.m
./fsroot/+arrow/+internal/+test/+tabular/createTableWithSupportedTypes.m
./fsroot/+arrow/+internal/+test/+tabular/createAllSupportedArrayTypes.m
./fsroot/+arrow/+internal/+validate/+index/numericOrString.m
./fsroot/+arrow/+internal/+validate/+index/string.m
./fsroot/+arrow/+internal/+validate/+index/numeric.m
./fsroot/+arrow/+internal/+validate/shape.m
./fsroot/+arrow/+internal/+validate/realnumeric.m
./fsroot/+arrow/+internal/+validate/parseValid.m
./fsroot/+arrow/+internal/+validate/nonsparse.m
./fsroot/+arrow/+internal/+validate/parseValidElements.m
./fsroot/+arrow/+internal/+validate/+temporal/timeUnit.m
./fsroot/+arrow/+internal/+validate/type.m
./fsroot/+arrow/+internal/+validate/numeric.m
./fsroot/+arrow/+internal/+display/pluralizeStringIfNeeded.m
./fsroot/+arrow/+internal/+display/boldFontIfPossible.m
./fsroot/+arrow/+internal/+io/+feather/Reader.m
./fsroot/+arrow/+internal/+io/+feather/Writer.m
./fsroot/+arrow/+internal/+proxy/validate.m
./fsroot/+arrow/+internal/+proxy/create.m
./fsroot/+arrow/timestamp.m
./fsroot/+arrow/list.m
./fsroot/+arrow/boolean.m
./fsroot/+arrow/int16.m
./fsroot/+arrow/float32.m
./fsroot/+arrow/int64.m
./fsroot/+arrow/date32.m
./fsroot/+arrow/+io/+csv/TableReader.m
./fsroot/+arrow/+io/+csv/TableWriter.m
./fsroot/+arrow/time32.m
./fsroot/+arrow/struct.m
./fsroot/+arrow/+array/Int64Array.m
./fsroot/+arrow/+array/Date64Array.m
./fsroot/+arrow/+array/ValidationMode.m
./fsroot/+arrow/+array/StringArray.m
./fsroot/+arrow/+array/ChunkedArray.m
./fsroot/+arrow/+array/Date32Array.m
./fsroot/+arrow/+array/BooleanArray.m
./fsroot/+arrow/+array/Int32Array.m
./fsroot/+arrow/+array/UInt16Array.m
./fsroot/+arrow/+array/Array.m
./fsroot/+arrow/+array/UInt64Array.m
./fsroot/+arrow/+array/+internal/+list/createValidator.m
./fsroot/+arrow/+array/+internal/+list/TableValidator.m
./fsroot/+arrow/+array/+internal/+list/DatetimeValidator.m
./fsroot/+arrow/+array/+internal/+list/ClassTypeValidator.m
./fsroot/+arrow/+array/+internal/+list/findFirstNonMissingElement.m
./fsroot/+arrow/+array/+internal/+list/Validator.m
./fsroot/+arrow/+array/+internal/+display/getHeader.m
./fsroot/+arrow/+array/+internal/getArrayProxyIDs.m
./fsroot/+arrow/+array/+internal/+temporal/convertDatetimeToEpochTime.m
./fsroot/+arrow/+array/UInt8Array.m
./fsroot/+arrow/+array/UInt32Array.m
./fsroot/+arrow/+array/Int8Array.m
./fsroot/+arrow/+array/NumericArray.m
./fsroot/+arrow/+array/Time64Array.m
./fsroot/+arrow/+array/ListArray.m
./fsroot/+arrow/+array/Float32Array.m
./fsroot/+arrow/+array/Float64Array.m
./fsroot/+arrow/+array/TimestampArray.m
./fsroot/+arrow/+array/Time32Array.m
./fsroot/+arrow/+array/StructArray.m
./fsroot/+arrow/+array/Int16Array.m
./fsroot/+arrow/array.m
./fsroot/+arrow/+tabular/RecordBatch.m
./fsroot/+arrow/+tabular/+internal/decompose.m
./fsroot/+arrow/+tabular/+internal/makeValidVariableNames.m
./fsroot/+arrow/+tabular/+internal/validateColumnNames.m
./fsroot/+arrow/+tabular/+internal/isequal.m
./fsroot/+arrow/+tabular/+internal/makeValidDimensionNames.m
./fsroot/+arrow/+tabular/+internal/validateArrayLengths.m
./fsroot/+arrow/+tabular/+internal/+display/getSchemaString.m
./fsroot/+arrow/+tabular/+internal/+display/getTabularDisplay.m
./fsroot/+arrow/+tabular/+internal/+display/getTabularHeader.m
./fsroot/+arrow/+tabular/Schema.m
./fsroot/+arrow/+tabular/Table.m
./fsroot/featherwrite.m
./fsroot/NOTICE.txt
./fsroot/+libmexclass/+proxy/gateway.mexmaci64
./fsroot/+libmexclass/+proxy/arrowproxy.lib
./fsroot/+libmexclass/+proxy/gateway.lib
./fsroot/+libmexclass/+proxy/libmexclass.dylib
./fsroot/+libmexclass/+proxy/Proxy.m
./fsroot/+libmexclass/+proxy/libarrow.so.1500.0.0
./fsroot/+libmexclass/+proxy/arrow.dll
./fsroot/+libmexclass/+proxy/libarrowproxy.so
./fsroot/+libmexclass/+proxy/libarrowproxy.dylib
./fsroot/+libmexclass/+proxy/libmexclass.so
./fsroot/+libmexclass/+proxy/Identifier.m
./fsroot/+libmexclass/+proxy/arrowproxy.dll
./fsroot/+libmexclass/+proxy/mexclass.lib
./fsroot/+libmexclass/+proxy/gateway.mexa64
./fsroot/+libmexclass/+proxy/libarrow.1500.0.0.dylib
./fsroot/+libmexclass/+proxy/mexclass.dll
./fsroot/+libmexclass/+proxy/gateway.mexw64
./[Content_Types].xml
./_xmlsignatures/sig1.xml
./_xmlsignatures/origin.sigs
./_xmlsignatures/_rels/origin.sigs.rels
./metadata/configuration.xml
./metadata/systemRequirements.xml
./metadata/mwcoreProperties.xml
./metadata/filesystemManifest.xml
./metadata/mwcorePropertiesExtension.xml
./metadata/mwcorePropertiesReleaseInfo.xml
./metadata/addonProperties.xml
./metadata/coreProperties.xml
./_rels/.rels

Note that the arrow shared libraries are only included in fsroot/+libmexclass/+proxy/.

We should probably prevent the .lib files under ./fsroot/+libmexclass/+proxy/ from being included in the MLTBX file, as well, since those shouldn't be required as runtime dependencies. We suspect that the fact that these are being included may be due to a bug in mathworks/libmexclass somewhere in the implementation of the install_imported_target function defined here: https://github.com/mathworks/libmexclass/blob/main/libmexclass/cpp/CMakeLists.txt#L18.

We can follow up with another PR once we have identified the source of this issue.

@kevingurney
Copy link
Member Author

I've created #40903 to capture the issue with .lib files being included in the MLTBX file.

@kevingurney
Copy link
Member Author

+1

@kevingurney kevingurney merged commit ce11e56 into apache:main Mar 29, 2024
10 checks passed
@kevingurney kevingurney deleted the GH-38659 branch March 29, 2024 20:57
@kevingurney kevingurney removed the awaiting change review Awaiting change review label Mar 29, 2024
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit ce11e56.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

tolleybot pushed a commit to tmct/arrow that referenced this pull request May 2, 2024
…o crossbow `tasks.yml` (apache#38660)

### Rationale for this change

Per the following mailing list discussion:

https://lists.apache.org/thread/0xyow40h7b1bptsppb0rxd4g9r1xpmh6

to integrate the MATLAB interface code with the existing Arrow release tooling, we first need to add a task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) to crossbow. This packaging task will automatically create a [MLTBX file](https://www.mathworks.com/help/matlab/creating-help.html?s_tid=CRUX_lftnav) (the MATLAB equivalent to a Python binary wheel or Ruby gem) that can be installed via a "one-click" workflow in MATLAB. This will enable MATLAB users to install the interface without needing to build from source.

### Licensing

For more information about licensing of the MLTBX file contents, please refer to the mailing list discussion and ASF Legal ticket linked below:

1. https://lists.apache.org/thread/zlpnncgvo6l4cvkxfxn7zt4q7qhptotw
2. https://issues.apache.org/jira/browse/LEGAL-665

### What changes are included in this PR?

1. Added a `matlab` task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) in `dev/tasks/tasks.yml`.
4. Added a new GitHub Actions workflow called  `dev/tasks/matlab/github.yml` which builds the MATLAB interface code on all platforms (Windows, macOS, and Ubuntu 20.04) and packages the generated build artifacts into a single MLTBX file using [`matlab.addons.toolbox.packageToolbox`](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html).
5. Changed the GitHub-hosted runner to `ubuntu-20.04` from `ubuntu-latest` for the MATLAB CI check (i.e. `.github/workflows/matlab.yml`). The rationale for this change is that we primarily develop and qualify against Debian 11 locally, but the CI check has been building against `ubuntu-latest` (i.e. `ubuntu-22.04`). There are two issues with using `ubuntu-22.04`. The first is that the version of `GLIBC` shipped with `ubuntu-22.04` is not fully compatible with the version of `GLIBC` shipped with `Debian 11`. This results in a runtime linker error when qualifying the packaged MATLAB interface code locally on Debian 11. The second issue with using `ubuntu-22.04` is that the system version of `GLIBCXX` is not fully compatible with the version of `GLIBCXX` bundled with MATLAB R2023a (this is a relatively common issue - e.g. see: https://www.mathworks.com/matlabcentral/answers/1907290-how-to-manually-select-the-libstdc-library-to-use-to-resolve-a-version-glibcxx_-not-found). Previously, we worked around this issue in GitHub Actions by using `LD_PRELOAD` before starting up MATLAB to run the unit tests. On the other hand, the version of `GLIBCXX` shipped with `ubuntu-20.04` **is** binary compatible with the version bundled with MATLAB R2023a. Therefore, we believe it would be better to use `ubuntu-20.04` in the MATLAB CI checks for the time being until we can qualify the MATLAB interface against `ubuntu-22.04`.

### Are these changes tested?

Yes.

1. Successfully submitted a crossbow `packaging` job for the MATLAB interface by commenting `@ github-actions crossbow submit matlab`. Example of a successful packaging job: https://github.com/ursacomputing/crossbow/actions/runs/6893506432/job/18753227453.
2. Manually installed the resulting MLTBX file on macOS, Windows, Debian 11, and Ubuntu 20.04. Ran all tests under `matlab/test` using `runtests . IncludeSubFolders 1`.

### Are there any user-facing changes?

No.

### Notes
 
1. While qualifying, we discovered that [MATLAB's programmatic packaging interface](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html) does not properly include symbolic link files in the packaged MLTBX file. We've reported this bug to the relevant MathWorks development team. As a temporary workaround, we included a step to change the expected name of the Arrow C++ libraries (using `patchelf`/`install_name_tool`) which `libarrowproxy.so`/`libarrowproxy.dylib` depends on to `libarrow.so.1500.0.0`/`libarrow.1500.0.0.dylib` instead of `libarrow.so.1500`/`libarrow.1500.dylib`, respectively. Once this bug is resolved, we will remove this step from the workflow.

### Future Directions
 
1. Add tooling to upload release candidate (RC) MLTBX files to apache/arrow's GitHub Releases area and mark them as "Prerelease". In other words, modify https://github.com/apache/arrow/blob/main/dev/release/05-binary-upload.sh.
2. Add a post-release script to upload release MLTBX files to apache/arrow's GitHub Releases area (similar to how https://github.com/apache/arrow/blob/main/dev/release/post-09-python.sh works).
4. Enable nightly builds for the MATLAB interface.
6. Document how to qualify a MATLAB Arrow interface release.
7. Enable building and testing the MATLAB Arrow interface on multiple Ubuntu distributions simulatneously (e.g. 20.04 *and* 22.04).

* Closes: apache#38659 
* GitHub Issue: apache#38659

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
tolleybot pushed a commit to tmct/arrow that referenced this pull request May 4, 2024
…o crossbow `tasks.yml` (apache#38660)

### Rationale for this change

Per the following mailing list discussion:

https://lists.apache.org/thread/0xyow40h7b1bptsppb0rxd4g9r1xpmh6

to integrate the MATLAB interface code with the existing Arrow release tooling, we first need to add a task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) to crossbow. This packaging task will automatically create a [MLTBX file](https://www.mathworks.com/help/matlab/creating-help.html?s_tid=CRUX_lftnav) (the MATLAB equivalent to a Python binary wheel or Ruby gem) that can be installed via a "one-click" workflow in MATLAB. This will enable MATLAB users to install the interface without needing to build from source.

### Licensing

For more information about licensing of the MLTBX file contents, please refer to the mailing list discussion and ASF Legal ticket linked below:

1. https://lists.apache.org/thread/zlpnncgvo6l4cvkxfxn7zt4q7qhptotw
2. https://issues.apache.org/jira/browse/LEGAL-665

### What changes are included in this PR?

1. Added a `matlab` task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) in `dev/tasks/tasks.yml`.
4. Added a new GitHub Actions workflow called  `dev/tasks/matlab/github.yml` which builds the MATLAB interface code on all platforms (Windows, macOS, and Ubuntu 20.04) and packages the generated build artifacts into a single MLTBX file using [`matlab.addons.toolbox.packageToolbox`](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html).
5. Changed the GitHub-hosted runner to `ubuntu-20.04` from `ubuntu-latest` for the MATLAB CI check (i.e. `.github/workflows/matlab.yml`). The rationale for this change is that we primarily develop and qualify against Debian 11 locally, but the CI check has been building against `ubuntu-latest` (i.e. `ubuntu-22.04`). There are two issues with using `ubuntu-22.04`. The first is that the version of `GLIBC` shipped with `ubuntu-22.04` is not fully compatible with the version of `GLIBC` shipped with `Debian 11`. This results in a runtime linker error when qualifying the packaged MATLAB interface code locally on Debian 11. The second issue with using `ubuntu-22.04` is that the system version of `GLIBCXX` is not fully compatible with the version of `GLIBCXX` bundled with MATLAB R2023a (this is a relatively common issue - e.g. see: https://www.mathworks.com/matlabcentral/answers/1907290-how-to-manually-select-the-libstdc-library-to-use-to-resolve-a-version-glibcxx_-not-found). Previously, we worked around this issue in GitHub Actions by using `LD_PRELOAD` before starting up MATLAB to run the unit tests. On the other hand, the version of `GLIBCXX` shipped with `ubuntu-20.04` **is** binary compatible with the version bundled with MATLAB R2023a. Therefore, we believe it would be better to use `ubuntu-20.04` in the MATLAB CI checks for the time being until we can qualify the MATLAB interface against `ubuntu-22.04`.

### Are these changes tested?

Yes.

1. Successfully submitted a crossbow `packaging` job for the MATLAB interface by commenting `@ github-actions crossbow submit matlab`. Example of a successful packaging job: https://github.com/ursacomputing/crossbow/actions/runs/6893506432/job/18753227453.
2. Manually installed the resulting MLTBX file on macOS, Windows, Debian 11, and Ubuntu 20.04. Ran all tests under `matlab/test` using `runtests . IncludeSubFolders 1`.

### Are there any user-facing changes?

No.

### Notes
 
1. While qualifying, we discovered that [MATLAB's programmatic packaging interface](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html) does not properly include symbolic link files in the packaged MLTBX file. We've reported this bug to the relevant MathWorks development team. As a temporary workaround, we included a step to change the expected name of the Arrow C++ libraries (using `patchelf`/`install_name_tool`) which `libarrowproxy.so`/`libarrowproxy.dylib` depends on to `libarrow.so.1500.0.0`/`libarrow.1500.0.0.dylib` instead of `libarrow.so.1500`/`libarrow.1500.dylib`, respectively. Once this bug is resolved, we will remove this step from the workflow.

### Future Directions
 
1. Add tooling to upload release candidate (RC) MLTBX files to apache/arrow's GitHub Releases area and mark them as "Prerelease". In other words, modify https://github.com/apache/arrow/blob/main/dev/release/05-binary-upload.sh.
2. Add a post-release script to upload release MLTBX files to apache/arrow's GitHub Releases area (similar to how https://github.com/apache/arrow/blob/main/dev/release/post-09-python.sh works).
4. Enable nightly builds for the MATLAB interface.
6. Document how to qualify a MATLAB Arrow interface release.
7. Enable building and testing the MATLAB Arrow interface on multiple Ubuntu distributions simulatneously (e.g. 20.04 *and* 22.04).

* Closes: apache#38659 
* GitHub Issue: apache#38659

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
rok pushed a commit to tmct/arrow that referenced this pull request May 8, 2024
…o crossbow `tasks.yml` (apache#38660)

### Rationale for this change

Per the following mailing list discussion:

https://lists.apache.org/thread/0xyow40h7b1bptsppb0rxd4g9r1xpmh6

to integrate the MATLAB interface code with the existing Arrow release tooling, we first need to add a task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) to crossbow. This packaging task will automatically create a [MLTBX file](https://www.mathworks.com/help/matlab/creating-help.html?s_tid=CRUX_lftnav) (the MATLAB equivalent to a Python binary wheel or Ruby gem) that can be installed via a "one-click" workflow in MATLAB. This will enable MATLAB users to install the interface without needing to build from source.

### Licensing

For more information about licensing of the MLTBX file contents, please refer to the mailing list discussion and ASF Legal ticket linked below:

1. https://lists.apache.org/thread/zlpnncgvo6l4cvkxfxn7zt4q7qhptotw
2. https://issues.apache.org/jira/browse/LEGAL-665

### What changes are included in this PR?

1. Added a `matlab` task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) in `dev/tasks/tasks.yml`.
4. Added a new GitHub Actions workflow called  `dev/tasks/matlab/github.yml` which builds the MATLAB interface code on all platforms (Windows, macOS, and Ubuntu 20.04) and packages the generated build artifacts into a single MLTBX file using [`matlab.addons.toolbox.packageToolbox`](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html).
5. Changed the GitHub-hosted runner to `ubuntu-20.04` from `ubuntu-latest` for the MATLAB CI check (i.e. `.github/workflows/matlab.yml`). The rationale for this change is that we primarily develop and qualify against Debian 11 locally, but the CI check has been building against `ubuntu-latest` (i.e. `ubuntu-22.04`). There are two issues with using `ubuntu-22.04`. The first is that the version of `GLIBC` shipped with `ubuntu-22.04` is not fully compatible with the version of `GLIBC` shipped with `Debian 11`. This results in a runtime linker error when qualifying the packaged MATLAB interface code locally on Debian 11. The second issue with using `ubuntu-22.04` is that the system version of `GLIBCXX` is not fully compatible with the version of `GLIBCXX` bundled with MATLAB R2023a (this is a relatively common issue - e.g. see: https://www.mathworks.com/matlabcentral/answers/1907290-how-to-manually-select-the-libstdc-library-to-use-to-resolve-a-version-glibcxx_-not-found). Previously, we worked around this issue in GitHub Actions by using `LD_PRELOAD` before starting up MATLAB to run the unit tests. On the other hand, the version of `GLIBCXX` shipped with `ubuntu-20.04` **is** binary compatible with the version bundled with MATLAB R2023a. Therefore, we believe it would be better to use `ubuntu-20.04` in the MATLAB CI checks for the time being until we can qualify the MATLAB interface against `ubuntu-22.04`.

### Are these changes tested?

Yes.

1. Successfully submitted a crossbow `packaging` job for the MATLAB interface by commenting `@ github-actions crossbow submit matlab`. Example of a successful packaging job: https://github.com/ursacomputing/crossbow/actions/runs/6893506432/job/18753227453.
2. Manually installed the resulting MLTBX file on macOS, Windows, Debian 11, and Ubuntu 20.04. Ran all tests under `matlab/test` using `runtests . IncludeSubFolders 1`.

### Are there any user-facing changes?

No.

### Notes
 
1. While qualifying, we discovered that [MATLAB's programmatic packaging interface](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html) does not properly include symbolic link files in the packaged MLTBX file. We've reported this bug to the relevant MathWorks development team. As a temporary workaround, we included a step to change the expected name of the Arrow C++ libraries (using `patchelf`/`install_name_tool`) which `libarrowproxy.so`/`libarrowproxy.dylib` depends on to `libarrow.so.1500.0.0`/`libarrow.1500.0.0.dylib` instead of `libarrow.so.1500`/`libarrow.1500.dylib`, respectively. Once this bug is resolved, we will remove this step from the workflow.

### Future Directions
 
1. Add tooling to upload release candidate (RC) MLTBX files to apache/arrow's GitHub Releases area and mark them as "Prerelease". In other words, modify https://github.com/apache/arrow/blob/main/dev/release/05-binary-upload.sh.
2. Add a post-release script to upload release MLTBX files to apache/arrow's GitHub Releases area (similar to how https://github.com/apache/arrow/blob/main/dev/release/post-09-python.sh works).
4. Enable nightly builds for the MATLAB interface.
6. Document how to qualify a MATLAB Arrow interface release.
7. Enable building and testing the MATLAB Arrow interface on multiple Ubuntu distributions simulatneously (e.g. 20.04 *and* 22.04).

* Closes: apache#38659 
* GitHub Issue: apache#38659

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
rok pushed a commit to tmct/arrow that referenced this pull request May 8, 2024
…o crossbow `tasks.yml` (apache#38660)

### Rationale for this change

Per the following mailing list discussion:

https://lists.apache.org/thread/0xyow40h7b1bptsppb0rxd4g9r1xpmh6

to integrate the MATLAB interface code with the existing Arrow release tooling, we first need to add a task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) to crossbow. This packaging task will automatically create a [MLTBX file](https://www.mathworks.com/help/matlab/creating-help.html?s_tid=CRUX_lftnav) (the MATLAB equivalent to a Python binary wheel or Ruby gem) that can be installed via a "one-click" workflow in MATLAB. This will enable MATLAB users to install the interface without needing to build from source.

### Licensing

For more information about licensing of the MLTBX file contents, please refer to the mailing list discussion and ASF Legal ticket linked below:

1. https://lists.apache.org/thread/zlpnncgvo6l4cvkxfxn7zt4q7qhptotw
2. https://issues.apache.org/jira/browse/LEGAL-665

### What changes are included in this PR?

1. Added a `matlab` task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) in `dev/tasks/tasks.yml`.
4. Added a new GitHub Actions workflow called  `dev/tasks/matlab/github.yml` which builds the MATLAB interface code on all platforms (Windows, macOS, and Ubuntu 20.04) and packages the generated build artifacts into a single MLTBX file using [`matlab.addons.toolbox.packageToolbox`](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html).
5. Changed the GitHub-hosted runner to `ubuntu-20.04` from `ubuntu-latest` for the MATLAB CI check (i.e. `.github/workflows/matlab.yml`). The rationale for this change is that we primarily develop and qualify against Debian 11 locally, but the CI check has been building against `ubuntu-latest` (i.e. `ubuntu-22.04`). There are two issues with using `ubuntu-22.04`. The first is that the version of `GLIBC` shipped with `ubuntu-22.04` is not fully compatible with the version of `GLIBC` shipped with `Debian 11`. This results in a runtime linker error when qualifying the packaged MATLAB interface code locally on Debian 11. The second issue with using `ubuntu-22.04` is that the system version of `GLIBCXX` is not fully compatible with the version of `GLIBCXX` bundled with MATLAB R2023a (this is a relatively common issue - e.g. see: https://www.mathworks.com/matlabcentral/answers/1907290-how-to-manually-select-the-libstdc-library-to-use-to-resolve-a-version-glibcxx_-not-found). Previously, we worked around this issue in GitHub Actions by using `LD_PRELOAD` before starting up MATLAB to run the unit tests. On the other hand, the version of `GLIBCXX` shipped with `ubuntu-20.04` **is** binary compatible with the version bundled with MATLAB R2023a. Therefore, we believe it would be better to use `ubuntu-20.04` in the MATLAB CI checks for the time being until we can qualify the MATLAB interface against `ubuntu-22.04`.

### Are these changes tested?

Yes.

1. Successfully submitted a crossbow `packaging` job for the MATLAB interface by commenting `@ github-actions crossbow submit matlab`. Example of a successful packaging job: https://github.com/ursacomputing/crossbow/actions/runs/6893506432/job/18753227453.
2. Manually installed the resulting MLTBX file on macOS, Windows, Debian 11, and Ubuntu 20.04. Ran all tests under `matlab/test` using `runtests . IncludeSubFolders 1`.

### Are there any user-facing changes?

No.

### Notes
 
1. While qualifying, we discovered that [MATLAB's programmatic packaging interface](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html) does not properly include symbolic link files in the packaged MLTBX file. We've reported this bug to the relevant MathWorks development team. As a temporary workaround, we included a step to change the expected name of the Arrow C++ libraries (using `patchelf`/`install_name_tool`) which `libarrowproxy.so`/`libarrowproxy.dylib` depends on to `libarrow.so.1500.0.0`/`libarrow.1500.0.0.dylib` instead of `libarrow.so.1500`/`libarrow.1500.dylib`, respectively. Once this bug is resolved, we will remove this step from the workflow.

### Future Directions
 
1. Add tooling to upload release candidate (RC) MLTBX files to apache/arrow's GitHub Releases area and mark them as "Prerelease". In other words, modify https://github.com/apache/arrow/blob/main/dev/release/05-binary-upload.sh.
2. Add a post-release script to upload release MLTBX files to apache/arrow's GitHub Releases area (similar to how https://github.com/apache/arrow/blob/main/dev/release/post-09-python.sh works).
4. Enable nightly builds for the MATLAB interface.
6. Document how to qualify a MATLAB Arrow interface release.
7. Enable building and testing the MATLAB Arrow interface on multiple Ubuntu distributions simulatneously (e.g. 20.04 *and* 22.04).

* Closes: apache#38659 
* GitHub Issue: apache#38659

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…o crossbow `tasks.yml` (apache#38660)

### Rationale for this change

Per the following mailing list discussion:

https://lists.apache.org/thread/0xyow40h7b1bptsppb0rxd4g9r1xpmh6

to integrate the MATLAB interface code with the existing Arrow release tooling, we first need to add a task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) to crossbow. This packaging task will automatically create a [MLTBX file](https://www.mathworks.com/help/matlab/creating-help.html?s_tid=CRUX_lftnav) (the MATLAB equivalent to a Python binary wheel or Ruby gem) that can be installed via a "one-click" workflow in MATLAB. This will enable MATLAB users to install the interface without needing to build from source.

### Licensing

For more information about licensing of the MLTBX file contents, please refer to the mailing list discussion and ASF Legal ticket linked below:

1. https://lists.apache.org/thread/zlpnncgvo6l4cvkxfxn7zt4q7qhptotw
2. https://issues.apache.org/jira/browse/LEGAL-665

### What changes are included in this PR?

1. Added a `matlab` task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) in `dev/tasks/tasks.yml`.
4. Added a new GitHub Actions workflow called  `dev/tasks/matlab/github.yml` which builds the MATLAB interface code on all platforms (Windows, macOS, and Ubuntu 20.04) and packages the generated build artifacts into a single MLTBX file using [`matlab.addons.toolbox.packageToolbox`](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html).
5. Changed the GitHub-hosted runner to `ubuntu-20.04` from `ubuntu-latest` for the MATLAB CI check (i.e. `.github/workflows/matlab.yml`). The rationale for this change is that we primarily develop and qualify against Debian 11 locally, but the CI check has been building against `ubuntu-latest` (i.e. `ubuntu-22.04`). There are two issues with using `ubuntu-22.04`. The first is that the version of `GLIBC` shipped with `ubuntu-22.04` is not fully compatible with the version of `GLIBC` shipped with `Debian 11`. This results in a runtime linker error when qualifying the packaged MATLAB interface code locally on Debian 11. The second issue with using `ubuntu-22.04` is that the system version of `GLIBCXX` is not fully compatible with the version of `GLIBCXX` bundled with MATLAB R2023a (this is a relatively common issue - e.g. see: https://www.mathworks.com/matlabcentral/answers/1907290-how-to-manually-select-the-libstdc-library-to-use-to-resolve-a-version-glibcxx_-not-found). Previously, we worked around this issue in GitHub Actions by using `LD_PRELOAD` before starting up MATLAB to run the unit tests. On the other hand, the version of `GLIBCXX` shipped with `ubuntu-20.04` **is** binary compatible with the version bundled with MATLAB R2023a. Therefore, we believe it would be better to use `ubuntu-20.04` in the MATLAB CI checks for the time being until we can qualify the MATLAB interface against `ubuntu-22.04`.

### Are these changes tested?

Yes.

1. Successfully submitted a crossbow `packaging` job for the MATLAB interface by commenting `@ github-actions crossbow submit matlab`. Example of a successful packaging job: https://github.com/ursacomputing/crossbow/actions/runs/6893506432/job/18753227453.
2. Manually installed the resulting MLTBX file on macOS, Windows, Debian 11, and Ubuntu 20.04. Ran all tests under `matlab/test` using `runtests . IncludeSubFolders 1`.

### Are there any user-facing changes?

No.

### Notes
 
1. While qualifying, we discovered that [MATLAB's programmatic packaging interface](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html) does not properly include symbolic link files in the packaged MLTBX file. We've reported this bug to the relevant MathWorks development team. As a temporary workaround, we included a step to change the expected name of the Arrow C++ libraries (using `patchelf`/`install_name_tool`) which `libarrowproxy.so`/`libarrowproxy.dylib` depends on to `libarrow.so.1500.0.0`/`libarrow.1500.0.0.dylib` instead of `libarrow.so.1500`/`libarrow.1500.dylib`, respectively. Once this bug is resolved, we will remove this step from the workflow.

### Future Directions
 
1. Add tooling to upload release candidate (RC) MLTBX files to apache/arrow's GitHub Releases area and mark them as "Prerelease". In other words, modify https://github.com/apache/arrow/blob/main/dev/release/05-binary-upload.sh.
2. Add a post-release script to upload release MLTBX files to apache/arrow's GitHub Releases area (similar to how https://github.com/apache/arrow/blob/main/dev/release/post-09-python.sh works).
4. Enable nightly builds for the MATLAB interface.
6. Document how to qualify a MATLAB Arrow interface release.
7. Enable building and testing the MATLAB Arrow interface on multiple Ubuntu distributions simulatneously (e.g. 20.04 *and* 22.04).

* Closes: apache#38659 
* GitHub Issue: apache#38659

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI][MATLAB][Packaging] Add MATLAB packaging task to crossbow tasks.yml
7 participants