-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37448: [MATLAB] Add arrow.array.ChunkedArray
class
#37525
Conversation
arrow.array.internal package
package structure
…tedArrayTypes() tries to construct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 47ce129. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…LAB class (#37617) ### Rationale for this change Following on to #37474, #37446, and #37525, we should implement `isequal` for the `arrow.type.Field` MATLAB class. ### What changes are included in this PR? 1. Implemented the `isequal` method for `arrow.type.Field` ### Are these changes tested? Yes. Add new unit tests to `tField.m` ### Are there any user-facing changes? Yes. Users can now call `isequal` on `arrow.type.Field`s to determine if two fields are equal. **Example** ```matlab >> f1 = arrow.field("A", arrow.time32(TimeUnit="Second")); >> f2 = arrow.field("B", arrow.time32(TimeUnit="Second")); >> f3 = arrow.field("A", arrow.time32(TimeUnit="Millisecond")); >> isequal(f1, f1) ans = logical 1 % Name properties differ >> isequal(f1, f2) ans = logical 0 % Type properties differ >> isequal(f1, f3) ans = logical 0 ``` ### Future Directions 1. #37568 2. #37570 * Closes: #37569 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
… MATLAB class (#37619) ### Rationale for this change Following on to #37474, #37446, and #37525, we should implement `isequal` for the `arrow.tabular.Schema` MATLAB class. ### What changes are included in this PR? 1. Updated `arrow.tabular.Schema` class to inherit from `matlab.mixin.Scalar`. 2. Added `isequal` method to `arrow.tabular.Schema`. ### Are these changes tested? Yes. Added `isequal` unit tests to `tSchema.m` ### Are there any user-facing changes? Yes. Users can now compare two `arrow.tabular.Schema` objects via `isequal`. **Example** ```matlab >> schema1 = arrow.schema([arrow.field("A", arrow.uint8), arrow.field("B", arrow.uint16)]); >> schema2 = arrow.schema([arrow.field("A", arrow.uint8), arrow.field("B", arrow.uint16)]); >> schema3 = arrow.schema([arrow.field("A", arrow.uint8)]); >> isequal(schema1, schema2) ans = logical 1 >> isequal(schema1, schema3) ans = logical 0 ``` ### Future Directions 1. #37570 * Closes: #37568 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
### Rationale for this change Following on from #37525, which adds `arrow.array.ChunkedArray` to the MATLAB interface, this pull request adds support for a new `arrow.tabular.Table` MATLAB class. This pull request is intended to be an initial implementation of `Table` support and does not include all methods or properties that may be useful on `arrow.tabular.Table`. ### What changes are included in this PR? 1. Added new `arrow.tabular.Table` MATLAB class. **Properties** * `NumRows` * `NumColumns` * `ColumnNames` * `Schema` **Methods** * `fromArrays(<array-1>, ..., <array-N>)` * `column(<index>)` * `table()` * `toMATLAB()` **Example of `arrow.tabular.Table.fromArrays(<array_1>, ..., <array-N>)` static construction method** ```matlab >> arrowTable = arrow.tabular.Table.fromArrays(arrow.array([1, 2, 3]), arrow.array(["A", "B", "C"]), arrow.array([true, false, true])) arrowTable = Column1: double Column2: string Column3: bool ---- Column1: [ [ 1, 2, 3 ] ] Column2: [ [ "A", "B", "C" ] ] Column3: [ [ true, false, true ] ] >> matlabTable = table(arrowTable) matlabTable = 3×3 table Column1 Column2 Column3 _______ _______ _______ 1 "A" true 2 "B" false 3 "C" true ``` 2. Added a new `arrow.table(<matlab-table>)` construction function which creates an `arrow.tabular.Table` from a MATLAB `table`. **Example of `arrow.table(<matlab-table>)` construction function** ```matlab >> matlabTable = table([1; 2; 3], ["A"; "B"; "C"], [true; false; true]) matlabTable = 3×3 table Var1 Var2 Var3 ____ ____ _____ 1 "A" true 2 "B" false 3 "C" true >> arrowTable = arrow.table(matlabTable) arrowTable = Var1: double Var2: string Var3: bool ---- Var1: [ [ 1, 2, 3 ] ] Var2: [ [ "A", "B", "C" ] ] Var3: [ [ true, false, true ] ] >> arrowTable.NumRows ans = int64 3 >> arrowTable.NumColumns ans = int32 3 >> arrowTable.ColumnNames ans = 1×3 string array "Var1" "Var2" "Var3" >> arrowTable.Schema ans = Var1: double Var2: string Var3: bool >> table(arrowTable) ans = 3×3 table Var1 Var2 Var3 ____ ____ _____ 1 "A" true 2 "B" false 3 "C" true >> isequal(ans, matlabTable) ans = logical 1 ``` ### Are these changes tested? Yes. 1. Added a new `tTable` test class for `arrow.tabular.Table` and `arrow.table(<matlab-table>)` tests. ### Are there any user-facing changes? Yes. 1. Users can now create `arrow.tabular.Table` objects using the `fromArrays` static construction method or the `arrow.table(<matlab-table>)` construction function. ### Future Directions 1. Create shared test infrastructure for common `RecordBatch` and `Table` MATLAB tests. 2. Implement equality check (i.e. `isequal`) for `arrow.tabular.Table` instances. 4. Add more static construction methods to `arrow.tabular.Table`. For example: `fromChunkedArrays(<chunkedArray-1>, ..., <chunkedArray-N>)` and `fromRecordBatches(<recordBatch-1>, ..., <recordBatch-N>)`. ### Notes 1. A lot of the code for `arrow.tabular.Table` is very similar to the code for `arrow.tabular.RecordBatch`. It may make sense for us to try to share more of the code using C++ templates or another approach. 2. Thank you @ sgilmore10 for your help with this pull request! * Closes: #37571 Lead-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…atch` MATLAB class (#37627) ### Rationale for this change Following on to #37474, #37446, and #37525, we should implement `isequal` for the `arrow.tabular.RecordBatch` MATLAB class. ### What changes are included in this PR? 1. Implemented `isequal` method for `arrow.tabular.RecordBatch` ### Are these changes tested? Yes. Added `isequal` unit tests to `tRecordBatch.m`. ### Are there any user-facing changes? Yes, users can now use `isequal` to compare `arrow.tabular.RecordBatch`es. **Example** ```matlab >> t1 = table(1, "A", false, VariableNames=["Number", "String", "Logical"]); >> t2 = table([1; 2], ["A"; "B"], [false; false], VariableNames=["Number", "String", "Logical"]); >> rb1 = arrow.recordBatch(t1); >> rb2 = arrow.recordBatch(t2); >> rb3 = arrow.recordBatch(t1); >> isequal(rb1, rb2) ans = logical 0 >> isequal(rb1, rb3) ans = logical 1 ``` ### Future Directions 1. #37628 * Closes: #37570 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…MATLAB class (#37629) ### Rationale for this change Following on to #37474, #37446, #37525, and #37627, we should implement `isequal` for the arrow.tabular.Table` MATLAB class. ### What changes are included in this PR? 1. Add new function `arrow.internal.tabular.isequal` that both `arrow.tabular.RecordBatch` and `arrow.tabular.Table` can use to implement their `isequal` methods. 2. Modified `arrow.tabular.RecordBatch` to use the new `isequal` package function to implement its `isequal` method. 3. Implemented the `isequal` method for `arrow.tabular.Table` using the new `isequal` package function. ### Are these changes tested? Yes, added `isequal` unit tests to `tTable.m` ### Are there any user-facing changes? Yes. Users can now compare `arrow.tabular.Table`s using `isequal`: ```matlab >> t1 = table(1, "A", false, VariableNames=["Number", "String", "Logical"]); >> t2 = table([1; 2], ["A"; "B"], [false; false], VariableNames=["Number", "String", "Logical"]); >> tbl1 = arrow.table(t1); >> tbl2 = arrow.table(t2); >> tbl3 = arrow.table(t1); >> isequal(tbl1, tbl2) ans = logical 0 >> isequal(tbl1, tbl3) ans = logical 1 ``` * Closes: #37628 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…#37525) ### Rationale for this change In order to add an `arrow.tabular.Table` class to the MATLAB Interface, we first need to add a MATLAB class representing `arrow::ChunkedArray`s. This is required because an `arrow::Table` is backed by a vector of `arrow::ChunkedArray`s, and the output of its `column(int index)` method is an `arrow::ChunkedArray`. ### What changes are included in this PR? 1. Introduced a new class called `arrow.array.ChunkedArray`. 2. `arrow.array.ChunkedArray` has the following properties: 1. `Type` - datatype of the `arrow.array.Array`s 2. `Length` - Sum of the `arrow.array.Array` lengths 3. `NumChunks` - Number of `arrow.array.Array`s 3. `arrow.array.ChunkedArray` has the following methods: 1. `chunk(index)` - Returns the `arrow.array.Array` stored at the specified index 2. `fromArrays(array1, array1, ..., arrayN, Type=type)` - Creates a `ChunkedArray` from the arrays provided. If `Type` is provided, all arrays are expected to have the specified `Type`. **Example Usage** ```matlab >> a1 = arrow.array(1:100); >> a2 = arrow.array(101:250); >> a3 = arrow.array(251:300); % Create a ChunkedArray from 3 Float64Arrays >> c = arrow.array.ChunkedArray.fromArrays(a1, a2, a3) c = ChunkedArray with properties: Type: [1×1 arrow.type.Float64Type] NumChunks: 3 Length: 300 % Extract the first chunk and compare it to a1 >> c1 = c.chunk(1); >> tf = isequal(c1, a1) tf = logical 1 % Create an empty ChunkedArray by providing the Type nv-pair >> c = arrow.array.ChunkedArray.fromArrays(Type=arrow.timestamp()) c = ChunkedArray with properties: Type: [1×1 arrow.type.TimestampType] NumChunks: 0 Length: 0 ``` ### Are these changes tested? Yes. I added a new test class called `tChunkedArray.m` that contains unit tests for the new class. ### Are there any user-facing changes? Yes. Users can now create a `ChunkedArray` in the MATLAB Interface. ### Future Directions 1. In this PR, we deliberately didn't include a convenience constructor function because we're not sure if we want users to create `ChunkedArray`s themselves. We think users will mostly use `ChunkedArray` when extracting columns from `Table`s. 2. We will implement more methods on `ChunkedArray`, such as `flatten()` and `combineChunks()`, etc. * Closes: apache#37448 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…d` MATLAB class (apache#37617) ### Rationale for this change Following on to apache#37474, apache#37446, and apache#37525, we should implement `isequal` for the `arrow.type.Field` MATLAB class. ### What changes are included in this PR? 1. Implemented the `isequal` method for `arrow.type.Field` ### Are these changes tested? Yes. Add new unit tests to `tField.m` ### Are there any user-facing changes? Yes. Users can now call `isequal` on `arrow.type.Field`s to determine if two fields are equal. **Example** ```matlab >> f1 = arrow.field("A", arrow.time32(TimeUnit="Second")); >> f2 = arrow.field("B", arrow.time32(TimeUnit="Second")); >> f3 = arrow.field("A", arrow.time32(TimeUnit="Millisecond")); >> isequal(f1, f1) ans = logical 1 % Name properties differ >> isequal(f1, f2) ans = logical 0 % Type properties differ >> isequal(f1, f3) ans = logical 0 ``` ### Future Directions 1. apache#37568 2. apache#37570 * Closes: apache#37569 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…chema` MATLAB class (apache#37619) ### Rationale for this change Following on to apache#37474, apache#37446, and apache#37525, we should implement `isequal` for the `arrow.tabular.Schema` MATLAB class. ### What changes are included in this PR? 1. Updated `arrow.tabular.Schema` class to inherit from `matlab.mixin.Scalar`. 2. Added `isequal` method to `arrow.tabular.Schema`. ### Are these changes tested? Yes. Added `isequal` unit tests to `tSchema.m` ### Are there any user-facing changes? Yes. Users can now compare two `arrow.tabular.Schema` objects via `isequal`. **Example** ```matlab >> schema1 = arrow.schema([arrow.field("A", arrow.uint8), arrow.field("B", arrow.uint16)]); >> schema2 = arrow.schema([arrow.field("A", arrow.uint8), arrow.field("B", arrow.uint16)]); >> schema3 = arrow.schema([arrow.field("A", arrow.uint8)]); >> isequal(schema1, schema2) ans = logical 1 >> isequal(schema1, schema3) ans = logical 0 ``` ### Future Directions 1. apache#37570 * Closes: apache#37568 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…he#37620) ### Rationale for this change Following on from apache#37525, which adds `arrow.array.ChunkedArray` to the MATLAB interface, this pull request adds support for a new `arrow.tabular.Table` MATLAB class. This pull request is intended to be an initial implementation of `Table` support and does not include all methods or properties that may be useful on `arrow.tabular.Table`. ### What changes are included in this PR? 1. Added new `arrow.tabular.Table` MATLAB class. **Properties** * `NumRows` * `NumColumns` * `ColumnNames` * `Schema` **Methods** * `fromArrays(<array-1>, ..., <array-N>)` * `column(<index>)` * `table()` * `toMATLAB()` **Example of `arrow.tabular.Table.fromArrays(<array_1>, ..., <array-N>)` static construction method** ```matlab >> arrowTable = arrow.tabular.Table.fromArrays(arrow.array([1, 2, 3]), arrow.array(["A", "B", "C"]), arrow.array([true, false, true])) arrowTable = Column1: double Column2: string Column3: bool ---- Column1: [ [ 1, 2, 3 ] ] Column2: [ [ "A", "B", "C" ] ] Column3: [ [ true, false, true ] ] >> matlabTable = table(arrowTable) matlabTable = 3×3 table Column1 Column2 Column3 _______ _______ _______ 1 "A" true 2 "B" false 3 "C" true ``` 2. Added a new `arrow.table(<matlab-table>)` construction function which creates an `arrow.tabular.Table` from a MATLAB `table`. **Example of `arrow.table(<matlab-table>)` construction function** ```matlab >> matlabTable = table([1; 2; 3], ["A"; "B"; "C"], [true; false; true]) matlabTable = 3×3 table Var1 Var2 Var3 ____ ____ _____ 1 "A" true 2 "B" false 3 "C" true >> arrowTable = arrow.table(matlabTable) arrowTable = Var1: double Var2: string Var3: bool ---- Var1: [ [ 1, 2, 3 ] ] Var2: [ [ "A", "B", "C" ] ] Var3: [ [ true, false, true ] ] >> arrowTable.NumRows ans = int64 3 >> arrowTable.NumColumns ans = int32 3 >> arrowTable.ColumnNames ans = 1×3 string array "Var1" "Var2" "Var3" >> arrowTable.Schema ans = Var1: double Var2: string Var3: bool >> table(arrowTable) ans = 3×3 table Var1 Var2 Var3 ____ ____ _____ 1 "A" true 2 "B" false 3 "C" true >> isequal(ans, matlabTable) ans = logical 1 ``` ### Are these changes tested? Yes. 1. Added a new `tTable` test class for `arrow.tabular.Table` and `arrow.table(<matlab-table>)` tests. ### Are there any user-facing changes? Yes. 1. Users can now create `arrow.tabular.Table` objects using the `fromArrays` static construction method or the `arrow.table(<matlab-table>)` construction function. ### Future Directions 1. Create shared test infrastructure for common `RecordBatch` and `Table` MATLAB tests. 2. Implement equality check (i.e. `isequal`) for `arrow.tabular.Table` instances. 4. Add more static construction methods to `arrow.tabular.Table`. For example: `fromChunkedArrays(<chunkedArray-1>, ..., <chunkedArray-N>)` and `fromRecordBatches(<recordBatch-1>, ..., <recordBatch-N>)`. ### Notes 1. A lot of the code for `arrow.tabular.Table` is very similar to the code for `arrow.tabular.RecordBatch`. It may make sense for us to try to share more of the code using C++ templates or another approach. 2. Thank you @ sgilmore10 for your help with this pull request! * Closes: apache#37571 Lead-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…ecordBatch` MATLAB class (apache#37627) ### Rationale for this change Following on to apache#37474, apache#37446, and apache#37525, we should implement `isequal` for the `arrow.tabular.RecordBatch` MATLAB class. ### What changes are included in this PR? 1. Implemented `isequal` method for `arrow.tabular.RecordBatch` ### Are these changes tested? Yes. Added `isequal` unit tests to `tRecordBatch.m`. ### Are there any user-facing changes? Yes, users can now use `isequal` to compare `arrow.tabular.RecordBatch`es. **Example** ```matlab >> t1 = table(1, "A", false, VariableNames=["Number", "String", "Logical"]); >> t2 = table([1; 2], ["A"; "B"], [false; false], VariableNames=["Number", "String", "Logical"]); >> rb1 = arrow.recordBatch(t1); >> rb2 = arrow.recordBatch(t2); >> rb3 = arrow.recordBatch(t1); >> isequal(rb1, rb2) ans = logical 0 >> isequal(rb1, rb3) ans = logical 1 ``` ### Future Directions 1. apache#37628 * Closes: apache#37570 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…able` MATLAB class (apache#37629) ### Rationale for this change Following on to apache#37474, apache#37446, apache#37525, and apache#37627, we should implement `isequal` for the arrow.tabular.Table` MATLAB class. ### What changes are included in this PR? 1. Add new function `arrow.internal.tabular.isequal` that both `arrow.tabular.RecordBatch` and `arrow.tabular.Table` can use to implement their `isequal` methods. 2. Modified `arrow.tabular.RecordBatch` to use the new `isequal` package function to implement its `isequal` method. 3. Implemented the `isequal` method for `arrow.tabular.Table` using the new `isequal` package function. ### Are these changes tested? Yes, added `isequal` unit tests to `tTable.m` ### Are there any user-facing changes? Yes. Users can now compare `arrow.tabular.Table`s using `isequal`: ```matlab >> t1 = table(1, "A", false, VariableNames=["Number", "String", "Logical"]); >> t2 = table([1; 2], ["A"; "B"], [false; false], VariableNames=["Number", "String", "Logical"]); >> tbl1 = arrow.table(t1); >> tbl2 = arrow.table(t2); >> tbl3 = arrow.table(t1); >> isequal(tbl1, tbl2) ans = logical 0 >> isequal(tbl1, tbl3) ans = logical 1 ``` * Closes: apache#37628 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…#37525) ### Rationale for this change In order to add an `arrow.tabular.Table` class to the MATLAB Interface, we first need to add a MATLAB class representing `arrow::ChunkedArray`s. This is required because an `arrow::Table` is backed by a vector of `arrow::ChunkedArray`s, and the output of its `column(int index)` method is an `arrow::ChunkedArray`. ### What changes are included in this PR? 1. Introduced a new class called `arrow.array.ChunkedArray`. 2. `arrow.array.ChunkedArray` has the following properties: 1. `Type` - datatype of the `arrow.array.Array`s 2. `Length` - Sum of the `arrow.array.Array` lengths 3. `NumChunks` - Number of `arrow.array.Array`s 3. `arrow.array.ChunkedArray` has the following methods: 1. `chunk(index)` - Returns the `arrow.array.Array` stored at the specified index 2. `fromArrays(array1, array1, ..., arrayN, Type=type)` - Creates a `ChunkedArray` from the arrays provided. If `Type` is provided, all arrays are expected to have the specified `Type`. **Example Usage** ```matlab >> a1 = arrow.array(1:100); >> a2 = arrow.array(101:250); >> a3 = arrow.array(251:300); % Create a ChunkedArray from 3 Float64Arrays >> c = arrow.array.ChunkedArray.fromArrays(a1, a2, a3) c = ChunkedArray with properties: Type: [1×1 arrow.type.Float64Type] NumChunks: 3 Length: 300 % Extract the first chunk and compare it to a1 >> c1 = c.chunk(1); >> tf = isequal(c1, a1) tf = logical 1 % Create an empty ChunkedArray by providing the Type nv-pair >> c = arrow.array.ChunkedArray.fromArrays(Type=arrow.timestamp()) c = ChunkedArray with properties: Type: [1×1 arrow.type.TimestampType] NumChunks: 0 Length: 0 ``` ### Are these changes tested? Yes. I added a new test class called `tChunkedArray.m` that contains unit tests for the new class. ### Are there any user-facing changes? Yes. Users can now create a `ChunkedArray` in the MATLAB Interface. ### Future Directions 1. In this PR, we deliberately didn't include a convenience constructor function because we're not sure if we want users to create `ChunkedArray`s themselves. We think users will mostly use `ChunkedArray` when extracting columns from `Table`s. 2. We will implement more methods on `ChunkedArray`, such as `flatten()` and `combineChunks()`, etc. * Closes: apache#37448 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…d` MATLAB class (apache#37617) ### Rationale for this change Following on to apache#37474, apache#37446, and apache#37525, we should implement `isequal` for the `arrow.type.Field` MATLAB class. ### What changes are included in this PR? 1. Implemented the `isequal` method for `arrow.type.Field` ### Are these changes tested? Yes. Add new unit tests to `tField.m` ### Are there any user-facing changes? Yes. Users can now call `isequal` on `arrow.type.Field`s to determine if two fields are equal. **Example** ```matlab >> f1 = arrow.field("A", arrow.time32(TimeUnit="Second")); >> f2 = arrow.field("B", arrow.time32(TimeUnit="Second")); >> f3 = arrow.field("A", arrow.time32(TimeUnit="Millisecond")); >> isequal(f1, f1) ans = logical 1 % Name properties differ >> isequal(f1, f2) ans = logical 0 % Type properties differ >> isequal(f1, f3) ans = logical 0 ``` ### Future Directions 1. apache#37568 2. apache#37570 * Closes: apache#37569 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…chema` MATLAB class (apache#37619) ### Rationale for this change Following on to apache#37474, apache#37446, and apache#37525, we should implement `isequal` for the `arrow.tabular.Schema` MATLAB class. ### What changes are included in this PR? 1. Updated `arrow.tabular.Schema` class to inherit from `matlab.mixin.Scalar`. 2. Added `isequal` method to `arrow.tabular.Schema`. ### Are these changes tested? Yes. Added `isequal` unit tests to `tSchema.m` ### Are there any user-facing changes? Yes. Users can now compare two `arrow.tabular.Schema` objects via `isequal`. **Example** ```matlab >> schema1 = arrow.schema([arrow.field("A", arrow.uint8), arrow.field("B", arrow.uint16)]); >> schema2 = arrow.schema([arrow.field("A", arrow.uint8), arrow.field("B", arrow.uint16)]); >> schema3 = arrow.schema([arrow.field("A", arrow.uint8)]); >> isequal(schema1, schema2) ans = logical 1 >> isequal(schema1, schema3) ans = logical 0 ``` ### Future Directions 1. apache#37570 * Closes: apache#37568 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…he#37620) ### Rationale for this change Following on from apache#37525, which adds `arrow.array.ChunkedArray` to the MATLAB interface, this pull request adds support for a new `arrow.tabular.Table` MATLAB class. This pull request is intended to be an initial implementation of `Table` support and does not include all methods or properties that may be useful on `arrow.tabular.Table`. ### What changes are included in this PR? 1. Added new `arrow.tabular.Table` MATLAB class. **Properties** * `NumRows` * `NumColumns` * `ColumnNames` * `Schema` **Methods** * `fromArrays(<array-1>, ..., <array-N>)` * `column(<index>)` * `table()` * `toMATLAB()` **Example of `arrow.tabular.Table.fromArrays(<array_1>, ..., <array-N>)` static construction method** ```matlab >> arrowTable = arrow.tabular.Table.fromArrays(arrow.array([1, 2, 3]), arrow.array(["A", "B", "C"]), arrow.array([true, false, true])) arrowTable = Column1: double Column2: string Column3: bool ---- Column1: [ [ 1, 2, 3 ] ] Column2: [ [ "A", "B", "C" ] ] Column3: [ [ true, false, true ] ] >> matlabTable = table(arrowTable) matlabTable = 3×3 table Column1 Column2 Column3 _______ _______ _______ 1 "A" true 2 "B" false 3 "C" true ``` 2. Added a new `arrow.table(<matlab-table>)` construction function which creates an `arrow.tabular.Table` from a MATLAB `table`. **Example of `arrow.table(<matlab-table>)` construction function** ```matlab >> matlabTable = table([1; 2; 3], ["A"; "B"; "C"], [true; false; true]) matlabTable = 3×3 table Var1 Var2 Var3 ____ ____ _____ 1 "A" true 2 "B" false 3 "C" true >> arrowTable = arrow.table(matlabTable) arrowTable = Var1: double Var2: string Var3: bool ---- Var1: [ [ 1, 2, 3 ] ] Var2: [ [ "A", "B", "C" ] ] Var3: [ [ true, false, true ] ] >> arrowTable.NumRows ans = int64 3 >> arrowTable.NumColumns ans = int32 3 >> arrowTable.ColumnNames ans = 1×3 string array "Var1" "Var2" "Var3" >> arrowTable.Schema ans = Var1: double Var2: string Var3: bool >> table(arrowTable) ans = 3×3 table Var1 Var2 Var3 ____ ____ _____ 1 "A" true 2 "B" false 3 "C" true >> isequal(ans, matlabTable) ans = logical 1 ``` ### Are these changes tested? Yes. 1. Added a new `tTable` test class for `arrow.tabular.Table` and `arrow.table(<matlab-table>)` tests. ### Are there any user-facing changes? Yes. 1. Users can now create `arrow.tabular.Table` objects using the `fromArrays` static construction method or the `arrow.table(<matlab-table>)` construction function. ### Future Directions 1. Create shared test infrastructure for common `RecordBatch` and `Table` MATLAB tests. 2. Implement equality check (i.e. `isequal`) for `arrow.tabular.Table` instances. 4. Add more static construction methods to `arrow.tabular.Table`. For example: `fromChunkedArrays(<chunkedArray-1>, ..., <chunkedArray-N>)` and `fromRecordBatches(<recordBatch-1>, ..., <recordBatch-N>)`. ### Notes 1. A lot of the code for `arrow.tabular.Table` is very similar to the code for `arrow.tabular.RecordBatch`. It may make sense for us to try to share more of the code using C++ templates or another approach. 2. Thank you @ sgilmore10 for your help with this pull request! * Closes: apache#37571 Lead-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…ecordBatch` MATLAB class (apache#37627) ### Rationale for this change Following on to apache#37474, apache#37446, and apache#37525, we should implement `isequal` for the `arrow.tabular.RecordBatch` MATLAB class. ### What changes are included in this PR? 1. Implemented `isequal` method for `arrow.tabular.RecordBatch` ### Are these changes tested? Yes. Added `isequal` unit tests to `tRecordBatch.m`. ### Are there any user-facing changes? Yes, users can now use `isequal` to compare `arrow.tabular.RecordBatch`es. **Example** ```matlab >> t1 = table(1, "A", false, VariableNames=["Number", "String", "Logical"]); >> t2 = table([1; 2], ["A"; "B"], [false; false], VariableNames=["Number", "String", "Logical"]); >> rb1 = arrow.recordBatch(t1); >> rb2 = arrow.recordBatch(t2); >> rb3 = arrow.recordBatch(t1); >> isequal(rb1, rb2) ans = logical 0 >> isequal(rb1, rb3) ans = logical 1 ``` ### Future Directions 1. apache#37628 * Closes: apache#37570 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…able` MATLAB class (apache#37629) ### Rationale for this change Following on to apache#37474, apache#37446, apache#37525, and apache#37627, we should implement `isequal` for the arrow.tabular.Table` MATLAB class. ### What changes are included in this PR? 1. Add new function `arrow.internal.tabular.isequal` that both `arrow.tabular.RecordBatch` and `arrow.tabular.Table` can use to implement their `isequal` methods. 2. Modified `arrow.tabular.RecordBatch` to use the new `isequal` package function to implement its `isequal` method. 3. Implemented the `isequal` method for `arrow.tabular.Table` using the new `isequal` package function. ### Are these changes tested? Yes, added `isequal` unit tests to `tTable.m` ### Are there any user-facing changes? Yes. Users can now compare `arrow.tabular.Table`s using `isequal`: ```matlab >> t1 = table(1, "A", false, VariableNames=["Number", "String", "Logical"]); >> t2 = table([1; 2], ["A"; "B"], [false; false], VariableNames=["Number", "String", "Logical"]); >> tbl1 = arrow.table(t1); >> tbl2 = arrow.table(t2); >> tbl3 = arrow.table(t1); >> isequal(tbl1, tbl2) ans = logical 0 >> isequal(tbl1, tbl3) ans = logical 1 ``` * Closes: apache#37628 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Rationale for this change
In order to add an
arrow.tabular.Table
class to the MATLAB Interface, we first need to add a MATLAB class representingarrow::ChunkedArray
s. This is required because anarrow::Table
is backed by a vector ofarrow::ChunkedArray
s, and the output of itscolumn(int index)
method is anarrow::ChunkedArray
.What changes are included in this PR?
arrow.array.ChunkedArray
.arrow.array.ChunkedArray
has the following properties:Type
- datatype of thearrow.array.Array
sLength
- Sum of thearrow.array.Array
lengthsNumChunks
- Number ofarrow.array.Array
sarrow.array.ChunkedArray
has the following methods:chunk(index)
- Returns thearrow.array.Array
stored at the specified indexfromArrays(array1, array1, ..., arrayN, Type=type)
- Creates aChunkedArray
from the arrays provided. IfType
is provided, all arrays are expected to have the specifiedType
.Example Usage
Are these changes tested?
Yes. I added a new test class called
tChunkedArray.m
that contains unit tests for the new class.Are there any user-facing changes?
Yes. Users can now create a
ChunkedArray
in the MATLAB Interface.Future Directions
ChunkedArray
s themselves. We think users will mostly useChunkedArray
when extracting columns fromTable
s.ChunkedArray
, such asflatten()
andcombineChunks()
, etc.arrow.array.ChunkedArray
class #37448