Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37448: [MATLAB] Add arrow.array.ChunkedArray class #37525

Merged
merged 26 commits into from
Sep 3, 2023

Conversation

sgilmore10
Copy link
Member

@sgilmore10 sgilmore10 commented Sep 1, 2023

Rationale for this change

In order to add an arrow.tabular.Table class to the MATLAB Interface, we first need to add a MATLAB class representing arrow::ChunkedArrays. This is required because an arrow::Table is backed by a vector of arrow::ChunkedArrays, and the output of its column(int index) method is an arrow::ChunkedArray.

What changes are included in this PR?

  1. Introduced a new class called arrow.array.ChunkedArray.
  2. arrow.array.ChunkedArray has the following properties:
    1. Type - datatype of the arrow.array.Arrays
    2. Length - Sum of the arrow.array.Array lengths
    3. NumChunks - Number of arrow.array.Arrays
  3. arrow.array.ChunkedArray has the following methods:
    1. chunk(index) - Returns the arrow.array.Array stored at the specified index
    2. fromArrays(array1, array1, ..., arrayN, Type=type) - Creates a ChunkedArray from the arrays provided. If Type is provided, all arrays are expected to have the specified Type.

Example Usage

>> a1 = arrow.array(1:100);
>> a2 = arrow.array(101:250);
>> a3 = arrow.array(251:300);

% Create a ChunkedArray from 3 Float64Arrays
>> c = arrow.array.ChunkedArray.fromArrays(a1, a2, a3)

c = 

  ChunkedArray with properties:

         Type: [1×1 arrow.type.Float64Type]
    NumChunks: 3
       Length: 300

% Extract the first chunk and compare it to a1
>> c1 = c.chunk(1);
>> tf = isequal(c1, a1)

tf =

  logical

   1

% Create an empty ChunkedArray by providing the Type nv-pair
>> c = arrow.array.ChunkedArray.fromArrays(Type=arrow.timestamp())

c = 

  ChunkedArray with properties:

         Type: [1×1 arrow.type.TimestampType]
    NumChunks: 0
       Length: 0

Are these changes tested?

Yes. I added a new test class called tChunkedArray.m that contains unit tests for the new class.

Are there any user-facing changes?

Yes. Users can now create a ChunkedArray in the MATLAB Interface.

Future Directions

  1. In this PR, we deliberately didn't include a convenience constructor function because we're not sure if we want users to create ChunkedArrays themselves. We think users will mostly use ChunkedArray when extracting columns from Tables.
  2. We will implement more methods on ChunkedArray, such as flatten() and combineChunks(), etc.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Sep 1, 2023
@kou kou merged commit 47ce129 into apache:main Sep 3, 2023
8 checks passed
@kou kou removed the awaiting merge Awaiting merge label Sep 3, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 47ce129.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

kevingurney pushed a commit that referenced this pull request Sep 7, 2023
…LAB class (#37617)

### Rationale for this change

Following on to #37474, #37446, and #37525, we should implement `isequal` for the `arrow.type.Field` MATLAB class.

### What changes are included in this PR?

1. Implemented the `isequal` method for `arrow.type.Field`

### Are these changes tested?

Yes. Add new unit tests to `tField.m`

### Are there any user-facing changes?

Yes. Users can now call `isequal` on `arrow.type.Field`s to determine if two fields are equal.

**Example**
```matlab
>> f1 = arrow.field("A", arrow.time32(TimeUnit="Second"));
>> f2 = arrow.field("B", arrow.time32(TimeUnit="Second"));
>> f3 = arrow.field("A", arrow.time32(TimeUnit="Millisecond"));

>> isequal(f1, f1)

ans =

  logical

   1

% Name properties differ
>> isequal(f1, f2)

ans =

  logical

   0

% Type properties differ
>> isequal(f1, f3)

ans =

  logical

   0
```

### Future Directions

1. #37568
2. #37570

* Closes: #37569

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
kevingurney pushed a commit that referenced this pull request Sep 7, 2023
… MATLAB class (#37619)

### Rationale for this change

Following on to #37474, #37446, and #37525, we should implement `isequal` for the `arrow.tabular.Schema` MATLAB class.

### What changes are included in this PR?

1. Updated `arrow.tabular.Schema` class to inherit from `matlab.mixin.Scalar`.
2. Added `isequal` method to `arrow.tabular.Schema`.

### Are these changes tested?

Yes. Added `isequal` unit tests to `tSchema.m`

### Are there any user-facing changes?

Yes. Users can now compare two `arrow.tabular.Schema` objects via `isequal`.

**Example**
```matlab
>> schema1 = arrow.schema([arrow.field("A", arrow.uint8), arrow.field("B", arrow.uint16)]);
>> schema2 = arrow.schema([arrow.field("A", arrow.uint8), arrow.field("B", arrow.uint16)]);
>> schema3 = arrow.schema([arrow.field("A", arrow.uint8)]);

>> isequal(schema1, schema2)

ans =

  logical

   1

>> isequal(schema1, schema3)

ans =

  logical

   0
```

### Future Directions
1. #37570 

* Closes: #37568

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
kevingurney added a commit that referenced this pull request Sep 7, 2023
### Rationale for this change

Following on from #37525, which adds `arrow.array.ChunkedArray` to the MATLAB interface, this pull request adds support for a new `arrow.tabular.Table` MATLAB class.

This pull request is intended to be an initial implementation of `Table` support and does not include all methods or properties that may be useful on `arrow.tabular.Table`.

### What changes are included in this PR?

1. Added new `arrow.tabular.Table` MATLAB class.

**Properties**

* `NumRows`
* `NumColumns`
* `ColumnNames`
* `Schema`

**Methods**

* `fromArrays(<array-1>, ..., <array-N>)`
* `column(<index>)`
* `table()`
* `toMATLAB()`

**Example of `arrow.tabular.Table.fromArrays(<array_1>, ..., <array-N>)` static construction method**
```matlab
>> arrowTable = arrow.tabular.Table.fromArrays(arrow.array([1, 2, 3]), arrow.array(["A", "B", "C"]), arrow.array([true, false, true]))

arrowTable = 

Column1: double
Column2: string
Column3: bool
----
Column1:
  [
    [
      1,
      2,
      3
    ]
  ]
Column2:
  [
    [
      "A",
      "B",
      "C"
    ]
  ]
Column3:
  [
    [
      true,
      false,
      true
    ]
  ]

>> matlabTable = table(arrowTable)

matlabTable =

  3×3 table

    Column1    Column2    Column3
    _______    _______    _______

       1         "A"       true  
       2         "B"       false 
       3         "C"       true  
```

2. Added a new `arrow.table(<matlab-table>)` construction function which creates an `arrow.tabular.Table` from a MATLAB `table`. 

**Example of `arrow.table(<matlab-table>)` construction function**
```matlab
>> matlabTable = table([1; 2; 3], ["A"; "B"; "C"], [true; false; true])

matlabTable =

  3×3 table

    Var1    Var2    Var3 
    ____    ____    _____

     1      "A"     true 
     2      "B"     false
     3      "C"     true 

>> arrowTable = arrow.table(matlabTable)

arrowTable = 

Var1: double
Var2: string
Var3: bool
----
Var1:
  [
    [
      1,
      2,
      3
    ]
  ]
Var2:
  [
    [
      "A",
      "B",
      "C"
    ]
  ]
Var3:
  [
    [
      true,
      false,
      true
    ]
  ]

>> arrowTable.NumRows

ans =

  int64

   3

>> arrowTable.NumColumns

ans =

  int32

   3

>> arrowTable.ColumnNames

ans = 

  1×3 string array

    "Var1"    "Var2"    "Var3"

>> arrowTable.Schema

ans = 

Var1: double
Var2: string
Var3: bool

>> table(arrowTable)

ans =

  3×3 table

    Var1    Var2    Var3 
    ____    ____    _____

     1      "A"     true 
     2      "B"     false
     3      "C"     true 

>> isequal(ans, matlabTable)

ans =

  logical

   1
```

### Are these changes tested?

Yes.

1. Added a new `tTable` test class for `arrow.tabular.Table` and `arrow.table(<matlab-table>)` tests.

### Are there any user-facing changes?

Yes.

1. Users can now create `arrow.tabular.Table` objects using the `fromArrays` static construction method or the `arrow.table(<matlab-table>)` construction function.

### Future Directions

1. Create shared test infrastructure for common `RecordBatch` and `Table` MATLAB tests.
2. Implement equality check (i.e. `isequal`) for `arrow.tabular.Table` instances.
4. Add more static construction methods to `arrow.tabular.Table`. For example: `fromChunkedArrays(<chunkedArray-1>, ..., <chunkedArray-N>)` and `fromRecordBatches(<recordBatch-1>, ..., <recordBatch-N>)`.

### Notes

1. A lot of the code for `arrow.tabular.Table` is very similar to the code for `arrow.tabular.RecordBatch`. It may make sense for us to try to share more of the code using C++ templates or another approach.
2. Thank you @ sgilmore10 for your help with this pull request!
* Closes: #37571

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
kevingurney pushed a commit that referenced this pull request Sep 8, 2023
…atch` MATLAB class (#37627)

### Rationale for this change

Following on to #37474, #37446, and #37525, we should implement `isequal` for the `arrow.tabular.RecordBatch` MATLAB class.

### What changes are included in this PR?

1. Implemented `isequal` method for `arrow.tabular.RecordBatch`

### Are these changes tested?

Yes. Added `isequal` unit tests to `tRecordBatch.m`.

### Are there any user-facing changes?

Yes, users can now use `isequal` to compare `arrow.tabular.RecordBatch`es. 

**Example**

```matlab
>> t1 = table(1, "A", false, VariableNames=["Number",  "String", "Logical"]);
>> t2 = table([1; 2], ["A"; "B"], [false; false], VariableNames=["Number",  "String", "Logical"]); 
>> rb1 = arrow.recordBatch(t1);
>> rb2 = arrow.recordBatch(t2);
>> rb3 = arrow.recordBatch(t1);

>> isequal(rb1, rb2)

ans =

  logical

   0

>> isequal(rb1, rb3)

ans =

  logical

   1
```

### Future Directions
1. #37628

* Closes: #37570

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
kevingurney pushed a commit that referenced this pull request Sep 8, 2023
…MATLAB class (#37629)

### Rationale for this change

Following on to #37474, #37446, #37525,  and  #37627, we should implement `isequal` for the arrow.tabular.Table` MATLAB class.

### What changes are included in this PR?

1. Add new function `arrow.internal.tabular.isequal` that both `arrow.tabular.RecordBatch` and `arrow.tabular.Table` can use to implement their `isequal` methods.
2. Modified `arrow.tabular.RecordBatch` to use the new `isequal` package function to implement  its `isequal` method.
3. Implemented the `isequal` method for `arrow.tabular.Table` using the new `isequal` package function.

### Are these changes tested?

Yes, added `isequal` unit tests to `tTable.m`

### Are there any user-facing changes?

Yes. Users can now compare `arrow.tabular.Table`s using `isequal`:

```matlab
>> t1 = table(1, "A", false, VariableNames=["Number",  "String", "Logical"]);
>> t2 = table([1; 2], ["A"; "B"], [false; false], VariableNames=["Number",  "String", "Logical"]); 
>> tbl1 = arrow.table(t1);
>> tbl2 = arrow.table(t2);
>> tbl3 = arrow.table(t1);

>> isequal(tbl1, tbl2)

ans =

  logical

   0

>> isequal(tbl1, tbl3)

ans =

  logical

   1
```

* Closes: #37628

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…#37525)

### Rationale for this change

In order to add an `arrow.tabular.Table` class to the MATLAB Interface, we first need to add a MATLAB class representing `arrow::ChunkedArray`s. This is required because an `arrow::Table` is backed by a vector of `arrow::ChunkedArray`s, and the output of its `column(int index)` method is an `arrow::ChunkedArray`.

### What changes are included in this PR?

1. Introduced a new class called `arrow.array.ChunkedArray`. 
2. `arrow.array.ChunkedArray` has the following properties:
    1.  `Type` - datatype of the `arrow.array.Array`s
    2. `Length` - Sum of the `arrow.array.Array` lengths 
    3. `NumChunks` - Number of `arrow.array.Array`s
3. `arrow.array.ChunkedArray` has the following methods:
   1. `chunk(index)` - Returns the `arrow.array.Array` stored at the specified index
   2. `fromArrays(array1, array1, ..., arrayN, Type=type)` - Creates a `ChunkedArray` from the arrays provided. If `Type` is provided, all arrays are expected to have the specified `Type`.

**Example Usage**

```matlab
>> a1 = arrow.array(1:100);
>> a2 = arrow.array(101:250);
>> a3 = arrow.array(251:300);

% Create a ChunkedArray from 3 Float64Arrays
>> c = arrow.array.ChunkedArray.fromArrays(a1, a2, a3)

c = 

  ChunkedArray with properties:

         Type: [1×1 arrow.type.Float64Type]
    NumChunks: 3
       Length: 300

% Extract the first chunk and compare it to a1
>> c1 = c.chunk(1);
>> tf = isequal(c1, a1)

tf =

  logical

   1

% Create an empty ChunkedArray by providing the Type nv-pair
>> c = arrow.array.ChunkedArray.fromArrays(Type=arrow.timestamp())

c = 

  ChunkedArray with properties:

         Type: [1×1 arrow.type.TimestampType]
    NumChunks: 0
       Length: 0

```

### Are these changes tested?

Yes. I added a new test class called `tChunkedArray.m` that contains unit tests for the new class.

### Are there any user-facing changes?

Yes. Users can now create a `ChunkedArray` in the MATLAB Interface. 

### Future Directions

1. In this PR, we deliberately didn't include a convenience constructor function because we're not sure if we want users to create `ChunkedArray`s themselves. We think users will mostly use `ChunkedArray` when extracting columns from `Table`s. 
2. We will implement more methods on `ChunkedArray`, such as `flatten()` and `combineChunks()`, etc.
* Closes: apache#37448

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…d` MATLAB class (apache#37617)

### Rationale for this change

Following on to apache#37474, apache#37446, and apache#37525, we should implement `isequal` for the `arrow.type.Field` MATLAB class.

### What changes are included in this PR?

1. Implemented the `isequal` method for `arrow.type.Field`

### Are these changes tested?

Yes. Add new unit tests to `tField.m`

### Are there any user-facing changes?

Yes. Users can now call `isequal` on `arrow.type.Field`s to determine if two fields are equal.

**Example**
```matlab
>> f1 = arrow.field("A", arrow.time32(TimeUnit="Second"));
>> f2 = arrow.field("B", arrow.time32(TimeUnit="Second"));
>> f3 = arrow.field("A", arrow.time32(TimeUnit="Millisecond"));

>> isequal(f1, f1)

ans =

  logical

   1

% Name properties differ
>> isequal(f1, f2)

ans =

  logical

   0

% Type properties differ
>> isequal(f1, f3)

ans =

  logical

   0
```

### Future Directions

1. apache#37568
2. apache#37570

* Closes: apache#37569

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…chema` MATLAB class (apache#37619)

### Rationale for this change

Following on to apache#37474, apache#37446, and apache#37525, we should implement `isequal` for the `arrow.tabular.Schema` MATLAB class.

### What changes are included in this PR?

1. Updated `arrow.tabular.Schema` class to inherit from `matlab.mixin.Scalar`.
2. Added `isequal` method to `arrow.tabular.Schema`.

### Are these changes tested?

Yes. Added `isequal` unit tests to `tSchema.m`

### Are there any user-facing changes?

Yes. Users can now compare two `arrow.tabular.Schema` objects via `isequal`.

**Example**
```matlab
>> schema1 = arrow.schema([arrow.field("A", arrow.uint8), arrow.field("B", arrow.uint16)]);
>> schema2 = arrow.schema([arrow.field("A", arrow.uint8), arrow.field("B", arrow.uint16)]);
>> schema3 = arrow.schema([arrow.field("A", arrow.uint8)]);

>> isequal(schema1, schema2)

ans =

  logical

   1

>> isequal(schema1, schema3)

ans =

  logical

   0
```

### Future Directions
1. apache#37570 

* Closes: apache#37568

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…he#37620)

### Rationale for this change

Following on from apache#37525, which adds `arrow.array.ChunkedArray` to the MATLAB interface, this pull request adds support for a new `arrow.tabular.Table` MATLAB class.

This pull request is intended to be an initial implementation of `Table` support and does not include all methods or properties that may be useful on `arrow.tabular.Table`.

### What changes are included in this PR?

1. Added new `arrow.tabular.Table` MATLAB class.

**Properties**

* `NumRows`
* `NumColumns`
* `ColumnNames`
* `Schema`

**Methods**

* `fromArrays(<array-1>, ..., <array-N>)`
* `column(<index>)`
* `table()`
* `toMATLAB()`

**Example of `arrow.tabular.Table.fromArrays(<array_1>, ..., <array-N>)` static construction method**
```matlab
>> arrowTable = arrow.tabular.Table.fromArrays(arrow.array([1, 2, 3]), arrow.array(["A", "B", "C"]), arrow.array([true, false, true]))

arrowTable = 

Column1: double
Column2: string
Column3: bool
----
Column1:
  [
    [
      1,
      2,
      3
    ]
  ]
Column2:
  [
    [
      "A",
      "B",
      "C"
    ]
  ]
Column3:
  [
    [
      true,
      false,
      true
    ]
  ]

>> matlabTable = table(arrowTable)

matlabTable =

  3×3 table

    Column1    Column2    Column3
    _______    _______    _______

       1         "A"       true  
       2         "B"       false 
       3         "C"       true  
```

2. Added a new `arrow.table(<matlab-table>)` construction function which creates an `arrow.tabular.Table` from a MATLAB `table`. 

**Example of `arrow.table(<matlab-table>)` construction function**
```matlab
>> matlabTable = table([1; 2; 3], ["A"; "B"; "C"], [true; false; true])

matlabTable =

  3×3 table

    Var1    Var2    Var3 
    ____    ____    _____

     1      "A"     true 
     2      "B"     false
     3      "C"     true 

>> arrowTable = arrow.table(matlabTable)

arrowTable = 

Var1: double
Var2: string
Var3: bool
----
Var1:
  [
    [
      1,
      2,
      3
    ]
  ]
Var2:
  [
    [
      "A",
      "B",
      "C"
    ]
  ]
Var3:
  [
    [
      true,
      false,
      true
    ]
  ]

>> arrowTable.NumRows

ans =

  int64

   3

>> arrowTable.NumColumns

ans =

  int32

   3

>> arrowTable.ColumnNames

ans = 

  1×3 string array

    "Var1"    "Var2"    "Var3"

>> arrowTable.Schema

ans = 

Var1: double
Var2: string
Var3: bool

>> table(arrowTable)

ans =

  3×3 table

    Var1    Var2    Var3 
    ____    ____    _____

     1      "A"     true 
     2      "B"     false
     3      "C"     true 

>> isequal(ans, matlabTable)

ans =

  logical

   1
```

### Are these changes tested?

Yes.

1. Added a new `tTable` test class for `arrow.tabular.Table` and `arrow.table(<matlab-table>)` tests.

### Are there any user-facing changes?

Yes.

1. Users can now create `arrow.tabular.Table` objects using the `fromArrays` static construction method or the `arrow.table(<matlab-table>)` construction function.

### Future Directions

1. Create shared test infrastructure for common `RecordBatch` and `Table` MATLAB tests.
2. Implement equality check (i.e. `isequal`) for `arrow.tabular.Table` instances.
4. Add more static construction methods to `arrow.tabular.Table`. For example: `fromChunkedArrays(<chunkedArray-1>, ..., <chunkedArray-N>)` and `fromRecordBatches(<recordBatch-1>, ..., <recordBatch-N>)`.

### Notes

1. A lot of the code for `arrow.tabular.Table` is very similar to the code for `arrow.tabular.RecordBatch`. It may make sense for us to try to share more of the code using C++ templates or another approach.
2. Thank you @ sgilmore10 for your help with this pull request!
* Closes: apache#37571

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…ecordBatch` MATLAB class (apache#37627)

### Rationale for this change

Following on to apache#37474, apache#37446, and apache#37525, we should implement `isequal` for the `arrow.tabular.RecordBatch` MATLAB class.

### What changes are included in this PR?

1. Implemented `isequal` method for `arrow.tabular.RecordBatch`

### Are these changes tested?

Yes. Added `isequal` unit tests to `tRecordBatch.m`.

### Are there any user-facing changes?

Yes, users can now use `isequal` to compare `arrow.tabular.RecordBatch`es. 

**Example**

```matlab
>> t1 = table(1, "A", false, VariableNames=["Number",  "String", "Logical"]);
>> t2 = table([1; 2], ["A"; "B"], [false; false], VariableNames=["Number",  "String", "Logical"]); 
>> rb1 = arrow.recordBatch(t1);
>> rb2 = arrow.recordBatch(t2);
>> rb3 = arrow.recordBatch(t1);

>> isequal(rb1, rb2)

ans =

  logical

   0

>> isequal(rb1, rb3)

ans =

  logical

   1
```

### Future Directions
1. apache#37628

* Closes: apache#37570

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…able` MATLAB class (apache#37629)

### Rationale for this change

Following on to apache#37474, apache#37446, apache#37525,  and  apache#37627, we should implement `isequal` for the arrow.tabular.Table` MATLAB class.

### What changes are included in this PR?

1. Add new function `arrow.internal.tabular.isequal` that both `arrow.tabular.RecordBatch` and `arrow.tabular.Table` can use to implement their `isequal` methods.
2. Modified `arrow.tabular.RecordBatch` to use the new `isequal` package function to implement  its `isequal` method.
3. Implemented the `isequal` method for `arrow.tabular.Table` using the new `isequal` package function.

### Are these changes tested?

Yes, added `isequal` unit tests to `tTable.m`

### Are there any user-facing changes?

Yes. Users can now compare `arrow.tabular.Table`s using `isequal`:

```matlab
>> t1 = table(1, "A", false, VariableNames=["Number",  "String", "Logical"]);
>> t2 = table([1; 2], ["A"; "B"], [false; false], VariableNames=["Number",  "String", "Logical"]); 
>> tbl1 = arrow.table(t1);
>> tbl2 = arrow.table(t2);
>> tbl3 = arrow.table(t1);

>> isequal(tbl1, tbl2)

ans =

  logical

   0

>> isequal(tbl1, tbl3)

ans =

  logical

   1
```

* Closes: apache#37628

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…#37525)

### Rationale for this change

In order to add an `arrow.tabular.Table` class to the MATLAB Interface, we first need to add a MATLAB class representing `arrow::ChunkedArray`s. This is required because an `arrow::Table` is backed by a vector of `arrow::ChunkedArray`s, and the output of its `column(int index)` method is an `arrow::ChunkedArray`.

### What changes are included in this PR?

1. Introduced a new class called `arrow.array.ChunkedArray`. 
2. `arrow.array.ChunkedArray` has the following properties:
    1.  `Type` - datatype of the `arrow.array.Array`s
    2. `Length` - Sum of the `arrow.array.Array` lengths 
    3. `NumChunks` - Number of `arrow.array.Array`s
3. `arrow.array.ChunkedArray` has the following methods:
   1. `chunk(index)` - Returns the `arrow.array.Array` stored at the specified index
   2. `fromArrays(array1, array1, ..., arrayN, Type=type)` - Creates a `ChunkedArray` from the arrays provided. If `Type` is provided, all arrays are expected to have the specified `Type`.

**Example Usage**

```matlab
>> a1 = arrow.array(1:100);
>> a2 = arrow.array(101:250);
>> a3 = arrow.array(251:300);

% Create a ChunkedArray from 3 Float64Arrays
>> c = arrow.array.ChunkedArray.fromArrays(a1, a2, a3)

c = 

  ChunkedArray with properties:

         Type: [1×1 arrow.type.Float64Type]
    NumChunks: 3
       Length: 300

% Extract the first chunk and compare it to a1
>> c1 = c.chunk(1);
>> tf = isequal(c1, a1)

tf =

  logical

   1

% Create an empty ChunkedArray by providing the Type nv-pair
>> c = arrow.array.ChunkedArray.fromArrays(Type=arrow.timestamp())

c = 

  ChunkedArray with properties:

         Type: [1×1 arrow.type.TimestampType]
    NumChunks: 0
       Length: 0

```

### Are these changes tested?

Yes. I added a new test class called `tChunkedArray.m` that contains unit tests for the new class.

### Are there any user-facing changes?

Yes. Users can now create a `ChunkedArray` in the MATLAB Interface. 

### Future Directions

1. In this PR, we deliberately didn't include a convenience constructor function because we're not sure if we want users to create `ChunkedArray`s themselves. We think users will mostly use `ChunkedArray` when extracting columns from `Table`s. 
2. We will implement more methods on `ChunkedArray`, such as `flatten()` and `combineChunks()`, etc.
* Closes: apache#37448

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…d` MATLAB class (apache#37617)

### Rationale for this change

Following on to apache#37474, apache#37446, and apache#37525, we should implement `isequal` for the `arrow.type.Field` MATLAB class.

### What changes are included in this PR?

1. Implemented the `isequal` method for `arrow.type.Field`

### Are these changes tested?

Yes. Add new unit tests to `tField.m`

### Are there any user-facing changes?

Yes. Users can now call `isequal` on `arrow.type.Field`s to determine if two fields are equal.

**Example**
```matlab
>> f1 = arrow.field("A", arrow.time32(TimeUnit="Second"));
>> f2 = arrow.field("B", arrow.time32(TimeUnit="Second"));
>> f3 = arrow.field("A", arrow.time32(TimeUnit="Millisecond"));

>> isequal(f1, f1)

ans =

  logical

   1

% Name properties differ
>> isequal(f1, f2)

ans =

  logical

   0

% Type properties differ
>> isequal(f1, f3)

ans =

  logical

   0
```

### Future Directions

1. apache#37568
2. apache#37570

* Closes: apache#37569

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…chema` MATLAB class (apache#37619)

### Rationale for this change

Following on to apache#37474, apache#37446, and apache#37525, we should implement `isequal` for the `arrow.tabular.Schema` MATLAB class.

### What changes are included in this PR?

1. Updated `arrow.tabular.Schema` class to inherit from `matlab.mixin.Scalar`.
2. Added `isequal` method to `arrow.tabular.Schema`.

### Are these changes tested?

Yes. Added `isequal` unit tests to `tSchema.m`

### Are there any user-facing changes?

Yes. Users can now compare two `arrow.tabular.Schema` objects via `isequal`.

**Example**
```matlab
>> schema1 = arrow.schema([arrow.field("A", arrow.uint8), arrow.field("B", arrow.uint16)]);
>> schema2 = arrow.schema([arrow.field("A", arrow.uint8), arrow.field("B", arrow.uint16)]);
>> schema3 = arrow.schema([arrow.field("A", arrow.uint8)]);

>> isequal(schema1, schema2)

ans =

  logical

   1

>> isequal(schema1, schema3)

ans =

  logical

   0
```

### Future Directions
1. apache#37570 

* Closes: apache#37568

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…he#37620)

### Rationale for this change

Following on from apache#37525, which adds `arrow.array.ChunkedArray` to the MATLAB interface, this pull request adds support for a new `arrow.tabular.Table` MATLAB class.

This pull request is intended to be an initial implementation of `Table` support and does not include all methods or properties that may be useful on `arrow.tabular.Table`.

### What changes are included in this PR?

1. Added new `arrow.tabular.Table` MATLAB class.

**Properties**

* `NumRows`
* `NumColumns`
* `ColumnNames`
* `Schema`

**Methods**

* `fromArrays(<array-1>, ..., <array-N>)`
* `column(<index>)`
* `table()`
* `toMATLAB()`

**Example of `arrow.tabular.Table.fromArrays(<array_1>, ..., <array-N>)` static construction method**
```matlab
>> arrowTable = arrow.tabular.Table.fromArrays(arrow.array([1, 2, 3]), arrow.array(["A", "B", "C"]), arrow.array([true, false, true]))

arrowTable = 

Column1: double
Column2: string
Column3: bool
----
Column1:
  [
    [
      1,
      2,
      3
    ]
  ]
Column2:
  [
    [
      "A",
      "B",
      "C"
    ]
  ]
Column3:
  [
    [
      true,
      false,
      true
    ]
  ]

>> matlabTable = table(arrowTable)

matlabTable =

  3×3 table

    Column1    Column2    Column3
    _______    _______    _______

       1         "A"       true  
       2         "B"       false 
       3         "C"       true  
```

2. Added a new `arrow.table(<matlab-table>)` construction function which creates an `arrow.tabular.Table` from a MATLAB `table`. 

**Example of `arrow.table(<matlab-table>)` construction function**
```matlab
>> matlabTable = table([1; 2; 3], ["A"; "B"; "C"], [true; false; true])

matlabTable =

  3×3 table

    Var1    Var2    Var3 
    ____    ____    _____

     1      "A"     true 
     2      "B"     false
     3      "C"     true 

>> arrowTable = arrow.table(matlabTable)

arrowTable = 

Var1: double
Var2: string
Var3: bool
----
Var1:
  [
    [
      1,
      2,
      3
    ]
  ]
Var2:
  [
    [
      "A",
      "B",
      "C"
    ]
  ]
Var3:
  [
    [
      true,
      false,
      true
    ]
  ]

>> arrowTable.NumRows

ans =

  int64

   3

>> arrowTable.NumColumns

ans =

  int32

   3

>> arrowTable.ColumnNames

ans = 

  1×3 string array

    "Var1"    "Var2"    "Var3"

>> arrowTable.Schema

ans = 

Var1: double
Var2: string
Var3: bool

>> table(arrowTable)

ans =

  3×3 table

    Var1    Var2    Var3 
    ____    ____    _____

     1      "A"     true 
     2      "B"     false
     3      "C"     true 

>> isequal(ans, matlabTable)

ans =

  logical

   1
```

### Are these changes tested?

Yes.

1. Added a new `tTable` test class for `arrow.tabular.Table` and `arrow.table(<matlab-table>)` tests.

### Are there any user-facing changes?

Yes.

1. Users can now create `arrow.tabular.Table` objects using the `fromArrays` static construction method or the `arrow.table(<matlab-table>)` construction function.

### Future Directions

1. Create shared test infrastructure for common `RecordBatch` and `Table` MATLAB tests.
2. Implement equality check (i.e. `isequal`) for `arrow.tabular.Table` instances.
4. Add more static construction methods to `arrow.tabular.Table`. For example: `fromChunkedArrays(<chunkedArray-1>, ..., <chunkedArray-N>)` and `fromRecordBatches(<recordBatch-1>, ..., <recordBatch-N>)`.

### Notes

1. A lot of the code for `arrow.tabular.Table` is very similar to the code for `arrow.tabular.RecordBatch`. It may make sense for us to try to share more of the code using C++ templates or another approach.
2. Thank you @ sgilmore10 for your help with this pull request!
* Closes: apache#37571

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ecordBatch` MATLAB class (apache#37627)

### Rationale for this change

Following on to apache#37474, apache#37446, and apache#37525, we should implement `isequal` for the `arrow.tabular.RecordBatch` MATLAB class.

### What changes are included in this PR?

1. Implemented `isequal` method for `arrow.tabular.RecordBatch`

### Are these changes tested?

Yes. Added `isequal` unit tests to `tRecordBatch.m`.

### Are there any user-facing changes?

Yes, users can now use `isequal` to compare `arrow.tabular.RecordBatch`es. 

**Example**

```matlab
>> t1 = table(1, "A", false, VariableNames=["Number",  "String", "Logical"]);
>> t2 = table([1; 2], ["A"; "B"], [false; false], VariableNames=["Number",  "String", "Logical"]); 
>> rb1 = arrow.recordBatch(t1);
>> rb2 = arrow.recordBatch(t2);
>> rb3 = arrow.recordBatch(t1);

>> isequal(rb1, rb2)

ans =

  logical

   0

>> isequal(rb1, rb3)

ans =

  logical

   1
```

### Future Directions
1. apache#37628

* Closes: apache#37570

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…able` MATLAB class (apache#37629)

### Rationale for this change

Following on to apache#37474, apache#37446, apache#37525,  and  apache#37627, we should implement `isequal` for the arrow.tabular.Table` MATLAB class.

### What changes are included in this PR?

1. Add new function `arrow.internal.tabular.isequal` that both `arrow.tabular.RecordBatch` and `arrow.tabular.Table` can use to implement their `isequal` methods.
2. Modified `arrow.tabular.RecordBatch` to use the new `isequal` package function to implement  its `isequal` method.
3. Implemented the `isequal` method for `arrow.tabular.Table` using the new `isequal` package function.

### Are these changes tested?

Yes, added `isequal` unit tests to `tTable.m`

### Are there any user-facing changes?

Yes. Users can now compare `arrow.tabular.Table`s using `isequal`:

```matlab
>> t1 = table(1, "A", false, VariableNames=["Number",  "String", "Logical"]);
>> t2 = table([1; 2], ["A"; "B"], [false; false], VariableNames=["Number",  "String", "Logical"]); 
>> tbl1 = arrow.table(t1);
>> tbl2 = arrow.table(t2);
>> tbl3 = arrow.table(t1);

>> isequal(tbl1, tbl2)

ans =

  logical

   0

>> isequal(tbl1, tbl3)

ans =

  logical

   1
```

* Closes: apache#37628

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MATLAB] Add arrow.array.ChunkedArray class
2 participants