Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37571: [MATLAB] Add arrow.tabular.Table MATLAB class #37620

Merged
merged 8 commits into from
Sep 7, 2023

Conversation

kevingurney
Copy link
Member

@kevingurney kevingurney commented Sep 7, 2023

Rationale for this change

Following on from #37525, which adds arrow.array.ChunkedArray to the MATLAB interface, this pull request adds support for a new arrow.tabular.Table MATLAB class.

This pull request is intended to be an initial implementation of Table support and does not include all methods or properties that may be useful on arrow.tabular.Table.

What changes are included in this PR?

  1. Added new arrow.tabular.Table MATLAB class.

Properties

  • NumRows
  • NumColumns
  • ColumnNames
  • Schema

Methods

  • fromArrays(<array-1>, ..., <array-N>)
  • column(<index>)
  • table()
  • toMATLAB()

Example of arrow.tabular.Table.fromArrays(<array_1>, ..., <array-N>) static construction method

>> arrowTable = arrow.tabular.Table.fromArrays(arrow.array([1, 2, 3]), arrow.array(["A", "B", "C"]), arrow.array([true, false, true]))

arrowTable = 

Column1: double
Column2: string
Column3: bool
----
Column1:
  [
    [
      1,
      2,
      3
    ]
  ]
Column2:
  [
    [
      "A",
      "B",
      "C"
    ]
  ]
Column3:
  [
    [
      true,
      false,
      true
    ]
  ]

>> matlabTable = table(arrowTable)

matlabTable =

  3×3 table

    Column1    Column2    Column3
    _______    _______    _______

       1         "A"       true  
       2         "B"       false 
       3         "C"       true  
  1. Added a new arrow.table(<matlab-table>) construction function which creates an arrow.tabular.Table from a MATLAB table.

Example of arrow.table(<matlab-table>) construction function

>> matlabTable = table([1; 2; 3], ["A"; "B"; "C"], [true; false; true])

matlabTable =

  3×3 table

    Var1    Var2    Var3 
    ____    ____    _____

     1      "A"     true 
     2      "B"     false
     3      "C"     true 

>> arrowTable = arrow.table(matlabTable)

arrowTable = 

Var1: double
Var2: string
Var3: bool
----
Var1:
  [
    [
      1,
      2,
      3
    ]
  ]
Var2:
  [
    [
      "A",
      "B",
      "C"
    ]
  ]
Var3:
  [
    [
      true,
      false,
      true
    ]
  ]

>> arrowTable.NumRows

ans =

  int64

   3

>> arrowTable.NumColumns

ans =

  int32

   3

>> arrowTable.ColumnNames

ans = 

  1×3 string array

    "Var1"    "Var2"    "Var3"

>> arrowTable.Schema

ans = 

Var1: double
Var2: string
Var3: bool

>> table(arrowTable)

ans =

  3×3 table

    Var1    Var2    Var3 
    ____    ____    _____

     1      "A"     true 
     2      "B"     false
     3      "C"     true 

>> isequal(ans, matlabTable)

ans =

  logical

   1

Are these changes tested?

Yes.

  1. Added a new tTable test class for arrow.tabular.Table and arrow.table(<matlab-table>) tests.

Are there any user-facing changes?

Yes.

  1. Users can now create arrow.tabular.Table objects using the fromArrays static construction method or the arrow.table(<matlab-table>) construction function.

Future Directions

  1. Create shared test infrastructure for common RecordBatch and Table MATLAB tests.
  2. Implement equality check (i.e. isequal) for arrow.tabular.Table instances.
  3. Add more static construction methods to arrow.tabular.Table. For example: fromChunkedArrays(<chunkedArray-1>, ..., <chunkedArray-N>) and fromRecordBatches(<recordBatch-1>, ..., <recordBatch-N>).

Notes

  1. A lot of the code for arrow.tabular.Table is very similar to the code for arrow.tabular.RecordBatch. It may make sense for us to try to share more of the code using C++ templates or another approach.
  2. Thank you @sgilmore10 for your help with this pull request!

Copy link
Member

@sgilmore10 sgilmore10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 7, 2023
empty `arrow.tabular.Table`.

Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 7, 2023
@kevingurney
Copy link
Member Author

+1

@kevingurney kevingurney merged commit da602af into apache:main Sep 7, 2023
8 checks passed
@kevingurney kevingurney deleted the GH-37571 branch September 7, 2023 21:10
@kevingurney kevingurney removed the awaiting change review Awaiting change review label Sep 7, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit da602af.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…he#37620)

### Rationale for this change

Following on from apache#37525, which adds `arrow.array.ChunkedArray` to the MATLAB interface, this pull request adds support for a new `arrow.tabular.Table` MATLAB class.

This pull request is intended to be an initial implementation of `Table` support and does not include all methods or properties that may be useful on `arrow.tabular.Table`.

### What changes are included in this PR?

1. Added new `arrow.tabular.Table` MATLAB class.

**Properties**

* `NumRows`
* `NumColumns`
* `ColumnNames`
* `Schema`

**Methods**

* `fromArrays(<array-1>, ..., <array-N>)`
* `column(<index>)`
* `table()`
* `toMATLAB()`

**Example of `arrow.tabular.Table.fromArrays(<array_1>, ..., <array-N>)` static construction method**
```matlab
>> arrowTable = arrow.tabular.Table.fromArrays(arrow.array([1, 2, 3]), arrow.array(["A", "B", "C"]), arrow.array([true, false, true]))

arrowTable = 

Column1: double
Column2: string
Column3: bool
----
Column1:
  [
    [
      1,
      2,
      3
    ]
  ]
Column2:
  [
    [
      "A",
      "B",
      "C"
    ]
  ]
Column3:
  [
    [
      true,
      false,
      true
    ]
  ]

>> matlabTable = table(arrowTable)

matlabTable =

  3×3 table

    Column1    Column2    Column3
    _______    _______    _______

       1         "A"       true  
       2         "B"       false 
       3         "C"       true  
```

2. Added a new `arrow.table(<matlab-table>)` construction function which creates an `arrow.tabular.Table` from a MATLAB `table`. 

**Example of `arrow.table(<matlab-table>)` construction function**
```matlab
>> matlabTable = table([1; 2; 3], ["A"; "B"; "C"], [true; false; true])

matlabTable =

  3×3 table

    Var1    Var2    Var3 
    ____    ____    _____

     1      "A"     true 
     2      "B"     false
     3      "C"     true 

>> arrowTable = arrow.table(matlabTable)

arrowTable = 

Var1: double
Var2: string
Var3: bool
----
Var1:
  [
    [
      1,
      2,
      3
    ]
  ]
Var2:
  [
    [
      "A",
      "B",
      "C"
    ]
  ]
Var3:
  [
    [
      true,
      false,
      true
    ]
  ]

>> arrowTable.NumRows

ans =

  int64

   3

>> arrowTable.NumColumns

ans =

  int32

   3

>> arrowTable.ColumnNames

ans = 

  1×3 string array

    "Var1"    "Var2"    "Var3"

>> arrowTable.Schema

ans = 

Var1: double
Var2: string
Var3: bool

>> table(arrowTable)

ans =

  3×3 table

    Var1    Var2    Var3 
    ____    ____    _____

     1      "A"     true 
     2      "B"     false
     3      "C"     true 

>> isequal(ans, matlabTable)

ans =

  logical

   1
```

### Are these changes tested?

Yes.

1. Added a new `tTable` test class for `arrow.tabular.Table` and `arrow.table(<matlab-table>)` tests.

### Are there any user-facing changes?

Yes.

1. Users can now create `arrow.tabular.Table` objects using the `fromArrays` static construction method or the `arrow.table(<matlab-table>)` construction function.

### Future Directions

1. Create shared test infrastructure for common `RecordBatch` and `Table` MATLAB tests.
2. Implement equality check (i.e. `isequal`) for `arrow.tabular.Table` instances.
4. Add more static construction methods to `arrow.tabular.Table`. For example: `fromChunkedArrays(<chunkedArray-1>, ..., <chunkedArray-N>)` and `fromRecordBatches(<recordBatch-1>, ..., <recordBatch-N>)`.

### Notes

1. A lot of the code for `arrow.tabular.Table` is very similar to the code for `arrow.tabular.RecordBatch`. It may make sense for us to try to share more of the code using C++ templates or another approach.
2. Thank you @ sgilmore10 for your help with this pull request!
* Closes: apache#37571

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…he#37620)

### Rationale for this change

Following on from apache#37525, which adds `arrow.array.ChunkedArray` to the MATLAB interface, this pull request adds support for a new `arrow.tabular.Table` MATLAB class.

This pull request is intended to be an initial implementation of `Table` support and does not include all methods or properties that may be useful on `arrow.tabular.Table`.

### What changes are included in this PR?

1. Added new `arrow.tabular.Table` MATLAB class.

**Properties**

* `NumRows`
* `NumColumns`
* `ColumnNames`
* `Schema`

**Methods**

* `fromArrays(<array-1>, ..., <array-N>)`
* `column(<index>)`
* `table()`
* `toMATLAB()`

**Example of `arrow.tabular.Table.fromArrays(<array_1>, ..., <array-N>)` static construction method**
```matlab
>> arrowTable = arrow.tabular.Table.fromArrays(arrow.array([1, 2, 3]), arrow.array(["A", "B", "C"]), arrow.array([true, false, true]))

arrowTable = 

Column1: double
Column2: string
Column3: bool
----
Column1:
  [
    [
      1,
      2,
      3
    ]
  ]
Column2:
  [
    [
      "A",
      "B",
      "C"
    ]
  ]
Column3:
  [
    [
      true,
      false,
      true
    ]
  ]

>> matlabTable = table(arrowTable)

matlabTable =

  3×3 table

    Column1    Column2    Column3
    _______    _______    _______

       1         "A"       true  
       2         "B"       false 
       3         "C"       true  
```

2. Added a new `arrow.table(<matlab-table>)` construction function which creates an `arrow.tabular.Table` from a MATLAB `table`. 

**Example of `arrow.table(<matlab-table>)` construction function**
```matlab
>> matlabTable = table([1; 2; 3], ["A"; "B"; "C"], [true; false; true])

matlabTable =

  3×3 table

    Var1    Var2    Var3 
    ____    ____    _____

     1      "A"     true 
     2      "B"     false
     3      "C"     true 

>> arrowTable = arrow.table(matlabTable)

arrowTable = 

Var1: double
Var2: string
Var3: bool
----
Var1:
  [
    [
      1,
      2,
      3
    ]
  ]
Var2:
  [
    [
      "A",
      "B",
      "C"
    ]
  ]
Var3:
  [
    [
      true,
      false,
      true
    ]
  ]

>> arrowTable.NumRows

ans =

  int64

   3

>> arrowTable.NumColumns

ans =

  int32

   3

>> arrowTable.ColumnNames

ans = 

  1×3 string array

    "Var1"    "Var2"    "Var3"

>> arrowTable.Schema

ans = 

Var1: double
Var2: string
Var3: bool

>> table(arrowTable)

ans =

  3×3 table

    Var1    Var2    Var3 
    ____    ____    _____

     1      "A"     true 
     2      "B"     false
     3      "C"     true 

>> isequal(ans, matlabTable)

ans =

  logical

   1
```

### Are these changes tested?

Yes.

1. Added a new `tTable` test class for `arrow.tabular.Table` and `arrow.table(<matlab-table>)` tests.

### Are there any user-facing changes?

Yes.

1. Users can now create `arrow.tabular.Table` objects using the `fromArrays` static construction method or the `arrow.table(<matlab-table>)` construction function.

### Future Directions

1. Create shared test infrastructure for common `RecordBatch` and `Table` MATLAB tests.
2. Implement equality check (i.e. `isequal`) for `arrow.tabular.Table` instances.
4. Add more static construction methods to `arrow.tabular.Table`. For example: `fromChunkedArrays(<chunkedArray-1>, ..., <chunkedArray-N>)` and `fromRecordBatches(<recordBatch-1>, ..., <recordBatch-N>)`.

### Notes

1. A lot of the code for `arrow.tabular.Table` is very similar to the code for `arrow.tabular.RecordBatch`. It may make sense for us to try to share more of the code using C++ templates or another approach.
2. Thank you @ sgilmore10 for your help with this pull request!
* Closes: apache#37571

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MATLAB] Add arrow.tabular.Table MATLAB class
2 participants