Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Support creating arrow.tabular.RecordBatch instances from a list of arrow.array.Array values #37175

Closed
sgilmore10 opened this issue Aug 15, 2023 · 1 comment · Fixed by #37176

Comments

@sgilmore10
Copy link
Member

sgilmore10 commented Aug 15, 2023

Describe the enhancement requested

Right now, the only way to construct an arrow.tabular.RecordBatch is from a MATLAB table:

>> t = table([1; 2; 3], ["A"; "B"; "C"], VariableNames=["Numbers", "Letters"]);

t =

  3×2 table

    Numbers    Letters
    _______    _______

       1         "A"  
       2         "B"  
       3         "C"  

>> rb = arrow.recordbatch(t)

rb = 

Numbers:   [
    1,
    2,
    3
  ]
Letters:   [
    "A",
    "B",
    "C"
  ]

The interface should also support creating arrow.tabular.RecordBatch instances from lists of arrow.array.Array values. To do this, we should add a static function called fromArrays to arrow.tabular.RecordBatch:

>> a1 = arrow.array([1; 2; 3]);
>> a2 =  arrow.array(["A"; "B"; "C"]);
>> rb = arrow.tabular.RecordBatch.fromArrays(a2, a2, ColumnNames=["Numbers", "Letters"])

rb = 

Numbers:   [
    "A",
    "B",
    "C"
  ]
Letters:   [
    "A",
    "B",
    "C"
  ]

Component(s)

MATLAB

@sgilmore10
Copy link
Member Author

take

kevingurney pushed a commit that referenced this issue Aug 15, 2023
…nces from a list of `arrow.array.Array` values (#37176)

### Rationale for this change

Right now, the only way to construct an `arrow.tabular.RecordBatch` is from a MATLAB `table`:

```matlab
>> t = table([1; 2; 3], ["A"; "B"; "C"], VariableNames=["Numbers", "Letters"]);

t =

  3×2 table

    Numbers    Letters
    _______    _______

       1         "A"  
       2         "B"  
       3         "C"  

>> rb = arrow.recordbatch(t)

rb = 

Numbers:   [
    1,
    2,
    3
  ]
Letters:   [
    "A",
    "B",
    "C"
  ]
```

The interface should also support creating `arrow.tabular.RecordBatch` instances from lists of `arrow.array.Array` values.

### What changes are included in this PR?

Added a new static method to `arrow.tabular.RecordBatch` called `fromArrays`. This method accepts a comma-separated list of `arrow.array.Array` values which it uses to construct an  `arrow.tabular.RecordBatch`. It also accepts an optional name-value pair called `ColumnNames`, which can be used to specify the column names in the record batch. If this name-value pair is not supplied, the column names default to `"Column1"`, `"Column2"`, etc.

**Example Usage:**
```matlab
>> a1 = arrow.array([1, 2, 3]);
>> a2 = arrow.array(["A", "B", "C"]);

>> rb1 = arrow.tabular.RecordBatch.fromArrays(a1, a2)

rb1 = 

Column1:   [
    1,
    2,
    3
  ]
Column2:   [
    "A",
    "B",
    "C"
  ]

>> rb2 = arrow.tabular.RecordBatch.fromArrays(a1, a2, ColumnNames=["Numbers", "Letters"])

rb2 = 

Numbers:   [
    1,
    2,
    3
  ]
Letters:   [
    "A",
    "B",
    "C"
  ]
```

### Are these changes tested?

Yes.

1. Added new test class `arrow/test/tabular/tValidateArrayLengths.m`
2. Added new test class `arrow/test/tabular/tValidateColumnNames.m`
3. Added new test cases to `arrow/test/tabular/tRecordBatch.m`

### Are there any user-facing changes?

Yes, users can now create `arrow.tabular.RecordBatch` instances using the static method `arrow.tabular.RecordBatch.fromArrays`.

* Closes: #37175

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
@kevingurney kevingurney added this to the 14.0.0 milestone Aug 15, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
… instances from a list of `arrow.array.Array` values (apache#37176)

### Rationale for this change

Right now, the only way to construct an `arrow.tabular.RecordBatch` is from a MATLAB `table`:

```matlab
>> t = table([1; 2; 3], ["A"; "B"; "C"], VariableNames=["Numbers", "Letters"]);

t =

  3×2 table

    Numbers    Letters
    _______    _______

       1         "A"  
       2         "B"  
       3         "C"  

>> rb = arrow.recordbatch(t)

rb = 

Numbers:   [
    1,
    2,
    3
  ]
Letters:   [
    "A",
    "B",
    "C"
  ]
```

The interface should also support creating `arrow.tabular.RecordBatch` instances from lists of `arrow.array.Array` values.

### What changes are included in this PR?

Added a new static method to `arrow.tabular.RecordBatch` called `fromArrays`. This method accepts a comma-separated list of `arrow.array.Array` values which it uses to construct an  `arrow.tabular.RecordBatch`. It also accepts an optional name-value pair called `ColumnNames`, which can be used to specify the column names in the record batch. If this name-value pair is not supplied, the column names default to `"Column1"`, `"Column2"`, etc.

**Example Usage:**
```matlab
>> a1 = arrow.array([1, 2, 3]);
>> a2 = arrow.array(["A", "B", "C"]);

>> rb1 = arrow.tabular.RecordBatch.fromArrays(a1, a2)

rb1 = 

Column1:   [
    1,
    2,
    3
  ]
Column2:   [
    "A",
    "B",
    "C"
  ]

>> rb2 = arrow.tabular.RecordBatch.fromArrays(a1, a2, ColumnNames=["Numbers", "Letters"])

rb2 = 

Numbers:   [
    1,
    2,
    3
  ]
Letters:   [
    "A",
    "B",
    "C"
  ]
```

### Are these changes tested?

Yes.

1. Added new test class `arrow/test/tabular/tValidateArrayLengths.m`
2. Added new test class `arrow/test/tabular/tValidateColumnNames.m`
3. Added new test cases to `arrow/test/tabular/tRecordBatch.m`

### Are there any user-facing changes?

Yes, users can now create `arrow.tabular.RecordBatch` instances using the static method `arrow.tabular.RecordBatch.fromArrays`.

* Closes: apache#37175

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment