Skip to content

Commit

Permalink
apacheGH-37592: [MATLAB] Add NumRows property to `arrow.tabular.Rec…
Browse files Browse the repository at this point in the history
…ordBatch` (apache#38215)

### Rationale for this change

Currently, there is a `NumColumns` property on `arrow.tabular.RecordBatch`, but no `NumRows` property. It would be useful to be able to query the number of rows in a `RecordBatch`.

This pull request adds a `NumRows` property to `arrow.tabular.RecordBatch` to mirror the design of `arrow.tabular.Table`.

### What changes are included in this PR?

1. Added new `NumRows` property to `arrow.tabular.RecordBatch`

**Example**
```matlab
>> matlabTable = array2table(rand(10, 5))           

matlabTable =

  10x5 table

      Var1        Var2       Var3       Var4        Var5  
    ________    ________    _______    _______    ________

     0.76062     0.12009    0.98898    0.29974     0.42165
     0.64994     0.85116    0.71768    0.58693     0.31061
     0.33593     0.87823    0.87766    0.38206     0.45742
    0.031364      0.8336    0.71528    0.14987      0.3618
      0.5986     0.81193    0.25784    0.21073     0.76715
     0.46493     0.40281    0.39729    0.16737     0.94521
     0.18738     0.16351    0.46437    0.45545     0.40774
     0.67682      0.3577    0.94882     0.1295    0.022501
     0.29368     0.47122    0.99682    0.46011     0.34275
      0.6849    0.064717    0.89719    0.38302      0.4523

>> arrowRecordBatch = arrow.recordBatch(matlabTable);

>> arrowRecordBatch.NumRows

ans =

  int64

   10
```

### Are these changes tested?

Yes.

1. Added `NumRows` test to `tRecordBatch` test class.
3. Updated `EmptyTable` test (renamed to `EmptyRecordBatch`) in `tRecordBatch` test class.
4. Added  `FromArraysNoInputs` test to mirror the `FromArraysNoInputs` test in `tTable` test class.

### Are there any user-facing changes?

Yes.

This pull request adds a new public `NumRows` property to the `arrow.tabular.RecordBatch` class. Users can query the number of rows in an `arrow.tabular.RecordBatch` by accessing the `NumRows` property.

### Future Directions

1. apache#38214
3. apache#38213 
* Closes: apache#37592

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
  • Loading branch information
kevingurney authored and Jeremy Aguilon committed Oct 23, 2023
1 parent 57eb420 commit 7567717
Show file tree
Hide file tree
Showing 4 changed files with 58 additions and 5 deletions.
9 changes: 9 additions & 0 deletions matlab/src/cpp/arrow/matlab/tabular/proxy/record_batch.cc
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ namespace arrow::matlab::tabular::proxy {

RecordBatch::RecordBatch(std::shared_ptr<arrow::RecordBatch> record_batch) : record_batch{record_batch} {
REGISTER_METHOD(RecordBatch, toString);
REGISTER_METHOD(RecordBatch, getNumRows);
REGISTER_METHOD(RecordBatch, getNumColumns);
REGISTER_METHOD(RecordBatch, getColumnNames);
REGISTER_METHOD(RecordBatch, getColumnByIndex);
Expand Down Expand Up @@ -104,6 +105,14 @@ namespace arrow::matlab::tabular::proxy {
return record_batch_proxy;
}

void RecordBatch::getNumRows(libmexclass::proxy::method::Context& context) {
namespace mda = ::matlab::data;
mda::ArrayFactory factory;
const auto num_rows = record_batch->num_rows();
auto num_rows_mda = factory.createScalar(num_rows);
context.outputs[0] = num_rows_mda;
}

void RecordBatch::getNumColumns(libmexclass::proxy::method::Context& context) {
namespace mda = ::matlab::data;
mda::ArrayFactory factory;
Expand Down
1 change: 1 addition & 0 deletions matlab/src/cpp/arrow/matlab/tabular/proxy/record_batch.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ namespace arrow::matlab::tabular::proxy {

protected:
void toString(libmexclass::proxy::method::Context& context);
void getNumRows(libmexclass::proxy::method::Context& context);
void getNumColumns(libmexclass::proxy::method::Context& context);
void getColumnNames(libmexclass::proxy::method::Context& context);
void getColumnByIndex(libmexclass::proxy::method::Context& context);
Expand Down
5 changes: 5 additions & 0 deletions matlab/src/matlab/+arrow/+tabular/RecordBatch.m
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
matlab.mixin.Scalar

properties (Dependent, SetAccess=private, GetAccess=public)
NumRows
NumColumns
ColumnNames
Schema
Expand All @@ -39,6 +40,10 @@
obj.Proxy = proxy;
end

function numRows = get.NumRows(obj)
numRows = obj.Proxy.getNumRows();
end

function numColumns = get.NumColumns(obj)
numColumns = obj.Proxy.getNumColumns();
end
Expand Down
48 changes: 43 additions & 5 deletions matlab/test/arrow/tabular/tRecordBatch.m
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,19 @@ function ColumnNames(tc)
tc.verifyEqual(arrowRecordBatch.ColumnNames, columnNames);
end

function NumRows(testCase)
% Verify that the NumRows property of arrow.tabular.RecordBatch
% returns the expected number of rows.
numRows = int64([1, 5, 100]);

for expectedNumRows = numRows
matlabTable = array2table(ones(expectedNumRows, 1));
arrowRecordBatch = arrow.recordBatch(matlabTable);
testCase.verifyEqual(arrowRecordBatch.NumRows, expectedNumRows);
end

end

function NumColumns(tc)
numColumns = int32([1, 5, 100]);

Expand All @@ -84,11 +97,27 @@ function UnicodeColumnNames(tc)
tc.verifyEqual(TOriginal, TConverted);
end

function EmptyTable(tc)
TOriginal = table();
arrowRecordBatch = arrow.recordBatch(TOriginal);
TConverted = arrowRecordBatch.toMATLAB();
tc.verifyEqual(TOriginal, TConverted);
function EmptyRecordBatch(testCase)
% Verify that an arrow.tabular.RecordBatch can be created from
% an empty MATLAB table.
matlabTable = table.empty(0, 0);
arrowRecordBatch = arrow.recordBatch(matlabTable);
testCase.verifyEqual(arrowRecordBatch.NumRows, int64(0));
testCase.verifyEqual(arrowRecordBatch.NumColumns, int32(0));
testCase.verifyEqual(arrowRecordBatch.ColumnNames, string.empty(1, 0));
testCase.verifyEqual(toMATLAB(arrowRecordBatch), matlabTable);

matlabTable = table.empty(1, 0);
arrowRecordBatch = arrow.recordBatch(matlabTable);
testCase.verifyEqual(arrowRecordBatch.NumRows, int64(0));
testCase.verifyEqual(arrowRecordBatch.NumColumns, int32(0));
testCase.verifyEqual(arrowRecordBatch.ColumnNames, string.empty(1, 0));

matlabTable = table.empty(0, 1);
arrowRecordBatch = arrow.recordBatch(matlabTable);
testCase.verifyEqual(arrowRecordBatch.NumRows, int64(0));
testCase.verifyEqual(arrowRecordBatch.NumColumns, int32(1));
testCase.verifyEqual(arrowRecordBatch.ColumnNames, "Var1");
end

function EmptyRecordBatchColumnIndexError(tc)
Expand Down Expand Up @@ -196,6 +225,15 @@ function FromArraysColumnNamesHasMissingString(tc)
tc.verifyError(fcn, "MATLAB:validators:mustBeNonmissing");
end

function FromArraysNoInputs(testCase)
% Verify that an empty RecordBatch is returned when calling
% fromArrays with no input arguments.
arrowRecordBatch = arrow.tabular.RecordBatch.fromArrays();
testCase.verifyEqual(arrowRecordBatch.NumRows, int64(0));
testCase.verifyEqual(arrowRecordBatch.NumColumns, int32(0));
testCase.verifyEqual(arrowRecordBatch.ColumnNames, string.empty(1, 0));
end

function Schema(tc)
% Verify that the public Schema property returns an approprate
% instance of arrow.tabular.Schema.
Expand Down

0 comments on commit 7567717

Please sign in to comment.