Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Use makeValidVariableNames and makeValidDimensionNames in implementation of table method for RecordBatch #37151

Closed
kevingurney opened this issue Aug 14, 2023 · 0 comments · Fixed by #37152

Comments

@kevingurney
Copy link
Member

Describe the enhancement requested

#37098 added utilities for making valid MATLAB table VariableNames (makeValidVariableNames) and DimensionNames (makeValidDimensionNames) from an arbitrary list of strings.

We can now use these utilities to ensure that RecordBatch field names are converted to valid MATLAB table VariableNames/DimensionNames when calling the table/toMATLAB methods.

Component(s)

MATLAB

@kevingurney kevingurney self-assigned this Aug 14, 2023
kevingurney added a commit that referenced this issue Aug 14, 2023
…onNames` in implementation of `table` method for `RecordBatch` (#37152)

### Rationale for this change

#37098 added utilities for making valid MATLAB `table` `VariableNames` (`makeValidVariableNames`) and `DimensionNames` (`makeValidDimensionNames`) from an arbitrary list of strings.

This pull request uses these utilities to ensure that `RecordBatch` field names are converted to valid MATLAB `table` `VariableNames`/`DimensionNames` when calling the `table`/`toMATLAB` methods.

### What changes are included in this PR?

1. Used `makeValidVariableNames` and `makeValidDimensionNames` utilities in the implementation of the `table` method for the `RecordBatch` class to ensure that column names are converted to valid MATLAB `table` `VariableNames`/`DimensionNames`.

### Are these changes tested?

Yes.

1. The existing tests in `arrow/matlab/test/arrow/tabular/tMakeValidVariableNames.m` and `arrow/matlab/test/arrow/tabular/tMakeValidDimensionNames.m` already test that an arbitrary list of column names are converted into valid MATLAB `table` `VariableNames`/`DimensionNames`.
2. There is currently no straightforward way to create a `RecordBatch` with field names that are invalid MATLAB `VariableNames`/`DimensionNames` using the MATLAB interface. When we have a way to do this in the MATLAB interface, we can add more "integration" tests which verify that a `RecordBatch` with field names that are invalid MATLAB `table` `VariableNames`/`DimensionNames` are converted into valid `VariableNames`/`DimensionNames` when the `table`/`toMATLAB` methods are called.

### Are there any user-facing changes?

Yes.

A `RecordBatch` with field names that are invalid MATLAB `table` `VariableNames`/`DimensionNames` will now be converted into a MATLAB `table` with valid `VariableNames`/`DimensionNames` when calling the `table`/`toMATLAB` methods.

### Future Directions

1. #37046 (this change to the `table` method of the `RecordBatch` class should help preserve the behavior of storing original `RecordBatch` field names in the `VariableDescriptions` property of the output MATLAB `table` returned by `featherread`).
* Closes: #37151

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
@kevingurney kevingurney added this to the 14.0.0 milestone Aug 14, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…imensionNames` in implementation of `table` method for `RecordBatch` (apache#37152)

### Rationale for this change

apache#37098 added utilities for making valid MATLAB `table` `VariableNames` (`makeValidVariableNames`) and `DimensionNames` (`makeValidDimensionNames`) from an arbitrary list of strings.

This pull request uses these utilities to ensure that `RecordBatch` field names are converted to valid MATLAB `table` `VariableNames`/`DimensionNames` when calling the `table`/`toMATLAB` methods.

### What changes are included in this PR?

1. Used `makeValidVariableNames` and `makeValidDimensionNames` utilities in the implementation of the `table` method for the `RecordBatch` class to ensure that column names are converted to valid MATLAB `table` `VariableNames`/`DimensionNames`.

### Are these changes tested?

Yes.

1. The existing tests in `arrow/matlab/test/arrow/tabular/tMakeValidVariableNames.m` and `arrow/matlab/test/arrow/tabular/tMakeValidDimensionNames.m` already test that an arbitrary list of column names are converted into valid MATLAB `table` `VariableNames`/`DimensionNames`.
2. There is currently no straightforward way to create a `RecordBatch` with field names that are invalid MATLAB `VariableNames`/`DimensionNames` using the MATLAB interface. When we have a way to do this in the MATLAB interface, we can add more "integration" tests which verify that a `RecordBatch` with field names that are invalid MATLAB `table` `VariableNames`/`DimensionNames` are converted into valid `VariableNames`/`DimensionNames` when the `table`/`toMATLAB` methods are called.

### Are there any user-facing changes?

Yes.

A `RecordBatch` with field names that are invalid MATLAB `table` `VariableNames`/`DimensionNames` will now be converted into a MATLAB `table` with valid `VariableNames`/`DimensionNames` when calling the `table`/`toMATLAB` methods.

### Future Directions

1. apache#37046 (this change to the `table` method of the `RecordBatch` class should help preserve the behavior of storing original `RecordBatch` field names in the `VariableDescriptions` property of the output MATLAB `table` returned by `featherread`).
* Closes: apache#37151

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment