Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Add utility which makes valid MATLAB table variable names from an arbitrary list of strings #37096

Closed
kevingurney opened this issue Aug 9, 2023 · 1 comment · Fixed by #37098

Comments

@kevingurney
Copy link
Member

Describe the enhancement requested

To make it possible to safely convert Arrow Schema field names to corresponding MATLAB table variable names, it would be helpful to add a utility which can take an arbitrary list of strings and return a set of valid MATLAB table variable names, which are (1) unique, (2) non-empty, and (3) do not conflict with the "reserved" variable names "Properties", "VariableNames", "RowNames", and ":".

Component(s)

MATLAB

@sgilmore10
Copy link
Member

take

kevingurney added a commit that referenced this issue Aug 10, 2023
…e names from an arbitrary list of strings (#37098)

### Rationale for this change

To make it possible to safely convert Arrow Schema field names to corresponding MATLAB `table` variable names, it would be helpful to add a utility which can take an arbitrary list of strings and return a set of valid MATLAB `table` variable names, which are (1) unique, (2) non-empty, and (3) do not conflict with the "reserved" variable names "Properties", "VariableNames", "RowNames", and ":". An additional restriction is that variable names must have 63 or less characters. 

### What changes are included in this PR?

1. Added a new function called `arrow.tabular.internal.makeValidVariableNames` that accepts an arbitrary list of strings and returns valid MATLAB `table` variable names.

```matlab
>> originalVarNames = ["", "Properties", ":", "ValidVar", "ValidVar"];
>> validVarNames = arrow.tabular.internal.makeValidVariableNames(originalVarNames)

validVarNames = 

  1×5 string array

    "Var1"    "Properties_1"    ":_1"    "ValidVar"    "ValidVar_1"
```

3. Added a new function called `arrow.tabular.internal.makeValidDimensionNames` that returns valid table dimension names with respect to a list of valid variable names. In MATLAB the default `table` dimension names are `"Row"` and `"Variables"`, but they must not conflict with any variables names. In other words, they must be unique with respect to the variable names. 

```matlab
>> validVarNames = ["Row" "Test" "Variables"];
>> validDimNames = arrow.tabular.internal.makeValidDimensionNames(validVarNames)

validDimNames = 

  1×2 string array

    "Row_1"    "Variables_1"
```

To summarize, MATLAB `table`s cannot have arbitrary variable names. For example, `"Properties"`, `"RowNames"`, `"VariableNames"`, and `":"` are all disallowed. Variable names must also be unique and must be between 1 and 63 characters in length. They also must be unique with respect to each other.

### Are these changes tested?

Yes. Added the following new test classes:

1. `tMakeValidVariableNames.m`
2. `tMakeValidDimensionNames.m`

### Are there any user-facing changes?

No.

### Future Directions

1. In a follow-up PR, we will integrate `makeValidVariableNames` and `makeValidDimensionNames` into the `table()` and `toMATLAB()` methods of `arrow.tabular.RecordBatch`.

### Notes

Thanks to @ kevingurney for help writing the test cases!

* Closes: #37096

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
@kevingurney kevingurney added this to the 14.0.0 milestone Aug 10, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…ariable names from an arbitrary list of strings (apache#37098)

### Rationale for this change

To make it possible to safely convert Arrow Schema field names to corresponding MATLAB `table` variable names, it would be helpful to add a utility which can take an arbitrary list of strings and return a set of valid MATLAB `table` variable names, which are (1) unique, (2) non-empty, and (3) do not conflict with the "reserved" variable names "Properties", "VariableNames", "RowNames", and ":". An additional restriction is that variable names must have 63 or less characters. 

### What changes are included in this PR?

1. Added a new function called `arrow.tabular.internal.makeValidVariableNames` that accepts an arbitrary list of strings and returns valid MATLAB `table` variable names.

```matlab
>> originalVarNames = ["", "Properties", ":", "ValidVar", "ValidVar"];
>> validVarNames = arrow.tabular.internal.makeValidVariableNames(originalVarNames)

validVarNames = 

  1×5 string array

    "Var1"    "Properties_1"    ":_1"    "ValidVar"    "ValidVar_1"
```

3. Added a new function called `arrow.tabular.internal.makeValidDimensionNames` that returns valid table dimension names with respect to a list of valid variable names. In MATLAB the default `table` dimension names are `"Row"` and `"Variables"`, but they must not conflict with any variables names. In other words, they must be unique with respect to the variable names. 

```matlab
>> validVarNames = ["Row" "Test" "Variables"];
>> validDimNames = arrow.tabular.internal.makeValidDimensionNames(validVarNames)

validDimNames = 

  1×2 string array

    "Row_1"    "Variables_1"
```

To summarize, MATLAB `table`s cannot have arbitrary variable names. For example, `"Properties"`, `"RowNames"`, `"VariableNames"`, and `":"` are all disallowed. Variable names must also be unique and must be between 1 and 63 characters in length. They also must be unique with respect to each other.

### Are these changes tested?

Yes. Added the following new test classes:

1. `tMakeValidVariableNames.m`
2. `tMakeValidDimensionNames.m`

### Are there any user-facing changes?

No.

### Future Directions

1. In a follow-up PR, we will integrate `makeValidVariableNames` and `makeValidDimensionNames` into the `table()` and `toMATLAB()` methods of `arrow.tabular.RecordBatch`.

### Notes

Thanks to @ kevingurney for help writing the test cases!

* Closes: apache#37096

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment