-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MATLAB] Add utility which makes valid MATLAB table
variable names from an arbitrary list of strings
#37096
Comments
take |
kevingurney
added a commit
that referenced
this issue
Aug 10, 2023
…e names from an arbitrary list of strings (#37098) ### Rationale for this change To make it possible to safely convert Arrow Schema field names to corresponding MATLAB `table` variable names, it would be helpful to add a utility which can take an arbitrary list of strings and return a set of valid MATLAB `table` variable names, which are (1) unique, (2) non-empty, and (3) do not conflict with the "reserved" variable names "Properties", "VariableNames", "RowNames", and ":". An additional restriction is that variable names must have 63 or less characters. ### What changes are included in this PR? 1. Added a new function called `arrow.tabular.internal.makeValidVariableNames` that accepts an arbitrary list of strings and returns valid MATLAB `table` variable names. ```matlab >> originalVarNames = ["", "Properties", ":", "ValidVar", "ValidVar"]; >> validVarNames = arrow.tabular.internal.makeValidVariableNames(originalVarNames) validVarNames = 1×5 string array "Var1" "Properties_1" ":_1" "ValidVar" "ValidVar_1" ``` 3. Added a new function called `arrow.tabular.internal.makeValidDimensionNames` that returns valid table dimension names with respect to a list of valid variable names. In MATLAB the default `table` dimension names are `"Row"` and `"Variables"`, but they must not conflict with any variables names. In other words, they must be unique with respect to the variable names. ```matlab >> validVarNames = ["Row" "Test" "Variables"]; >> validDimNames = arrow.tabular.internal.makeValidDimensionNames(validVarNames) validDimNames = 1×2 string array "Row_1" "Variables_1" ``` To summarize, MATLAB `table`s cannot have arbitrary variable names. For example, `"Properties"`, `"RowNames"`, `"VariableNames"`, and `":"` are all disallowed. Variable names must also be unique and must be between 1 and 63 characters in length. They also must be unique with respect to each other. ### Are these changes tested? Yes. Added the following new test classes: 1. `tMakeValidVariableNames.m` 2. `tMakeValidDimensionNames.m` ### Are there any user-facing changes? No. ### Future Directions 1. In a follow-up PR, we will integrate `makeValidVariableNames` and `makeValidDimensionNames` into the `table()` and `toMATLAB()` methods of `arrow.tabular.RecordBatch`. ### Notes Thanks to @ kevingurney for help writing the test cases! * Closes: #37096 Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Co-authored-by: Kevin Gurney <kgurney@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne
pushed a commit
to loicalleyne/arrow
that referenced
this issue
Nov 13, 2023
…ariable names from an arbitrary list of strings (apache#37098) ### Rationale for this change To make it possible to safely convert Arrow Schema field names to corresponding MATLAB `table` variable names, it would be helpful to add a utility which can take an arbitrary list of strings and return a set of valid MATLAB `table` variable names, which are (1) unique, (2) non-empty, and (3) do not conflict with the "reserved" variable names "Properties", "VariableNames", "RowNames", and ":". An additional restriction is that variable names must have 63 or less characters. ### What changes are included in this PR? 1. Added a new function called `arrow.tabular.internal.makeValidVariableNames` that accepts an arbitrary list of strings and returns valid MATLAB `table` variable names. ```matlab >> originalVarNames = ["", "Properties", ":", "ValidVar", "ValidVar"]; >> validVarNames = arrow.tabular.internal.makeValidVariableNames(originalVarNames) validVarNames = 1×5 string array "Var1" "Properties_1" ":_1" "ValidVar" "ValidVar_1" ``` 3. Added a new function called `arrow.tabular.internal.makeValidDimensionNames` that returns valid table dimension names with respect to a list of valid variable names. In MATLAB the default `table` dimension names are `"Row"` and `"Variables"`, but they must not conflict with any variables names. In other words, they must be unique with respect to the variable names. ```matlab >> validVarNames = ["Row" "Test" "Variables"]; >> validDimNames = arrow.tabular.internal.makeValidDimensionNames(validVarNames) validDimNames = 1×2 string array "Row_1" "Variables_1" ``` To summarize, MATLAB `table`s cannot have arbitrary variable names. For example, `"Properties"`, `"RowNames"`, `"VariableNames"`, and `":"` are all disallowed. Variable names must also be unique and must be between 1 and 63 characters in length. They also must be unique with respect to each other. ### Are these changes tested? Yes. Added the following new test classes: 1. `tMakeValidVariableNames.m` 2. `tMakeValidDimensionNames.m` ### Are there any user-facing changes? No. ### Future Directions 1. In a follow-up PR, we will integrate `makeValidVariableNames` and `makeValidDimensionNames` into the `table()` and `toMATLAB()` methods of `arrow.tabular.RecordBatch`. ### Notes Thanks to @ kevingurney for help writing the test cases! * Closes: apache#37096 Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Co-authored-by: Kevin Gurney <kgurney@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
To make it possible to safely convert Arrow
Schema
field names to corresponding MATLABtable
variable names, it would be helpful to add a utility which can take an arbitrary list of strings and return a set of valid MATLABtable
variable names, which are (1) unique, (2) non-empty, and (3) do not conflict with the "reserved" variable names "Properties", "VariableNames", "RowNames", and ":".Component(s)
MATLAB
The text was updated successfully, but these errors were encountered: