-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38419: [MATLAB] Implement a ClassTypeValidator
class that validates a MATLAB cell
array contains only values of the same class type.
#38530
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
ClassTypeValidator
class that validates a MATLAB cell
array contains only values of the same class type. ClassTypeValidator
class that validates a MATLAB cell
array contains only values of the same class type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for the pull request!
matlab/src/matlab/+arrow/+array/+internal/+list/ListTypeValidator.m
Outdated
Show resolved
Hide resolved
+1 |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit aaf01e8. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…validates a MATLAB `cell` array contains only values of the same class type. (apache#38530) ### Rationale for this change Adding this `ClassTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is either a numeric, boolean, string, time32, or time64 array from a MATLAB `cell` array. ### What changes are included in this PR? Added an abstract class `arrow.array.internal.list.ListTypeValidator` that defines three abstract methods: 1. `validateElement(obj, element)` 2. `length = getElementLength(obj, element)` 3. `C = reshapeCellElements(obj, C)` These abstract methods will be used in `ListArray.fromMATLAB` to create `ListArray`s from MATLAB `cell` arrays. Below is a "pared-down" version of how the `fromMATLAB` algorithm will work: ```matlab function listArray = fromMATLAB(C) % Create the appropriate ListTypeValidator from the % first element in the cell array C validator = createListTypeValidator(C{1}); % Pre-allocate a uint32 vector for the offsets numRows = numel(C); offsets = zeros([numRows 1], "int32"); for ii = 1:numRows cellElement = C{ii}; % Validate cellElement can be used to create % one row in the ListArray, i.e. For example, % if the first element in C was a double, verify % cellElement is also a double. validator.validateElement(cellElement); % Determine how much to increment the % last offset value by to set the offset at index ii + 1. length = validator.getElementLEngth(cellElement); offsets[ii + 1] = length + offsets[i]; end % Reshape the elements in cell array C so that they % can be vertically concatenated. C = validator.reshapeCellElements(C); % Extract the cell array elements and vertically concatenate % them into one array. Then pass this array to arrow.array(). values = vertcat(C{:}); valueArray = arrow.array(values); % Create an Int32Array from offsets offsetArray = arrow.array(offsets); listArray = arrow.array.ListArray(Values=valueArray, Offsets=offsetArray) end ``` The concrete type of the `validator` object is created based on the first element in the `cell` array `C`. We use the first element to determine what kind of `ListArray` to construct from the input `cell` array. -- Added a concrete class called `arrow.array.internal.list.ClassTypeValidator`, which inherits from `arrow.array.internal.list.ListTypeValidator`: 1. `validateElement(obj, element)` - Throws an error if the element's class type does not match the expected value. 2. `length = getElementLength(obj, element)` - Returns the number of elements in the input array. 3. `C = reshapeCellElements(obj, C)` - Reshapes all elements in the `cell` array `C` to be column vectors. `ClassTypeValidator` will be used when creating `ListArray`s from MATLAB `cell` arrays containing "primitive types", such as numerics, strings, and durations. ### Are these changes tested? Yes. I added a new class called `tClassTypeValidator.m`. ### Are there any user-facing changes? No. ### Future Directions 1. apache#38420 2. apache#38417 3. apache#38354 * Closes: apache#38419 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
…validates a MATLAB `cell` array contains only values of the same class type. (apache#38530) ### Rationale for this change Adding this `ClassTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is either a numeric, boolean, string, time32, or time64 array from a MATLAB `cell` array. ### What changes are included in this PR? Added an abstract class `arrow.array.internal.list.ListTypeValidator` that defines three abstract methods: 1. `validateElement(obj, element)` 2. `length = getElementLength(obj, element)` 3. `C = reshapeCellElements(obj, C)` These abstract methods will be used in `ListArray.fromMATLAB` to create `ListArray`s from MATLAB `cell` arrays. Below is a "pared-down" version of how the `fromMATLAB` algorithm will work: ```matlab function listArray = fromMATLAB(C) % Create the appropriate ListTypeValidator from the % first element in the cell array C validator = createListTypeValidator(C{1}); % Pre-allocate a uint32 vector for the offsets numRows = numel(C); offsets = zeros([numRows 1], "int32"); for ii = 1:numRows cellElement = C{ii}; % Validate cellElement can be used to create % one row in the ListArray, i.e. For example, % if the first element in C was a double, verify % cellElement is also a double. validator.validateElement(cellElement); % Determine how much to increment the % last offset value by to set the offset at index ii + 1. length = validator.getElementLEngth(cellElement); offsets[ii + 1] = length + offsets[i]; end % Reshape the elements in cell array C so that they % can be vertically concatenated. C = validator.reshapeCellElements(C); % Extract the cell array elements and vertically concatenate % them into one array. Then pass this array to arrow.array(). values = vertcat(C{:}); valueArray = arrow.array(values); % Create an Int32Array from offsets offsetArray = arrow.array(offsets); listArray = arrow.array.ListArray(Values=valueArray, Offsets=offsetArray) end ``` The concrete type of the `validator` object is created based on the first element in the `cell` array `C`. We use the first element to determine what kind of `ListArray` to construct from the input `cell` array. -- Added a concrete class called `arrow.array.internal.list.ClassTypeValidator`, which inherits from `arrow.array.internal.list.ListTypeValidator`: 1. `validateElement(obj, element)` - Throws an error if the element's class type does not match the expected value. 2. `length = getElementLength(obj, element)` - Returns the number of elements in the input array. 3. `C = reshapeCellElements(obj, C)` - Reshapes all elements in the `cell` array `C` to be column vectors. `ClassTypeValidator` will be used when creating `ListArray`s from MATLAB `cell` arrays containing "primitive types", such as numerics, strings, and durations. ### Are these changes tested? Yes. I added a new class called `tClassTypeValidator.m`. ### Are there any user-facing changes? No. ### Future Directions 1. apache#38420 2. apache#38417 3. apache#38354 * Closes: apache#38419 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Rationale for this change
Adding this
ClassTypeValidator
class is a step towards implementing thearrow.array.ListArray.fromMATLAB()
method for creatingListArray
s whoseValueType
s is either a numeric, boolean, string, time32, or time64 array from a MATLABcell
array.What changes are included in this PR?
Added an abstract class
arrow.array.internal.list.ListTypeValidator
that defines three abstract methods:validateElement(obj, element)
length = getElementLength(obj, element)
C = reshapeCellElements(obj, C)
These abstract methods will be used in
ListArray.fromMATLAB
to createListArray
s from MATLABcell
arrays. Below is a "pared-down" version of how thefromMATLAB
algorithm will work:The concrete type of the
validator
object is created based on the first element in thecell
arrayC
. We use the first element to determine what kind ofListArray
to construct from the inputcell
array.--
Added a concrete class called
arrow.array.internal.list.ClassTypeValidator
, which inherits fromarrow.array.internal.list.ListTypeValidator
:validateElement(obj, element)
- Throws an error if the element's class type does not match the expected value.length = getElementLength(obj, element)
- Returns the number of elements in the input array.C = reshapeCellElements(obj, C)
- Reshapes all elements in thecell
arrayC
to be column vectors.ClassTypeValidator
will be used when creatingListArray
s from MATLABcell
arrays containing "primitive types", such as numerics, strings, and durations.Are these changes tested?
Yes. I added a new class called
tClassTypeValidator.m
.Are there any user-facing changes?
No.
Future Directions
DatetimeValidator
class that validates a MATLABcell
array contains only values of zoned or unzoneddatetime
s #38420TableTypeValidator
class that validates a MATLABcell
array contains onlytable
s that share the same schema #38417fromMATLAB
method forarrow.array.ListArray
. #38354ClassTypeValidator
class that validates a MATLABcell
array contains only values of the same class type. #38419