Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38419: [MATLAB] Implement a ClassTypeValidator class that validates a MATLAB cell array contains only values of the same class type. #38530

Merged
merged 8 commits into from
Oct 31, 2023

Conversation

sgilmore10
Copy link
Member

@sgilmore10 sgilmore10 commented Oct 31, 2023

Rationale for this change

Adding this ClassTypeValidator class is a step towards implementing the arrow.array.ListArray.fromMATLAB() method for creating ListArrays whose ValueTypes is either a numeric, boolean, string, time32, or time64 array from a MATLAB cell array.

What changes are included in this PR?

Added an abstract class arrow.array.internal.list.ListTypeValidator that defines three abstract methods:

  1. validateElement(obj, element)
  2. length = getElementLength(obj, element)
  3. C = reshapeCellElements(obj, C)

These abstract methods will be used in ListArray.fromMATLAB to create ListArrays from MATLAB cell arrays. Below is a "pared-down" version of how the fromMATLAB algorithm will work:

function listArray = fromMATLAB(C)

    % Create the appropriate ListTypeValidator from the
    % first element in the cell array C
    validator = createListTypeValidator(C{1});

    % Pre-allocate a uint32 vector for the offsets
    numRows = numel(C);
    offsets = zeros([numRows 1], "int32");

    for ii = 1:numRows
         cellElement = C{ii};
    
        % Validate cellElement can be used to create
        % one row in the ListArray, i.e. For example,
        % if the first element in C was a double, verify
        % cellElement is also a double.
        validator.validateElement(cellElement);

        % Determine how much to increment the 
        % last offset value by to set the offset at index ii + 1.
        length = validator.getElementLEngth(cellElement);
        offsets[ii + 1] = length + offsets[i];
    end

    % Reshape the elements in cell array C so that they
    % can be vertically concatenated.
    C = validator.reshapeCellElements(C);
    
    % Extract the cell array elements and vertically concatenate
    % them into one array. Then pass this array to arrow.array().
    values = vertcat(C{:});
    valueArray = arrow.array(values);
     
    % Create an Int32Array from offsets
    offsetArray = arrow.array(offsets);

    listArray = arrow.array.ListArray(Values=valueArray, Offsets=offsetArray)
end

The concrete type of the validator object is created based on the first element in the cell array C. We use the first element to determine what kind of ListArray to construct from the input cell array.

--

Added a concrete class called arrow.array.internal.list.ClassTypeValidator, which inherits from arrow.array.internal.list.ListTypeValidator:

  1. validateElement(obj, element) - Throws an error if the element's class type does not match the expected value.
  2. length = getElementLength(obj, element) - Returns the number of elements in the input array.
  3. C = reshapeCellElements(obj, C) - Reshapes all elements in the cell array C to be column vectors.

ClassTypeValidator will be used when creating ListArrays from MATLAB cell arrays containing "primitive types", such as numerics, strings, and durations.

Are these changes tested?

Yes. I added a new class called tClassTypeValidator.m.

Are there any user-facing changes?

No.

Future Directions

  1. [MATLAB] Implement a DatetimeValidator class that validates a MATLAB cell array contains only values of zoned or unzoned datetimes #38420
  2. [MATLAB] Implement a TableTypeValidator class that validates a MATLAB cell array contains only tables that share the same schema #38417
  3. [MATLAB] Implement fromMATLAB method for arrow.array.ListArray. #38354

Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@sgilmore10 sgilmore10 changed the title GhH-38419: [MATLAB] Implement a ClassTypeValidator class that validates a MATLAB cell array contains only values of the same class type. GH-38419: [MATLAB] Implement a ClassTypeValidator class that validates a MATLAB cell array contains only values of the same class type. Oct 31, 2023
Copy link
Member

@kevingurney kevingurney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for the pull request!

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Oct 31, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 31, 2023
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Oct 31, 2023
@kevingurney
Copy link
Member

+1

@kevingurney kevingurney merged commit aaf01e8 into apache:main Oct 31, 2023
9 checks passed
@kevingurney kevingurney deleted the GH-38419 branch October 31, 2023 18:41
@kevingurney kevingurney removed the awaiting merge Awaiting merge label Oct 31, 2023
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit aaf01e8.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…validates a MATLAB `cell` array contains only values of the same class type. (apache#38530)

### Rationale for this change

Adding this `ClassTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is either a numeric, boolean, string, time32, or time64 array from a MATLAB `cell` array.

### What changes are included in this PR?

Added an abstract class `arrow.array.internal.list.ListTypeValidator` that defines three abstract methods: 
1. `validateElement(obj, element)`
2. `length = getElementLength(obj, element)` 
3. `C = reshapeCellElements(obj, C)`

These abstract methods will be used in `ListArray.fromMATLAB` to create `ListArray`s from MATLAB `cell` arrays.  Below is a  "pared-down" version of how the `fromMATLAB` algorithm will work:

```matlab
function listArray = fromMATLAB(C)

    % Create the appropriate ListTypeValidator from the
    % first element in the cell array C
    validator = createListTypeValidator(C{1});

    % Pre-allocate a uint32 vector for the offsets
    numRows = numel(C);
    offsets = zeros([numRows 1], "int32");

    for ii = 1:numRows
         cellElement = C{ii};
    
        % Validate cellElement can be used to create
        % one row in the ListArray, i.e. For example,
        % if the first element in C was a double, verify
        % cellElement is also a double.
        validator.validateElement(cellElement);

        % Determine how much to increment the 
        % last offset value by to set the offset at index ii + 1.
        length = validator.getElementLEngth(cellElement);
        offsets[ii + 1] = length + offsets[i];
    end

    % Reshape the elements in cell array C so that they
    % can be vertically concatenated.
    C = validator.reshapeCellElements(C);
    
    % Extract the cell array elements and vertically concatenate
    % them into one array. Then pass this array to arrow.array().
    values = vertcat(C{:});
    valueArray = arrow.array(values);
     
    % Create an Int32Array from offsets
    offsetArray = arrow.array(offsets);

    listArray = arrow.array.ListArray(Values=valueArray, Offsets=offsetArray)
end
```
The concrete type of the `validator` object is created based on the first element in the `cell` array `C`.  We use the first element to determine what kind of `ListArray` to construct from the input `cell` array.

--

Added a concrete class called `arrow.array.internal.list.ClassTypeValidator`, which inherits from `arrow.array.internal.list.ListTypeValidator`:

1. `validateElement(obj, element)` - Throws an error if the element's class type does not match the expected value.
2.  `length = getElementLength(obj, element)` - Returns the number of elements in the input array.
3.  `C = reshapeCellElements(obj, C)` - Reshapes all elements in the `cell` array `C` to be column vectors.

`ClassTypeValidator` will be used when creating `ListArray`s from MATLAB `cell` arrays containing "primitive types", such as numerics, strings, and durations.

### Are these changes tested?

Yes. I added a new class called `tClassTypeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. apache#38420 
2. apache#38417
3. apache#38354 
* Closes: apache#38419

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…validates a MATLAB `cell` array contains only values of the same class type. (apache#38530)

### Rationale for this change

Adding this `ClassTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is either a numeric, boolean, string, time32, or time64 array from a MATLAB `cell` array.

### What changes are included in this PR?

Added an abstract class `arrow.array.internal.list.ListTypeValidator` that defines three abstract methods: 
1. `validateElement(obj, element)`
2. `length = getElementLength(obj, element)` 
3. `C = reshapeCellElements(obj, C)`

These abstract methods will be used in `ListArray.fromMATLAB` to create `ListArray`s from MATLAB `cell` arrays.  Below is a  "pared-down" version of how the `fromMATLAB` algorithm will work:

```matlab
function listArray = fromMATLAB(C)

    % Create the appropriate ListTypeValidator from the
    % first element in the cell array C
    validator = createListTypeValidator(C{1});

    % Pre-allocate a uint32 vector for the offsets
    numRows = numel(C);
    offsets = zeros([numRows 1], "int32");

    for ii = 1:numRows
         cellElement = C{ii};
    
        % Validate cellElement can be used to create
        % one row in the ListArray, i.e. For example,
        % if the first element in C was a double, verify
        % cellElement is also a double.
        validator.validateElement(cellElement);

        % Determine how much to increment the 
        % last offset value by to set the offset at index ii + 1.
        length = validator.getElementLEngth(cellElement);
        offsets[ii + 1] = length + offsets[i];
    end

    % Reshape the elements in cell array C so that they
    % can be vertically concatenated.
    C = validator.reshapeCellElements(C);
    
    % Extract the cell array elements and vertically concatenate
    % them into one array. Then pass this array to arrow.array().
    values = vertcat(C{:});
    valueArray = arrow.array(values);
     
    % Create an Int32Array from offsets
    offsetArray = arrow.array(offsets);

    listArray = arrow.array.ListArray(Values=valueArray, Offsets=offsetArray)
end
```
The concrete type of the `validator` object is created based on the first element in the `cell` array `C`.  We use the first element to determine what kind of `ListArray` to construct from the input `cell` array.

--

Added a concrete class called `arrow.array.internal.list.ClassTypeValidator`, which inherits from `arrow.array.internal.list.ListTypeValidator`:

1. `validateElement(obj, element)` - Throws an error if the element's class type does not match the expected value.
2.  `length = getElementLength(obj, element)` - Returns the number of elements in the input array.
3.  `C = reshapeCellElements(obj, C)` - Reshapes all elements in the `cell` array `C` to be column vectors.

`ClassTypeValidator` will be used when creating `ListArray`s from MATLAB `cell` arrays containing "primitive types", such as numerics, strings, and durations.

### Are these changes tested?

Yes. I added a new class called `tClassTypeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. apache#38420 
2. apache#38417
3. apache#38354 
* Closes: apache#38419

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants