Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Implement a ClassTypeValidator class that validates a MATLAB cell array contains only values of the same class type. #38419

Closed
sgilmore10 opened this issue Oct 23, 2023 · 1 comment · Fixed by #38530

Comments

@sgilmore10
Copy link
Member

sgilmore10 commented Oct 23, 2023

Describe the enhancement requested

Adding this ClassTypeValidator class is a step towards implementing the arrow.array.ListArray.fromMATLAB() method for creating ListArrays whose ValueTypes is either a numeric, boolean, string, time32, or time64 array from a MATLAB cell array.

Component(s)

MATLAB

@sgilmore10 sgilmore10 changed the title [MATLAB] Implement a ClassTypeValidator that validates a MATLAB cell array contains only values of the same class type. [MATLAB] Implement a ClassTypeValidator class that validates a MATLAB cell array contains only values of the same class type. Oct 23, 2023
@kevingurney kevingurney assigned sgilmore10 and unassigned sgilmore10 Oct 23, 2023
@sgilmore10
Copy link
Member Author

take

kevingurney pushed a commit that referenced this issue Oct 31, 2023
…tes a MATLAB `cell` array contains only values of the same class type. (#38530)

### Rationale for this change

Adding this `ClassTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is either a numeric, boolean, string, time32, or time64 array from a MATLAB `cell` array.

### What changes are included in this PR?

Added an abstract class `arrow.array.internal.list.ListTypeValidator` that defines three abstract methods: 
1. `validateElement(obj, element)`
2. `length = getElementLength(obj, element)` 
3. `C = reshapeCellElements(obj, C)`

These abstract methods will be used in `ListArray.fromMATLAB` to create `ListArray`s from MATLAB `cell` arrays.  Below is a  "pared-down" version of how the `fromMATLAB` algorithm will work:

```matlab
function listArray = fromMATLAB(C)

    % Create the appropriate ListTypeValidator from the
    % first element in the cell array C
    validator = createListTypeValidator(C{1});

    % Pre-allocate a uint32 vector for the offsets
    numRows = numel(C);
    offsets = zeros([numRows 1], "int32");

    for ii = 1:numRows
         cellElement = C{ii};
    
        % Validate cellElement can be used to create
        % one row in the ListArray, i.e. For example,
        % if the first element in C was a double, verify
        % cellElement is also a double.
        validator.validateElement(cellElement);

        % Determine how much to increment the 
        % last offset value by to set the offset at index ii + 1.
        length = validator.getElementLEngth(cellElement);
        offsets[ii + 1] = length + offsets[i];
    end

    % Reshape the elements in cell array C so that they
    % can be vertically concatenated.
    C = validator.reshapeCellElements(C);
    
    % Extract the cell array elements and vertically concatenate
    % them into one array. Then pass this array to arrow.array().
    values = vertcat(C{:});
    valueArray = arrow.array(values);
     
    % Create an Int32Array from offsets
    offsetArray = arrow.array(offsets);

    listArray = arrow.array.ListArray(Values=valueArray, Offsets=offsetArray)
end
```
The concrete type of the `validator` object is created based on the first element in the `cell` array `C`.  We use the first element to determine what kind of `ListArray` to construct from the input `cell` array.

--

Added a concrete class called `arrow.array.internal.list.ClassTypeValidator`, which inherits from `arrow.array.internal.list.ListTypeValidator`:

1. `validateElement(obj, element)` - Throws an error if the element's class type does not match the expected value.
2.  `length = getElementLength(obj, element)` - Returns the number of elements in the input array.
3.  `C = reshapeCellElements(obj, C)` - Reshapes all elements in the `cell` array `C` to be column vectors.

`ClassTypeValidator` will be used when creating `ListArray`s from MATLAB `cell` arrays containing "primitive types", such as numerics, strings, and durations.

### Are these changes tested?

Yes. I added a new class called `tClassTypeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. #38420 
2. #38417
3. #38354 
* Closes: #38419

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
@kevingurney kevingurney added this to the 15.0.0 milestone Oct 31, 2023
kevingurney pushed a commit that referenced this issue Oct 31, 2023
…es a MATLAB `cell` array contains only values of zoned or unzoned `datetime`s (#38533)

### Rationale for this change

This is a followup to #38419.

Adding this `DatetimeTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is a timestamp array from a MATLAB `cell` array.

This validator will ensure the cell array contain only `datetime`s or unzoned `datetime`s. This is a requirement when creating a `List` of `Timestamp`s because two MATLAB `datetime`s can only be concatenated together if they are either both zoned or both unzoned:

```matlab
>> d1 = datetime(2023, 10, 31, TimeZone="America/New_York");
>> d2 =datetime(2023, 11, 1);
>> [d1; d2]
Error using datetime/vertcat
Unable to concatenate a datetime array that has a time zone with one that does not have a time
zone.
```

### What changes are included in this PR?

Added a new MATLAB class called `arrow.array.internal.list.DatetimeValidator`, which inherits from `arrow.array.internal.list.ClassTypeValidator`.

 This new class defines one property called `HasTimeZone`, which is a scalar `logical` indicating if the validator expects all `datetime`s to be zoned or not. 

Additionally, `DatetimeValidator` overrides the `validateElement` method. It first call's `ClassTypeValidator`'s implementation of `validateElement` to verify the input element is a `datetime`. If so, it then confirms that the input `datetime`'s TimeZone property is empty or nonempty, based on the validator's `HasTimeZone`  property value.

### Are these changes tested?

Yes, I added a new test class called `tDatetimeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. #38417 
2. #38354 
* Closes: #38420

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…validates a MATLAB `cell` array contains only values of the same class type. (apache#38530)

### Rationale for this change

Adding this `ClassTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is either a numeric, boolean, string, time32, or time64 array from a MATLAB `cell` array.

### What changes are included in this PR?

Added an abstract class `arrow.array.internal.list.ListTypeValidator` that defines three abstract methods: 
1. `validateElement(obj, element)`
2. `length = getElementLength(obj, element)` 
3. `C = reshapeCellElements(obj, C)`

These abstract methods will be used in `ListArray.fromMATLAB` to create `ListArray`s from MATLAB `cell` arrays.  Below is a  "pared-down" version of how the `fromMATLAB` algorithm will work:

```matlab
function listArray = fromMATLAB(C)

    % Create the appropriate ListTypeValidator from the
    % first element in the cell array C
    validator = createListTypeValidator(C{1});

    % Pre-allocate a uint32 vector for the offsets
    numRows = numel(C);
    offsets = zeros([numRows 1], "int32");

    for ii = 1:numRows
         cellElement = C{ii};
    
        % Validate cellElement can be used to create
        % one row in the ListArray, i.e. For example,
        % if the first element in C was a double, verify
        % cellElement is also a double.
        validator.validateElement(cellElement);

        % Determine how much to increment the 
        % last offset value by to set the offset at index ii + 1.
        length = validator.getElementLEngth(cellElement);
        offsets[ii + 1] = length + offsets[i];
    end

    % Reshape the elements in cell array C so that they
    % can be vertically concatenated.
    C = validator.reshapeCellElements(C);
    
    % Extract the cell array elements and vertically concatenate
    % them into one array. Then pass this array to arrow.array().
    values = vertcat(C{:});
    valueArray = arrow.array(values);
     
    % Create an Int32Array from offsets
    offsetArray = arrow.array(offsets);

    listArray = arrow.array.ListArray(Values=valueArray, Offsets=offsetArray)
end
```
The concrete type of the `validator` object is created based on the first element in the `cell` array `C`.  We use the first element to determine what kind of `ListArray` to construct from the input `cell` array.

--

Added a concrete class called `arrow.array.internal.list.ClassTypeValidator`, which inherits from `arrow.array.internal.list.ListTypeValidator`:

1. `validateElement(obj, element)` - Throws an error if the element's class type does not match the expected value.
2.  `length = getElementLength(obj, element)` - Returns the number of elements in the input array.
3.  `C = reshapeCellElements(obj, C)` - Reshapes all elements in the `cell` array `C` to be column vectors.

`ClassTypeValidator` will be used when creating `ListArray`s from MATLAB `cell` arrays containing "primitive types", such as numerics, strings, and durations.

### Are these changes tested?

Yes. I added a new class called `tClassTypeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. apache#38420 
2. apache#38417
3. apache#38354 
* Closes: apache#38419

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…alidates a MATLAB `cell` array contains only values of zoned or unzoned `datetime`s (apache#38533)

### Rationale for this change

This is a followup to apache#38419.

Adding this `DatetimeTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is a timestamp array from a MATLAB `cell` array.

This validator will ensure the cell array contain only `datetime`s or unzoned `datetime`s. This is a requirement when creating a `List` of `Timestamp`s because two MATLAB `datetime`s can only be concatenated together if they are either both zoned or both unzoned:

```matlab
>> d1 = datetime(2023, 10, 31, TimeZone="America/New_York");
>> d2 =datetime(2023, 11, 1);
>> [d1; d2]
Error using datetime/vertcat
Unable to concatenate a datetime array that has a time zone with one that does not have a time
zone.
```

### What changes are included in this PR?

Added a new MATLAB class called `arrow.array.internal.list.DatetimeValidator`, which inherits from `arrow.array.internal.list.ClassTypeValidator`.

 This new class defines one property called `HasTimeZone`, which is a scalar `logical` indicating if the validator expects all `datetime`s to be zoned or not. 

Additionally, `DatetimeValidator` overrides the `validateElement` method. It first call's `ClassTypeValidator`'s implementation of `validateElement` to verify the input element is a `datetime`. If so, it then confirms that the input `datetime`'s TimeZone property is empty or nonempty, based on the validator's `HasTimeZone`  property value.

### Are these changes tested?

Yes, I added a new test class called `tDatetimeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. apache#38417 
2. apache#38354 
* Closes: apache#38420

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…validates a MATLAB `cell` array contains only values of the same class type. (apache#38530)

### Rationale for this change

Adding this `ClassTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is either a numeric, boolean, string, time32, or time64 array from a MATLAB `cell` array.

### What changes are included in this PR?

Added an abstract class `arrow.array.internal.list.ListTypeValidator` that defines three abstract methods: 
1. `validateElement(obj, element)`
2. `length = getElementLength(obj, element)` 
3. `C = reshapeCellElements(obj, C)`

These abstract methods will be used in `ListArray.fromMATLAB` to create `ListArray`s from MATLAB `cell` arrays.  Below is a  "pared-down" version of how the `fromMATLAB` algorithm will work:

```matlab
function listArray = fromMATLAB(C)

    % Create the appropriate ListTypeValidator from the
    % first element in the cell array C
    validator = createListTypeValidator(C{1});

    % Pre-allocate a uint32 vector for the offsets
    numRows = numel(C);
    offsets = zeros([numRows 1], "int32");

    for ii = 1:numRows
         cellElement = C{ii};
    
        % Validate cellElement can be used to create
        % one row in the ListArray, i.e. For example,
        % if the first element in C was a double, verify
        % cellElement is also a double.
        validator.validateElement(cellElement);

        % Determine how much to increment the 
        % last offset value by to set the offset at index ii + 1.
        length = validator.getElementLEngth(cellElement);
        offsets[ii + 1] = length + offsets[i];
    end

    % Reshape the elements in cell array C so that they
    % can be vertically concatenated.
    C = validator.reshapeCellElements(C);
    
    % Extract the cell array elements and vertically concatenate
    % them into one array. Then pass this array to arrow.array().
    values = vertcat(C{:});
    valueArray = arrow.array(values);
     
    % Create an Int32Array from offsets
    offsetArray = arrow.array(offsets);

    listArray = arrow.array.ListArray(Values=valueArray, Offsets=offsetArray)
end
```
The concrete type of the `validator` object is created based on the first element in the `cell` array `C`.  We use the first element to determine what kind of `ListArray` to construct from the input `cell` array.

--

Added a concrete class called `arrow.array.internal.list.ClassTypeValidator`, which inherits from `arrow.array.internal.list.ListTypeValidator`:

1. `validateElement(obj, element)` - Throws an error if the element's class type does not match the expected value.
2.  `length = getElementLength(obj, element)` - Returns the number of elements in the input array.
3.  `C = reshapeCellElements(obj, C)` - Reshapes all elements in the `cell` array `C` to be column vectors.

`ClassTypeValidator` will be used when creating `ListArray`s from MATLAB `cell` arrays containing "primitive types", such as numerics, strings, and durations.

### Are these changes tested?

Yes. I added a new class called `tClassTypeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. apache#38420 
2. apache#38417
3. apache#38354 
* Closes: apache#38419

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…alidates a MATLAB `cell` array contains only values of zoned or unzoned `datetime`s (apache#38533)

### Rationale for this change

This is a followup to apache#38419.

Adding this `DatetimeTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is a timestamp array from a MATLAB `cell` array.

This validator will ensure the cell array contain only `datetime`s or unzoned `datetime`s. This is a requirement when creating a `List` of `Timestamp`s because two MATLAB `datetime`s can only be concatenated together if they are either both zoned or both unzoned:

```matlab
>> d1 = datetime(2023, 10, 31, TimeZone="America/New_York");
>> d2 =datetime(2023, 11, 1);
>> [d1; d2]
Error using datetime/vertcat
Unable to concatenate a datetime array that has a time zone with one that does not have a time
zone.
```

### What changes are included in this PR?

Added a new MATLAB class called `arrow.array.internal.list.DatetimeValidator`, which inherits from `arrow.array.internal.list.ClassTypeValidator`.

 This new class defines one property called `HasTimeZone`, which is a scalar `logical` indicating if the validator expects all `datetime`s to be zoned or not. 

Additionally, `DatetimeValidator` overrides the `validateElement` method. It first call's `ClassTypeValidator`'s implementation of `validateElement` to verify the input element is a `datetime`. If so, it then confirms that the input `datetime`'s TimeZone property is empty or nonempty, based on the validator's `HasTimeZone`  property value.

### Are these changes tested?

Yes, I added a new test class called `tDatetimeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. apache#38417 
2. apache#38354 
* Closes: apache#38420

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment