Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Add a public Valid property to to the MATLAB arrow.array.<Array> classes to query Null values (i.e. validity bitmap support) #35598

Closed
kevingurney opened this issue May 15, 2023 · 2 comments

Comments

@kevingurney
Copy link
Member

Describe the enhancement requested

Currently, the arrow.array.<Array> classes do not support querying the Null values (i.e. validity bitmap) on an Arrow array. Support for encoding Null values is an important part of the Arrow memory format, so the MATLAB Interface to Arrow should support it.

There are likely multiple different APIs that the MATLAB interface should have to support Null values robustly. However, to focus on incremental delivery, we can start by adding a public Valid property to the arrow.array.<Array> classes, which would return a logical array of null values in the given array.

Component(s)

MATLAB

@kevingurney
Copy link
Member Author

take

kou added a commit that referenced this issue May 28, 2023
…row.array.<Array>` classes to query Null values (i.e. validity bitmap support) (#35655)

### Rationale for this change

Currently, the `arrow.array.<Array>` classes do not support querying the Null values (i.e. validity bitmap) on an Arrow array. Support for encoding Null values is an important part of the Arrow memory format, so the MATLAB Interface to Arrow should support it.

There are likely multiple different APIs that the MATLAB interface should have to support Null values robustly. However, to focus on incremental delivery, we can start by adding a public `Valid` property to the `arrow.array.<Array>` classes, which would return a `logical` array of null values in the given array.

### What changes are included in this PR?

1. Added a new public property `Valid` to the `arrow.array.Array` superclass.
2. Implemented basic null value handling for `arrow.array.Float64Array` (i.e. treat `NaN` values in the input MATLAB array as null values in the corresponding `arrow.array.Float64Array`).
3. Implement null value substitution (i.e. substitute null values with `NaN`) for `Float64Array` in `toMATLAB` and `double` conversion methods.

Example of creating an `arrow.array.Float64Array` from a MATLAB `double` array containing `NaN` values:

```matlab
>> matlabArray = [1, 2, NaN, 4, NaN]'

matlabArray =

     1
     2
   NaN
     4
   NaN

>> arrowArray = arrow.array.Float64Array(matlabArray)

arrowArray = 

[
  1,
  2,
  null,
  4,
  null
]

>> arrowArray.Valid

ans =

  5×1 logical array

   1
   1
   0
   1
   0

>> all(~isnan(matlabArray) == arrowArray.Valid)

ans =

  logical

   1
```

### Are these changes tested?

Yes, we have added the following test points for the `Valid` property of `arrow.array.Float64Array`:

1. `ValidBasic`
2. `ValidNoNulls`
4. `ValidAllNulls`
5. `ValidEmpty`

### Are there any user-facing changes?

Yes.

There is now a public property `Valid` on the arrow.array.Float64Array` class which is a MATLAB `logical` array encoding the null values in the underlying Arrow array, where `true` indicates an element is valid (i.e. not null) and `false` indicates that an element is invalid (i.e. null).

### Future Directions

1. Implement more null value related methods like `isvalid`, `isnull`, `packagedValidityBitmap`, etc.
2. Add null value (i.e. `Valid` property) support to the rest of the `arrow.array.Array` subclasses.

### Notes

1. Thank you to @ sgilmore10 for your help with this pull request!

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com>
Co-authored-by: Kevin Gurney <kevin.p.gurney@gmail.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou kou added this to the 13.0.0 milestone May 28, 2023
@kou
Copy link
Member

kou commented May 28, 2023

Issue resolved by pull request 35655
#35655

@kou kou closed this as completed May 28, 2023
kou added a commit that referenced this issue Jun 13, 2023
### Rationale for this change

Now that the MATLAB interface supports validity bitmaps and bit packing/unpacking (#35598), we can add support for a `BooleanArray` class. This is a follow up to the work on the `NumericArray` classes.

`BooleanArray` maps to the MATLAB [`logical`](https://www.mathworks.com/help/matlab/logical-operations.html) type when calling `toMATLAB`.

### What changes are included in this PR?

1. Added a new `arrow.array.BooleanArray` class that can be converted to/from a MATLAB `logical` array.

**Example**:

```matlab
>> matlabArray = logical([true, false, true])'

matlabArray =

  3x1 logical array

   1
   0
   1

>> arrowArray = arrow.array.BooleanArray(matlabArray)

arrowArray = 

[
  true,
  false,
  true
]

>> convertedArrowArray = toMATLAB(arrowArray)

convertedArrowArray =

  3x1 logical array

   1
   0
   1

```

### Are these changes tested?

Yes.

1. Added a new `tBooleanArray.m` test class which follows the existing pattern for the `NumericArray` test classes.

### Are there any user-facing changes?

Yes.

1. Added a new user-facing `arrow.array.BooleanArray` class.

### Notes

1. Thank you @ sgilmore10 for your help with this pull request!
* Closes: #36040

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Kevin Gurney <kevin.p.gurney@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants