New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MATLAB] Add an InferNulls
name-value pair for controlling null value inference during construction of arrow.array.Array
#35676
Comments
arrow.array.Array
InferNulls
name-value pair for controlling null value inference during construction of arrow.array.Array
After further consideration, it may make sense to simplify the proposed name-value pairs to only include Rather than using a I've updated the issue title and description accordingly. |
take |
…g null value inference during construction of `arrow.array.Array` (#35827) ### Rationale for this change This change lets users control toggle the automatic null-value detection behavior. By default, values MATLAB considers to be missing (e.g. `NaN` for `double`, `<missing>` for `string`, and `NaT` for `datetime`) will be treated as `null` values. Users can toggle this behavior on and off using the `InferNulls` name-value pair. **Example** ```matlab >> matlabArray = [1 NaN 3]' matlabArray = 1 NaN 3 % Treat NaN as a null value >> arrowArray1 = arrow.array.Float64Array(maltabArray, InferNulls=true) arrowArray1 = [ 1, null, 3 ] % Don't treat NaN as a null value >> arrowArray2 = arrow.array.Float64Array(maltabArray, InferNulls=false) arrowArray2 = [ 1, nan, 3 ] ``` We've only added this nv-pair to `arrow.array.Float64Array` for now. We'll add this nv-pair to the other types in a followup changelist. ### What changes are included in this PR? 1. Added `InferNulls` name-value pair to `arrow.array.Float64Array`. 2. Added common validation function `arrow.args.validateTypeAndShape` to remove duplicate validation code among the numeric classes. 3. Added a function called `arrow.args.parseValidElements` that the `arrow.array.<Type>Array` classes will be able to share for generating the logical mask of valid elements. ### Are these changes tested? Yes, we added a test pointed called `InferNulls` to the test class`tFloat64Array.m`. ### Are there any user-facing changes? Yes, users can now control how `NaN` values are treated when creating an `arrow.array.Float64Array`. ### Future Directions 1. Add a name-value pair to allow users to specify the valid elements themselves. 2. Extend null support to other numeric types. 3. We've been working on adding error-handling support to `mathworks/libmexclass`. We have a prototype to do this using status-like and result-like objects already pushed to a [branch](https://github.com/mathworks/libmexclass/tree/33). Once this branch is merged with the `main` branch of `mathworks/libmexclass`, we'll port it over. ### Notes Thank you @ kevingurney for all the help with this PR! * Closes: #35676 Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Co-authored-by: Kevin Gurney <kgurney@mathworks.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Describe the enhancement requested
This is a follow up to the initial null value handling support that was added in #35598.
In order to give clients more flexibility in how null values in MATLAB arrays are detected when constructing an
arrow.array.Array
, it would be helpful to expose more name-value pairs on thearrow.array.Array
class (and concrete subclasses).One possible name-value pair for handling null value inference would be
InferNulls
, which is described below.InferNulls
Supported values:
true (default) | false
true
- "automatically" detect null values in the input MATLAB array based on the presence of MATLAB type-specific missing values (e.g.NaN
fordouble
,<missing>
forstring
,NaT
fordatetime
, etc.).false
- Do not "automatically" detect null values.Example:
Note: For some MATLAB types (e.g.
int64
) there is no concept of amissing
value. In this case the value ofInferNulls
won't impact the resultingarrow.array.Array
.Component(s)
MATLAB
The text was updated successfully, but these errors were encountered: