Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35676: [MATLAB] Add an InferNulls name-value pair for controlling null value inference during construction of arrow.array.Array #35827

Merged
merged 5 commits into from
Jun 3, 2023

Conversation

sgilmore10
Copy link
Member

@sgilmore10 sgilmore10 commented May 30, 2023

Rationale for this change

This change lets users control toggle the automatic null-value detection behavior. By default, values MATLAB considers to be missing (e.g. NaN for double, <missing> for string, and NaT for datetime) will be treated as null values. Users can toggle this behavior on and off using the InferNulls name-value pair.

Example

>> matlabArray = [1 NaN 3]'

matlabArray =

     1
     NaN
     3
      
% Treat NaN as a null value
 >> arrowArray1 = arrow.array.Float64Array(maltabArray, InferNulls=true)

arrowArray1 = 

[
  1,
  null,
  3
]

% Don't treat NaN as a null value 
 >> arrowArray2 = arrow.array.Float64Array(maltabArray, InferNulls=false)
   
arrowArray2 = 

[
  1,
  nan,
  3
]

We've only added this nv-pair to arrow.array.Float64Array for now. We'll add this nv-pair to the other types in a followup changelist.

What changes are included in this PR?

  1. Added InferNulls name-value pair to arrow.array.Float64Array.
  2. Added common validation function arrow.args.validateTypeAndShape to remove duplicate validation code among the numeric classes.
  3. Added a function called arrow.args.parseValidElements that the arrow.array.<Type>Array classes will be able to share for generating the logical mask of valid elements.

Are these changes tested?

Yes, we added a test pointed called InferNulls to the test classtFloat64Array.m.

Are there any user-facing changes?

Yes, users can now control how NaN values are treated when creating an arrow.array.Float64Array.

Future Directions

  1. Add a name-value pair to allow users to specify the valid elements themselves.
  2. Extend null support to other numeric types.
  3. We've been working on adding error-handling support to mathworks/libmexclass. We have a prototype to do this using status-like and result-like objects already pushed to a branch. Once this branch is merged with the main branch of mathworks/libmexclass, we'll port it over.

Notes

Thank you @kevingurney for all the help with this PR!

@github-actions
Copy link

⚠️ GitHub issue #35676 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels May 30, 2023
2. Move license header to the top of the file
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 30, 2023
@sgilmore10
Copy link
Member Author

I think I've addressed all the current feedback. Let me know if we feel anything else needs to be changed.

Thanks!
Sarah

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit f44f768 into apache:main Jun 3, 2023
8 checks passed
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Jun 3, 2023
@ursabot
Copy link

ursabot commented Jun 4, 2023

Benchmark runs are scheduled for baseline = 8878fc8 and contender = f44f768. f44f768 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️1.8% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.98% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️1.6% ⬆️0.48%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] f44f7685 ec2-t3-xlarge-us-east-2
[Finished] f44f7685 test-mac-arm
[Finished] f44f7685 ursa-i9-9960x
[Finished] f44f7685 ursa-thinkcentre-m75q
[Finished] 8878fc81 ec2-t3-xlarge-us-east-2
[Finished] 8878fc81 test-mac-arm
[Finished] 8878fc81 ursa-i9-9960x
[Finished] 8878fc81 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@sgilmore10 sgilmore10 deleted the GH-35676 branch June 27, 2023 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants