-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional checks to InvalidTargetDataCheck to handle invalid target data types #929
Conversation
Codecov Report
@@ Coverage Diff @@
## main #929 +/- ##
=======================================
Coverage 99.85% 99.86%
=======================================
Files 170 170
Lines 8565 8593 +28
=======================================
+ Hits 8553 8581 +28
Misses 12 12
Continue to review full report at Codecov.
|
return [DataCheckError("{} row(s) ({}%) of target values are null".format(null_rows.sum(), null_rows.mean() * 100), self.name)] | ||
if null_rows.any(): | ||
messages.append(DataCheckError("{} row(s) ({}%) of target values are null".format(null_rows.sum(), null_rows.mean() * 100), self.name)) | ||
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64', 'bool'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're also defining numerics
in PR #917. Might be best to define once in a common location and then import it wherever its needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! And same with having supported_data_types
somewhere common.
value_counts = y.value_counts() | ||
if len(value_counts) == 2 and y.dtype in numerics: | ||
unique_values = value_counts.index.tolist() | ||
if set(unique_values) != set([0, 1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if unique_values
has float dtype? Does this still work properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! But added test_invalid_target_data_check_numeric_binary_classification_valid_float
so we don't have to wonder 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 🎆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin Good to merge but please remember to consolidate the definition of numerics
in #932 !
Closes #916