New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add problematic target data check #814
Conversation
Codecov Report
@@ Coverage Diff @@
## master #814 +/- ##
=======================================
Coverage 99.67% 99.67%
=======================================
Files 186 188 +2
Lines 7295 7338 +43
=======================================
+ Hits 7271 7314 +43
Misses 24 24
Continue to review full report at Codecov.
|
y = pd.Series(y) | ||
null_rows = y.isnull() | ||
error_msg = "Row '{}' contains a null value" | ||
return [DataCheckError(error_msg.format(row_index), self.name) for row_index, row_value in null_rows.items() if row_value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need an error for each row. Let's just return one error saying the target contains a missing value, yeah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we put the count and % of the null values in the error message? 1 row vs 50% of rows is a very different error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's what i was thinking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff. I think we should return a single error though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there, left another comment about what info is included in the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 🚢
Closes #710 by adding
InvalidTargetsDataCheck
, appending it to DefaultDataChecks run by AutoMLInvalidTargetsDataCheck
currently only checks if there are any NaN/None values in the target labels.