-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update data check message to include a "code" and "details" fields #1451
Conversation
…o 1430_data_check_jsons
Codecov Report
@@ Coverage Diff @@
## main #1451 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 222 223 +1
Lines 14891 14930 +39
=========================================
+ Hits 14884 14923 +39
Misses 7 7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 ! 😁
No blocking comments
if set(unique_values) != set([0, 1]): | ||
messages["errors"].append(DataCheckError("Numerical binary classification target classes must be [0, 1], got [{}] instead".format(", ".join([str(val) for val in unique_values])), self.name).to_dict()) | ||
messages["warnings"].append(DataCheckWarning(message="Numerical binary classification target classes must be [0, 1], got [{}] instead".format(", ".join([str(val) for val in unique_values])), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about it until now but this check could potentially blow up if a large regression dataset is used. The message would get super long because we include all unique values. Would be a good thing to file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry Right, but that's why we have problem_type
passed in for the initialization of InvalidTargetDataCheck
, right? Or are you concerned if the user accidentally passes in the wrong problem type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's what I was thinking, if someone accidentally ran binary classification with a regression target. One quick way around this would be to show the number of uniques and the counts of the 100 most frequent uniques instead of the counts of all the uniques!
No need to add this to this PR heh. But this would be great to file as a performance enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it! Filed #1460 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @angela97lin ! I think this is great. My only comment is that enums are not actually json serializable so we should convert to it to string in validate
!
Oh shoot, that's a super great point @freddyaboulton , that python enums aren't JSON serializable. Yes, let's include the stringified version of the enum. (@angela97lin ) |
@freddyaboulton @dsherry Ah, really good point--Will update this to include the stringified versions instead. Thanks for catching this! |
Closes #1430
Also addresses #1422 by converting the check for binary classification targets != {0, 1} to return a warning instead of an error.