-
Notifications
You must be signed in to change notification settings - Fork 89
Support string / categorical targets for binary and multiclass problems #932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! The impl looks ready to go. I left a comment about a line in ClassificationPipeline.predict_proba
which I'm not sure I follow yet, and a comment about docstrings.
Its great that we have coverage for binary classification estimators’ class order, thanks for adding.
Here's a few test cases and potential changes I'd like us to address before merging:
- What happens when the target passed to
fit
andscore
disagree, and there are new unique values passed toscore
, or we passedstr
tofit
and nowint
toscore
? We should raise an error explaining the problem. I think what would currently happen is a stack trace out ofLabelEncoder
, right? - Try calling binary and multiclass pipeline
fit
,predict
,predict_proba
andscore
with various types of target (int, float, bool, category, string), and test that either the encode/decode still functions properly, or we get a clear error message. - If we’re going to modify
predict_proba
col names in this PR, as it appears we are, we should have coverage for that change too. My suggestion would be to push it to Include target column name and class labels in predict/predict_proba output #645 , unless I'm misunderstanding.
Happy to talk more about this if you'd like!
evalml/tests/pipeline_tests/classification_pipeline_tests/test_binary_classification.py
Show resolved
Hide resolved
Yup, we get a pretty clear error out of
I already had what's now
On second thought, I'm going to update the tests and address this in #645. I know you added a potential code snippet, but I think rather than introduce even more code to fix/patch something, might as well punt it after all! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin the tests you added look great! I left one suggested for an additional check. Well done on this, there were lots of details to chase down.
It occurs to me, would be nice to have a target type test for regression too, mirroring your classification test. That one would just expect that categorical and str types fail. Not required for merge.
evalml/tests/pipeline_tests/classification_pipeline_tests/test_classification.py
Show resolved
Hide resolved
@angela97lin sounds good! |
Continuation of #917, closes #215