Handle Categorical Boolean values#3960
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3960 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 347 347
Lines 36776 36790 +14
=======================================
+ Hits 36656 36670 +14
Misses 120 120
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
| if y is None: | ||
| raise ValueError("y cannot be None!") | ||
| y_ww = infer_feature_types(y) | ||
| self.original_typing = str(y_ww.ww.logical_type) |
There was a problem hiding this comment.
I don't know if we want to just put a note into some places to remove this once we fully deprecate typelib.
There was a problem hiding this comment.
@Cmancuso I think we need to keep this functionality as long as woodwork transforms 1/0, yes/no etc. to True/False unless that change was made for typelib
There was a problem hiding this comment.
I thought we weren't seeing this issue in schemaUpdate?
There was a problem hiding this comment.
on the EvalML side we'd still want to output the original form of the target instead of outputting True/False if the target is boolean or boolean inferable.
There was a problem hiding this comment.
Yeah, this will only be an issue for OS users when they use it in this specific scenario. If users pass in a yes/no dataset, they should still receive yes/no predictions, so I think this is needed.
| X = pd.DataFrame({}) | ||
| # binary | ||
| y = pd.Series(["yes", "yes", "no", "yes"]) | ||
| y = ww.init_series(y, logical_type="Categorical") |
There was a problem hiding this comment.
does it matter if we init this series as Boolean or Categorical?
There was a problem hiding this comment.
The behavior is the same, but if the type is passed as categorical, we would expect it to remain categorical rather than bool
There was a problem hiding this comment.
I agree that we should probably parametrize this over both Boolean and `Categorical.
| assert_index_equal(y_with_custom_indices.index, y_transformed.index) | ||
|
|
||
|
|
||
| def test_label_encoder_categorical_boolean_values(): |
There was a problem hiding this comment.
I would repeat the same advice for this test as I would for Becca's - what motivated the adding of this test case? I am not sure the test name as-is reflects the "why" of why this test exists. I think it's worthwhile to add a little context via a docstring to the test function...perhaps link it to a woodwork version upgrade.
| X = pd.DataFrame({}) | ||
| # binary | ||
| y = pd.Series(["yes", "yes", "no", "yes"]) | ||
| y = ww.init_series(y, logical_type="Categorical") |
There was a problem hiding this comment.
I agree that we should probably parametrize this over both Boolean and `Categorical.
Previously, if we passed in a categorical column (ie 'yes', 'no') that gets inferred as boolean through WW's new boolean typing, the
infer_feature_typesline inLabelEncoder.inverseTransformwill transform the mapped column back into boolean.This PR puts up a fix for that.