Conversation
Codecov Report
@@ Coverage Diff @@
## main #3959 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 347 347
Lines 36768 36776 +8
=======================================
+ Hits 36647 36656 +9
+ Misses 121 120 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
| self._boolean_cols = X.ww.schema._filter_cols( | ||
| include=["Boolean", "BooleanNullable"], | ||
| ) |
There was a problem hiding this comment.
I might be being pedantic, but is the preferred way to call a private function on the schema? I thought there was a select function on the ww accessor?
There was a problem hiding this comment.
Certainly. I stole this line directly from set_boolean_columns_to_integer, but I can switch both places over to use select instead.
|
|
||
| new_schema = original_schema.get_subset_schema(X_t.columns) | ||
|
|
||
| # TODO: Fix this after WW adds inference of object type booleans to BooleanNullable |
| if logical_type in [NaturalLanguage, Categorical]: | ||
| impute_strategy_to_use = "most_frequent" | ||
| if logical_type in [NaturalLanguage, Categorical, Boolean, BooleanNullable]: | ||
| impute_strategy = "most_frequent" |
There was a problem hiding this comment.
Not a huge fan of how this was originally done - with impute_strategy iterating over a subset of the total impute_strategy available and changing it in the test. But that's not your problem...we might want to think about rewriting this.
| X_train = pd.DataFrame({"a": [pd.NA] * 20 + [1.0] + [pd.NA] * 20}) | ||
| y = pd.Series(range(len(X_train))) | ||
| X_test = pd.DataFrame({"a": [pd.NA] * 10}) |
There was a problem hiding this comment.
Times like these, I think it's helpful to docstring the test to get at what exactly you're testing here. The test name doesn't seem to match what's going on. The test case here is that you're train is sparse and your test set happens to not be fully representative of all the classes available in X, right?
There was a problem hiding this comment.
Basically, yeah. It's really just testing having an all-null test set when the training had non-null values. I'll update the test name and add a docstring
chukarsten
left a comment
There was a problem hiding this comment.
Pending select change, looks great!
jeremyliweishih
left a comment
There was a problem hiding this comment.
Agree with @chukarsten's comments but otherwise LGTM
Co-authored-by: Jeremy Shih <jeremyliweishih@gmail.com>
…booleannullable-fix
Fixes the bug where all-null BooleanNullable columns will break the simple imputer during transform, when fit on nullable data that has a non-null value.