Remove impute_all parameter from PerColumnImputer#3267
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3267 +/- ##
=======================================
+ Coverage 99.8% 99.8% +0.1%
=======================================
Files 326 326
Lines 31496 31497 +1
=======================================
+ Hits 31405 31406 +1
Misses 91 91
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Blocking on #3241 (comment)! Let's discuss, I think I'm confused and would like to clear up some things before moving forward if that's okay with you 😅
freddyaboulton
left a comment
There was a problem hiding this comment.
Looks good to me! Thanks @eccabay
| "column_with_nan_included": {"impute_strategy": "most_frequent"}, | ||
| } | ||
| transformer = PerColumnImputer(impute_strategies=strategies, impute_all=False) | ||
| transformer = PerColumnImputer(impute_strategies=strategies) |
There was a problem hiding this comment.
question here: why does this test return three columns? I guess the overall question I have is, what is the behavior of this component for columns not specified in impute_strategies?
There was a problem hiding this comment.
The new behavior for columns not specified in impute_strategies is to completely leave them alone. Any NaN values will remain, and unspecified columns will not be dropped even if they are fully nan.
In this test, you can see that of the two NaN columns, the one included in the impute strategies was dropped but the one that wasn't, remains. Same with the some-nan columns, the one that was included was imputed but the one that wasn't still has nans after transformation.
angela97lin
left a comment
There was a problem hiding this comment.
LGTM, thanks @eccabay! Agreed that we should add a comment that the behavior of this component has changed but otherwise 🚢
…nto 3241_percolumn-datetime
Closes #3241