-
Notifications
You must be signed in to change notification settings - Fork 92
Add ability to only impute specific columns #3123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3123 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 317 317
Lines 30685 30711 +26
=======================================
+ Hits 30581 30607 +26
Misses 104 104
Continue to review full report at Codecov.
|
jeremyliweishih
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!
bchen1116
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
eccabay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it, such a simple yet elegant change! Just left a few small questions.
| self.imputers = dict() | ||
| for column in X.columns: | ||
|
|
||
| columns_to_impute = X.columns if self.impute_all else self.impute_strategies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work correctly even though X.columns is a list and self.impute_strategies is a dict? Why not use self.impute_strategies.keys()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I think here it works because we just use it to iterate, and iterating over a dictionary will use its keys! But happy to update this to be more clear / in case we update this logic :)
| If False, only columns specified as keys in the `impute_strategies` dictionary are imputed. If False and `impute_strategies` is None, | ||
| no columns will be imputed. Defaults to True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth it to warn the user in the case that the imputer does not impute anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a warning during fit time for this case :)
| Valid values include "mean", "median", "most_frequent", "constant" for numerical data, | ||
| and "most_frequent", "constant" for object data types. Defaults to "most_frequent". | ||
| impute_all (bool): Whether or not to impute all columns or just the columns that are specified in `impute_strategies`. If True, | ||
| all columns will be imputed either using the strategy specified in `impute_strategies` or using the `default_impute_strategy`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean by "If True, all columns will be imputed either using the strategy specified in impute_strategies". Aren't the impute_strategies defined on a per-column basis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chukarsten Yup, impute_strategies is a dictionary that looks like { "col_name": {"impute_strategy": "mean", "fill_value": 0}, "col_name2": {"impute_strategy": "mean", "fill_value": 0} ... }, but if they're not specified in the dictionary then we use the default_impute_strategy for that column. Does that make sense / do you think this could be clarified more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the wording a bit!
Closes #3039