-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes bug where SimpleImputer cannot handle dropped columns #846
Conversation
Codecov Report
@@ Coverage Diff @@
## master #846 +/- ##
=======================================
Coverage 99.69% 99.69%
=======================================
Files 195 195
Lines 7745 7774 +29
=======================================
+ Hits 7721 7750 +29
Misses 24 24
Continue to review full report at Codecov.
|
@angela97lin do we need a similar change for the new per-column imputer @jeremyliweishih added? |
@dsherry Ah probably! I'll take a look :) |
super().__init__(parameters=parameters, | ||
component_obj=imputer, | ||
random_state=random_state) | ||
|
||
def fit(self, X, y=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will pick up the docstring from the subclass, right? Our pattern has been to docstring everything, even if its inherited. I suggest we continue that until we decide to change it globally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, picks it up from Transformer
! Seems like we haven't really established a proper pattern--some places have docstrings, other places don't--but doesn't hurt to be explicit!
@@ -35,23 +43,16 @@ def transform(self, X, y=None): | |||
Returns: | |||
pd.DataFrame: Transformed X | |||
""" | |||
if self._all_null_cols is None: | |||
raise RuntimeError("Must fit transformer before calling transform!") | |||
X_t = self._component_obj.transform(X) | |||
if not isinstance(X_t, pd.DataFrame) and isinstance(X, pd.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there ever a case when this condition isn't true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, I guess just when X wasn't a DataFrame to begin with, so this could be simplified to just check isinstance(X, pd.DataFrame)
, but there would be no need to do that if X_t
was a DataFrame already, hence both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test coverage!! Left comments, approved pending resolution
@angela97lin cool thanks. I suggest you merge this, keep #514 open and handle the per-column imputer in a separate PR |
Addresses #514 by storing which columns have all nan and thus will be dropped and manually excluding them; work still needs to be done to support the per-column imputer.