Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes bug where SimpleImputer cannot handle dropped columns #846

Merged
merged 6 commits into from
Jun 13, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Jun 11, 2020

Addresses #514 by storing which columns have all nan and thus will be dropped and manually excluding them; work still needs to be done to support the per-column imputer.

@angela97lin angela97lin self-assigned this Jun 11, 2020
@codecov
Copy link

codecov bot commented Jun 12, 2020

Codecov Report

Merging #846 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #846   +/-   ##
=======================================
  Coverage   99.69%   99.69%           
=======================================
  Files         195      195           
  Lines        7745     7774   +29     
=======================================
+ Hits         7721     7750   +29     
  Misses         24       24           
Impacted Files Coverage Δ
...components/transformers/imputers/simple_imputer.py 100.00% <100.00%> (ø)
...valml/tests/component_tests/test_simple_imputer.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2ea9a2...7145abb. Read the comment docs.

@angela97lin angela97lin requested a review from dsherry June 12, 2020 17:43
@angela97lin angela97lin marked this pull request as ready for review June 12, 2020 17:43
@dsherry
Copy link
Contributor

dsherry commented Jun 12, 2020

@angela97lin do we need a similar change for the new per-column imputer @jeremyliweishih added?

@angela97lin
Copy link
Contributor Author

@dsherry Ah probably! I'll take a look :)

super().__init__(parameters=parameters,
component_obj=imputer,
random_state=random_state)

def fit(self, X, y=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will pick up the docstring from the subclass, right? Our pattern has been to docstring everything, even if its inherited. I suggest we continue that until we decide to change it globally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, picks it up from Transformer! Seems like we haven't really established a proper pattern--some places have docstrings, other places don't--but doesn't hurt to be explicit!

@@ -35,23 +43,16 @@ def transform(self, X, y=None):
Returns:
pd.DataFrame: Transformed X
"""
if self._all_null_cols is None:
raise RuntimeError("Must fit transformer before calling transform!")
X_t = self._component_obj.transform(X)
if not isinstance(X_t, pd.DataFrame) and isinstance(X, pd.DataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a case when this condition isn't true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I guess just when X wasn't a DataFrame to begin with, so this could be simplified to just check isinstance(X, pd.DataFrame), but there would be no need to do that if X_t was a DataFrame already, hence both.

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test coverage!! Left comments, approved pending resolution

@dsherry
Copy link
Contributor

dsherry commented Jun 12, 2020

@dsherry Ah probably! I'll take a look :)

@angela97lin cool thanks. I suggest you merge this, keep #514 open and handle the per-column imputer in a separate PR

@angela97lin angela97lin merged commit cc87a99 into master Jun 13, 2020
@angela97lin angela97lin mentioned this pull request Jun 30, 2020
@dsherry dsherry deleted the 514_simpleimputer branch October 29, 2020 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants