Skip to content

Remove impute_all parameter from PerColumnImputer#3267

Merged
eccabay merged 6 commits intomainfrom
3241_percolumn-datetime
Jan 21, 2022
Merged

Remove impute_all parameter from PerColumnImputer#3267
eccabay merged 6 commits intomainfrom
3241_percolumn-datetime

Conversation

@eccabay
Copy link
Copy Markdown
Contributor

@eccabay eccabay commented Jan 20, 2022

Closes #3241

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 20, 2022

Codecov Report

Merging #3267 (b2dac9b) into main (b7e7552) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3267     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        326     326             
  Lines      31496   31497      +1     
=======================================
+ Hits       31405   31406      +1     
  Misses        91      91             
Impacted Files Coverage Δ
evalml/tests/component_tests/test_components.py 99.3% <ø> (ø)
...onents/transformers/imputers/per_column_imputer.py 100.0% <100.0%> (ø)
...l/tests/component_tests/test_per_column_imputer.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7e7552...b2dac9b. Read the comment docs.

@angela97lin angela97lin self-requested a review January 21, 2022 03:14
Copy link
Copy Markdown
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking on #3241 (comment)! Let's discuss, I think I'm confused and would like to clear up some things before moving forward if that's okay with you 😅

Copy link
Copy Markdown
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks @eccabay

Comment thread docs/source/release_notes.rst
Comment thread evalml/tests/component_tests/test_per_column_imputer.py Outdated
"column_with_nan_included": {"impute_strategy": "most_frequent"},
}
transformer = PerColumnImputer(impute_strategies=strategies, impute_all=False)
transformer = PerColumnImputer(impute_strategies=strategies)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question here: why does this test return three columns? I guess the overall question I have is, what is the behavior of this component for columns not specified in impute_strategies?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new behavior for columns not specified in impute_strategies is to completely leave them alone. Any NaN values will remain, and unspecified columns will not be dropped even if they are fully nan.

In this test, you can see that of the two NaN columns, the one included in the impute strategies was dropped but the one that wasn't, remains. Same with the some-nan columns, the one that was included was imputed but the one that wasn't still has nans after transformation.

Copy link
Copy Markdown
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @eccabay! Agreed that we should add a comment that the behavior of this component has changed but otherwise 🚢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PerColumnImputer fails if the input dataframe has datetime columns

4 participants