Fix BooleanNullable SimpleImputer bug #3959

eccabay · 2023-01-26T15:12:08Z

Fixes the bug where all-null BooleanNullable columns will break the simple imputer during transform, when fit on nullable data that has a non-null value.

codecov · 2023-01-26T15:21:10Z

Codecov Report

Merging #3959 (82d29ef) into main (7e035b0) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3959     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        347     347             
  Lines      36768   36776      +8     
=======================================
+ Hits       36647   36656      +9     
+ Misses       121     120      -1

Impacted Files	Coverage Δ
...components/transformers/imputers/simple_imputer.py	`100.0% <100.0%> (+1.7%)`	⬆️
evalml/pipelines/components/utils.py	`96.2% <100.0%> (-<0.1%)`	⬇️
...valml/tests/component_tests/test_simple_imputer.py	`100.0% <100.0%> (ø)`
evalml/tests/component_tests/test_utils.py	`99.1% <100.0%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

chukarsten · 2023-01-26T17:18:18Z

evalml/pipelines/components/transformers/imputers/simple_imputer.py

+        self._boolean_cols = X.ww.schema._filter_cols(
+            include=["Boolean", "BooleanNullable"],
+        )


I might be being pedantic, but is the preferred way to call a private function on the schema? I thought there was a select function on the ww accessor?

Certainly. I stole this line directly from set_boolean_columns_to_integer, but I can switch both places over to use select instead.

chukarsten · 2023-01-26T17:44:20Z

evalml/pipelines/components/transformers/imputers/simple_imputer.py

@@ -124,11 +134,9 @@ def transform(self, X, y=None):

        new_schema = original_schema.get_subset_schema(X_t.columns)

-        # TODO: Fix this after WW adds inference of object type booleans to BooleanNullable


chukarsten · 2023-01-26T17:50:38Z

evalml/tests/component_tests/test_simple_imputer.py

-    if logical_type in [NaturalLanguage, Categorical]:
-        impute_strategy_to_use = "most_frequent"
+    if logical_type in [NaturalLanguage, Categorical, Boolean, BooleanNullable]:
+        impute_strategy = "most_frequent"


Not a huge fan of how this was originally done - with impute_strategy iterating over a subset of the total impute_strategy available and changing it in the test. But that's not your problem...we might want to think about rewriting this.

chukarsten · 2023-01-26T17:52:29Z

evalml/tests/component_tests/test_simple_imputer.py

+    X_train = pd.DataFrame({"a": [pd.NA] * 20 + [1.0] + [pd.NA] * 20})
+    y = pd.Series(range(len(X_train)))
+    X_test = pd.DataFrame({"a": [pd.NA] * 10})


Times like these, I think it's helpful to docstring the test to get at what exactly you're testing here. The test name doesn't seem to match what's going on. The test case here is that you're train is sparse and your test set happens to not be fully representative of all the classes available in X, right?

Basically, yeah. It's really just testing having an all-null test set when the training had non-null values. I'll update the test name and add a docstring

evalml/tests/component_tests/test_simple_imputer.py

chukarsten

Pending select change, looks great!

jeremyliweishih

Agree with @chukarsten's comments but otherwise LGTM

evalml/tests/component_tests/test_simple_imputer.py

Co-authored-by: Jeremy Shih <jeremyliweishih@gmail.com>

…booleannullable-fix

eccabay added 4 commits January 26, 2023 09:27

Add test for boolean nullable in simple imputer

f0bd24a

Update simple imputer to convert bool to int instead of categorical

55a2f21

Update tests to reflect change

e68d4ad

Update release notes

c51995f

Avoid catboost issue

034fa68

chukarsten reviewed Jan 26, 2023

View reviewed changes

Merge branch 'main' into booleannullable-fix

9a314eb

eccabay marked this pull request as ready for review January 26, 2023 17:33

auto-assign bot assigned eccabay Jan 26, 2023

eccabay requested review from chukarsten, bchen1116, christopherbunn, Cmancuso, jeremyliweishih and tamargrey January 26, 2023 17:34

chukarsten reviewed Jan 26, 2023

View reviewed changes

evalml/tests/component_tests/test_simple_imputer.py Outdated Show resolved Hide resolved

chukarsten approved these changes Jan 26, 2023

View reviewed changes

jeremyliweishih approved these changes Jan 26, 2023

View reviewed changes

evalml/tests/component_tests/test_simple_imputer.py Outdated Show resolved Hide resolved

chukarsten and others added 4 commits January 26, 2023 13:20

Update evalml/tests/component_tests/test_simple_imputer.py

f5ff980

Co-authored-by: Jeremy Shih <jeremyliweishih@gmail.com>

PR comments

0f02f0e

Merge branch 'booleannullable-fix' of github.com:alteryx/evalml into …

d8ccabc

…booleannullable-fix

Remove pdb

82d29ef

eccabay enabled auto-merge (squash) January 26, 2023 18:28

eccabay merged commit 10a4980 into main Jan 26, 2023

eccabay deleted the booleannullable-fix branch January 26, 2023 19:01

chukarsten mentioned this pull request Jan 26, 2023

Release v0.66.1 #3961

Merged

tamargrey mentioned this pull request Feb 15, 2023

Remove Nullable type logic from Imputer Components and Refactor #3999

Closed

tamargrey mentioned this pull request Mar 1, 2023

Refactor imputer components to remove unnecessary logic #4038

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BooleanNullable SimpleImputer bug #3959

Fix BooleanNullable SimpleImputer bug #3959

eccabay commented Jan 26, 2023 •

edited

Loading

codecov bot commented Jan 26, 2023 •

edited

Loading

chukarsten Jan 26, 2023

eccabay Jan 26, 2023

chukarsten Jan 26, 2023

chukarsten Jan 26, 2023

chukarsten Jan 26, 2023

eccabay Jan 26, 2023

chukarsten left a comment

jeremyliweishih left a comment

		@@ -124,11 +134,9 @@ def transform(self, X, y=None):

		new_schema = original_schema.get_subset_schema(X_t.columns)

		# TODO: Fix this after WW adds inference of object type booleans to BooleanNullable

Fix BooleanNullable SimpleImputer bug #3959

Fix BooleanNullable SimpleImputer bug #3959

Conversation

eccabay commented Jan 26, 2023 • edited Loading

codecov bot commented Jan 26, 2023 • edited Loading

Codecov Report

chukarsten Jan 26, 2023

Choose a reason for hiding this comment

eccabay Jan 26, 2023

Choose a reason for hiding this comment

chukarsten Jan 26, 2023

Choose a reason for hiding this comment

chukarsten Jan 26, 2023

Choose a reason for hiding this comment

chukarsten Jan 26, 2023

Choose a reason for hiding this comment

eccabay Jan 26, 2023

Choose a reason for hiding this comment

chukarsten left a comment

Choose a reason for hiding this comment

jeremyliweishih left a comment

Choose a reason for hiding this comment

eccabay commented Jan 26, 2023 •

edited

Loading

codecov bot commented Jan 26, 2023 •

edited

Loading