Refactor imputer components to remove unnecessary logic #4038

tamargrey · 2023-02-28T19:01:17Z

Related to #3999 (Note - will not yet close this issue, as this PR does not remove the nullable type logic; it only refactors the unnecessary logic)

Refactors the following imputers

imputer.py
simple_imputer.py
time_series_imputer.py
target_imputer.py
knn_imputer.py

codecov · 2023-02-28T19:08:46Z

Codecov Report

Merging #4038 (68ef36d) into component-specific-nullable-types-handling (80be9be) will decrease coverage by 0.0%.
The diff coverage is 100.0%.

@@                             Coverage Diff                              @@
##           component-specific-nullable-types-handling   #4038     +/-   ##
============================================================================
- Coverage                                        99.7%   99.7%   -0.0%     
============================================================================
  Files                                             349     349             
  Lines                                           37523   37505     -18     
============================================================================
- Hits                                            37405   37387     -18     
  Misses                                            118     118

Impacted Files	Coverage Δ
evalml/pipelines/components/utils.py	`96.3% <ø> (-0.2%)`	⬇️
.../tests/component_tests/test_time_series_imputer.py	`100.0% <ø> (ø)`
evalml/tests/component_tests/test_utils.py	`99.2% <ø> (-0.1%)`	⬇️
evalml/tests/conftest.py	`98.3% <ø> (ø)`
evalml/tests/utils_tests/test_gen_utils.py	`99.5% <ø> (-<0.1%)`	⬇️
...alml/tests/utils_tests/test_nullable_type_utils.py	`100.0% <ø> (ø)`
evalml/utils/gen_utils.py	`99.3% <ø> (-<0.1%)`	⬇️
...elines/components/transformers/imputers/imputer.py	`100.0% <100.0%> (ø)`
...es/components/transformers/imputers/knn_imputer.py	`100.0% <100.0%> (ø)`
...components/transformers/imputers/simple_imputer.py	`100.0% <100.0%> (ø)`
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

tamargrey · 2023-02-28T22:02:44Z

evalml/pipelines/components/transformers/preprocessing/transform_primitive_components.py

-        # "and arrays.StringArray are about the same."
-        features = features.astype(object, copy=False)
-        features.index = X_ww.index
-        features.ww.init(logical_types={col_: "categorical" for col_ in features})


This popped up as a problem in this CI run - by removing is_categorical_actually_boolean usage in the simple imputer, we were no longer recognizing features made from IsFreeEmailDomain as boolean columns and imputing them as such in test_imputer_can_impute_features_generated_from_null_email_url_features.

I think we're good to remove this block of code for two reasons

Freddy added much of that in this PR. We should be good to remove it now that we’re upgraded past sklearn 1.1

The fact that we were converting all columns to categorical logical type at all was originally done because of a change in woodwork inference, but we were also not using woodwork in featuretools at that time (which you can see from the usage of entity in the featuretools calls from that PR), so we no longer have to rely on woodwork inference to get the types of features, which will have woodwork types specified by the primitives used to make them.

While sklearn 1.1 did fix the problem Freddy described, I found that removing this block of code entirely caused an issue in Catboost that I was seeing during #3966 with string categories. Converting to object fixed that problem, so I'm leaving that as is and will open a ticket to remove this once that bug is fixed.

Ticket to remove this workaround: #4051

tamargrey · 2023-02-28T23:21:42Z

evalml/pipelines/components/transformers/imputers/imputer.py

        if len(X_categorical.columns) > 0:
            self._categorical_imputer.fit(X_categorical, y)
            self._categorical_cols = X_categorical.columns

-        X_boolean = X[[col for col in bool_cols if col not in self._all_null_cols]]
+        X_boolean = X.ww[[col for col in bool_cols if col not in self._all_null_cols]]


When using the imputer in pipelines, we weren't fitting the child imputers on woodwork-initialized data.

I'm surprised this hasn't been noticed before, but I stumbled upon it because of these changes. Now that we're back to returning early if all the columns are the bool dtype in the simple imputer, this was causing an issue in a BooleanNullable column that would get re-inferred as Boolean at fit, so we'd return early and then raise an error that the _component_obj wasn't fit at transform when the correct BooleanNullable type was passed in.

tamargrey · 2023-02-28T23:36:36Z

evalml/tests/pipeline_tests/test_component_graph.py

@@ -2737,7 +2744,7 @@ def test_get_component_input_logical_types():
        "cat": Categorical(),
        "numeric": Integer(),
        "EMAIL_ADDRESS_TO_DOMAIN(email)": Categorical(),
-        "IS_FREE_EMAIL_DOMAIN(email)": Categorical(),
+        "IS_FREE_EMAIL_DOMAIN(email)": BooleanNullable(),


The change to not automatically convert all features to Categorical will result in BooleanNullable columns being introduced into X via the EmailFeaturizer after the ReplaceNullableTypes transformer has been used.

Because of that, I feel like this change should wait to be introduced with the rest of the nullable type handling. 😢

tamargrey · 2023-03-01T21:51:12Z

evalml/pipelines/components/transformers/imputers/simple_imputer.py

-            [
-                col
-                for col in X.ww.select("Categorical")
-                if is_categorical_actually_boolean(X, col)


Explanation as to why we can remove is_categorical_actually_boolean:

We had "categorical booleans" because of the initial ReplaceNulalbleTypes transformer implementation that turned all BooleanNullable columns int Categorical.

This util originally added in Sklearn 1.1.1 Upgrade #3525 when we first started imputing boolean columns separately from categorical columns so we needed a way to separate them out.

We no longer create these categorical booleans (I caught and removed the last case where we were doing it in the EmailFeaturizer), and I don't think we need to handle the case where a user has created them, so uses of this should be removed.

tamargrey · 2023-03-01T21:54:43Z

evalml/pipelines/components/transformers/imputers/simple_imputer.py

-                if is_categorical_actually_boolean(X, col)
-            ],
-        )
-        X = set_boolean_columns_to_integer(X)


Explanation as to why we can stop converting boolean columns to integers:

This originally began as converting booleans to categorical at fit inside the simple imputer in Pandas 1.4.0 Support #3324. Not 100% clear why this was needed, but I imagine it had to do with a pandas incompatibility with booleans.

Then, we started converting all boolean columns to categorical at transform as well in Make EvalML compatible with the new Woodwork Boolean inference #3892 (which I think was done because we couldn't reinfer the columns as booleans when they were imputed as booleans because sklearn turns them into 1s and 0s)

Eventually, turning bools to categories became converting to IntegerNullable in Fix BooleanNullable SimpleImputer bug #3959 to handle a pandas bug with imputing categorical booleans.

Given that the sklearn simple imputer's behavior is to turn boolean columns to 1s and 0s when imputed, we've now essentially recreated that behavior outside of it, so we should be able to just pass the original boolean and boolean nullable columns into the sklearn imputer and handle turning them back to Booleans at the end like we are now.

These changes also mean that we don't need to keep track of boolean columns and can stop reconverting columns to bool after imputing

I'm slightly worried that part of the original reason for converting boolean columns was because sklearn will not fit or transform its imputer on data that is only the bool dtype. Converting to category or int64 doesnt solves that problem because sklearn will happily take in data of only those types.

At one point we had the logic that I now added back in to return early and not fit the imputer on sklearn's imputer object if only bool columns are present, and I havent been able to confirm it, but it's possible that part of the reason we removed that logic is that it's problematic if nans get introduced in the test data - we would have not fit the component, causing errors.

Because of that, I added logic to store the information if the component was fit on fully bool data, and if the data at transform isn't fully bool, call fit on the component_obj, which I know isn't ideal to be imputing the target based on only test data.

To me this felt like enough of an edge case that this handling was okay given that it's the simplest solution, but I am open to other handlings (converting datasets of only bool columns to boolean nullable? just not handling this case at all?) if others feel strongly.

Looks like we have to handle this edge case if I understood correctly:

"Given that the sklearn simple imputer's behavior is to turn boolean columns to 1s and 0s when imputed, we've now essentially recreated that behavior outside of it, so we should be able to just pass the original boolean and boolean nullable columns into the sklearn imputer and handle turning them back to Booleans at the end like we are now."

This is only true when the impute strategy is mean. Above we set it to most_frequent and we error out. This is the case we need since we only use most_frequent with bool columns in Imputer.

Could we convert the column to a string categorical or keep the current behavior of converting to a integer just for the most_frequent strategy or your suggestion above? I'd still prefer the conversion over refitting and transforming on the component object.

Agreed with Jeremy here - calling fit during transform opens us up to future bugs. We only call fit once, and only ever call it on the training data. With a call to _component_obj.fit during transform, we now allow the component to fit on validation/testing data and overwrite earlier component training.
A quick test shows what would happen in the current setup:

We will be better served converting the booleans to integers rather than categoricals for most_frequent to match the behavior for mean, and agreed that conversion is preferable to this refitting.

Thank you both for keeping me honest here! I definitely missed the nuance that sklearn allows bool only data for "mean" and "median" strategies. I think there are two main questions to answer:

How should we handle "mean" and "median" strategies for the SimpleImputer? We only allow knn or most_frequent as strategies for boolean columns in the Imputer component, but we don't restrict that in our SimpleImputer component despite the fact that we actually get a woodwork TypeConversionError at init if you attempt to use median or mean with a boolean nullable column that contains nans and the result contains not whole numbers (the cases becca's examples show - this is on main and my branch). For now, I think we should ignore this behavior since fixing it should be given proper consideration and it's not reachable from automl. I'll make an issue.

How should we handle the case when only bool dtypes are passed to the SimpleImputer (independent of impute strategy which only really works for bool cols when its most_frequent)? We need to find a way to fit the _component_obj from our fit method in this case instead of returning early, and I think we should keep the changes as minimal as possible, and only convert those columns to BooleanNullable (which sklearn is fine with!) when there are only bool columns in the data. There's no need to convert other bool columns, and I think converting to string or numeric values just opens us up to unexpected behavior now that sklearn has so much support for the boolean dtype. I'll double check with all of the different types that there isn't any unexpected behavior, or course.

Really, I just wish sklearn didn't have this weird limitation. Seems to completely miss the exact use case we need it for which is different training and test data.

Created issue for the TypeConversionError: #4050

sounds good for 1. and agree with the BooleanNullable transform as long we set it back to bool after imputation!

Good call out on making sure it goes back to Boolean after. I thiiink that's what's happening but I'll confirm.

ah, we're only converting to boolean dtype in fit; we just then return early in transform if all the columns are bool , so the types will stay the same!

tamargrey · 2023-03-02T14:43:29Z

I believe the codecov failures are just because enough code was removed that coverage has lowered, but if you go into the codecov report, none of the changes show uncovered lines.

tamargrey · 2023-03-02T15:53:26Z

evalml/pipelines/components/transformers/imputers/target_imputer.py

-        )
+        new_logical_type = y_ww.ww.logical_type
+        if isinstance(y_ww.ww.logical_type, IntegerNullable):
+            new_logical_type = Double


Converting imputed IntegerNullable data to Integer truncates the imputed values for mean and median, so I changed this to Double, which matches other imputers' behavior.

tamargrey · 2023-03-02T15:53:47Z

evalml/pipelines/components/transformers/imputers/target_imputer.py

+        if isinstance(y_ww.ww.logical_type, IntegerNullable):
+            new_logical_type = Double
+        elif isinstance(y_ww.ww.logical_type, BooleanNullable):
+            new_logical_type = Boolean


Added this for consistency's sake

tamargrey · 2023-03-02T17:01:25Z

evalml/pipelines/components/transformers/imputers/knn_imputer.py

+        )
+
+        # Only impute data that is not natural language columns or fully null
+        self._cols_to_impute = [


This change to use a _cols_to_impute variable to manage the fully null and natural language column logic here and at the simple imputer was a change that, to me, makes the imputer logic easier to understand. It makes the handling of different types of data more explicit rather than a sequence of changes to X to keep track of. It should also have slightly more efficient woodwork usage.

One implication of this is that we never even pass fully null columns to the imputer _component_obj, which could be problematic for test data that may not be fully null, but in that case, we'd have no information from the training data to use to impute data, and in the previous call to pd.DataFrame(X_t, columns=not_all_null_or_natural_language_cols), it assumes the fully null columns wouldn't be present anyway, so I'm assuming that it's okay to not pass the fully null columns in at all.

It should be ok as long as we're not passing the fully null or natural language column as input for both fit and transform. It is also ok to pass the full null columns in - I believe the sklearn imputer's drop them automatically.

yep! It simplified things a bit to not pass them in at either fit or transform, but agreed it's something we need to be consistent on.

jeremyliweishih

LGTM other than the all bool logic for the simple imputer. Lmk what you think!

jeremyliweishih · 2023-03-02T19:04:48Z

evalml/pipelines/components/transformers/imputers/knn_imputer.py

+        )
+
+        # Only impute data that is not natural language columns or fully null
+        self._cols_to_impute = [


It should be ok as long as we're not passing the fully null or natural language column as input for both fit and transform. It is also ok to pass the full null columns in - I believe the sklearn imputer's drop them automatically.

jeremyliweishih · 2023-03-02T19:05:54Z

evalml/pipelines/components/transformers/imputers/knn_imputer.py

+        )
+
+        # Only impute data that is not natural language columns or fully null
+        self._cols_to_impute = [


jeremyliweishih · 2023-03-02T19:13:46Z

evalml/pipelines/components/transformers/imputers/simple_imputer.py

-                if is_categorical_actually_boolean(X, col)
-            ],
-        )
-        X = set_boolean_columns_to_integer(X)


Looks like we have to handle this edge case if I understood correctly:

"Given that the sklearn simple imputer's behavior is to turn boolean columns to 1s and 0s when imputed, we've now essentially recreated that behavior outside of it, so we should be able to just pass the original boolean and boolean nullable columns into the sklearn imputer and handle turning them back to Booleans at the end like we are now."

This is only true when the impute strategy is mean. Above we set it to most_frequent and we error out. This is the case we need since we only use most_frequent with bool columns in Imputer.

Could we convert the column to a string categorical or keep the current behavior of converting to a integer just for the most_frequent strategy or your suggestion above? I'd still prefer the conversion over refitting and transforming on the component object.

eccabay

Similar to Jeremy's feedback, I think we need to find a better way to handle booleans in the simple imputer.

Other than that though, it's mostly just nitpicks. I really appreciate this work!

docs/source/release_notes.rst

evalml/pipelines/components/transformers/imputers/knn_imputer.py

eccabay · 2023-03-03T14:22:20Z

evalml/pipelines/components/transformers/imputers/simple_imputer.py

-                if is_categorical_actually_boolean(X, col)
-            ],
-        )
-        X = set_boolean_columns_to_integer(X)


Agreed with Jeremy here - calling fit during transform opens us up to future bugs. We only call fit once, and only ever call it on the training data. With a call to _component_obj.fit during transform, we now allow the component to fit on validation/testing data and overwrite earlier component training.
A quick test shows what would happen in the current setup:

We will be better served converting the booleans to integers rather than categoricals for most_frequent to match the behavior for mean, and agreed that conversion is preferable to this refitting.

eccabay · 2023-03-03T14:30:02Z

evalml/pipelines/components/transformers/imputers/simple_imputer.py

+            # reorder columns to match original
+            X_t = X_t.ww[[col for col in original_schema.columns if col in X_t.columns]]


🙏 this will prevent future mystery bugs, thank you!

evalml/pipelines/components/transformers/imputers/target_imputer.py

eccabay · 2023-03-03T14:45:36Z

evalml/tests/component_tests/test_simple_imputer.py

+    imp = SimpleImputer()
+    imp.fit(X_train)
+
+    # X_test will be BooleanNullable which will be a problem when _component_obj isn't fit


I'm confused what this comment is saying here, could you clarify?

You also explicitly cast the woodwork types to be BooleanNullable, but they're already automatically inferred to be. This isn't a bad thing, it protects us against future changes in woodwork inference, but I'm a bit confused with the context of the comment.

This is just the follow-up to the first comment, explaining the test case where we start with Boolean types in X_train and then have BooleanNullable types with nans in X_test. Once I switch to convert boolean types instead of calling fit inside of transform, this test will stay the same, though maybe I'll add a check that the actual imputed values are the same as the values that would have been imputed if the types were BooleanNullable all along

evalml/tests/component_tests/test_target_imputer.py

eccabay · 2023-03-03T14:55:59Z

evalml/tests/utils_tests/test_nullable_type_utils.py

@@ -199,7 +199,6 @@ def test_downcast_nullable_y_replaces_nullable_types(
    nullable_ltype,
    has_nans,
 ):
-    # --> remove this comment


Dismissed so I don't block your progress when I'm OOO next week!

jeremyliweishih · 2023-03-08T14:51:37Z

evalml/pipelines/components/transformers/imputers/simple_imputer.py

-                if is_categorical_actually_boolean(X, col)
-            ],
-        )
-        X = set_boolean_columns_to_integer(X)


sounds good for 1. and agree with the BooleanNullable transform as long we set it back to bool after imputation!

jeremyliweishih · 2023-03-08T14:53:41Z

LGTM! great work 😄

chukarsten · 2023-03-08T18:07:13Z

evalml/tests/component_tests/test_simple_imputer.py

+    X_train = pd.DataFrame(
+        {
+            "bools1": pd.Series([True, False, True, True] * 20),
+            "bools2": pd.Series([True, False, True, False] * 20),
+        },
+    )
+    X_train.ww.init(
+        logical_types={
+            "bools1": "Boolean",
+            "bools2": "Boolean",
+        },
+    )


Is it worth adding the all bools columns to the imputer_test_data

Probably!

I can add in a second all boolean no nulls column into the fixture that way I can get my "all bools" dataframe with a select("Boolean") call and then add nans in after fit to create the test data for this test. I can't think of anything that would be inherently problematic about adding a second Boolean column other than maybe some tests that only expect there to be one might fail.

chukarsten

This all looks solid to me. I just had the one question about roping the small ad-hoc dataframes into the larger fixture, but that's about it.

* Stop using woodwork describe to get nan info in time series imputer * remove logic that's no longer needed and return if all bool dtype * remove unnecessary logic from target imputer * remove unused utils * remove logic to convert dfs features to categorical logical type * fix email featureizer test * Revert changes to transfomr prim components for testing * Revert "Revert changes to transfomr prim components for testing" This reverts commit 57dda43. * Fix bugs with ww not imputed and string categories * Add release note * Handle case where nans are present at transform in previously all bool dtype data * Stop truncating ints in target imputer * clean up * Fix tests * Keep ltype integer for most frequent impute tpye in target imputer * refactor knn imputer to use new logic * Fix list bug * remove comment * Update release note to mention nullable types * remove outdated comment * Conver all bool dfs to boolean nullable instead of refitting * lint fix * PR comments * add second bool col to imputer fixture

* initial commit to make the PR * Refactor imputer components to remove unnecessary logic (#4038) * Stop using woodwork describe to get nan info in time series imputer * remove logic that's no longer needed and return if all bool dtype * remove unnecessary logic from target imputer * remove unused utils * remove logic to convert dfs features to categorical logical type * fix email featureizer test * Revert changes to transfomr prim components for testing * Revert "Revert changes to transfomr prim components for testing" This reverts commit 57dda43. * Fix bugs with ww not imputed and string categories * Add release note * Handle case where nans are present at transform in previously all bool dtype data * Stop truncating ints in target imputer * clean up * Fix tests * Keep ltype integer for most frequent impute tpye in target imputer * refactor knn imputer to use new logic * Fix list bug * remove comment * Update release note to mention nullable types * remove outdated comment * Conver all bool dfs to boolean nullable instead of refitting * lint fix * PR comments * add second bool col to imputer fixture * Use nullable type handling in components' fit, transform, and predict methods (#4046) * Remove existing nullable type handling from oversampler and use _handle_nullable_types instead * Add handle call for lgbm regressor and remove existing handling * Add handle call for lgbm classifier * temp broken exp smoothing tests * lint fix * add release note * Fix broken tests by initting woodwork on y in lgbm classifier * Update tests * Call handle in arima * call handle from ts imputer y ltype is downcasted value * remove unnecessary comments * Fix time series guide * lint fix * Only call handle_nullable_types when necessary in methods * Remove remaining unnecessary handle calls * resolve remaining comments * Add y ww init to ts imputer back in to fix tests * Copy X in testing nullable types to stop hiding potential incompatibilities in methods * use X_d in lgbm predict proba * remove nullable type handling after sklearn upgrade fixed incompatibilities * use common util to determine type for time series imputed integers * Add comments around why we copy X * remove _prepare_data from samplers * PR comments * remove tests to check if handle method is called * remove nullable types from imputed data because of regularizer * fix typo * fix docstrings * fix codecov issues * PR comments * Revert "Fix time series guide" This reverts commit 964622a. * return unchanged ltype in nullabl;e type utils * add back ts imputer incompatibility test * use dict get return value * call handle nullable types in oversampler and check schema equality * Remove other nullable handling across AutoMLSearch (#4085) * Update tests * remove unnecessary comments * resolve remaining comments * Copy X in testing nullable types to stop hiding potential incompatibilities in methods * Handle new oversampler nullable type incompatibility and add tests * Remove existing nullable type handling from oversampler and use _handle_nullable_types instead * Add comments to self * Add handle call for lgbm classifier * lint fix * Only call handle_nullable_types when necessary in methods * Remove remaining unnecessary handle calls * resolve remaining comments * Copy X in testing nullable types to stop hiding potential incompatibilities in methods * remove nullable type handling after sklearn upgrade fixed incompatibilities * Remove downcast call from imputer.py and fix tests * remove downcast call from catboost regressor * stop adding ohe if bool null present * remove nullable type handling from knn imputer * remove nullable type handling from targer imputer * use util for ltype deciding in simple imputer * Fix broken target imputer test * Remove replace nullable types transformer from automl search and fix tests * Handle nullable types in lgbm classifier predict_proba * fix after rebase * move util to imputer utils and other fixes * remove duplicate util * Change how imputer reinits woodwork types * Expand tests * use automl test env * fix broken test * Clean up * Add release note * Further clean up and note incompatibility * Continue clean up * fix failing docstring * remove remaining comments * fix invali target data check docstring * let knn and simple imputers use same flow of logic for readability * Move _get_new_logical_types_for_imputed_data to nullable type utils file * PR comment * Handle decomposer incompatibility (#4105) * Handle decomposer incompatibility * fix comment * Stop scaling the values up * Add release note * PR Comments * Create incompatibility for testing more similarly to how it shows up in the decomposer * Add release note * lint fix * pr notes * fix after rebase --------- Co-authored-by: chukarsten <64713315+chukarsten@users.noreply.github.com>

tamargrey force-pushed the imputer-refactor branch from 5d18f8f to ce4f423 Compare February 28, 2023 20:22

tamargrey commented Feb 28, 2023

View reviewed changes

tamargrey changed the base branch from main to component-specific-nullable-types-handling March 1, 2023 21:42

tamargrey force-pushed the imputer-refactor branch 2 times, most recently from a4d1335 to ee47192 Compare March 1, 2023 21:47

tamargrey commented Mar 1, 2023

View reviewed changes

tamargrey force-pushed the imputer-refactor branch from ee47192 to e5a16da Compare March 2, 2023 15:16

tamargrey commented Mar 2, 2023

View reviewed changes

tamargrey force-pushed the component-specific-nullable-types-handling branch from 216fac9 to 494c4a5 Compare March 2, 2023 16:01

tamargrey force-pushed the imputer-refactor branch from bdbae0b to 9076320 Compare March 2, 2023 16:01

tamargrey commented Mar 2, 2023

View reviewed changes

tamargrey marked this pull request as ready for review March 2, 2023 17:58

auto-assign bot assigned tamargrey Mar 2, 2023

tamargrey requested review from chukarsten, eccabay, christopherbunn, jeremyliweishih and ParthivNaresh March 2, 2023 18:16

jeremyliweishih requested changes Mar 2, 2023

View reviewed changes

eccabay previously requested changes Mar 3, 2023

View reviewed changes

tamargrey force-pushed the imputer-refactor branch from 81cb271 to 32eb18a Compare March 6, 2023 15:33

tamargrey force-pushed the component-specific-nullable-types-handling branch from 494c4a5 to 80be9be Compare March 6, 2023 15:44

Stop using woodwork describe to get nan info in time series imputer

862c9d9

Tamar Grey added 5 commits March 6, 2023 10:45

Fix list bug

eb210f2

remove comment

93e81b3

Update release note to mention nullable types

0984571

remove outdated comment

600aea4

Conver all bool dfs to boolean nullable instead of refitting

bcbe9e2

tamargrey force-pushed the imputer-refactor branch from 32eb18a to bcbe9e2 Compare March 6, 2023 15:45

Tamar Grey added 2 commits March 6, 2023 11:05

lint fix

f044af3

PR comments

98c0561

tamargrey linked an issue Mar 6, 2023 that may be closed by this pull request

Remove Nullable type logic from Imputer Components and Refactor #3999

Closed

tamargrey requested a review from jeremyliweishih March 7, 2023 17:17

jeremyliweishih approved these changes Mar 8, 2023

View reviewed changes

chukarsten reviewed Mar 8, 2023

View reviewed changes

chukarsten approved these changes Mar 8, 2023

View reviewed changes

add second bool col to imputer fixture

68ef36d

tamargrey merged commit 7d25090 into component-specific-nullable-types-handling Mar 8, 2023

tamargrey deleted the imputer-refactor branch March 8, 2023 20:19

chukarsten mentioned this pull request Apr 3, 2023

Release v0.73.0. #4122

Merged

		# reorder columns to match original
		X_t = X_t.ww[[col for col in original_schema.columns if col in X_t.columns]]

Refactor imputer components to remove unnecessary logic #4038

Refactor imputer components to remove unnecessary logic #4038

Conversation

tamargrey commented Feb 28, 2023 • edited Loading

codecov bot commented Feb 28, 2023 • edited Loading

Codecov Report

tamargrey Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tamargrey Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

tamargrey Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

tamargrey Mar 1, 2023 • edited Loading

Choose a reason for hiding this comment

tamargrey Mar 1, 2023 • edited Loading

Choose a reason for hiding this comment

tamargrey Mar 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tamargrey Mar 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tamargrey commented Mar 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tamargrey Mar 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyliweishih left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eccabay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyliweishih commented Mar 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chukarsten left a comment

Choose a reason for hiding this comment

tamargrey commented Feb 28, 2023 •

edited

Loading

codecov bot commented Feb 28, 2023 •

edited

Loading

tamargrey Feb 28, 2023 •

edited

Loading

tamargrey Feb 28, 2023 •

edited

Loading

tamargrey Feb 28, 2023 •

edited

Loading

tamargrey Mar 1, 2023 •

edited

Loading

tamargrey Mar 1, 2023 •

edited

Loading

tamargrey Mar 2, 2023 •

edited

Loading

tamargrey Mar 6, 2023 •

edited

Loading

tamargrey Mar 2, 2023 •

edited

Loading