-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to Woodwork 0.8.1 #2783
Conversation
e1ea71a
to
5be5dca
Compare
Codecov Report
@@ Coverage Diff @@
## main #2783 +/- ##
=====================================
Coverage 99.8% 99.8%
=====================================
Files 297 297
Lines 27718 27718
=====================================
Hits 27650 27650
Misses 68 68
Continue to review full report at Codecov.
|
7a7dbc0
to
0bec5c2
Compare
evalml/pipelines/utils.py
Outdated
@@ -130,7 +131,9 @@ def _get_preprocessing_components( | |||
|
|||
# The URL and EmailAddress Featurizers will create categorical columns | |||
categorical_cols = list( | |||
X.ww.select(["category", "URL", "EmailAddress"], return_schema=True).columns | |||
X.ww.select( | |||
["category", "URL", "EmailAddress", "IPAddress"], return_schema=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chukarsten How come we need to make this change? AutoMLSearch and our pipelines can't handle the IPAddress
logical type yet.
9cf7ea4
to
8edb3e6
Compare
@@ -94,7 +99,7 @@ def transform(self, X, y=None): | |||
{ | |||
col: "Categorical" | |||
for col, ltype in X.ww.logical_types.items() | |||
if isinstance(ltype, NaturalLanguage) | |||
if isinstance(ltype, (NaturalLanguage, EmailAddress, IPAddress, URL)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the difference between WW 0.7.x and 0.8.0 is now that IPAddresses and URLs are inferred automatically. WW 0.5.1 introduced Email inference. Since these are ultimately strings, I wanted to add some support in the SimpleImputer for these types. I think they should probably be treated as categorical columns and thus able to be filled with a placeholder value.
evalml/pipelines/utils.py
Outdated
@@ -130,7 +131,9 @@ def _get_preprocessing_components( | |||
|
|||
# The URL and EmailAddress Featurizers will create categorical columns | |||
categorical_cols = list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just updating this to accommodate the IPAddress.
@@ -350,11 +350,24 @@ def test_simple_imputer_with_none(): | |||
assert_frame_equal(expected, transformed, check_dtype=False) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying this test to support IPAddress, EmailAddress and URLs.
@@ -220,7 +229,7 @@ def test_make_pipeline( | |||
) | |||
drop_col = ( | |||
[DropColumns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because URLs are now properly being inferred so pandas dataframes with them are inferred properly.
@@ -209,9 +220,7 @@ def test_make_pipeline( | |||
else [] | |||
) | |||
email_featurizer = [EmailFeaturizer] if "email" in column_names else [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This didn't seem to be necessary.
@@ -177,10 +189,9 @@ def test_make_pipeline( | |||
[OneHotEncoder] | |||
if estimator_class.model_family != ModelFamily.CATBOOST | |||
and ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't necessary but combined it for simplicity.
@@ -83,6 +83,16 @@ def _get_test_data_from_configuration( | |||
"https://github.com/alteryx/featuretools", | |||
] | |||
* 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just going to leave these as they don't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chukarsten Looks good! We can fix build_conda_pkg
by upping the max ww version after you merge! Thank you!!
5336256
to
bc98e08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!! 😁
Addresses: #2772