Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Woodwork 0.8.1 #2783

Merged
merged 3 commits into from
Sep 17, 2021
Merged

Upgrade to Woodwork 0.8.1 #2783

merged 3 commits into from
Sep 17, 2021

Conversation

chukarsten
Copy link
Contributor

Addresses: #2772

@chukarsten chukarsten force-pushed the ww_080_upgrade branch 2 times, most recently from e1ea71a to 5be5dca Compare September 14, 2021 22:12
@codecov
Copy link

codecov bot commented Sep 14, 2021

Codecov Report

Merging #2783 (7615af5) into main (916f797) will not change coverage.
The diff coverage is 100.0%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #2783   +/-   ##
=====================================
  Coverage   99.8%   99.8%           
=====================================
  Files        297     297           
  Lines      27718   27718           
=====================================
  Hits       27650   27650           
  Misses        68      68           
Impacted Files Coverage Δ
evalml/tests/pipeline_tests/test_pipeline_utils.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 916f797...7615af5. Read the comment docs.

@chukarsten chukarsten changed the title Minor fix to handle IP and URLs. Upgrade to Woodwork 0.8.0 Sep 15, 2021
@chukarsten chukarsten self-assigned this Sep 16, 2021
@chukarsten chukarsten marked this pull request as ready for review September 16, 2021 15:52
@chukarsten chukarsten force-pushed the ww_080_upgrade branch 2 times, most recently from 7a7dbc0 to 0bec5c2 Compare September 16, 2021 18:53
@@ -130,7 +131,9 @@ def _get_preprocessing_components(

# The URL and EmailAddress Featurizers will create categorical columns
categorical_cols = list(
X.ww.select(["category", "URL", "EmailAddress"], return_schema=True).columns
X.ww.select(
["category", "URL", "EmailAddress", "IPAddress"], return_schema=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chukarsten How come we need to make this change? AutoMLSearch and our pipelines can't handle the IPAddress logical type yet.

@@ -94,7 +99,7 @@ def transform(self, X, y=None):
{
col: "Categorical"
for col, ltype in X.ww.logical_types.items()
if isinstance(ltype, NaturalLanguage)
if isinstance(ltype, (NaturalLanguage, EmailAddress, IPAddress, URL))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the difference between WW 0.7.x and 0.8.0 is now that IPAddresses and URLs are inferred automatically. WW 0.5.1 introduced Email inference. Since these are ultimately strings, I wanted to add some support in the SimpleImputer for these types. I think they should probably be treated as categorical columns and thus able to be filled with a placeholder value.

@@ -130,7 +131,9 @@ def _get_preprocessing_components(

# The URL and EmailAddress Featurizers will create categorical columns
categorical_cols = list(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updating this to accommodate the IPAddress.

@@ -350,11 +350,24 @@ def test_simple_imputer_with_none():
assert_frame_equal(expected, transformed, check_dtype=False)


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying this test to support IPAddress, EmailAddress and URLs.

@@ -220,7 +229,7 @@ def test_make_pipeline(
)
drop_col = (
[DropColumns]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because URLs are now properly being inferred so pandas dataframes with them are inferred properly.

@@ -209,9 +220,7 @@ def test_make_pipeline(
else []
)
email_featurizer = [EmailFeaturizer] if "email" in column_names else []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't seem to be necessary.

@@ -177,10 +189,9 @@ def test_make_pipeline(
[OneHotEncoder]
if estimator_class.model_family != ModelFamily.CATBOOST
and (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't necessary but combined it for simplicity.

@@ -83,6 +83,16 @@ def _get_test_data_from_configuration(
"https://github.com/alteryx/featuretools",
]
* 2,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just going to leave these as they don't hurt.

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chukarsten Looks good! We can fix build_conda_pkg by upping the max ww version after you merge! Thank you!!

@chukarsten chukarsten changed the title Upgrade to Woodwork 0.8.0 Upgrade to Woodwork 0.8.1 Sep 17, 2021
Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!! 😁

@chukarsten chukarsten merged commit 2bc8654 into main Sep 17, 2021
@chukarsten chukarsten mentioned this pull request Oct 1, 2021
@freddyaboulton freddyaboulton deleted the ww_080_upgrade branch May 13, 2022 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants