Add support for PostalCode, SubRegionCode, CountryCode logical types #2946

eccabay · 2021-10-21T21:17:38Z

codecov · 2021-10-21T21:22:47Z

Codecov Report

Merging #2946 (1ed6e42) into main (c11809c) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #2946     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        307     307             
  Lines      29049   29197    +148     
=======================================
+ Hits       28958   29106    +148     
  Misses        91      91

Impacted Files	Coverage Δ
evalml/model_understanding/graphs.py	`100.0% <100.0%> (ø)`
...es/components/transformers/samplers/oversampler.py	`100.0% <100.0%> (ø)`
evalml/tests/component_tests/test_oversampler.py	`100.0% <100.0%> (ø)`
...s/prediction_explanations_tests/test_explainers.py	`100.0% <100.0%> (ø)`
...del_understanding_tests/test_partial_dependence.py	`99.4% <100.0%> (+0.1%)`	⬆️
...understanding_tests/test_permutation_importance.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c11809c...1ed6e42. Read the comment docs.

freddyaboulton

@eccabay Thank you for this! I think this is ready to merge except we should make sure the one-way partial dependence plot is a bar plot for postal code etc instead of a line plot.

freddyaboulton · 2021-10-22T15:04:43Z

evalml/pipelines/components/transformers/samplers/oversampler.py

@@ -88,8 +88,8 @@ def _get_categorical(self, X):
        X = infer_feature_types(X)
        self.categorical_features = [
            i
-            for i, val in enumerate(X.ww.types["Logical Type"].items())
-            if str(val[1]) in {"Boolean", "Categorical"}
+            for i, val in enumerate(X.ww.semantic_tags.items())


Don't we also need to check for boolean here? Why not do X.ww.select(['category', 'boolean']) ?

Wait @bchen1116 I think there might be a bug here on main? If there are either categorical or boolean features in the input along with numerics, the sampler should be SMOTENC not SMOTE right? I thought that's why we had changed the one hot encoder to encode the created features as boolean?

Repro on main

from evalml.automl import AutoMLSearch from evalml.demos import load_fraud import imblearn.over_sampling as imb X, y = load_fraud(100) X = X.ww[["provider", "country", "amount", "region"]] automl = AutoMLSearch(X, y, "binary", verbose=True) automl.search() pipeline_3 = automl.get_pipeline(3) pipeline_3.fit(X, y) assert pipeline_3.get_component("Oversampler").sampler == imb.SMOTE

Not blocking this pr since it preserves this behavior but if this is a bug we should file another issue.

@freddyaboulton Yeah, you're right! That's a good catch, this should be SMOTENC in the case thata there are both numeric and categorical. It's likely through this line. I think we should be grabbing both categorical and booleans, not just categoricals.

I can file an issue here!

Thank you @bchen1116 !

evalml/model_understanding/graphs.py

bchen1116

LGTM! Just left one nit pick just for clarity bcause I was confused for a while.

evalml/model_understanding/graphs.py

eccabay added 12 commits October 15, 2021 16:54

Fix oversampler categorical bug

765e456

Merge branch 'main' into 2856_postalcode

b233de0

Merge branch 'main' into 2856_postalcode

e583b44

Add _is_feature_of_semantic_type to enable graphing 2way pdep

8ea8852

Add partial dependence tests

1d3b4e0

Add graph partial dependence tests

a723a1f

Add permutation importance test

360b616

Merge branch 'main' into 2856_postalcode

9ba9a79

Add explain predictions tests

bbf3526

Lint fixes

7d7e736

Merge branch 'main' into 2856_postalcode

564b06e

Update release notes

8b08049

Fix mysterious two way pdep swap

184c7f9

eccabay marked this pull request as ready for review October 22, 2021 13:32

auto-assign bot assigned eccabay Oct 22, 2021

eccabay requested review from freddyaboulton, bchen1116, angela97lin, chukarsten and dsherry and removed request for freddyaboulton and bchen1116 October 22, 2021 13:32

freddyaboulton approved these changes Oct 22, 2021

View reviewed changes

eccabay added 3 commits October 22, 2021 11:54

Add catch for boolean in oversampler get_categorical

8cb5bf5

Fix partial dependence graphs to be bar

662124b

Merge branch 'main' into 2856_postalcode

bc28a4f

bchen1116 mentioned this pull request Oct 22, 2021

SMOTENC Oversampler fix #2947

Closed

Comments and remove redundant code

c640931

bchen1116 approved these changes Oct 22, 2021

View reviewed changes

evalml/model_understanding/graphs.py Show resolved Hide resolved

eccabay added 3 commits October 22, 2021 16:38

Nitpicks

6d541be

Undo nitpicks for lint

53bc1b8

Merge branch 'main' into 2856_postalcode

1ed6e42

eccabay merged commit 910fbd0 into main Oct 25, 2021

eccabay deleted the 2856_postalcode branch October 25, 2021 12:42

chukarsten mentioned this pull request Oct 27, 2021

Release v0.36.0 #2974

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for PostalCode, SubRegionCode, CountryCode logical types #2946

Add support for PostalCode, SubRegionCode, CountryCode logical types #2946

eccabay commented Oct 21, 2021

codecov bot commented Oct 21, 2021 •

edited

Loading

freddyaboulton left a comment

freddyaboulton Oct 22, 2021

freddyaboulton Oct 22, 2021

bchen1116 Oct 22, 2021 •

edited

Loading

freddyaboulton Oct 22, 2021

bchen1116 left a comment

Add support for PostalCode, SubRegionCode, CountryCode logical types #2946

Add support for PostalCode, SubRegionCode, CountryCode logical types #2946

Conversation

eccabay commented Oct 21, 2021

codecov bot commented Oct 21, 2021 • edited Loading

Codecov Report

freddyaboulton left a comment

Choose a reason for hiding this comment

freddyaboulton Oct 22, 2021

Choose a reason for hiding this comment

freddyaboulton Oct 22, 2021

Choose a reason for hiding this comment

bchen1116 Oct 22, 2021 • edited Loading

Choose a reason for hiding this comment

freddyaboulton Oct 22, 2021

Choose a reason for hiding this comment

bchen1116 left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 21, 2021 •

edited

Loading

bchen1116 Oct 22, 2021 •

edited

Loading