Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nullable Types: infer_feature_types modifications #3156

Merged
merged 11 commits into from Jan 6, 2022
Merged

Conversation

chukarsten
Copy link
Collaborator

@chukarsten chukarsten commented Dec 15, 2021

Addresses: #3060 and #3061

Removing the ignore_nullable_types arg, the nullable types check from infer_feature_types and testing for infer_feature_types to raise an exception when they are detected.

@codecov
Copy link

codecov bot commented Dec 15, 2021

Codecov Report

Merging #3156 (a78dba4) into main (3ad0d3c) will decrease coverage by 0.8%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3156     +/-   ##
=======================================
- Coverage   99.7%   99.0%   -0.7%     
=======================================
  Files        326     326             
  Lines      31432   31444     +12     
=======================================
- Hits       31328   31107    -221     
- Misses       104     337    +233     
Impacted Files Coverage Δ
evalml/data_checks/invalid_target_data_check.py 100.0% <ø> (ø)
evalml/tests/utils_tests/test_woodwork_utils.py 100.0% <ø> (ø)
...valml/automl/automl_algorithm/default_algorithm.py 100.0% <100.0%> (ø)
...lml/automl/automl_algorithm/iterative_algorithm.py 100.0% <100.0%> (ø)
evalml/automl/automl_search.py 99.7% <100.0%> (-0.1%) ⬇️
evalml/data_checks/highly_null_data_check.py 100.0% <100.0%> (ø)
evalml/data_checks/no_variance_data_check.py 100.0% <100.0%> (ø)
evalml/data_checks/outliers_data_check.py 100.0% <100.0%> (ø)
...ansformers/preprocessing/replace_nullable_types.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 99.7% <100.0%> (ø)
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ad0d3c...a78dba4. Read the comment docs.

@@ -105,7 +105,7 @@ def validate(self, X, y=None):
"""
results = {"warnings": [], "errors": [], "actions": []}

X = infer_feature_types(X, ignore_nullable_types=True)
X = infer_feature_types(X)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that #3061 is also done?

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with Freddy! I think this should be done in conjunction with #3061 so that we can ensure this change doesn't break anything for data checks

@chukarsten
Copy link
Collaborator Author

@bchen1116 @freddyaboulton I've added additional testing for the DefaultDataChecks suite as well as some of the data checks that explicitly deal with nulls to check that they jibe with the new nullable types. Let me know what you think!

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chukarsten This looks good to me! Thanks for adding the coverage for the data checks + nullable types.

@@ -10,6 +10,8 @@
ww.logical_types.Integer.type_string,
ww.logical_types.Double.type_string,
ww.logical_types.Boolean.type_string,
ww.logical_types.IntegerNullable.type_string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add Age/AgeNullable here? I wonder if we can get rid of this list though.

@@ -216,7 +218,13 @@ def test_default_data_checks_classification(input_type):

y = pd.Series([0, 1, np.nan, 1, 0])
y_multiclass = pd.Series([0, 1, np.nan, 2, 0])
X.ww.init(logical_types={"natural_language_nan": "NaturalLanguage"})
X.ww.init(
logical_types={
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think this covering the default data checks gives us coverage for a lot of the data checks.

@chukarsten chukarsten merged commit 37f2734 into main Jan 6, 2022
@chukarsten chukarsten deleted the nt_remove_nt_check branch January 6, 2022 06:11
@chukarsten chukarsten mentioned this pull request Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants