Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statically set woodwork typing in tests #3697

Merged
merged 25 commits into from
Sep 16, 2022
Merged

Statically set woodwork typing in tests #3697

merged 25 commits into from
Sep 16, 2022

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented Sep 7, 2022

Closes #3651

This is not a woodwork 0.17.2 upgrade. The goal was simply to add explicit woodwork typing throughout the tests to make them more resilient to woodwork inference changes. The end result of this does have fewer tests failing in 0.17.2 since I used the upgrade as a check for general stability.
This does include some setting of woodwork types in components though, related to my previous work about being explicit about woodwork typing to reduce the number of times we re-infer within components (plus it makes tests pass 😁).

Changes in this PR:

  • Consolidation of the ts_data and get_ts_X_y test pytest fixtures into one fixture, named ts_data but with data from get_ts_X_y.
  • Updated the fixtures ts_data, imputer_test_data, X_y_binary, X_y_multi, X_y_regression, X_y_categorical_classification, and X_y_categorical_regression to explicitly set woodwork types.
  • Updated time series featurizer and data check test files to explicitly set woodwork types.
  • downcast_nullable_types now works with either a DataFrame or a Series as input data (instead of just dataframes), and there's now a test for it.
  • Removed a call to ww.init() in infer_feature_types in the case where we already have a valid schema for the input data. This has performance implications and there are performance test results for this branch.

Before vs after woodwork upgrade comparison:
Main/0.16.4 -> 0 failing tests
Main/0.17.2 -> 287 failing tests
Main /0.18.0 -> 291 failing tests

Branch/0.16.4 -> 0 failing tests
Branch/0.17.2 -> 23 failing tests
Branch/0.18.0 -> 22 failing tests

I apologize in advance for the length of this PR 😬

@codecov
Copy link

codecov bot commented Sep 7, 2022

Codecov Report

Merging #3697 (56eb074) into main (3128e44) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3697     +/-   ##
=======================================
- Coverage   99.7%   99.7%   -0.0%     
=======================================
  Files        339     339             
  Lines      34465   34386     -79     
=======================================
- Hits       34338   34254     -84     
- Misses       127     132      +5     
Impacted Files Coverage Δ
...omponent_tests/test_stacked_ensemble_classifier.py 100.0% <ø> (ø)
...ansformers/preprocessing/time_series_featurizer.py 100.0% <100.0%> (ø)
evalml/pipelines/time_series_pipeline_base.py 100.0% <100.0%> (ø)
...ts/automl_tests/parallel_tests/test_automl_dask.py 96.1% <100.0%> (-0.2%) ⬇️
evalml/tests/automl_tests/test_automl.py 99.5% <100.0%> (-<0.1%) ⬇️
...ts/automl_tests/test_automl_iterative_algorithm.py 100.0% <100.0%> (ø)
.../automl_tests/test_automl_search_classification.py 96.4% <100.0%> (-<0.1%) ⬇️
...ests/automl_tests/test_automl_search_regression.py 95.0% <100.0%> (-<0.1%) ⬇️
...valml/tests/automl_tests/test_default_algorithm.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_engine_base.py 100.0% <100.0%> (ø)
... and 44 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@eccabay eccabay marked this pull request as ready for review September 7, 2022 17:19
@chukarsten chukarsten marked this pull request as draft September 13, 2022 17:30
@chukarsten
Copy link
Contributor

Moving back to draft

Comment on lines -261 to +263
logical_types={col: "Double" for col in cols_derived_from_categoricals},
logical_types={col: "Double" for col in lagged_features.columns},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident in the correctness of this change, as to how we handle all types and if everything does in fact become a double here. If anyone knows better about this, please let me know.

@eccabay eccabay marked this pull request as ready for review September 13, 2022 19:38
Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job Becca. Really huge amount of work and very impressive that you lowered the sensitivity of the codebase to inference by so much!!

Comment on lines -4549 to -4565
if problem_type == ProblemTypes.TIME_SERIES_MULTICLASS:
X, y = ts_data_multi
elif problem_type == ProblemTypes.TIME_SERIES_BINARY:
X, y = ts_data_binary
else:
X, y = ts_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring 👍

@@ -568,7 +566,7 @@ def test_simple_imputer_ignores_natural_language(
ans = X_df.mode().iloc[0, :]
ans["natural language col"] = pd.NA
X_df.iloc[-1, :] = ans
assert_frame_equal(result, X_df)
assert_frame_equal(result, X_df, check_dtype=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classic

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on making the tests less reliant on inference. Just had a comment about nan behavior and down casting for series. Lmk what you think!

@@ -297,7 +297,22 @@ def test_schema_is_equal_fraud(fraud_100):
assert _schema_is_equal(X.ww.schema, X2.ww.schema)


def test_test_downcast_nullable_types_can_handle_no_schema():
def test_downcast_nullable_types_series():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to add the test case with NaNs as well?


X_bool_nullable_cols = X.ww.select("BooleanNullable")
X_int_nullable_cols = X.ww.select(["IntegerNullable", "AgeNullable"])
if isinstance(data, pd.Series):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the series contains null values should we still downcast into Boolean or Double? I think we should follow what we have and use ignore_null_cols to differentiate.

@eccabay eccabay merged commit fc982d7 into main Sep 16, 2022
@eccabay eccabay deleted the 3651_ww_hardening branch September 16, 2022 18:53
@chukarsten chukarsten mentioned this pull request Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Audit the Unit Test Suite for Woodwork Inference Sensitivity
4 participants