Statically set woodwork typing in tests#3697
Conversation
…licitly set types
Codecov Report
@@ Coverage Diff @@
## main #3697 +/- ##
=======================================
- Coverage 99.7% 99.7% -0.0%
=======================================
Files 339 339
Lines 34465 34386 -79
=======================================
- Hits 34338 34254 -84
- Misses 127 132 +5
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
|
Moving back to draft |
| logical_types={col: "Double" for col in cols_derived_from_categoricals}, | ||
| logical_types={col: "Double" for col in lagged_features.columns}, |
There was a problem hiding this comment.
I'm not confident in the correctness of this change, as to how we handle all types and if everything does in fact become a double here. If anyone knows better about this, please let me know.
chukarsten
left a comment
There was a problem hiding this comment.
Nice job Becca. Really huge amount of work and very impressive that you lowered the sensitivity of the codebase to inference by so much!!
| if problem_type == ProblemTypes.TIME_SERIES_MULTICLASS: | ||
| X, y = ts_data_multi | ||
| elif problem_type == ProblemTypes.TIME_SERIES_BINARY: | ||
| X, y = ts_data_binary | ||
| else: | ||
| X, y = ts_data |
| ans["natural language col"] = pd.NA | ||
| X_df.iloc[-1, :] = ans | ||
| assert_frame_equal(result, X_df) | ||
| assert_frame_equal(result, X_df, check_dtype=False) |
jeremyliweishih
left a comment
There was a problem hiding this comment.
Great work on making the tests less reliant on inference. Just had a comment about nan behavior and down casting for series. Lmk what you think!
|
|
||
|
|
||
| def test_test_downcast_nullable_types_can_handle_no_schema(): | ||
| def test_downcast_nullable_types_series(): |
There was a problem hiding this comment.
do you want to add the test case with NaNs as well?
|
|
||
| X_bool_nullable_cols = X.ww.select("BooleanNullable") | ||
| X_int_nullable_cols = X.ww.select(["IntegerNullable", "AgeNullable"]) | ||
| if isinstance(data, pd.Series): |
There was a problem hiding this comment.
if the series contains null values should we still downcast into Boolean or Double? I think we should follow what we have and use ignore_null_cols to differentiate.
Closes #3651
This is not a woodwork 0.17.2 upgrade. The goal was simply to add explicit woodwork typing throughout the tests to make them more resilient to woodwork inference changes. The end result of this does have fewer tests failing in 0.17.2 since I used the upgrade as a check for general stability.
This does include some setting of woodwork types in components though, related to my previous work about being explicit about woodwork typing to reduce the number of times we re-infer within components (plus it makes tests pass 😁).
Changes in this PR:
ts_dataandget_ts_X_ytest pytest fixtures into one fixture, namedts_databut with data fromget_ts_X_y.ts_data,imputer_test_data,X_y_binary,X_y_multi,X_y_regression,X_y_categorical_classification, andX_y_categorical_regressionto explicitly set woodwork types.downcast_nullable_typesnow works with either a DataFrame or a Series as input data (instead of just dataframes), and there's now a test for it.ww.init()ininfer_feature_typesin the case where we already have a valid schema for the input data. This has performance implications and there are performance test results for this branch.Before vs after woodwork upgrade comparison:
Main/0.16.4 -> 0 failing tests
Main/0.17.2 -> 287 failing tests
Main /0.18.0 -> 291 failing tests
Branch/0.16.4 -> 0 failing tests
Branch/0.17.2 -> 23 failing tests
Branch/0.18.0 -> 22 failing tests
I apologize in advance for the length of this PR 😬