Conversation
a253784 to
a3aa645
Compare
Codecov Report
@@ Coverage Diff @@
## main #3626 +/- ##
=======================================
- Coverage 99.7% 99.7% -0.0%
=======================================
Files 335 335
Lines 33750 33839 +89
=======================================
+ Hits 33627 33714 +87
- Misses 123 125 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
…ble types temporary change to ClassificationPipeline.
…_nullable_types_builds_pipelines.
…uter to, column by column, use the woodwork accessor to set the schema.
…x/evalml into latest-dep-update-d117391
ParthivNaresh
left a comment
There was a problem hiding this comment.
Awesome work man, this was truly next level finessing. I think some outstanding questions/issues are:
- Why is the
ClassImbalanceDataCheckpotentially casting2.0to2? - Issue has been filed to keep pseudo floats like
2.0, 12.0, -4.0, etcasDoubleinstead ofIntegerorIntegerNullable. - Issue has been filed for support of multiple column assignments
- Issue has been filed for no exception when casting to a
Boolean. - Issue has been filed to match
ww.init()behaviour across DataFrames and Series.
| X_numeric = X.ww[self._numeric_cols.tolist()] | ||
| imputed = self._numeric_imputer.transform(X_numeric) | ||
| X_no_all_null[X_numeric.columns] = imputed | ||
| for numeric_col in X_numeric.columns: |
| ).interpolate() # Cast to float because Int64 not handled | ||
| imputed.bfill(inplace=True) # Fill in the first value, if missing | ||
| X_not_all_null[X_interpolate.columns] = imputed | ||
| X_not_all_null.ww.init(schema=X_schema) |
There was a problem hiding this comment.
Reinitializes the dataframe with the original schema excluding IntegerNullable and BooleanNullable types so that they can be reinferred post imputation
| y_imputed.bfill(inplace=True) | ||
| y_imputed.ww.init(schema=y.ww.schema) | ||
|
|
||
| X_not_all_null.ww.init(schema=X_schema) |
There was a problem hiding this comment.
Covered as part of test_numeric_only_input and test_imputer_bool_dtype_object
| cleaned_y = cleaned_y["target"] | ||
| cleaned_y = ww.init_series(cleaned_y) | ||
|
|
||
| cleaned_x.ww.init() |
There was a problem hiding this comment.
Introduction of nulls makes initialization necessary here
| return class_dic | ||
|
|
||
|
|
||
| def downcast_int_nullable_to_double(X): |
There was a problem hiding this comment.
A function that helps with some components not accepting an IntegerArray or being unable to cast values from a float to an int
| { | ||
| "int col": [0, 1, 2, 0, 3] * 4, | ||
| "float col": [0.0, 1.0, 0.0, -2.0, 5.0] * 4, | ||
| "float col": [0.1, 1.0, 0.0, -2.0, 5.0] * 4, |
| "int col": [0, 1, 2, 0, 3] * 4, | ||
| "object col": ["b", "b", "a", "c", "d"] * 4, | ||
| "float col": [0.0, 1.0, 0.0, -2.0, 5.0] * 4, | ||
| "float col": [0.1, 1.0, 0.0, -2.0, 5.0] * 4, |
…evalml into ww_0.17.0_compatibility
eccabay
left a comment
There was a problem hiding this comment.
LGTM! Just left some small efficiency and clarification questions.
|
|
||
| preds = baseline.predict(X_validation, None, X_train, y_train) | ||
| pd.testing.assert_series_equal(expected_predictions, preds) | ||
| pd.testing.assert_series_equal(expected_predictions, preds, check_dtype=False) |
There was a problem hiding this comment.
This worries me slightly - are there any scenarios where this would cause us issues down the road?
evalml/pipelines/components/transformers/samplers/base_sampler.py
Outdated
Show resolved
Hide resolved
| delayed_features = self._compute_delays(X_ww, y) | ||
| rolling_means = self._compute_rolling_transforms(X_ww, y, original_features) | ||
| features = ww.concat_columns([delayed_features, rolling_means]) | ||
| features.ww.init() |
There was a problem hiding this comment.
Can we reuse any part of the initial schema or use what we know about the dtypes of these features here to reduce the amount of type reinference this might introduce?
| self._fit(X, y) | ||
| self._classes_ = list(ww.init_series(np.unique(y))) | ||
|
|
||
| # TODO: Added this in because numpy's unique() does not support pandas.NA |
There was a problem hiding this comment.
If there's a workaround for this error, why do we start off by attempting to use numpy? Are there downsides to just using y.unique() in all cases instead?
jeremyliweishih
left a comment
There was a problem hiding this comment.
@chukarsten this looks good to me! Apologies for all the "did we file an issue" comments. Just wanted to make sure we're keep track of removing these fixes once the appropriate WW changes are in. 😄
| self._fit(X, y) | ||
| self._classes_ = list(ww.init_series(np.unique(y))) | ||
|
|
||
| # TODO: Added this in because numpy's unique() does not support pandas.NA |
There was a problem hiding this comment.
do we have an issue filed to resolve this?
| if self._interpolate_cols is not None: | ||
| X_interpolate = X.ww[self._interpolate_cols] | ||
| imputed = X_interpolate.interpolate() | ||
| # TODO: Revert when pandas introduces Float64 dtype |
There was a problem hiding this comment.
do we have an issue filed to track this?
|
|
||
| preds = baseline.predict(X_validation, None, X_train, y_train) | ||
| pd.testing.assert_series_equal(expected_predictions, preds) | ||
| pd.testing.assert_series_equal(expected_predictions, preds, check_dtype=False) |
There was a problem hiding this comment.
a little confused here - is preds coming out as an integers here and why if so?
| "target_delay_11": y_answer.shift(11), | ||
| }, | ||
| ) | ||
| answer_only_y.ww.init() |
There was a problem hiding this comment.
was the new ww.init call in TimeSeriesFeaturizer in response to this? If not should we file an issue to cover this?
All the work required to get EvalML compatible with Woodwork 0.17.2.