-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise ValueError when predict/predict_proba input types don't match fit input #3036
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3036 +/- ##
=======================================
+ Coverage 99.8% 99.8% +0.1%
=======================================
Files 312 312
Lines 30340 30421 +81
=======================================
+ Hits 30249 30330 +81
Misses 91 91
Continue to review full report at Codecov.
|
@@ -642,7 +643,8 @@ def test_score_nonlinear_regression( | |||
|
|||
@patch("evalml.pipelines.BinaryClassificationPipeline.fit") | |||
@patch("evalml.pipelines.components.Estimator.predict") | |||
def test_score_binary_single(mock_predict, mock_fit, X_y_binary): | |||
@patch("evalml.pipelines.component_graph._schema_is_equal", return_value=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add this when we mock pipeline fit in order to not raise the valueError
if first.types.index.tolist() != other.types.index.tolist(): | ||
return False | ||
logical = [ | ||
x if x != "Integer" else "Double" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with @freddyaboulton and @angela97lin, we decided to treat Integer and Doubles as the same logical types until @chukarsten's work with NullableInteger
goes in. At that point, we can revisit this and see what the best way to accommodate that would be.
@@ -378,6 +384,14 @@ def _transform_features( | |||
dict: Outputs from each component. | |||
""" | |||
X = infer_feature_types(X) | |||
if not fit: | |||
if not _schema_is_equal(X.ww.schema, self._input_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 I think we can get rid of schema_is_equal
and let woodwork do the comparison for us if we use exclude
. Something like this
if not fit:
if X.ww.select(exclude=['integer'], return_schema=True) != self._input_types:
raise ValueError(
"Input X data types are different from the input types the pipeline was fitted on."
)
else:
self._input_types = X.ww.select(exclude=['integer'], return_schema=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton I had to implement this because the schema for logical types included the logical type objects (ie Categorical(), Integer()), and the equality would fail here when the objects are different instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wild, thanks for explaining! Can we add a unit test for this scenario? The unit tests on this branch pass with using the schema equality method above. I think we should also file a woodwork issue? I feel like schema equality should work in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can add a test for this case! I brought it up to the woodwork team, but seems like this is something they want to keep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this yesterday and I think it's cause the DateTime format is None in X2 in the repro you shared.
I will try to come up with a minimal repro and share with ww team. This ends up working
from evalml.demos import load_fraud
X, y = load_fraud(1000)
X2 = X.ww.copy()
X.ww.schema == X2.ww.schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @bchen1116 ! I am excited to get this out the door, hopefully it'll help debug some tricky problems that can happen between fit and predict.
fix #2855
Running
now results in:
![image](https://user-images.githubusercontent.com/22552445/141530910-00129457-6a87-4fcf-b282-8a9b9f63f7a6.png)