Raise ValueError when predict/predict_proba input types don't match fit input #3036

bchen1116 · 2021-11-11T17:46:03Z

Running

import pandas as pd
from evalml import AutoMLSearch
import logging
logging.basicConfig(level=logging.ERROR)

path = "/Users/bryan.chen/Downloads/string_nan_ex/1625078186889-mushroom_subset.csv"

df = pd.read_csv(path)
y_train = df['class']
X_train = df.drop('class', axis=1)

aml = AutoMLSearch(X_train, y_train, 'binary', verbose=False)
aml.search()


pipeline = aml.best_pipeline
holdout = pd.read_csv("/Users/bryan.chen/Downloads/string_nan_ex/mushroom_holdout.csv")
pipeline.predict(holdout)

now results in:

codecov · 2021-11-11T17:49:50Z

Codecov Report

Merging #3036 (76f733b) into main (d6682a4) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3036     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        312     312             
  Lines      30340   30421     +81     
=======================================
+ Hits       30249   30330     +81     
  Misses        91      91

Impacted Files	Coverage Δ
evalml/utils/__init__.py	`100.0% <ø> (ø)`
evalml/pipelines/component_graph.py	`99.8% <100.0%> (+0.1%)`	⬆️
...valml/tests/pipeline_tests/test_component_graph.py	`99.9% <100.0%> (+0.1%)`	⬆️
evalml/tests/pipeline_tests/test_pipelines.py	`99.8% <100.0%> (+0.1%)`	⬆️
evalml/tests/utils_tests/test_woodwork_utils.py	`100.0% <100.0%> (ø)`
evalml/utils/woodwork_utils.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d6682a4...76f733b. Read the comment docs.

…5_types

bchen1116 · 2021-11-12T20:29:39Z

evalml/tests/pipeline_tests/test_pipelines.py

@@ -642,7 +643,8 @@ def test_score_nonlinear_regression(

 @patch("evalml.pipelines.BinaryClassificationPipeline.fit")
 @patch("evalml.pipelines.components.Estimator.predict")
-def test_score_binary_single(mock_predict, mock_fit, X_y_binary):
+@patch("evalml.pipelines.component_graph._schema_is_equal", return_value=True)


Need to add this when we mock pipeline fit in order to not raise the valueError

bchen1116 · 2021-11-12T20:31:04Z

evalml/utils/woodwork_utils.py

+    if first.types.index.tolist() != other.types.index.tolist():
+        return False
+    logical = [
+        x if x != "Integer" else "Double"


After discussion with @freddyaboulton and @angela97lin, we decided to treat Integer and Doubles as the same logical types until @chukarsten's work with NullableInteger goes in. At that point, we can revisit this and see what the best way to accommodate that would be.

freddyaboulton · 2021-11-15T16:41:47Z

evalml/pipelines/component_graph.py

@@ -378,6 +384,14 @@ def _transform_features(
            dict: Outputs from each component.
        """
        X = infer_feature_types(X)
+        if not fit:
+            if not _schema_is_equal(X.ww.schema, self._input_types):


@bchen1116 I think we can get rid of schema_is_equal and let woodwork do the comparison for us if we use exclude. Something like this

if not fit: if X.ww.select(exclude=['integer'], return_schema=True) != self._input_types: raise ValueError( "Input X data types are different from the input types the pipeline was fitted on." ) else: self._input_types = X.ww.select(exclude=['integer'], return_schema=True)

@freddyaboulton I had to implement this because the schema for logical types included the logical type objects (ie Categorical(), Integer()), and the equality would fail here when the objects are different instances.

Wild, thanks for explaining! Can we add a unit test for this scenario? The unit tests on this branch pass with using the schema equality method above. I think we should also file a woodwork issue? I feel like schema equality should work in this case.

Can add a test for this case! I brought it up to the woodwork team, but seems like this is something they want to keep

I looked into this yesterday and I think it's cause the DateTime format is None in X2 in the repro you shared.

I will try to come up with a minimal repro and share with ww team. This ends up working

from evalml.demos import load_fraud X, y = load_fraud(1000) X2 = X.ww.copy() X.ww.schema == X2.ww.schema

…5_types

freddyaboulton

Thank you @bchen1116 ! I am excited to get this out the door, hopefully it'll help debug some tricky problems that can happen between fit and predict.

evalml/utils/woodwork_utils.py

evalml/tests/utils_tests/test_woodwork_utils.py

initial impl

0a790a7

bchen1116 self-assigned this Nov 11, 2021

update release notes

53a841b

bchen1116 added 11 commits November 11, 2021 13:15

fix infer

aa509c8

add tests and private method

604689f

fix tests

27bff4f

Merge branch 'main' into bc_2855_types

ddee479

update

852a09f

Merge branch 'bc_2855_types' of github.com:alteryx/evalml into bc_285…

b1b9fab

…5_types

fix notebook

e78aa58

update code

c492813

updat test

c1b9e52

fix test

2306857

remove unneeded files

8fc7453

bchen1116 commented Nov 12, 2021

View reviewed changes

bchen1116 requested review from freddyaboulton, angela97lin, chukarsten, christopherbunn, eccabay, jeremyliweishih and ParthivNaresh and removed request for freddyaboulton and angela97lin November 12, 2021 20:32

bchen1116 added 2 commits November 12, 2021 16:53

Merge branch 'main' into bc_2855_types

95e0c76

Merge branch 'main' into bc_2855_types

e905788

Merge branch 'main' into bc_2855_types

3e1a299

freddyaboulton reviewed Nov 15, 2021

View reviewed changes

bchen1116 added 3 commits November 15, 2021 12:59

add test

c56f48e

Merge branch 'main' into bc_2855_types

215a234

Merge branch 'main' into bc_2855_types

4391eaf

bchen1116 requested a review from freddyaboulton November 15, 2021 22:40

bchen1116 added 2 commits November 15, 2021 18:29

update test

1aeaf2e

Merge branch 'bc_2855_types' of github.com:alteryx/evalml into bc_285…

67971b2

…5_types

freddyaboulton approved these changes Nov 16, 2021

View reviewed changes

evalml/utils/woodwork_utils.py Outdated Show resolved Hide resolved

evalml/tests/utils_tests/test_woodwork_utils.py Show resolved Hide resolved

evalml/tests/utils_tests/test_woodwork_utils.py Show resolved Hide resolved

bchen1116 added 6 commits November 17, 2021 10:18

address comments

0319746

Merge branch 'main' into bc_2855_types

0e99ff3

Merge branch 'main' into bc_2855_types

3c03b0a

Merge branch 'main' into bc_2855_types

c117012

Merge branch 'main' into bc_2855_types

d98197b

Merge branch 'main' into bc_2855_types

76f733b

bchen1116 merged commit 401457c into main Nov 17, 2021

chukarsten mentioned this pull request Nov 29, 2021

Release v.0.38.0 #3102

Merged

freddyaboulton deleted the bc_2855_types branch May 13, 2022 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise ValueError when predict/predict_proba input types don't match fit input #3036

Raise ValueError when predict/predict_proba input types don't match fit input #3036

bchen1116 commented Nov 11, 2021 •

edited

Loading

codecov bot commented Nov 11, 2021 •

edited

Loading

bchen1116 Nov 12, 2021

bchen1116 Nov 12, 2021

freddyaboulton Nov 15, 2021

bchen1116 Nov 15, 2021 •

edited

Loading

freddyaboulton Nov 15, 2021

bchen1116 Nov 15, 2021

freddyaboulton Nov 16, 2021

freddyaboulton left a comment

Raise ValueError when predict/predict_proba input types don't match fit input #3036

Raise ValueError when predict/predict_proba input types don't match fit input #3036

Conversation

bchen1116 commented Nov 11, 2021 • edited Loading

codecov bot commented Nov 11, 2021 • edited Loading

Codecov Report

bchen1116 Nov 12, 2021

Choose a reason for hiding this comment

bchen1116 Nov 12, 2021

Choose a reason for hiding this comment

freddyaboulton Nov 15, 2021

Choose a reason for hiding this comment

bchen1116 Nov 15, 2021 • edited Loading

Choose a reason for hiding this comment

freddyaboulton Nov 15, 2021

Choose a reason for hiding this comment

bchen1116 Nov 15, 2021

Choose a reason for hiding this comment

freddyaboulton Nov 16, 2021

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

bchen1116 commented Nov 11, 2021 •

edited

Loading

codecov bot commented Nov 11, 2021 •

edited

Loading

bchen1116 Nov 15, 2021 •

edited

Loading