Better error msg for time series predict + tests #3579

MichaelFu512 · 2022-06-21T17:52:52Z

Pull Request Description

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2022-06-21T18:00:20Z

Codecov Report

Merging #3579 (14a3ab9) into main (204c849) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3579     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        335     335             
  Lines      33350   33375     +25     
=======================================
+ Hits       33221   33246     +25     
  Misses       129     129

Impacted Files	Coverage Δ
evalml/pipelines/time_series_pipeline_base.py	`100.0% <100.0%> (ø)`
.../tests/pipeline_tests/test_time_series_pipeline.py	`99.9% <100.0%> (+0.1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 204c849...14a3ab9. Read the comment docs.

eccabay

This is looking great! I just left a few suggestions on making this code as robust as it can be.

eccabay · 2022-06-21T19:53:20Z

docs/source/release_notes.rst

@@ -6,6 +6,7 @@ Release Notes
        * Updated the Imputer and SimpleImputer to work with scikit-learn 1.1.1. :pr:`3525`
        * Bumped the minimum versions of scikit-learn to 1.1.1 and imbalanced-learn to 0.9.1. :pr:`3525`
        * Added a clearer error message when ``describe`` is called on an un-instantiated ComponentGraph :pr:`3569`
+        * Added a clear error message when ``predict`` is called without a y_train parameter :pr:`3579`


This should mention the change is only for time series problems, predict should run just fine without y_train for other problem types!

Also, this should mention the error message for X_train as well, since we now have a clearer error message in that case too!

@eccabay your devotion to clear and thorough release notes is appreciated

eccabay · 2022-06-21T19:55:08Z

evalml/pipelines/time_series_pipeline_base.py


        Returns:
            Predictions.
        """
-        X_train, y_train = self._convert_to_woodwork(X_train, y_train)
+        try:
+            print("HELLO")


We don't need this line!

oh shoot sorry about that!

eccabay · 2022-06-21T19:58:08Z

evalml/pipelines/time_series_pipeline_base.py

+            X_train, y_train = self._convert_to_woodwork(X_train, y_train)
+        except AttributeError:
+            raise ValueError(


While this does work to catch the correct bug, there's definitely easier/more pythonic ways to do this. I would also recommend doing separate checks for missing X_train and y_train, so the user has more information about their specific error!
The easiest way to do these checks would be to check if X_train and/or y_train are None.

This is a great call out @eccabay . As you start to get into our larger products, you'll notice that having specific, clear error messages are incredibly beneficial when EvalML consumers are handling these errors.

eccabay · 2022-06-21T19:59:47Z

evalml/tests/pipeline_tests/test_time_series_pipeline.py

+        ValueError,
+        match="Make sure to have a value for both X_train and y_train when calling predict",
+    ):
+        clf.predict(X_train)


If you're throwing the error if either X_train or y_train is missing, it's a good idea to cover both cases in your test.

It looks like if X_train is None, the convert_to_woodwork() function makes X_train a pd.DataFrame() and infers its type from there. It doesn't do the same for y_train which is why this initial error occurs.

So probably, instead of checking both X_train and y_train, I just need to check y_train since if X_train is null the code still runs (though it does make X_train a blank dataframe).

That's very fair! For the overall predict function (outside of just convert_to_woodwork), we do still need X_train to not be None. I think it would be better to simply check to make sure neither X_train nor y_train is None instead of depending on convert_to_woodwork() failing. It's more future-proof, too, since someday we may update that sub function to do the same "convert to empty series" for y train.

jeremyliweishih · 2022-06-21T20:08:57Z

evalml/pipelines/time_series_pipeline_base.py


        Returns:
            Predictions.
        """
-        X_train, y_train = self._convert_to_woodwork(X_train, y_train)
+        try:
+            print("HELLO")


need to remove this print

I am glad to see that someone else does this as well

jeremyliweishih · 2022-06-21T20:10:41Z

evalml/pipelines/time_series_pipeline_base.py

@@ -183,12 +183,18 @@ def predict(self, X, objective=None, X_train=None, y_train=None):
            y_train (pd.Series or None): Training labels.


@ParthivNaresh if we need X_train and y_train should we still accept None as input? I don't have the context on why we allowed this before!

Yeah I'm not sure why X_train and y_train are optional here, they're very much required

Oh I think it had to do with the fact that the parent PipelineBase has them as optional because regular regression and classification problems don't need them

jeremyliweishih · 2022-06-21T20:11:15Z

evalml/pipelines/time_series_pipeline_base.py

+        try:
+            print("HELLO")
+            X_train, y_train = self._convert_to_woodwork(X_train, y_train)
+        except AttributeError:


Think this is good but we may need to reconsider allow X_train or y_train as optional arguments. We can also consider just validating that X_train and y_train exists. Let's keep this for now and see what @ParthivNaresh has to say!

jeremyliweishih · 2022-06-21T20:12:41Z

evalml/tests/pipeline_tests/test_time_series_pipeline.py

+        X, y = ts_data_binary
+        clf = time_series_binary_classification_pipeline_class(
+            parameters={
+                "Logistic Regression Classifier": {"n_jobs": 1},


if all the parameters are the same for each of these pipelines, we can save the parameters up top and then call it to save some space.

eccabay

This is looking better! Since we've determined that both X_train and y_train should be required parameters (but we should probably keep the function signature the same as other predict functions), we should shift this back to checking for both the parameters, but separately this time!

eccabay · 2022-06-22T18:56:21Z

evalml/tests/pipeline_tests/test_time_series_pipeline.py

+        ValueError,
+        match="Make sure to have a value for both X_train and y_train when calling predict",
+    ):
+        clf.predict(X_train)


That's very fair! For the overall predict function (outside of just convert_to_woodwork), we do still need X_train to not be None. I think it would be better to simply check to make sure neither X_train nor y_train is None instead of depending on convert_to_woodwork() failing. It's more future-proof, too, since someday we may update that sub function to do the same "convert to empty series" for y train.

eccabay · 2022-06-22T18:56:42Z

evalml/tests/pipeline_tests/test_time_series_pipeline.py

-                "Logistic Regression Classifier": {"n_jobs": 1},
+                "Random Forest Regressor": {"n_jobs": 1},


What's the reason for this (and the below) change?

eccabay · 2022-06-22T19:02:42Z

evalml/tests/pipeline_tests/test_time_series_pipeline.py

+@pytest.fixture
+def my_parameters():


Generally, if we're going to go through the effort of creating a pytest fixture, we try to use it in more than one test. I'd either put this at the top of the function instead of a fixture, or update the other tests that have the same parameters in this file to use the same fixture.

Also, my_parameters isn't a very useful name 😂 could you name it something more problem-specific (and not first person!)

eccabay

Awesome, Michael, thanks for making these updates! I just left a final few comments, but other than that this is good to go! 🚢

evalml/pipelines/time_series_pipeline_base.py

evalml/tests/pipeline_tests/test_time_series_pipeline.py

…alteryx/evalml into 3443-Error-TimeSeries-Predict-y

jeremyliweishih

Sweet LGTM - just a style comment 😄

jeremyliweishih · 2022-06-23T18:27:10Z

evalml/pipelines/time_series_pipeline_base.py


        Returns:
            Predictions.
        """
+        if X_train is None:
+            raise ValueError(
+                "Make sure to have a non None value for X_train when calling time series' predict"


nit: instead of "non Non value" maybe just "to include input for X-train". Double negatives can get a little confusing. just a style thing so up to you!

MichaelFu512 added 2 commits June 21, 2022 10:46

Better error msg to time series predict + tests

6887675

Updated release notes

18aced7

MichaelFu512 added 2 commits June 21, 2022 11:16

Fixed a test I messed up by accident

4a77441

Fixed linting (forgot oops)

2a8f5ce

MichaelFu512 marked this pull request as ready for review June 21, 2022 19:04

auto-assign bot assigned MichaelFu512 Jun 21, 2022

MichaelFu512 requested review from eccabay, jeremyliweishih, christopherbunn, chukarsten and fjlanasa June 21, 2022 19:04

eccabay requested changes Jun 21, 2022

View reviewed changes

jeremyliweishih reviewed Jun 21, 2022

View reviewed changes

MichaelFu512 added 5 commits June 21, 2022 16:56

Updated error to only mention y_train

18167aa

Tried to use fixtures

f3be691

Merge branch 'main' into 3443-Error-TimeSeries-Predict-y

97a897b

Fixed a test I broke by accident

d77eef8

reverted tests

9a5cfdb

MichaelFu512 requested review from eccabay and jeremyliweishih June 22, 2022 18:27

eccabay reviewed Jun 22, 2022

View reviewed changes

MichaelFu512 added 4 commits June 22, 2022 16:20

Updated fixture to better name + used elsewhere

ae18b77

Separated check for xtrain/ytrain and added test

9f6d06e

Updated release notes

155a138

Merge branch 'main' into 3443-Error-TimeSeries-Predict-y

b77e0f4

eccabay approved these changes Jun 23, 2022

View reviewed changes

evalml/pipelines/time_series_pipeline_base.py Outdated Show resolved Hide resolved

evalml/tests/pipeline_tests/test_time_series_pipeline.py Outdated Show resolved Hide resolved

MichaelFu512 added 2 commits June 23, 2022 11:02

Addressed Becca's comments

a3fe8e5

Merge branch '3443-Error-TimeSeries-Predict-y' of https://github.com/…

50d5238

…alteryx/evalml into 3443-Error-TimeSeries-Predict-y

jeremyliweishih approved these changes Jun 23, 2022

View reviewed changes

Changed error message to be more clear

14a3ab9

MichaelFu512 merged commit 4f5f7a4 into main Jun 23, 2022

MichaelFu512 deleted the 3443-Error-TimeSeries-Predict-y branch June 23, 2022 19:34

This was referenced Jun 23, 2022

Release 0.54.0 #3586

Closed

Release 0.54.0 #3587

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better error msg for time series predict + tests #3579

Better error msg for time series predict + tests #3579

MichaelFu512 commented Jun 21, 2022

codecov bot commented Jun 21, 2022 •

edited

Loading

eccabay left a comment

eccabay Jun 21, 2022

eccabay Jun 21, 2022

chukarsten Jun 21, 2022

eccabay Jun 21, 2022

MichaelFu512 Jun 21, 2022

eccabay Jun 21, 2022

chukarsten Jun 21, 2022

eccabay Jun 21, 2022

MichaelFu512 Jun 21, 2022

eccabay Jun 22, 2022

jeremyliweishih Jun 21, 2022

chukarsten Jun 21, 2022

jeremyliweishih Jun 21, 2022

ParthivNaresh Jun 22, 2022

ParthivNaresh Jun 22, 2022

jeremyliweishih Jun 21, 2022

jeremyliweishih Jun 21, 2022

eccabay left a comment

eccabay Jun 22, 2022

eccabay Jun 22, 2022

eccabay Jun 22, 2022

eccabay left a comment

jeremyliweishih left a comment

jeremyliweishih Jun 23, 2022

MichaelFu512 Jun 23, 2022

		@@ -183,12 +183,18 @@ def predict(self, X, objective=None, X_train=None, y_train=None):
		y_train (pd.Series or None): Training labels.

		"Logistic Regression Classifier": {"n_jobs": 1},
		"Random Forest Regressor": {"n_jobs": 1},

Better error msg for time series predict + tests #3579

Better error msg for time series predict + tests #3579

Conversation

MichaelFu512 commented Jun 21, 2022

Pull Request Description

codecov bot commented Jun 21, 2022 • edited Loading

Codecov Report

eccabay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eccabay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eccabay left a comment

Choose a reason for hiding this comment

jeremyliweishih left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 21, 2022 •

edited

Loading