-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better error msg for time series predict + tests #3579
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3579 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 335 335
Lines 33350 33375 +25
=======================================
+ Hits 33221 33246 +25
Misses 129 129
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! I just left a few suggestions on making this code as robust as it can be.
docs/source/release_notes.rst
Outdated
@@ -6,6 +6,7 @@ Release Notes | |||
* Updated the Imputer and SimpleImputer to work with scikit-learn 1.1.1. :pr:`3525` | |||
* Bumped the minimum versions of scikit-learn to 1.1.1 and imbalanced-learn to 0.9.1. :pr:`3525` | |||
* Added a clearer error message when ``describe`` is called on an un-instantiated ComponentGraph :pr:`3569` | |||
* Added a clear error message when ``predict`` is called without a y_train parameter :pr:`3579` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should mention the change is only for time series problems, predict
should run just fine without y_train
for other problem types!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this should mention the error message for X_train
as well, since we now have a clearer error message in that case too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eccabay your devotion to clear and thorough release notes is appreciated
|
||
Returns: | ||
Predictions. | ||
""" | ||
X_train, y_train = self._convert_to_woodwork(X_train, y_train) | ||
try: | ||
print("HELLO") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this line!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh shoot sorry about that!
X_train, y_train = self._convert_to_woodwork(X_train, y_train) | ||
except AttributeError: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this does work to catch the correct bug, there's definitely easier/more pythonic ways to do this. I would also recommend doing separate checks for missing X_train and y_train, so the user has more information about their specific error!
The easiest way to do these checks would be to check if X_train
and/or y_train
are None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great call out @eccabay . As you start to get into our larger products, you'll notice that having specific, clear error messages are incredibly beneficial when EvalML consumers are handling these errors.
ValueError, | ||
match="Make sure to have a value for both X_train and y_train when calling predict", | ||
): | ||
clf.predict(X_train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're throwing the error if either X_train or y_train is missing, it's a good idea to cover both cases in your test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like if X_train is None, the convert_to_woodwork()
function makes X_train a pd.DataFrame()
and infers its type from there. It doesn't do the same for y_train which is why this initial error occurs.
So probably, instead of checking both X_train and y_train, I just need to check y_train since if X_train is null the code still runs (though it does make X_train a blank dataframe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very fair! For the overall predict function (outside of just convert_to_woodwork
), we do still need X_train to not be None. I think it would be better to simply check to make sure neither X_train nor y_train is None instead of depending on convert_to_woodwork()
failing. It's more future-proof, too, since someday we may update that sub function to do the same "convert to empty series" for y train.
|
||
Returns: | ||
Predictions. | ||
""" | ||
X_train, y_train = self._convert_to_woodwork(X_train, y_train) | ||
try: | ||
print("HELLO") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove this print
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am glad to see that someone else does this as well
@@ -183,12 +183,18 @@ def predict(self, X, objective=None, X_train=None, y_train=None): | |||
y_train (pd.Series or None): Training labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ParthivNaresh if we need X_train
and y_train
should we still accept None
as input? I don't have the context on why we allowed this before!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not sure why X_train
and y_train
are optional here, they're very much required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think it had to do with the fact that the parent PipelineBase
has them as optional because regular regression and classification problems don't need them
try: | ||
print("HELLO") | ||
X_train, y_train = self._convert_to_woodwork(X_train, y_train) | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this is good but we may need to reconsider allow X_train
or y_train
as optional arguments. We can also consider just validating that X_train
and y_train
exists. Let's keep this for now and see what @ParthivNaresh has to say!
X, y = ts_data_binary | ||
clf = time_series_binary_classification_pipeline_class( | ||
parameters={ | ||
"Logistic Regression Classifier": {"n_jobs": 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if all the parameters are the same for each of these pipelines, we can save the parameters up top and then call it to save some space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking better! Since we've determined that both X_train and y_train should be required parameters (but we should probably keep the function signature the same as other predict functions), we should shift this back to checking for both the parameters, but separately this time!
ValueError, | ||
match="Make sure to have a value for both X_train and y_train when calling predict", | ||
): | ||
clf.predict(X_train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very fair! For the overall predict function (outside of just convert_to_woodwork
), we do still need X_train to not be None. I think it would be better to simply check to make sure neither X_train nor y_train is None instead of depending on convert_to_woodwork()
failing. It's more future-proof, too, since someday we may update that sub function to do the same "convert to empty series" for y train.
"Logistic Regression Classifier": {"n_jobs": 1}, | ||
"Random Forest Regressor": {"n_jobs": 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this (and the below) change?
@pytest.fixture | ||
def my_parameters(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, if we're going to go through the effort of creating a pytest fixture, we try to use it in more than one test. I'd either put this at the top of the function instead of a fixture, or update the other tests that have the same parameters in this file to use the same fixture.
Also, my_parameters
isn't a very useful name 😂 could you name it something more problem-specific (and not first person!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, Michael, thanks for making these updates! I just left a final few comments, but other than that this is good to go! 🚢
…alteryx/evalml into 3443-Error-TimeSeries-Predict-y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet LGTM - just a style comment 😄
|
||
Returns: | ||
Predictions. | ||
""" | ||
if X_train is None: | ||
raise ValueError( | ||
"Make sure to have a non None value for X_train when calling time series' predict" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of "non Non value" maybe just "to include input for X-train". Double negatives can get a little confusing. just a style thing so up to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure!
Pull Request Description
Closes #3443
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.