New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate holdout dataset passed to predict/predict_proba for time series #2804
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2804 +/- ##
=======================================
+ Coverage 99.8% 99.8% +0.1%
=======================================
Files 297 297
Lines 27719 27754 +35
=======================================
+ Hits 27651 27686 +35
Misses 68 68
Continue to review full report at Codecov.
|
e9e3611
to
1059466
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I think you just need to fix up a copy-pasta doc string and validate your double validation of the holdout sets and I think you're gucci!
X (pd.DataFrame): Data of shape [n_samples, n_features]. | ||
objective (Object or string): The objective to use to make predictions. | ||
X_train (pd.DataFrame): Training data. | ||
y_train (pd.Series): Training labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring doesn't match the function sig!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
y_holdout = self._create_empty_series(y_train) | ||
X, y_holdout = self._convert_to_woodwork(X, y_holdout) | ||
y_holdout = infer_feature_types(y_holdout) | ||
self._validate_holdout_datasets(X, X_train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intend to validate the holdout datasets twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake lol
}, | ||
) | ||
X, y = ts_data | ||
X_train, y_train = X.iloc[:15], y.iloc[:15] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to just put the 15 into a variable in case we come back to change this for some reason or another down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
elif thing_thats_wrong == "not-separated-by-gap": | ||
X = X.iloc[15 + gap + 2 : 15 + gap + 2 + forecast_horizon] | ||
else: | ||
X = X.iloc[15 + gap + 2 : 15 + gap + 2 + forecast_horizon + 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me again what the right length looks like here? What subsetting of X passes the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. X.iloc[15 + gap: 15 + gap + forecast_horizon]
a17c87a
to
7ade293
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes! Looks solid.
Pull Request Description
Fixes #2732
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.