New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeseries regression pipeline #1418
Conversation
e7a47d4
to
c03e186
Compare
Codecov Report
@@ Coverage Diff @@
## main #1418 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 216 218 +2
Lines 14226 14438 +212
=========================================
+ Hits 14219 14431 +212
Misses 7 7
Continue to review full report at Codecov.
|
@@ -11,7 +11,7 @@ class StackedEnsembleRegressor(StackedEnsembleBase): | |||
"""Stacked Ensemble Regressor.""" | |||
name = "Stacked Ensemble Regressor" | |||
model_family = ModelFamily.ENSEMBLE | |||
supported_problem_types = [ProblemTypes.REGRESSION] | |||
supported_problem_types = [ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to make this change because of _validate_estimator_problem_type
return X_t | ||
|
||
def _fit(self, X, y): | ||
def _compute_features_during_fit(self, X, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to make this small refactor because we need to make sure the Nans are dropped before the estimator is fit. I think it makes the most sense to do that in the time series pipeline as opposed to the base class.
@@ -81,6 +82,8 @@ def _get_pipeline_base_class(problem_type): | |||
return MulticlassClassificationPipeline | |||
elif problem_type == ProblemTypes.REGRESSION: | |||
return RegressionPipeline | |||
elif problem_type == ProblemTypes.TIME_SERIES_REGRESSION: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to make this change so that the unit tests for make_pipeline don't error out for TIME_SERIES_REGRESSION problem types.
I'll modify the logic of make_pipeline to handle time series when we integrate time series regression into automl search in #1382
@@ -60,6 +60,9 @@ def test_scikit_learn_wrapper(X_y_binary, X_y_multi, X_y_regression): | |||
num_classes = 3 | |||
elif problem_type == ProblemTypes.REGRESSION: | |||
X, y = X_y_regression | |||
elif problem_type == ProblemTypes.TIME_SERIES_REGRESSION: | |||
# Skipping because make_pipeline_from_components does not yet work for time series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is because of the pipeline-level paramters of gap and max_delay. I imagine we'll use the same pattern to specify them that we will introduce in #1382 .
return X, y | ||
|
||
|
||
class MockDelayedFeatures(Transformer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the complexity from time series comes from the nans that are introduced into the features. We need a component that does this to make the tests more robust. I'll delete this in favor of the bona-fide DelayedFeaturesTransformer when #1379 is merged.
Deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I read the impl and left some thoughts and questions.
evalml/pipelines/pipeline_base.py
Outdated
@@ -190,17 +190,24 @@ def compute_estimator_features(self, X): | |||
""" | |||
X_t = X | |||
for component in self.component_graph[:-1]: | |||
X_t = component.transform(X_t) | |||
X_t = component.transform(X_t, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 100% a style nit-pick, but do you think we should say y=y
given that y
is a named arg to transform
, just for clarity/consistency?
evalml/pipelines/time_series.py
Outdated
@@ -0,0 +1,82 @@ | |||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton our current pattern for file naming means we should call this file time_series_regression_pipeline.py
We can certainly change that pattern, just calling that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of adding all of the time series pipelines in one file but let's stick with the regression vs classification separation we have with our files!
evalml/pipelines/__init__.py
Outdated
@@ -44,3 +44,5 @@ | |||
BaselineRegressionPipeline, | |||
MeanBaselineRegressionPipeline | |||
) | |||
|
|||
from .time_series import TimeSeriesRegressionPipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
evalml/pipelines/time_series.py
Outdated
def __init__(self, parameters, gap, max_delay, random_state=0): | ||
super().__init__(parameters, random_state) | ||
self.gap = gap | ||
self.max_delay = max_delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit-pick, do super
last
evalml/pipelines/time_series.py
Outdated
class TimeSeriesRegressionPipeline(RegressionPipeline): | ||
problem_type = ProblemTypes.TIME_SERIES_REGRESSION | ||
|
||
def __init__(self, parameters, gap, max_delay, random_state=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also didn't we say gap/max_delay/min_delay would go into parameters
under the key 'pipeline' as "pipeline parameters"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of doing that once we got to the AutoML integration but it's way better to do this now. I just pushed this change up. The one thing I'd like to hear your thoughts on is whether we should raise an exception when the pipeline
key is missing. I vote yes because it doesn't make sense to specify default arguments for these pipeline-level arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Yeah, the timeseries pipelines need those timeseries parameters in order to function. So yes, if gap
and max_delay
aren't included in the parameters
dict under the pipeline
key, I think we should raise an exception.
Should we prefix these params with timeseries
? I.e. timeseries_gap
, timeseries_max_delay
. An argument for this would be that we may add other pipeline-level params in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what you mean? This is how a user would instantiate a TS regression pipeline right now:
pl = Pipeline({"Delayed Feature Transformer": {"gap": gap, "max_delay": max_delay,
"delay_features": True,
"delay_target": True},
"pipeline": {"gap": gap, "max_delay": max_delay}})
I think renaming to the "pipeline" dict keys to "timeseries_gap" and "timeseries_max_delay" is bit too verbose. We can add more pipeline level params to the "pipeline" dict as we need without conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I agree! Let's stick with the example you've provided here.
evalml/pipelines/time_series.py
Outdated
X_t, _ = self._compute_features_during_fit(X, y=None) | ||
y = X.squeeze() | ||
else: | ||
y = pd.Series(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this out, like we're doing for X online 29? Will make it easier to normalize to dataframe when that comes in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh actually @freddyaboulton we should be converting these to woodwork now that @angela97lin merged #1393. Check out the code in the other pipeline classes. That just got merged recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reminding me! Just pushed this up.
evalml/pipelines/time_series.py
Outdated
X_t, _ = self._compute_features_during_fit(X, y) | ||
|
||
y_shifted = y.shift(-self.gap) | ||
X_t, y_shifted = drop_nan(X_t, y_shifted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
Perhaps drop_nan_rows
or drop_rows_with_nans
would be a good name for this helper function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely agree that this needs a clearer name. Let me know what you think of my suggestion below to use keep_non_nan_rows
.
evalml/pipelines/time_series.py
Outdated
if any_values_are_nan(features): | ||
return pad_with_nans(predictions, self.max_delay) | ||
else: | ||
return predictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just always run pad_with_nans
, and in the case where there are no nans, nothing happens? Maybe I'm not following this code right yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predictions
will never have NaNs. I'm doing this because our design says that the data passed into this method and the predictions should always have the same number of rows. Since we drop NaN rows before calling predict
on the estimator, I need to pad with NaNs.
There are other ways to do this:
return pad_with_nans(predictions, self.max_delay if any_values_are_nan(features) else 0)
if features.shape[0] != len(predictions):
return pad_with_nans(predictions, self.max_delay)
else:
return predictions
return pad_with_nans(predictions, self.max_delay if features.shape[0] != len(predictions) else 0)
Which do you think is best? Open to suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it, thanks, will respond on new code since this comment is now outdated
evalml/pipelines/time_series.py
Outdated
pd.Series: Predicted values. | ||
""" | ||
features = self.compute_estimator_features(X, y) | ||
predictions = self.estimator.predict(features.dropna(axis=0, how="any")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the drop_nan
helper here too? Maybe I'm just confused about the names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should rename the drop_nan
helper keep_non_nan_rows
? Since the target and features will have NaNs in different places in score
, we need a way for selecting the rows that are not NaN in both of them. As far as I know, there isn't a pandas method that does this.
In this case, we just need to drop the NaN rows from a dataframe and pandas has a built-in for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I like that name. Right, I don't think there's a convenient builtin in pandas for that particular use-case. I've always either used dropna
or isnan
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I changed the name
evalml/pipelines/time_series.py
Outdated
""" | ||
y_predicted = self.predict(X, y) | ||
if y is None: | ||
y = X.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this? If y isn't provided, that's an error...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the case where there are no features and only y is used but I just cleaned this up!
6a791a4
to
51392d4
Compare
e34eb7b
to
bd70f7b
Compare
X = _convert_to_woodwork_structure(X) | ||
y = _convert_to_woodwork_structure(y) | ||
X = _convert_woodwork_types_wrapper(X.to_pandas()) | ||
y = _convert_woodwork_types_wrapper(y.to_pandas()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! @angela97lin does this look good to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just left a few docstring comments / questions for my own curiosity haha
Returns: | ||
self | ||
""" | ||
if X is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, what's the point of this? Is there a case where the user only passes in y X is None for time series regression problems? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! It's common to build ts models based on only delayed values of the target. Like let's say you have a sequence of daily stock prices or temperature readings, you could model future values based solely on the previous values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. One common model type like this is ARIMA.
I think in practice, most timeseries pipelines will use the input features.
Arguments: | ||
parameters (dict): Dictionary with component names as keys and dictionary of that component's parameters as values. | ||
An empty dictionary {} implies using all default values for component parameters. Pipeline-level | ||
parameters such as gap and max_delay must be specified as a dictionary with the "pipeline" key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"must be specified in a dictionary with the "pipeline" key"? Hmm. This was a tad bit confusing to me--not sure best way to clear it up via wording. Maybe would be good to give an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion - I added an example!
evalml/utils/gen_utils.py
Outdated
def pad_with_nans(pd_data, num_to_pad): | ||
"""Pad the beginning num_to_pad rows with nans. | ||
|
||
Arguments: | ||
pd_data (pd.DataFrame or pd.Series): Data to pad. | ||
|
||
Returns: | ||
pd.DataFrame or pd.Series | ||
""" | ||
if isinstance(pd_data, pd.Series): | ||
padding = pd.Series([None] * num_to_pad) | ||
else: | ||
padding = pd.DataFrame({col: [None] * num_to_pad | ||
for col in pd_data.columns}) | ||
return pd.concat([padding, pd_data], ignore_index=True).infer_objects() | ||
|
||
|
||
def keep_non_nan_rows(pd_data_1, pd_data_2): | ||
"""Keep nows that do not have any NaNs in both pd_data_1 and pd_data_2. | ||
|
||
Returns: | ||
tuple of pd.DataFrame or pd.Series | ||
""" | ||
|
||
def _not_nan(pd_data): | ||
if isinstance(pd_data, pd.Series): | ||
return ~pd_data.isna().values | ||
else: | ||
return ~pd_data.isna().any(axis=1).values | ||
|
||
mask = np.logical_and(_not_nan(pd_data_1), _not_nan(pd_data_2)) | ||
return pd_data_1.iloc[mask], pd_data_2.iloc[mask] | ||
|
||
|
||
def any_values_are_nan(pd_data): | ||
"""Determine if any rows have at least one NaN. | ||
Args: | ||
pd_data (pd.Dataframe or pd.Series): Pandas data to check. | ||
|
||
Returns: | ||
pd.Series with bool values. Each value corresponds to if any values in the row are NaN. | ||
""" | ||
return pd_data.isna().values.any() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's either specify these in the API docs or keep them private? :d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added them to the API docs! Thanks for reminding me.
evalml/utils/gen_utils.py
Outdated
|
||
def any_values_are_nan(pd_data): | ||
"""Determine if any rows have at least one NaN. | ||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Args --> Arguments
evalml/utils/gen_utils.py
Outdated
Returns: | ||
pd.Series with bool values. Each value corresponds to if any values in the row are NaN. | ||
""" | ||
return pd_data.isna().values.any() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably necessary/not since it's just one line? :d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted!
evalml/utils/gen_utils.py
Outdated
|
||
|
||
def keep_non_nan_rows(pd_data_1, pd_data_2): | ||
"""Keep nows that do not have any NaNs in both pd_data_1 and pd_data_2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep nows --> Keep rows
pipeline_params = parameters.pop("pipeline") | ||
self.gap = pipeline_params['gap'] | ||
self.max_delay = pipeline_params['max_delay'] | ||
super().__init__(parameters, random_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do parameters.pop
as above, the pipeline-level params won't be saved in the parameters
attribute of the pipeline instance. Can we make it so those parameters are saved just like any other component's parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; we should add a test to make sure gap
and max_delay
are in the parameters dict after initialization to check for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point! I had to modify PipelineBase a little to get this to work (the pipeline parameters method would only read parameters from the component graph) but I'll push this up momentarily!
I decided to only save the pipeline-level parameters if they are present to avoid introducing breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to only save the pipeline-level parameters if they are present to avoid introducing breaking changes.
I.e. whatever the user passes in is what's saved on the pipeline? If so, yep I agree. We shouldn't create an empty "pipelines" entry in parameters dicts, only create if its populated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what I meant!
|
||
features = self.compute_estimator_features(X, y) | ||
predictions = self.estimator.predict(features.dropna(axis=0, how="any")) | ||
return pad_with_nans(predictions, self.max_delay if features.isna().values.any() else 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton this looks good!
Why not say pad_with_nans(predictions, self.max_delay)
? I figure we should rely on the configuration rather than on properties of the data. Fundamentally, we need the shape of the output to match the number of rows in X and y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user doesn't have any delayed features (either the DelayedFeatureTransformer
is missing from the component graph or they passed delay_features:False, 'delay_target:False
to DelayFeatureTransformer
) then we don't need to pad the predictions.
pad_with_nans(predictions, self.max_delay)
will always add max_delay
number of rows with NaNs to the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks.
Edge cases to consider:
delay_features
anddelay_target
wereFalse
. And, due to a bug or unexpected component behavior,features
happens to have anan
value somewhere, even though there were nonans
created by the delayed featurizer.delay_features
ordelay_target
wereTrue
. And, due to a bug or unexpected component behavior,features
happens to not have anynans
. Maybe this case isn't important, but it could happen if someone put an imputer after the delayed feature eng.
Is there a foolproof way to avoid these sorts of scenarios? What about
n_rows_padding_pre = X.shape[0] - predictions.shape[0]
return pad_with_nans(predictions, n_rows_padding_pre)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think that works! Good suggestion.
…avor of continue.
…part from target for time series.
f54673c
to
198400a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton colossal!
I like that you put parametrize in place in the unit tests in prep for the other problem types 👏
and I love the coverage which checks the data provided to the estimator is correct.
I think we've rounded out most of the open stuff. I left a comment in predict
about the padding -- ideally we should find a way to tighten that up. I left a couple unit test suggestions too.
It would be nice to have coverage for using a time series pipeline which happens to not use delayed feature transformer, but this isn't high priority since we know we want these features.
X_t = self._compute_features_during_fit(X, y) | ||
|
||
y_shifted = y.shift(-self.gap) | ||
X_t, y_shifted = keep_non_nan_rows(X_t, y_shifted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to nit-pick on naming again lol, not critical, but: does this make more sense as a position ("drop_rows_with_nans") rather than a negation ("keep_non_nan_rows")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed! No worries I agree ehehe
|
||
features = self.compute_estimator_features(X, y) | ||
predictions = self.estimator.predict(features.dropna(axis=0, how="any")) | ||
return pad_with_nans(predictions, self.max_delay if features.isna().values.any() else 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks.
Edge cases to consider:
delay_features
anddelay_target
wereFalse
. And, due to a bug or unexpected component behavior,features
happens to have anan
value somewhere, even though there were nonans
created by the delayed featurizer.delay_features
ordelay_target
wereTrue
. And, due to a bug or unexpected component behavior,features
happens to not have anynans
. Maybe this case isn't important, but it could happen if someone put an imputer after the delayed feature eng.
Is there a foolproof way to avoid these sorts of scenarios? What about
n_rows_padding_pre = X.shape[0] - predictions.shape[0]
return pad_with_nans(predictions, n_rows_padding_pre)
X = _convert_woodwork_types_wrapper(X.to_dataframe()) | ||
|
||
y_predicted = self.predict(X, y) | ||
y_shifted = y.shift(-self.gap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, makes sense. So we shift the target down by gap
rows, so that the target value at row n
headed into the estimator comes originally from row n + gap
. And the final gap
rows are nan
.
|
||
@pytest.mark.parametrize("only_use_y", [True, False]) | ||
@pytest.mark.parametrize("include_delayed_features", [True, False]) | ||
@pytest.mark.parametrize("gap,max_delay", [(1, 2), (2, 2), (7, 3), (2, 4)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have coverage where gap
is set to 0 ("given historical features and today's features, predict today's target"), or where max_delay
is set to 0 ("given today's features, predict a future target") or 1?
I think we should add this to the tests in this file, fit/predict/score
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added these combinations of gap and max_delay to the tests!
(0, 0), (1, 0), (0, 2), (1, 1)
estimator_name, gap, max_delay, include_delayed_features, only_use_y, ts_data): | ||
|
||
if only_use_y and not include_delayed_features: | ||
pytest.skip("Can't prevent label leakage when only the target is used without lagging.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this would happen when max_delay=0
and delay_target=True
? Perhaps we should raise an error in that case?
Pull Request Description
Fix #1380
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.