Timeseries regression pipeline #1418

freddyaboulton · 2020-11-09T17:36:40Z

Pull Request Description

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2020-11-09T20:16:49Z

Codecov Report

Merging #1418 (35af009) into main (87cc77f) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##             main    #1418     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         216      218      +2     
  Lines       14226    14438    +212     
=========================================
+ Hits        14219    14431    +212     
  Misses          7        7

Impacted Files	Coverage Δ
evalml/pipelines/__init__.py	`100.0% <100.0%> (ø)`
evalml/pipelines/classification_pipeline.py	`100.0% <100.0%> (ø)`
.../components/ensemble/stacked_ensemble_regressor.py	`100.0% <100.0%> (ø)`
...onents/estimators/regressors/baseline_regressor.py	`100.0% <100.0%> (ø)`
...onents/estimators/regressors/catboost_regressor.py	`100.0% <100.0%> (ø)`
...s/estimators/regressors/decision_tree_regressor.py	`100.0% <100.0%> (ø)`
...ents/estimators/regressors/elasticnet_regressor.py	`100.0% <100.0%> (ø)`
...s/components/estimators/regressors/et_regressor.py	`100.0% <100.0%> (ø)`
...mponents/estimators/regressors/linear_regressor.py	`100.0% <100.0%> (ø)`
...s/components/estimators/regressors/rf_regressor.py	`100.0% <100.0%> (ø)`
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87cc77f...35af009. Read the comment docs.

freddyaboulton · 2020-11-09T20:45:20Z

evalml/pipelines/components/ensemble/stacked_ensemble_regressor.py

@@ -11,7 +11,7 @@ class StackedEnsembleRegressor(StackedEnsembleBase):
    """Stacked Ensemble Regressor."""
    name = "Stacked Ensemble Regressor"
    model_family = ModelFamily.ENSEMBLE
-    supported_problem_types = [ProblemTypes.REGRESSION]
+    supported_problem_types = [ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION]


Need to make this change because of _validate_estimator_problem_type

freddyaboulton · 2020-11-09T20:46:52Z

evalml/pipelines/pipeline_base.py

        return X_t

-    def _fit(self, X, y):
+    def _compute_features_during_fit(self, X, y):


Need to make this small refactor because we need to make sure the Nans are dropped before the estimator is fit. I think it makes the most sense to do that in the time series pipeline as opposed to the base class.

freddyaboulton · 2020-11-09T20:49:15Z

evalml/pipelines/utils.py

@@ -81,6 +82,8 @@ def _get_pipeline_base_class(problem_type):
        return MulticlassClassificationPipeline
    elif problem_type == ProblemTypes.REGRESSION:
        return RegressionPipeline
+    elif problem_type == ProblemTypes.TIME_SERIES_REGRESSION:


Need to make this change so that the unit tests for make_pipeline don't error out for TIME_SERIES_REGRESSION problem types.

I'll modify the logic of make_pipeline to handle time series when we integrate time series regression into automl search in #1382

freddyaboulton · 2020-11-09T20:50:46Z

evalml/tests/component_tests/test_utils.py

@@ -60,6 +60,9 @@ def test_scikit_learn_wrapper(X_y_binary, X_y_multi, X_y_regression):
                num_classes = 3
            elif problem_type == ProblemTypes.REGRESSION:
                X, y = X_y_regression
+            elif problem_type == ProblemTypes.TIME_SERIES_REGRESSION:
+                # Skipping because make_pipeline_from_components does not yet work for time series.


The reason is because of the pipeline-level paramters of gap and max_delay. I imagine we'll use the same pattern to specify them that we will introduce in #1382 .

freddyaboulton · 2020-11-09T20:52:12Z

evalml/tests/pipeline_tests/test_time_series_pipeline.py

+    return X, y
+
+
+class MockDelayedFeatures(Transformer):


A lot of the complexity from time series comes from the nans that are introduced into the features. We need a component that does this to make the tests more robust. I'll delete this in favor of the bona-fide DelayedFeaturesTransformer when #1379 is merged.

Deleted

dsherry

Cool! I read the impl and left some thoughts and questions.

dsherry · 2020-11-10T20:18:47Z

evalml/pipelines/pipeline_base.py

@@ -190,17 +190,24 @@ def compute_estimator_features(self, X):
        """
        X_t = X
        for component in self.component_graph[:-1]:
-            X_t = component.transform(X_t)
+            X_t = component.transform(X_t, y)


This is 100% a style nit-pick, but do you think we should say y=y given that y is a named arg to transform, just for clarity/consistency?

dsherry · 2020-11-10T20:50:47Z

evalml/pipelines/time_series.py

@@ -0,0 +1,82 @@
+import pandas as pd


@freddyaboulton our current pattern for file naming means we should call this file time_series_regression_pipeline.py

We can certainly change that pattern, just calling that out.

I was thinking of adding all of the time series pipelines in one file but let's stick with the regression vs classification separation we have with our files!

dsherry · 2020-11-10T20:51:10Z

evalml/pipelines/__init__.py

@@ -44,3 +44,5 @@
    BaselineRegressionPipeline,
    MeanBaselineRegressionPipeline
 )
+
+from .time_series import TimeSeriesRegressionPipeline


dsherry · 2020-11-10T20:57:28Z

evalml/pipelines/time_series.py

+    def __init__(self, parameters, gap, max_delay, random_state=0):
+        super().__init__(parameters, random_state)
+        self.gap = gap
+        self.max_delay = max_delay


Style nit-pick, do super last

dsherry · 2020-11-10T20:57:39Z

evalml/pipelines/time_series.py

+class TimeSeriesRegressionPipeline(RegressionPipeline):
+    problem_type = ProblemTypes.TIME_SERIES_REGRESSION
+
+    def __init__(self, parameters, gap, max_delay, random_state=0):


Please add a docstring

Also didn't we say gap/max_delay/min_delay would go into parameters under the key 'pipeline' as "pipeline parameters"?

I was thinking of doing that once we got to the AutoML integration but it's way better to do this now. I just pushed this change up. The one thing I'd like to hear your thoughts on is whether we should raise an exception when the pipeline key is missing. I vote yes because it doesn't make sense to specify default arguments for these pipeline-level arguments.

Thanks!

Yeah, the timeseries pipelines need those timeseries parameters in order to function. So yes, if gap and max_delay aren't included in the parameters dict under the pipeline key, I think we should raise an exception.

Should we prefix these params with timeseries? I.e. timeseries_gap, timeseries_max_delay. An argument for this would be that we may add other pipeline-level params in the future.

Can you explain what you mean? This is how a user would instantiate a TS regression pipeline right now:

pl = Pipeline({"Delayed Feature Transformer": {"gap": gap, "max_delay": max_delay, "delay_features": True, "delay_target": True}, "pipeline": {"gap": gap, "max_delay": max_delay}})

I think renaming to the "pipeline" dict keys to "timeseries_gap" and "timeseries_max_delay" is bit too verbose. We can add more pipeline level params to the "pipeline" dict as we need without conflicts.

Ok, I agree! Let's stick with the example you've provided here.

dsherry · 2020-11-10T21:24:52Z

evalml/pipelines/time_series.py

+            X_t, _ = self._compute_features_during_fit(X, y=None)
+            y = X.squeeze()
+        else:
+            y = pd.Series(y)


Can you move this out, like we're doing for X online 29? Will make it easier to normalize to dataframe when that comes in

Ooh actually @freddyaboulton we should be converting these to woodwork now that @angela97lin merged #1393. Check out the code in the other pipeline classes. That just got merged recently.

Thank you for reminding me! Just pushed this up.

dsherry · 2020-11-10T21:32:40Z

evalml/pipelines/time_series.py

+            X_t, _ = self._compute_features_during_fit(X, y)
+
+        y_shifted = y.shift(-self.gap)
+        X_t, y_shifted = drop_nan(X_t, y_shifted)


Nice.

Perhaps drop_nan_rows or drop_rows_with_nans would be a good name for this helper function

Definitely agree that this needs a clearer name. Let me know what you think of my suggestion below to use keep_non_nan_rows.

dsherry · 2020-11-10T21:34:46Z

evalml/pipelines/time_series.py

+        if any_values_are_nan(features):
+            return pad_with_nans(predictions, self.max_delay)
+        else:
+            return predictions


Why not just always run pad_with_nans, and in the case where there are no nans, nothing happens? Maybe I'm not following this code right yet

predictions will never have NaNs. I'm doing this because our design says that the data passed into this method and the predictions should always have the same number of rows. Since we drop NaN rows before calling predict on the estimator, I need to pad with NaNs.

There are other ways to do this:

return pad_with_nans(predictions, self.max_delay if any_values_are_nan(features) else 0)

if features.shape[0] != len(predictions): return pad_with_nans(predictions, self.max_delay) else: return predictions

return pad_with_nans(predictions, self.max_delay if features.shape[0] != len(predictions) else 0)

Which do you think is best? Open to suggestions!

Ah got it, thanks, will respond on new code since this comment is now outdated

dsherry · 2020-11-10T21:36:01Z

evalml/pipelines/time_series.py

+            pd.Series: Predicted values.
+        """
+        features = self.compute_estimator_features(X, y)
+        predictions = self.estimator.predict(features.dropna(axis=0, how="any"))


Should we use the drop_nan helper here too? Maybe I'm just confused about the names.

Maybe I should rename the drop_nan helper keep_non_nan_rows? Since the target and features will have NaNs in different places in score, we need a way for selecting the rows that are not NaN in both of them. As far as I know, there isn't a pandas method that does this.

In this case, we just need to drop the NaN rows from a dataframe and pandas has a built-in for it.

Yep I like that name. Right, I don't think there's a convenient builtin in pandas for that particular use-case. I've always either used dropna or isnan.

Great! I changed the name

dsherry · 2020-11-10T21:37:07Z

evalml/pipelines/time_series.py

+        """
+        y_predicted = self.predict(X, y)
+        if y is None:
+            y = X.copy()


Why do this? If y isn't provided, that's an error...

This is for the case where there are no features and only y is used but I just cleaned this up!

dsherry · 2020-11-11T18:28:35Z

evalml/pipelines/time_series_regression_pipeline.py

+        X = _convert_to_woodwork_structure(X)
+        y = _convert_to_woodwork_structure(y)
+        X = _convert_woodwork_types_wrapper(X.to_pandas())
+        y = _convert_woodwork_types_wrapper(y.to_pandas())


Nice! @angela97lin does this look good to you?

Yup, looks good!

angela97lin

Looking good, just left a few docstring comments / questions for my own curiosity haha

angela97lin · 2020-11-11T19:49:53Z

evalml/pipelines/time_series_regression_pipeline.py

+        Returns:
+            self
+        """
+        if X is None:


Interesting, what's the point of this? Is there a case where the user only passes in y X is None for time series regression problems? 🤔

Yes! It's common to build ts models based on only delayed values of the target. Like let's say you have a sequence of daily stock prices or temperature readings, you could model future values based solely on the previous values.

Yep. One common model type like this is ARIMA.

I think in practice, most timeseries pipelines will use the input features.

angela97lin · 2020-11-11T20:05:39Z

evalml/pipelines/time_series_regression_pipeline.py

+        Arguments:
+            parameters (dict): Dictionary with component names as keys and dictionary of that component's parameters as values.
+                 An empty dictionary {} implies using all default values for component parameters. Pipeline-level
+                 parameters such as gap and max_delay must be specified as a dictionary with the "pipeline" key.


"must be specified in a dictionary with the "pipeline" key"? Hmm. This was a tad bit confusing to me--not sure best way to clear it up via wording. Maybe would be good to give an example?

Good suggestion - I added an example!

angela97lin · 2020-11-11T20:10:39Z

evalml/utils/gen_utils.py

+def pad_with_nans(pd_data, num_to_pad):
+    """Pad the beginning num_to_pad rows with nans.
+
+    Arguments:
+        pd_data (pd.DataFrame or pd.Series): Data to pad.
+
+    Returns:
+        pd.DataFrame or pd.Series
+    """
+    if isinstance(pd_data, pd.Series):
+        padding = pd.Series([None] * num_to_pad)
+    else:
+        padding = pd.DataFrame({col: [None] * num_to_pad
+                                for col in pd_data.columns})
+    return pd.concat([padding, pd_data], ignore_index=True).infer_objects()
+
+
+def keep_non_nan_rows(pd_data_1, pd_data_2):
+    """Keep nows that do not have any NaNs in both pd_data_1 and pd_data_2.
+
+    Returns:
+        tuple of pd.DataFrame or pd.Series
+    """
+
+    def _not_nan(pd_data):
+        if isinstance(pd_data, pd.Series):
+            return ~pd_data.isna().values
+        else:
+            return ~pd_data.isna().any(axis=1).values
+
+    mask = np.logical_and(_not_nan(pd_data_1), _not_nan(pd_data_2))
+    return pd_data_1.iloc[mask], pd_data_2.iloc[mask]
+
+
+def any_values_are_nan(pd_data):
+    """Determine if any rows have at least one NaN.
+    Args:
+        pd_data (pd.Dataframe or pd.Series): Pandas data to check.
+
+    Returns:
+        pd.Series with bool values. Each value corresponds to if any values in the row are NaN.
+    """
+    return pd_data.isna().values.any()


Let's either specify these in the API docs or keep them private? :d

Added them to the API docs! Thanks for reminding me.

angela97lin · 2020-11-11T20:10:48Z

evalml/utils/gen_utils.py

+
+def any_values_are_nan(pd_data):
+    """Determine if any rows have at least one NaN.
+    Args:


Args --> Arguments

angela97lin · 2020-11-11T20:11:55Z

evalml/utils/gen_utils.py

+    Returns:
+        pd.Series with bool values. Each value corresponds to if any values in the row are NaN.
+    """
+    return pd_data.isna().values.any()


Arguably necessary/not since it's just one line? :d

angela97lin · 2020-11-11T20:12:16Z

evalml/utils/gen_utils.py

+
+
+def keep_non_nan_rows(pd_data_1, pd_data_2):
+    """Keep nows that do not have any NaNs in both pd_data_1 and pd_data_2.


Keep nows --> Keep rows

dsherry · 2020-11-11T20:34:16Z

evalml/pipelines/time_series_regression_pipeline.py

+        pipeline_params = parameters.pop("pipeline")
+        self.gap = pipeline_params['gap']
+        self.max_delay = pipeline_params['max_delay']
+        super().__init__(parameters, random_state)


If we do parameters.pop as above, the pipeline-level params won't be saved in the parameters attribute of the pipeline instance. Can we make it so those parameters are saved just like any other component's parameters?

Agreed; we should add a test to make sure gap and max_delay are in the parameters dict after initialization to check for this

Great point! I had to modify PipelineBase a little to get this to work (the pipeline parameters method would only read parameters from the component graph) but I'll push this up momentarily!

I decided to only save the pipeline-level parameters if they are present to avoid introducing breaking changes.

I decided to only save the pipeline-level parameters if they are present to avoid introducing breaking changes.

I.e. whatever the user passes in is what's saved on the pipeline? If so, yep I agree. We shouldn't create an empty "pipelines" entry in parameters dicts, only create if its populated

Yes that's what I meant!

dsherry · 2020-11-11T21:26:02Z

evalml/pipelines/time_series_regression_pipeline.py

+
+        features = self.compute_estimator_features(X, y)
+        predictions = self.estimator.predict(features.dropna(axis=0, how="any"))
+        return pad_with_nans(predictions, self.max_delay if features.isna().values.any() else 0)


@freddyaboulton this looks good!

Why not say pad_with_nans(predictions, self.max_delay) ? I figure we should rely on the configuration rather than on properties of the data. Fundamentally, we need the shape of the output to match the number of rows in X and y.

If the user doesn't have any delayed features (either the DelayedFeatureTransformer is missing from the component graph or they passed delay_features:False, 'delay_target:False to DelayFeatureTransformer) then we don't need to pad the predictions.

pad_with_nans(predictions, self.max_delay) will always add max_delay number of rows with NaNs to the beginning.

Got it, thanks.

Edge cases to consider:

delay_features and delay_target were False. And, due to a bug or unexpected component behavior, features happens to have a nan value somewhere, even though there were no nans created by the delayed featurizer.

delay_features or delay_target were True. And, due to a bug or unexpected component behavior, features happens to not have any nans. Maybe this case isn't important, but it could happen if someone put an imputer after the delayed feature eng.

Is there a foolproof way to avoid these sorts of scenarios? What about

n_rows_padding_pre = X.shape[0] - predictions.shape[0] return pad_with_nans(predictions, n_rows_padding_pre)

Yea I think that works! Good suggestion.

…avor of continue.

…part from target for time series.

…dict.

dsherry

@freddyaboulton colossal!

I like that you put parametrize in place in the unit tests in prep for the other problem types 👏
and I love the coverage which checks the data provided to the estimator is correct.
I think we've rounded out most of the open stuff. I left a comment in predict about the padding -- ideally we should find a way to tighten that up. I left a couple unit test suggestions too.

It would be nice to have coverage for using a time series pipeline which happens to not use delayed feature transformer, but this isn't high priority since we know we want these features.

dsherry · 2020-11-12T19:48:27Z

evalml/pipelines/time_series_regression_pipeline.py

+        X_t = self._compute_features_during_fit(X, y)
+
+        y_shifted = y.shift(-self.gap)
+        X_t, y_shifted = keep_non_nan_rows(X_t, y_shifted)


Sorry to nit-pick on naming again lol, not critical, but: does this make more sense as a position ("drop_rows_with_nans") rather than a negation ("keep_non_nan_rows")?

Changed! No worries I agree ehehe

dsherry · 2020-11-12T20:00:42Z

evalml/pipelines/time_series_regression_pipeline.py

+
+        features = self.compute_estimator_features(X, y)
+        predictions = self.estimator.predict(features.dropna(axis=0, how="any"))
+        return pad_with_nans(predictions, self.max_delay if features.isna().values.any() else 0)


Got it, thanks.

Edge cases to consider:

delay_features and delay_target were False. And, due to a bug or unexpected component behavior, features happens to have a nan value somewhere, even though there were no nans created by the delayed featurizer.

delay_features or delay_target were True. And, due to a bug or unexpected component behavior, features happens to not have any nans. Maybe this case isn't important, but it could happen if someone put an imputer after the delayed feature eng.

Is there a foolproof way to avoid these sorts of scenarios? What about

n_rows_padding_pre = X.shape[0] - predictions.shape[0] return pad_with_nans(predictions, n_rows_padding_pre)

dsherry · 2020-11-12T20:01:54Z

evalml/pipelines/time_series_regression_pipeline.py

+        X = _convert_woodwork_types_wrapper(X.to_dataframe())
+
+        y_predicted = self.predict(X, y)
+        y_shifted = y.shift(-self.gap)


Yep, makes sense. So we shift the target down by gap rows, so that the target value at row n headed into the estimator comes originally from row n + gap. And the final gap rows are nan.

dsherry · 2020-11-12T20:15:11Z

evalml/tests/pipeline_tests/test_time_series_pipeline.py

+
+@pytest.mark.parametrize("only_use_y", [True, False])
+@pytest.mark.parametrize("include_delayed_features", [True, False])
+@pytest.mark.parametrize("gap,max_delay", [(1, 2), (2, 2), (7, 3), (2, 4)])


Do we have coverage where gap is set to 0 ("given historical features and today's features, predict today's target"), or where max_delay is set to 0 ("given today's features, predict a future target") or 1?

I think we should add this to the tests in this file, fit/predict/score

Added these combinations of gap and max_delay to the tests!

(0, 0), (1, 0), (0, 2), (1, 1)

dsherry · 2020-11-12T20:19:08Z

evalml/tests/pipeline_tests/test_time_series_pipeline.py

+                                        estimator_name, gap, max_delay, include_delayed_features, only_use_y, ts_data):
+
+    if only_use_y and not include_delayed_features:
+        pytest.skip("Can't prevent label leakage when only the target is used without lagging.")


So this would happen when max_delay=0 and delay_target=True? Perhaps we should raise an error in that case?

Yea the problem is that all of our (current) estimators require the X to be non-empty. If you are passing an X=None to the pipeline and you are not computing any features with a DelayedFeaturesTransformer, then our estimators will freak out. I'll raise an error in fit if we're in that case but we may need to rework this in the future when we add time-series specific estimators that compute their own features internally. In that case, we'll likely want to pass in X=None and not include a DelayFeatureTransformer in the pipeline.

That's a good point @freddyaboulton . Sounds good, raising an error in pipeline fit makes sense, and then when we add timeseries support to make_pipelines for automl we'll just have to handle this edge case and not add delayed feature eng to the pipeline.

dsherry · 2020-11-12T20:19:20Z

evalml/tests/pipeline_tests/test_time_series_pipeline.py

+    if only_use_y:
+        pl.fit(None, y)
+    else:
+        pl.fit(X, y)


dsherry · 2020-11-12T20:20:19Z

evalml/tests/pipeline_tests/test_time_series_pipeline.py

+        expected_target = np.arange(1 + gap + max_delay, 32)
+    else:
+        train_index = pd.date_range(f"2020-10-01", f"2020-10-{31-gap}")
+        expected_target = np.arange(1 + gap, 32)


This data gen is great, easy to follow

* Adding TimeSeriesRegressionPipeline. * Simplyfing MockDelayedFeatures transformer. * Adding PR 1418 to release notes. * Removing artifact after rebase. * Fixing broken tests. * Adding docstrings for TimeSeriesRegressionPipeline. * Removing pytest.skip that was skipping all subsequent estimators in favor of continue. * Add docstrings for NaN helper functions in gen_utils. * Reverting change that breaks unit tests. * Adding time series regression pipeline to api ref. * Adopting converntion that X must be None when there are no features apart from target for time series. * Refactoring how pipeline is defined in tests. * Specifying pipeline parameters as the pipeline key in the parameters dict. * Rename drop_nan to keep_non_nan_rows. * Fixing import order in tests/utils_tests/test_gen_utils.py * Deleting unnecessary series conversion in fit. * Pipeline parameters are now saved in the parameters dict. * Upgrading woodwork conversion to 0.0.5 * Adding test coverage for gap, max_delay = 0 * Changing skip message in test_time_series_pipeline.py * Fixing typo in api ref.

freddyaboulton self-assigned this Nov 9, 2020

freddyaboulton force-pushed the 1380-timeseries-regression-pipeline branch 2 times, most recently from e7a47d4 to c03e186 Compare November 9, 2020 20:10

freddyaboulton changed the title ~~1380 timeseries regression pipeline~~ Timeseries regression pipeline Nov 9, 2020

freddyaboulton commented Nov 9, 2020

View reviewed changes

freddyaboulton marked this pull request as ready for review November 9, 2020 21:16

freddyaboulton requested review from dsherry, angela97lin, jeremyliweishih, bchen1116 and eccabay November 9, 2020 21:17

dsherry suggested changes Nov 10, 2020

View reviewed changes

freddyaboulton force-pushed the 1380-timeseries-regression-pipeline branch from 6a791a4 to 51392d4 Compare November 10, 2020 21:45

freddyaboulton requested a review from dsherry November 10, 2020 23:05

freddyaboulton force-pushed the 1380-timeseries-regression-pipeline branch from e34eb7b to bd70f7b Compare November 11, 2020 14:50

dsherry reviewed Nov 11, 2020

View reviewed changes

angela97lin reviewed Nov 11, 2020

View reviewed changes

dsherry reviewed Nov 11, 2020

View reviewed changes

freddyaboulton added 6 commits November 12, 2020 13:34

Adding TimeSeriesRegressionPipeline.

eff6177

Simplyfing MockDelayedFeatures transformer.

01b121a

Adding PR 1418 to release notes.

e2afbaf

Removing artifact after rebase.

0e5d10f

Fixing broken tests.

d30f857

Adding docstrings for TimeSeriesRegressionPipeline.

fda9eb9

freddyaboulton added 12 commits November 12, 2020 13:35

Removing pytest.skip that was skipping all subsequent estimators in f…

53792d9

…avor of continue.

Add docstrings for NaN helper functions in gen_utils.

6441cbd

Reverting change that breaks unit tests.

59ec36c

Adding time series regression pipeline to api ref.

5355b02

Adopting converntion that X must be None when there are no features a…

6bc5c93

…part from target for time series.

Refactoring how pipeline is defined in tests.

ce0c523

Specifying pipeline parameters as the pipeline key in the parameters …

8eac327

…dict.

Rename drop_nan to keep_non_nan_rows.

da39f56

Fixing import order in tests/utils_tests/test_gen_utils.py

61be20d

Deleting unnecessary series conversion in fit.

e1d808f

Pipeline parameters are now saved in the parameters dict.

30eff70

Upgrading woodwork conversion to 0.0.5

198400a

freddyaboulton force-pushed the 1380-timeseries-regression-pipeline branch from f54673c to 198400a Compare November 12, 2020 18:43

dsherry approved these changes Nov 12, 2020

View reviewed changes

freddyaboulton added 3 commits November 12, 2020 17:43

Adding test coverage for gap, max_delay = 0

93f3ac1

Changing skip message in test_time_series_pipeline.py

102ca3b

Fixing typo in api ref.

35af009

freddyaboulton merged commit af9400b into main Nov 12, 2020

freddyaboulton deleted the 1380-timeseries-regression-pipeline branch November 12, 2020 23:29

freddyaboulton mentioned this pull request Nov 13, 2020

Tests getting skipped #1437

Closed

dsherry mentioned this pull request Nov 24, 2020

Release v0.16.0 #1468

Merged

freddyaboulton mentioned this pull request Dec 7, 2020

Allow X to be None in Time Series Pipelines #1517

Closed



		def keep_non_nan_rows(pd_data_1, pd_data_2):
		"""Keep nows that do not have any NaNs in both pd_data_1 and pd_data_2.

Timeseries regression pipeline #1418

Timeseries regression pipeline #1418

Conversation

freddyaboulton commented Nov 9, 2020

Pull Request Description

codecov bot commented Nov 9, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton Nov 9, 2020 • edited Loading

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton Nov 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton Nov 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton Nov 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton Nov 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton Nov 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 9, 2020 •

edited

Loading

freddyaboulton Nov 9, 2020 •

edited

Loading

freddyaboulton Nov 11, 2020 •

edited

Loading

freddyaboulton Nov 10, 2020 •

edited

Loading

freddyaboulton Nov 10, 2020 •

edited

Loading

freddyaboulton Nov 11, 2020 •

edited

Loading

freddyaboulton Nov 12, 2020 •

edited

Loading