Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification Pipelines for Time Series #1528

Merged
merged 22 commits into from
Dec 17, 2020

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Dec 8, 2020

Pull Request Description

Fixes #1474

I'll add to AutoML in a separate PR since this is a big diff.

Most of the changes come from adding time series binary and time series multiclass problems types to the list of supported problem types for each existing classifier. I would ignore the diff in the estimators and estimator tests and focus on the diff in pipeline_base_meta.py, time_series_classification_pipelines.py and test_time_series_pipeline.py


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Dec 8, 2020

Codecov Report

Merging #1528 (f9733d4) into main (a48ebfb) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1528     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         238      239      +1     
  Lines       17136    17406    +270     
=========================================
+ Hits        17128    17398    +270     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/tests/utils_tests/test_gen_utils.py 100.0% <ø> (ø)
evalml/utils/__init__.py 100.0% <ø> (ø)
...alml/objectives/binary_classification_objective.py 100.0% <100.0%> (ø)
.../objectives/multiclass_classification_objective.py 100.0% <100.0%> (ø)
evalml/pipelines/__init__.py 100.0% <100.0%> (ø)
evalml/pipelines/classification_pipeline.py 100.0% <100.0%> (ø)
...components/ensemble/stacked_ensemble_classifier.py 100.0% <100.0%> (ø)
...ents/estimators/classifiers/catboost_classifier.py 100.0% <100.0%> (ø)
...estimators/classifiers/decision_tree_classifier.py 100.0% <100.0%> (ø)
...ts/estimators/classifiers/elasticnet_classifier.py 100.0% <100.0%> (ø)
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a48ebfb...f9733d4. Read the comment docs.

@freddyaboulton freddyaboulton force-pushed the 1474-classification-pipelines-for-ts branch 3 times, most recently from ed2d59e to 1d380ca Compare December 10, 2020 16:11
@@ -24,5 +25,9 @@ def _check_for_fit(self, X=None, y=None):
elif y is None:
return method(self, X)
else:
# For time series classification pipelines, predict will take X, y, objective
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that TimeSeriesBinaryClassification.predict accepts X, y and objective. All other pipeline classes accept X and y (TimeSeriesRegressionPipeline) or X and objective (BinaryClassificationPipeline, MulticlassClassificationPipeline) or X (ClassificationPipeline).

I think adding this extra check for the number of arguments in predict is the lowest-touch way of handling the complexity (as opposed to adding another metaclass) especially since this class is not user-facing and we don't expect users to define their own pipeline base classes.

That being said, maybe we can refactor this metaclass definition to make it a little clearer what's going on?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking # of arguments is a good idea. Maybe we can just refactor it into the usual if else logic?

             if not self._is_fitted:
                raise PipelineNotYetFittedError(f'This {klass} is not fitted yet. You must fit {klass} before calling {method.__name__}.')
            elif X is None and y is None:
                return method(self)
            elif y is None:
                return method(self, X)
            # For time series classification pipelines, predict will take X, y, objective
            elif len(inspect.getfullargspec(method).args) == 4:
                 return method(self, X, y, objective)
            else:
                # For other pipelines, predict will take X, y or X, objective
                return method(self, X, y)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

X = _convert_woodwork_types_wrapper(X.to_dataframe())
y = _convert_woodwork_types_wrapper(y.to_series())
self._encoder.fit(y)
y = self._encode_targets(y)
Copy link
Contributor Author

@freddyaboulton freddyaboulton Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout, we need to encode targets before computing features because we allow the user to pass in string-valued targets. If there is a DelayedFeatureTransformer in the pipeline, these string values will get added to the feature matrix and if there isn't a OneHotEncoder, the estimator will crash.

Long story short, this is a repeat of #1477 . I think it would be better if we pass this complexity to the DelayedFeatureTransformer and have it encode the target before delaying but we should not let that block this PR. Let me know what you guys think. I'm happy to file something for updating the DelayedFeatureTransformer!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you here! I like the idea of filing an issue and updating the DelayedFeatureTransformer as well as removing encoding in the pipelines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton ah agreed that it would be great to have DelayedFeatureTransformer be able to handle this.

Can you call super().fit(X, y) here instead of duplicating code with ClassificationPipeline.fit? ClassificationPipeline already sets up the encoder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad, I forgot about _compute_features_during_fit

In any case, I wonder if there's a way to avoid duplicating code for the classification target encoder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the dust settles and we build more features, I want to see if there is a way to refactor these classes to avoid duplication between classification and time series classification pipelines.

It's not apparent right now because the "time-series-specific" code gets sprinkled in between the "non-time-series" code so it's not easy to just isolate the time series code into a helper function/mix-in yet. But maybe a new design will be clearer when this one becomes more stable and we test it!

self.estimator.fit(X_t, y_shifted)
return self

def _predict(self, X, y, objective=None, pad=False):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a pad parameter to _predict because in predict we want to return un-padded predictions or else self._decode_targets will error out. However in _compute_predictions (called by score) we want to return padded predictions (so that predictions, predicted probabilities, and the shifted target all have the same length. This makes dropping nans much easier).

y_shifted = y_encoded.shift(-self.gap)
y_pred, y_pred_proba = self._compute_predictions(X, y, objectives)
non_nan_mask = _get_rows_without_nans(y_shifted, y_pred, y_pred_proba)
if y_pred is not None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit ugly but depending on the objectives that are passed in, y_pred or y_pred_proba can be None so we can only drop nans if possible.

I've thought about avoiding this complexity by having _compute_predictions return un-padded predictions so that we don't have to worry about dropping anything. However, that doesn't really work because the predictions and y_shifted will have different number of nans depending on the values of gap and the number of delayed features computed in the pipeline. The least complex way of knowing what the common non-nan subset is by looking at the nans in the predictions and targets.

In short, we may be able to hide this complexity in a helper function but I don't think we can avoid it.

if not objective.is_defined_for_problem_type(self.problem_type):
raise ValueError(f"Objective {objective.name} is not defined for time series binary classification.")

if self.threshold is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit different from BinaryClassificationPipeline so that I only have to call pad_with_nans once.

@patch('evalml.objectives.BinaryClassificationObjective.decision_function')
@patch('evalml.pipelines.components.Estimator.predict_proba', return_value=pd.DataFrame({0: [1.]}))
@patch('evalml.pipelines.components.Estimator.predict', return_value=pd.Series([1.]))
def test_binary_classification_predictions_thresholded_properly(mock_predict, mock_predict_proba,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to achieve 100% coverage on the binary classification _predict method.

(TimeSeriesRegressionPipeline, ['R2']),
(TimeSeriesRegressionPipeline, ['R2', "Mean Absolute Percentage Error"])])
@pytest.mark.parametrize("use_ww", [True, False])
def test_score_works(pipeline_class, objectives, use_ww, X_y_binary, X_y_multi, X_y_regression):
Copy link
Contributor Author

@freddyaboulton freddyaboulton Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have a unit test where we pass data through a minimal time series pipeline and make sure nothing errors out. There's a lot of stuff happening in these pipelines lol. I'm setting n_jobs=1 here and the test takes 2.5 seconds to run so I think the overall impact on test runtime is small.

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @freddyaboulton! Just left some questions and comments.

@@ -24,5 +25,9 @@ def _check_for_fit(self, X=None, y=None):
elif y is None:
return method(self, X)
else:
# For time series classification pipelines, predict will take X, y, objective
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking # of arguments is a good idea. Maybe we can just refactor it into the usual if else logic?

             if not self._is_fitted:
                raise PipelineNotYetFittedError(f'This {klass} is not fitted yet. You must fit {klass} before calling {method.__name__}.')
            elif X is None and y is None:
                return method(self)
            elif y is None:
                return method(self, X)
            # For time series classification pipelines, predict will take X, y, objective
            elif len(inspect.getfullargspec(method).args) == 4:
                 return method(self, X, y, objective)
            else:
                # For other pipelines, predict will take X, y or X, objective
                return method(self, X, y)

"""Pipeline base class for time series classifcation problems."""

def __init__(self, parameters, random_state=0):
"""Machine learning pipeline for time series regression problems made out of transformers and a classifier.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classification instead of regression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

X = _convert_woodwork_types_wrapper(X.to_dataframe())
y = _convert_woodwork_types_wrapper(y.to_series())
self._encoder.fit(y)
y = self._encode_targets(y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you here! I like the idea of filing an issue and updating the DelayedFeatureTransformer as well as removing encoding in the pipelines.


X_t = self._compute_features_during_fit(X, y)
if X_t.empty:
raise RuntimeError("Pipeline computed empty features during call to .fit. This means "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton do we want behavior here to be the same as the current regression pipelines or should it be like after the fixes for #1517? I think this is fine for now but if we want to change behavior in the future we should add that info to #1517.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I think we should get this merged and then fix #1517 for all ts pipelines at once. I don't want the pipelines to have differing behavior if we make the changes needed for #1517 here and not simultaneously for regression.

@freddyaboulton freddyaboulton force-pushed the 1474-classification-pipelines-for-ts branch 2 times, most recently from db0c862 to 6c3d569 Compare December 15, 2020 15:22
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, Freddy! The logic for when to keep track of adding padding versus not seemed challenging, but everything looks good to me! I left a couple questions and a comment, but solid implementation and test coverage!

return X, y

def fit(self, X, y):
"""Fit a time series regression pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classification

else:
return pd_data

mask = reduce(lambda a, b: np.logical_and(_not_nan(a), _not_nan(b)), data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool! So this means that if the second Dataframe/Series in the sequence had a nan value in the last column, then the last value in the mask will be False, like below?
image

Might be nice to update the return to mask where each entry is True if and only if all corresponding entries in that index in data are non-nan or something of the sort. I wasn't sure what this was supposed to do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly. Great suggestion on the docstring!

mock_regressor_predict.side_effect = mock_predict
else:
mock_classifier_predict.side_effect = mock_predict

if only_use_y:
pl.fit(None, y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? Looking at the code for fit(), it seems like X is necessary to pass in, since you do call estimator.fit() on the computed features of X?

Copy link
Contributor Author

@freddyaboulton freddyaboulton Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! When X is None it's assumed that the pipeline will create some features from the y before passing to the estimator (typically DelayedFeatureTransformer). We're getting rid of this assumption in #1517, however, because some (future) estimators will naturally support the X=None case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, that makes sense. Thanks!

@freddyaboulton freddyaboulton force-pushed the 1474-classification-pipelines-for-ts branch from 6c3d569 to 8c87b40 Compare December 16, 2020 15:31
@freddyaboulton freddyaboulton added the new feature Features which don't yet exist. label Dec 16, 2020
@freddyaboulton freddyaboulton added this to the December 2020 milestone Dec 16, 2020
@freddyaboulton freddyaboulton force-pushed the 1474-classification-pipelines-for-ts branch from 8c87b40 to e65ddcf Compare December 17, 2020 17:52
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton I left a few comments, got halfway through the impl so far :)

@@ -50,7 +50,7 @@ def fit(self, X, y):
def _encode_targets(self, y):
"""Converts target values from their original values to integer values that can be processed."""
try:
return pd.Series(self._encoder.transform(y))
return pd.Series(self._encoder.transform(y), index=y.index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh neat.

Do we need to do this elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know LightGBM and Catboost also use encoders internally. I'll make a note to see if we need to do something similar there and if so file an issue!

Great thinking!

klass = type(self).__name__
if not self._is_fitted:
raise PipelineNotYetFittedError(f'This {klass} is not fitted yet. You must fit {klass} before calling {method.__name__}.')
elif X is None and y is None:
return method(self)
elif y is None:
return method(self, X)
# For time series classification pipelines, predict will take X, y, objective
elif len(inspect.getfullargspec(method).args) == 4:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton does args list positional args only? Or does it include named args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Positional arguments! So it includes any arguments that don't have * or ** in front of their name.

# For time series classification pipelines, predict will take X, y, objective
elif len(inspect.getfullargspec(method).args) == 4:
return method(self, X, y, objective)
# For other pipelines, predict will take X, y or X, objective
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? The else below doesn't pass along the objective

Copy link
Contributor Author

@freddyaboulton freddyaboulton Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it's correct but confusing 😅 .

For non-timeseries classification pipelines, the arguments will be X and objective so the y in this else is actually the objective the user passed in!

I left this comment earlier but I guess it got overwritten when I pushed new changes:

The problem is that TimeSeriesBinaryClassification.predict accepts X, y and objective. All other pipeline classes accept X and y (TimeSeriesRegressionPipeline) or X and objective (BinaryClassificationPipeline, MulticlassClassificationPipeline) or X (ClassificationPipeline).

I think adding this extra check for the number of arguments in predict is the lowest-touch way of handling the complexity (as opposed to adding another metaclass) especially since this class is not user-facing and we don't expect users to define their own pipeline base classes.

That being said, maybe we can refactor this metaclass definition to make it a little clearer what's going on?

"""
if "pipeline" not in parameters:
raise ValueError("gap and max_delay parameters cannot be omitted from the parameters dict. "
"Please specify them as a dictionary with the key 'pipeline'.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 awesome

X = pd.DataFrame()
X = _convert_to_woodwork_structure(X)
y = _convert_to_woodwork_structure(y)
return X, y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I wonder if this would be good to move to PipelineBase or even utils, and use it throughout the codebase. @angela97lin FYI :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... could be, but the conversion to an empty DataFrame if X is None is a time series specific thing, no? In other cases, we'd probably want to raise something if the user passes None as X?

X = _convert_woodwork_types_wrapper(X.to_dataframe())
y = _convert_woodwork_types_wrapper(y.to_series())
self._encoder.fit(y)
y = self._encode_targets(y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton ah agreed that it would be great to have DelayedFeatureTransformer be able to handle this.

Can you call super().fit(X, y) here instead of duplicating code with ClassificationPipeline.fit? ClassificationPipeline already sets up the encoder.

X = _convert_woodwork_types_wrapper(X.to_dataframe())
y = _convert_woodwork_types_wrapper(y.to_series())
self._encoder.fit(y)
y = self._encode_targets(y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad, I forgot about _compute_features_during_fit

In any case, I wonder if there's a way to avoid duplicating code for the classification target encoder.

@@ -0,0 +1,204 @@
import pandas as pd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton typo in filename: evalml/pipelines/time_series_classifcation_pipelines.py --> evalml/pipelines/time_series_classification_pipelines.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! 😬

X, y = self._convert_to_woodwork(X, y)
X = _convert_woodwork_types_wrapper(X.to_dataframe())
y = _convert_woodwork_types_wrapper(y.to_series())
n_features = max(len(y), X.shape[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton this is the same nan padding math we use for TS regression, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few nit-picky comments but otherwise looks good! (Will come back and read more into impl later when I get the chance hehe)

parameters (dict): Dictionary with component names as keys and dictionary of that component's parameters as values.
An empty dictionary {} implies using all default values for component parameters. Pipeline-level
parameters such as gap and max_delay must be specified with the "pipeline" key. For example:
Pipeline(parameters={"pipeline": {"max_delay": 4, "gap": 2}}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great example, nice and clear :)

Pipeline(parameters={"pipeline": {"max_delay": 4, "gap": 2}}).
random_state (int, np.random.RandomState): The random seed/state. Defaults to 0.
"""
if "pipeline" not in parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nit-picky, but what if a user passes in a "pipeline" key mapping to a dictionary with "gap" or "max_delay" missing? Ex: {pipeline: {"gap":1}} I think we get a KeyError in the following lines, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep you're right!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to fix the same thing in regression pipelines so I'll just file an issue and change it for both after this is merged!

X = pd.DataFrame()
X = _convert_to_woodwork_structure(X)
y = _convert_to_woodwork_structure(y)
return X, y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... could be, but the conversion to an empty DataFrame if X is None is a time series specific thing, no? In other cases, we'd probably want to raise something if the user passes None as X?

@@ -50,7 +50,7 @@ def fit(self, X, y):
def _encode_targets(self, y):
"""Converts target values from their original values to integer values that can be processed."""
try:
return pd.Series(self._encoder.transform(y))
return pd.Series(self._encoder.transform(y), index=y.index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, did this break / how did you catch this? If so, could be good to add a test for it if there isn't one already 😬

Copy link
Contributor Author

@freddyaboulton freddyaboulton Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time series tests caught this! We pass a y with a custom index there.

@freddyaboulton freddyaboulton merged commit 913919c into main Dec 17, 2020
@freddyaboulton freddyaboulton deleted the 1474-classification-pipelines-for-ts branch December 17, 2020 20:46
@dsherry dsherry mentioned this pull request Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature Features which don't yet exist.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binary and Multiclass pipelines for time series
5 participants