Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix feature importance for time series baseline #3285

Merged
merged 2 commits into from
Jan 31, 2022

Conversation

freddyaboulton
Copy link
Contributor

Pull Request Description

Fixes #3284 , Fixes #3282


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Jan 26, 2022

Codecov Report

Merging #3285 (4a0e79c) into main (e2bca54) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3285     +/-   ##
=======================================
+ Coverage   99.8%   99.8%   +0.1%     
=======================================
  Files        322     322             
  Lines      31600   31614     +14     
=======================================
+ Hits       31510   31524     +14     
  Misses        90      90             
Impacted Files Coverage Δ
evalml/tests/component_tests/test_components.py 99.3% <ø> (ø)
...ators/regressors/time_series_baseline_estimator.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 99.6% <100.0%> (+0.1%) ⬆️
...peline_tests/test_time_series_baseline_pipeline.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2bca54...4a0e79c. Read the comment docs.

@@ -91,6 +95,8 @@ def predict(self, X):
"Time Series Baseline Estimator is meant to be used in a pipeline with "
"a Time Series Featurizer"
)
self._num_features = X.shape[1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically don't set fields like this in predict but since the feature_name is not used until predict, it seemed better to keep this logic together.

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left one comment but nothing blocking.

@@ -253,7 +253,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The pipeline found by AutoMLSearch has a 268% improvement over the naive forecast!"
"The pipeline found by AutoMLSearch has a 31% improvement over the naive forecast!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm hesitant on putting numeric values for these since if we change/add components or change the way things are calculated, these numbers will likely change. Not blocking, but something to keep in mind!

@tamargrey
Copy link
Contributor

Ran this branch with my time series demo, and I'm seeing these feature importances for the baseline
image
Should target_rolling_mean be there? And if so, how does it have 0 importance when, in the other pipelines, it's one of the most important features?

@ParthivNaresh
Copy link
Contributor

ParthivNaresh commented Jan 28, 2022

Ran this branch with my time series demo, and I'm seeing these feature importances for the baseline image Should target_rolling_mean be there? And if so, how does it have 0 importance when, in the other pipelines, it's one of the most important features?

We include the target_rolling_mean from the TimeSeriesFeaturizer if the target is numeric.

Also with the baseline pipeline, predictions are based entirely off of the delayed feature, so we'd have to set that to 1 I believe.

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid!

@freddyaboulton freddyaboulton force-pushed the 3282-fix-feature-importance-ts-estimator branch from a589833 to 4a0e79c Compare January 31, 2022 19:21
@tamargrey
Copy link
Contributor

We include the target_rolling_mean from the TimeSeriesFeaturizer if the target is numeric.

Also with the baseline pipeline, predictions are based entirely off of the delayed feature, so we'd have to set that to 1 I believe.

So If I'm understanding correctly: The target_rolling_mean feature shows up in the baseline pipeline importances because its added via the TimeSeriesFetureizer even though it's not actually included in the model and should always have zero importance?

@freddyaboulton
Copy link
Contributor Author

@tamargrey Exactly. The estimator technically "sees" the rolling mean feature, but it doesn't use it, so it gets an importance of 0.

@freddyaboulton freddyaboulton merged commit a18efa8 into main Jan 31, 2022
@freddyaboulton freddyaboulton deleted the 3282-fix-feature-importance-ts-estimator branch January 31, 2022 21:44
@chukarsten chukarsten mentioned this pull request Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants