-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up decomposer index logic #3829
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3829 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 344 344
Lines 36080 36116 +36
=======================================
+ Hits 35942 35979 +37
+ Misses 138 137 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@@ -1072,7 +1072,7 @@ def test_binary_predict_pipeline_objective_mismatch( | |||
ValueError, | |||
match="Objective Precision Micro is not defined for time series binary classification.", | |||
): | |||
binary_pipeline.predict(X[30:32], "precision micro", X[:30], y[:30]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this change for? Is it because we can't infer frequencies with only two values? Should we handle this somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct - pandas needs 3 or more values to infer a frequency. I'm not sure where we could handle this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK fair enough. How does this affect our ability to forecast out 1 unit? is it still possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eccabay I thought we infer from y_train in fit? How does this predict line play into that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyliweishih we infer from y_train in the decomposer's fit, so the pipeline has no frequency knowledge.
@fjlanasa that was a good catch - this change would have prevented us from forecasting out 1 unit. Fortunately, the frequency detection isn't actually required within drop_time_index thanks to the other changes here, so I've just reverted it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just a little suggestion!
if X.ww.schema is not None: | ||
X = X.ww.drop([self.time_index]) | ||
X.ww.set_index(self.time_index) | ||
X = X.ww.drop(self.time_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought setting the index dropped the column in woodwork as well? I might be remembering incorrectly though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the _set_underlying_index()
function here it looks like it doesn't drop it automatically
|
||
X, _, y = ts_data() | ||
datetime_index = pd.date_range(start="01-01-2002", periods=len(X), freq="M") | ||
datetime_index.freq = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check the case where there is a freqstr
as well?
if X.ww.schema is not None: | ||
X = X.ww.drop([self.time_index]) | ||
X.ww.set_index(self.time_index) | ||
X = X.ww.drop(self.time_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the _set_underlying_index()
function here it looks like it doesn't drop it automatically
Ensures we set y's time index as well as X during
drop_time_index
, and saves the frequency inSTLDecomposer
so we don't mis-assume the frequency when determining the seasonality offset ininverse_transform