-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get PolynomialDecomposer to Parity with STLDecomposer #3768
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3768 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 343 343
Lines 35680 35696 +16
=======================================
+ Hits 35544 35560 +16
Misses 136 136
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@@ -16,6 +16,10 @@ | |||
decomposer_list = [STLDecomposer, PolynomialDecomposer] | |||
|
|||
|
|||
def get_trend_dataframe_format_correct(df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just moved this up from the child class test modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this test testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more a utility for the tests to use, just to make sure that get_trend_df()
returns a dataframe with the proper columns. get_trend_dataframe_format_correct()
will be used in the get_trend_dataframe()
tests.
@@ -34,6 +38,21 @@ def test_set_time_index(decomposer_child_class): | |||
assert isinstance(y_time_index.index, pd.DatetimeIndex) | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved all this up from child class test modules.
@@ -481,3 +500,125 @@ def test_decomposer_bad_target_index( | |||
match="doesn't support target data with index of type", | |||
): | |||
dec._choose_proper_index(y) | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved all this up from child class test modules....
pd.Series(np.zeros(len(output_y_t))).set_axis(y_new.index), | ||
output_y_t, | ||
check_exact=False, | ||
atol=0.1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did have to loosen this tolerance (atol
) here a bit. Part of the problem was that the different decomposers are able to more or less properly capture the trend. Based on how much of the trend they capture, the answers will be closer or further away from a horizontal line. Maybe this isn't a good way to test the transform...you tell me!
output_y, | ||
check_dtype=False, | ||
check_names=False, | ||
atol=0.2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tolerance was still good enough for both decomposers.
period=seasonal_period, | ||
freq_str="D", | ||
set_time_index=True, | ||
seasonal_scale=0.05, # Increasing this value causes the decomposer to miscalculate trend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to try and get the decomposer to perform a little better and do a better job capturing the trend. This just reduces the amplitude of the sinusoidal oscillations added to the synthetic signal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does increasing seasonal_scale cause the decomposer to miscalculate the trend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That has yet to be determined. What we've noticed in some datasets is that there is a tendency for the periodic signal to "leak" into the trend. Loosely, I've noticed that a larger relative seasonal signal vs. trend exacerbates this to a degree.
) | ||
|
||
|
||
def build_test_target(subset_y, seasonal_period, transformer_fit_on_data, to_test): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has been moved up to the base test module class unchanged.
@@ -348,25 +326,3 @@ def test_polynomial_decomposer_uses_time_index( | |||
assert not isinstance(y.index, pd.DatetimeIndex) | |||
else: | |||
assert isinstance(y.index, pd.DatetimeIndex) | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved all this up to child class test modules in the base STLDecomposer PR. This just needed to be removed.
797ea73
to
4084bf4
Compare
0902f77
to
d535e7e
Compare
@@ -95,6 +95,7 @@ def fit(self, X: pd.DataFrame, y: pd.Series = None) -> PolynomialDecomposer: | |||
""" | |||
self.original_index = y.index if y is not None else None | |||
X, y = self._check_target(X, y) | |||
self._map_dt_to_integer(self.original_index, y.index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent behind this function is to build a mapping between integer indices and datetime indices and store it so that it's easier to handle out of sample calls to inverse_transform()
that have integer indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_map_dt_to_integer
is called in both STL and PD but _convert_int_index_to_dt_index
is only called for the PD? What's the reason for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, to be fair, it's probably not needed in STL. Yet. It kind of depends on how we want to move forward with standardizing the fit functions and what attributes we want to be defined in the class. I sort of want to do an additional PR to completely standardize both fit functions and see if I can pass that up into.
trend = self._component_obj.inverse_transform( | ||
y_t_in_sample.set_axis(y_t_dt_ind), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been the main offender and is one of the main differences between the STLDecomposer
and PolynomialDecomposer
. With respect to the trend, the PD's underlying object has the same fit/transform/inverse_transform API that our components do, so it fits well. The STLD does not have this so requires a lot of code to do manual trend projection. I believe that both required the detrended target signal to have datetime indices, which is challenging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to clarify - this new block ensures that both STL and PD are using the same logic when calculating inverse_transform
and not depend on PD's native inverse_transform
?
222977d
to
5ff4982
Compare
0e17897
to
1853983
Compare
1853983
to
39e6913
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some random questions but I think overall it seems to be on track, nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM - seeing the tests pass like this gives me good confidence in parity 👍
trend = self._component_obj.inverse_transform( | ||
y_t_in_sample.set_axis(y_t_dt_ind), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to clarify - this new block ensures that both STL and PD are using the same logic when calculating inverse_transform
and not depend on PD's native inverse_transform
?
A follow up PR to move more of the tests over to
test_decomposer.py
and parametrize them between the two child decomposers in order to prepare for more decomposers.