Get PolynomialDecomposer to Parity with STLDecomposer#3768
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3768 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 343 343
Lines 35680 35696 +16
=======================================
+ Hits 35544 35560 +16
Misses 136 136
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
| decomposer_list = [STLDecomposer, PolynomialDecomposer] | ||
|
|
||
|
|
||
| def get_trend_dataframe_format_correct(df): |
There was a problem hiding this comment.
Just moved this up from the child class test modules.
There was a problem hiding this comment.
What is this test testing?
There was a problem hiding this comment.
It's more a utility for the tests to use, just to make sure that get_trend_df() returns a dataframe with the proper columns. get_trend_dataframe_format_correct() will be used in the get_trend_dataframe() tests.
| assert isinstance(y_time_index.index, pd.DatetimeIndex) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Moved all this up from child class test modules.
| dec._choose_proper_index(y) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Moved all this up from child class test modules....
| pd.Series(np.zeros(len(output_y_t))).set_axis(y_new.index), | ||
| output_y_t, | ||
| check_exact=False, | ||
| atol=0.1, |
There was a problem hiding this comment.
I did have to loosen this tolerance (atol) here a bit. Part of the problem was that the different decomposers are able to more or less properly capture the trend. Based on how much of the trend they capture, the answers will be closer or further away from a horizontal line. Maybe this isn't a good way to test the transform...you tell me!
| output_y, | ||
| check_dtype=False, | ||
| check_names=False, | ||
| atol=0.2, |
There was a problem hiding this comment.
This tolerance was still good enough for both decomposers.
| period=seasonal_period, | ||
| freq_str="D", | ||
| set_time_index=True, | ||
| seasonal_scale=0.05, # Increasing this value causes the decomposer to miscalculate trend |
There was a problem hiding this comment.
I added this to try and get the decomposer to perform a little better and do a better job capturing the trend. This just reduces the amplitude of the sinusoidal oscillations added to the synthetic signal.
There was a problem hiding this comment.
Why does increasing seasonal_scale cause the decomposer to miscalculate the trend?
There was a problem hiding this comment.
That has yet to be determined. What we've noticed in some datasets is that there is a tendency for the periodic signal to "leak" into the trend. Loosely, I've noticed that a larger relative seasonal signal vs. trend exacerbates this to a degree.
| ) | ||
|
|
||
|
|
||
| def build_test_target(subset_y, seasonal_period, transformer_fit_on_data, to_test): |
There was a problem hiding this comment.
This function has been moved up to the base test module class unchanged.
| assert isinstance(y.index, pd.DatetimeIndex) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Moved all this up to child class test modules in the base STLDecomposer PR. This just needed to be removed.
797ea73 to
4084bf4
Compare
0902f77 to
d535e7e
Compare
| """ | ||
| self.original_index = y.index if y is not None else None | ||
| X, y = self._check_target(X, y) | ||
| self._map_dt_to_integer(self.original_index, y.index) |
There was a problem hiding this comment.
The intent behind this function is to build a mapping between integer indices and datetime indices and store it so that it's easier to handle out of sample calls to inverse_transform() that have integer indices.
There was a problem hiding this comment.
_map_dt_to_integer is called in both STL and PD but _convert_int_index_to_dt_index is only called for the PD? What's the reason for this?
There was a problem hiding this comment.
Yea, to be fair, it's probably not needed in STL. Yet. It kind of depends on how we want to move forward with standardizing the fit functions and what attributes we want to be defined in the class. I sort of want to do an additional PR to completely standardize both fit functions and see if I can pass that up into.
| trend = self._component_obj.inverse_transform( | ||
| y_t_in_sample.set_axis(y_t_dt_ind), | ||
| ) |
There was a problem hiding this comment.
This has been the main offender and is one of the main differences between the STLDecomposer and PolynomialDecomposer. With respect to the trend, the PD's underlying object has the same fit/transform/inverse_transform API that our components do, so it fits well. The STLD does not have this so requires a lot of code to do manual trend projection. I believe that both required the detrended target signal to have datetime indices, which is challenging.
There was a problem hiding this comment.
just to clarify - this new block ensures that both STL and PD are using the same logic when calculating inverse_transform and not depend on PD's native inverse_transform?
222977d to
5ff4982
Compare
0e17897 to
1853983
Compare
1853983 to
39e6913
Compare
ParthivNaresh
left a comment
There was a problem hiding this comment.
Left some random questions but I think overall it seems to be on track, nice work!
jeremyliweishih
left a comment
There was a problem hiding this comment.
This LGTM - seeing the tests pass like this gives me good confidence in parity 👍
| trend = self._component_obj.inverse_transform( | ||
| y_t_in_sample.set_axis(y_t_dt_ind), | ||
| ) |
There was a problem hiding this comment.
just to clarify - this new block ensures that both STL and PD are using the same logic when calculating inverse_transform and not depend on PD's native inverse_transform?
A follow up PR to move more of the tests over to
test_decomposer.pyand parametrize them between the two child decomposers in order to prepare for more decomposers.