Improve automatic periodicity determination#3912
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3912 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 347 347
Lines 36755 36745 -10
=======================================
- Hits 36625 36619 -6
+ Misses 130 126 -4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
jeremyliweishih
left a comment
There was a problem hiding this comment.
Some clarification questions and quick fixes but otherwise looks good to me!
| """Determines the relative maxima of the target's autocorrelation.""" | ||
| acf = sm.tsa.acf(y, nlags=np.maximum(400, len(y))) | ||
| filter_acf = [acf[i] if (acf[i] > 0) else 0 for i in range(len(acf))] | ||
| filter_acf = [acf[i] if (acf[i] > 0.01) else 0 for i in range(len(acf))] |
There was a problem hiding this comment.
curious why we need this change!
There was a problem hiding this comment.
Great question! We were seeing that datasets with no seasonality were still getting a result due to having some very small autocorrelations. This is to filter those out so we correctly say these datasets have no seasonality. Datasets with a seasonal period will be more on the scale of 0.1-0.6, so we aren't filtering out anything significant here.
There was a problem hiding this comment.
We might want this 0.01 to be a variable that's defined in the decomposer. I think that's probably a better practice.
chukarsten
left a comment
There was a problem hiding this comment.
Just a few minor comments! Nice work!!
| """Determines the relative maxima of the target's autocorrelation.""" | ||
| acf = sm.tsa.acf(y, nlags=np.maximum(400, len(y))) | ||
| filter_acf = [acf[i] if (acf[i] > 0) else 0 for i in range(len(acf))] | ||
| filter_acf = [acf[i] if (acf[i] > 0.01) else 0 for i in range(len(acf))] |
There was a problem hiding this comment.
We might want this 0.01 to be a variable that's defined in the decomposer. I think that's probably a better practice.
| """Uses a moving average to determine the target's trend and remove it.""" | ||
| # A larger moving average will be less likely to remove the seasonal signal | ||
| # but we need to make sure we're not passing in a window that's larger than the data | ||
| moving_avg = min(51, len(y) // 3) |
There was a problem hiding this comment.
The 51 and 3 seem a little arbitrary and kind of trigger my "should this be hardcoded" sense, but I think in this case it's probably fine.
Simplifies and improves performance of our
decomposer.determine_periodmethod by removing the need to pre-fit the model for detrending, using a moving average to detrend instead.