New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct seasonal decomposition #18
Conversation
@yy910616 I have done the easy part. I put a placeholder for the STLDecomposition. One thing I noticed is that the extraction of trend and seasonal in STL decomposition is coupled, therefore it cannot be fit and then predict (on the contrary, classic decomposition may fit the seasonal periodic pattern first, and then reused it in prediction as detrending and deseasoning are two steps). It means that the core algorithm of STLDecomposition should go to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tailaiw, the documentations looks really good to me.
I'm still unfamiliar with large parts of your code base, so I have no input there.
I'm not entirely sure if it was necessarily to introduce a breaking change just to rename "Naive" to "Classic", since they seem similar enough to me. But that's a very minor issue. I'll leave that up to you
For Implement the real STLDecomposition, I have a couple of notes. @tailaiw Please let me know what you think:
Now for the STL algorithm, I've found some concerning posts about STLComposition:
I'm not sure what should be the next step. We could proceed to implement the LowessSmoothing(Trend) + autocorrelation(Seasonality) method, but will have to test it out against some seasonal+trend data. Or we could shelf this method for now, and wait to see if statsmodel's decompose method will incorporate STLDecompose. Let me know what you think. |
@yy910616 It is fine if we want to call We assume that the users of ADTK can be split into two groups. The first is, I have a time series, I want to find anomalies from it. The second is, I have a historical time series (say the temperature of the past few years), I want to train a model with it so that the model may help me with finding anomalies in future (say the temperature in future). For the first user group, they only need In classic seasonal decomposition, we assume what needs to be learned by the model is the seasonal pattern which will not change over time (from past to future). The trend is something you always need to extract on the fly. Under this assumption, the classic decomposition can serve both user groups. In particular, for the second group, the model learns the seasonal pattern from historical data, and when the model is applied to the new data the seasonal pattern is removed directly as it is irrelevant to the process of extracting trend. In STL, however, extracting trends in the new data will automatically be bundled with extracting the seasonal pattern. Then the second user group's assumption that seasonal pattern doesn't change over time is violated, and we are unclear what to do. So my proposal is, putting everything of STL in |
Regarding the STL algorithm implementation, I also noticed the package by @jrmontag. It seems the only reasonably robust package I can find. But I notice that statsmodels is developing their STL and it looks it may come with v0.11 release (hopefully soon?). One of our core developers happens to know @jrmontag and it seems the statsmodels implementation is expected to be more robust according to him. So what do you think about holding on this issue for now until their release? |
👋 howdy! just to add a bit more context here, I'd encourage you to prefer the (forthcoming?) statsmodels implementation over the STLDecompose library that I authored. While I don't have any issue, per se, with folks building on top of STLDecompose, it was a bit of a hacky project and I'm confident you'll have better support in the longer term using whatever comes out of the statsmodels authors. Feel free to use any/all/none of STLDecompose as inspiration, but I hope the statsmodels release will have a friendlier API than my version which is more of a "reach inside the internals and tie some pieces together" implementation 😄 |
@jrmontag Thanks for sharing your insights. Looks that matches how we understand. |
@yy910616 Great. Here is what I will do. I will clean up the code and docs (including merging some new API introduced in the master branch by #19), and only keep the classic decomposition for now. I will remove the placeholder of STL for now. When it is ready, I will request your review. Upon your approval, I will close this PR. But I will keep issue #13 open for future implementation of STLDecomposition (after a robust STL is released in statsmodels). Does that sound good to you? |
@tailaiw Thank you so much for checking and working on this. I have no project currently that depends on STL so feel free to make the changes. We've been using the package internally and has been giving us some great insights. Please let me know if you need help anywhere! |
As @yy910616 pointed out in #13, our implementation of STL decomposition was wrong and it was basically classic seasonal decomposition (trend as moving average, seasonal as average across periods, plus residual). In this PR, we want to correct it.
Implement the real STLDecompositionUpdated: for item 2, we decided to hold on for statsmodels new release (details in the thread).