You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The larger the seasonal period of a dataset is, the longer STL takes to train. There's some wiggle room here, as STL will take longer to fit on a random dataset compared to a perfectly seasonal dataset when given the same period, but the relationship holds in general.
We have a few example datasets where the detected period is around 1800, but there isn't really a seasonality there at all when inspecting visually. The pipelines that include STL also perform worse on average than those without the decomposer.
Therefore, we should impose a cap on how large the seasonal period will be in order to include the STL decomposer in pipelines.
The text was updated successfully, but these errors were encountered:
(There are axes on this graph, you just have to click the image to see them?)
Here is a plot of the time it takes to run STLDecomposer.fit() in the worst and best case scenarios - where the data is totally random (worst) and the data is perfectly seasonal (best).
With a standard run of search for time series, we run 10 datasets with the STLDecomposer, and within each pipeline we train it 3 times (once per CV split). With a dataset where it takes 30 seconds to fit the STLDecomposer once, we will spend 15 minutes fitting the decomposer across all datasets. For a random dataset, we hit that at a period of around 3700.
I propose that we set the bar much lower, at a seasonal period of 1000. From my testing, that takes 3s/fit in the best case and 8s/fit in the worst. That means we will maximally spend 1.5-4 minutes just fitting the STLDecomposer.
Conceptually, this limit is reasonable as seasonal periods that large are rare. We will need to run performance tests on this before merging, but I'm confident this will be exclusively a performance improvement.
The larger the seasonal period of a dataset is, the longer STL takes to train. There's some wiggle room here, as STL will take longer to fit on a random dataset compared to a perfectly seasonal dataset when given the same period, but the relationship holds in general.
We have a few example datasets where the detected period is around 1800, but there isn't really a seasonality there at all when inspecting visually. The pipelines that include STL also perform worse on average than those without the decomposer.
Therefore, we should impose a cap on how large the seasonal period will be in order to include the STL decomposer in pipelines.
The text was updated successfully, but these errors were encountered: