Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cap seasonal period for inclusion of STL in search #4146

Closed
eccabay opened this issue Apr 12, 2023 · 1 comment · Fixed by #4147
Closed

Cap seasonal period for inclusion of STL in search #4146

eccabay opened this issue Apr 12, 2023 · 1 comment · Fixed by #4147
Assignees

Comments

@eccabay
Copy link
Contributor

eccabay commented Apr 12, 2023

The larger the seasonal period of a dataset is, the longer STL takes to train. There's some wiggle room here, as STL will take longer to fit on a random dataset compared to a perfectly seasonal dataset when given the same period, but the relationship holds in general.

We have a few example datasets where the detected period is around 1800, but there isn't really a seasonality there at all when inspecting visually. The pipelines that include STL also perform worse on average than those without the decomposer.

Therefore, we should impose a cap on how large the seasonal period will be in order to include the STL decomposer in pipelines.

@eccabay
Copy link
Contributor Author

eccabay commented Apr 12, 2023

decomp_timing
(There are axes on this graph, you just have to click the image to see them?)
Here is a plot of the time it takes to run STLDecomposer.fit() in the worst and best case scenarios - where the data is totally random (worst) and the data is perfectly seasonal (best).

With a standard run of search for time series, we run 10 datasets with the STLDecomposer, and within each pipeline we train it 3 times (once per CV split). With a dataset where it takes 30 seconds to fit the STLDecomposer once, we will spend 15 minutes fitting the decomposer across all datasets. For a random dataset, we hit that at a period of around 3700.

I propose that we set the bar much lower, at a seasonal period of 1000. From my testing, that takes 3s/fit in the best case and 8s/fit in the worst. That means we will maximally spend 1.5-4 minutes just fitting the STLDecomposer.

Conceptually, this limit is reasonable as seasonal periods that large are rare. We will need to run performance tests on this before merging, but I'm confident this will be exclusively a performance improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant