Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[timeseries] Minor bugfixes & improvements for local forecasting models #3252

Merged
merged 5 commits into from
Jun 1, 2023

Conversation

shchur
Copy link
Collaborator

@shchur shchur commented May 31, 2023

Description of changes:

  • Expose use_fallback_model as an optional hyperparameter for all local models (default True). When set to False, fallback model will be disabled, and any exception in the underlying model will propagate. This is important for testing - currently, we had one model that always failed because of a bug, but this wasn't caught by the CI because of the fallback model.
  • Fix typos in docstrings
  • All local models are now trained using at most the last 2500 entries of each time series. This allows to significantly reduce the training time without degrading the accuracy:
    image
    image

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@tonyhoo
Copy link
Collaborator

tonyhoo commented May 31, 2023

Thank you for addressing the issue. I have a question: why doesn't our CI detect the local model's failure and subsequent use of the naive model, given that this occurs consistently? Does this suggest that the naive model's performance is on par with the local model, or is the issue specific to the dataset being used?

@@ -71,6 +68,10 @@ def __init__(
self.n_jobs = n_jobs
else:
raise ValueError(f"n_jobs must be a float between 0 and 1 or an integer (received n_jobs = {n_jobs})")
# Default values, potentially overridden inside _fit()
self.use_fallback_model = True
self.max_ts_length = 2500
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we adjust it based on the freq? For example, for min data, 2500 is less than 2 days which will not be able to capture weekly trends/seasonality

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on at least making it customizable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonyhoo Currently, the models that we use (AutoETS, AutoARIMA, Theta, ETS, ARIMA, Naive, SeasonalNaive) are anyway unable to capture multiple seasonalities - they only capture the seasonality at the seasonal_period that we provide. This is at most 24 * 60 = 1440 for minutely data, but usually much smaller (<= 24).

I think that the example you described would apply to models like MSTL that consider multiple seasonalities, and I agree that we would need to increase the max_ts_length for such models if we add them.

@gradientsky I've moved the parameter override code from the _fit method to __init__.

Number of CPU cores used to fit the models in parallel.
When set to a float between 0.0 and 1.0, that fraction of available CPU cores is used.
When set to a positive integer, that many cores are used.
When set to -1, all CPU cores are used.
max_ts_length : int, default = 2500
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -71,6 +68,10 @@ def __init__(
self.n_jobs = n_jobs
else:
raise ValueError(f"n_jobs must be a float between 0 and 1 or an integer (received n_jobs = {n_jobs})")
# Default values, potentially overridden inside _fit()
self.use_fallback_model = True
self.max_ts_length = 2500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on at least making it customizable

@github-actions
Copy link

Job PR-3252-05a8937 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3252/05a8937/index.html

@@ -3,6 +3,7 @@
import pandas as pd
import pytest
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonyhoo Regarding the tests for model performance: Currently, CI for time series does not include any regression tests. The reasoning was the models could be changing quite frequently, and keeping track of the performance ranges for individual models would be tedious. However, I agree that now we should probably be looking to add these tests, as the model set is becoming more stable. Do you think this it's fine if we add these tests after v0.8, or do you think it has higher priority and we should do this asap?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add these post 0.8 and make sure it is incorporated into the benchmark and dashboard project

@github-actions
Copy link

github-actions bot commented Jun 1, 2023

Job PR-3252-156ba1e is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3252/156ba1e/index.html

@github-actions
Copy link

github-actions bot commented Jun 1, 2023

Job PR-3252-bc47956 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3252/bc47956/index.html

@github-actions
Copy link

github-actions bot commented Jun 1, 2023

Job PR-3252-5c3cf7f is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3252/5c3cf7f/index.html

@shchur shchur merged commit 5f51edc into autogluon:master Jun 1, 2023
28 checks passed
@shchur shchur deleted the fix-local-models branch June 1, 2023 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: timeseries related to the timeseries module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants