Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[timeseries] Add wrappers for Statsforecast models #2758

Merged
merged 10 commits into from
Jan 27, 2023

Conversation

shchur
Copy link
Collaborator

@shchur shchur commented Jan 25, 2023

Description of changes:

  • Add AutoETS, AutoARIMA and DynamicOptimizedTheta models from StatsForecast.

To Do:

  • Add tests
  • Benchmark & add models to presets
  • Expose the n_jobs parameter

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@github-actions
Copy link

Job PR-2758-7212948 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2758/7212948/index.html

@github-actions
Copy link

Job PR-2758-a376e62 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2758/a376e62/index.html

Copy link
Collaborator

@tonyhoo tonyhoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to add them to any popular presets such as medium_quality or high_quality?

@@ -32,6 +32,7 @@
"torch>=1.9,<1.14",
"pytorch-lightning>=1.7.4,<1.9.0",
"networkx",
"statsforecast==1.4.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make 1.4.0 the lower bound?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way we are protecting ourselves from potentially breaking changes / regressions caused by newer versions of the dependencies. Both things already happened to us a few times (caused by minor releases of sktime & GluonTS), so I would rather be extra cautious here.

"""
# TODO: Find a way to ensure that SF models respect time_limit
# Fitting usually takes >= 20 seconds
if time_limit is not None and time_limit < 20:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fit time should be dependent on instance type as well. If time_limit is not implemented and we expect the local model fit shall be quick compared with other models, shall we send warning message instead?

Copy link
Collaborator Author

@shchur shchur Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this logic based on the discussion below.

# Fitting usually takes >= 20 seconds
if time_limit is not None and time_limit < 20:
raise TimeLimitExceeded
super()._fit(train_data=train_data, time_limit=time_limit, verbosity=verbosity, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One quick idea to achieve time_limit is to take advantage of timeout from ThreadPoolExecutor or ProcessPoolExecutor. Details can be found here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but, if I understand correctly, this would require monkey-patching StatsForecast. Is it fine if we leave it as-is for now and add a TODO comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add it in TODO and follow up with that

Comment on lines 84 to 86
AutoARIMA=30,
AutoETS=70,
DynamicOptimizedTheta=60,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we determine the priorities of these stats models? How will they added to existing ARIMA and ETA impl? Some accuracy metrics comparison might be helpful to give insights with ensemble results

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the priorities inversely proportional to the average time it takes to fit these models (slower model -> lower priority).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added models to the presets based on the discussion below.

@shchur
Copy link
Collaborator Author

shchur commented Jan 26, 2023

After benchmarking AutoETS, AutoARIMA and DynamicOptimizedTheta models on 28 datasets on m5.4xlarge:

  • Fit time (i.e., prediction time for the validation set)
    • ≥25s for all datasets, median time is 45s (these are approximately the same for all models)
    • For 90% of the datasets these times are slower than the respective Statsmodels models (ETS, ARIMA, Theta) by 30s in the median case
    • The runtime numbers are similar on my p3.8xlarge cloud desktop
    • 99% of the time is spent inside StatsForecast.forecast, so our wrapper does not introduce any noticeable overhead
  • Performance comparison (MASE) on the test set
    • AutoETS > ETS winrate = 71%
    • AutoARIMA > ARIMA winrate = 62%
    • DynamicOptimizedTheta > Theta winrate = 57%

Based on these findings, I suggest to

  • Add AutoETS to the medium_quality preset
  • Add AutoETS and AutoARIMA to the high_quality & best_quality presets + remove HPO for ARIMA, ETS in these presets.
  • Add DynamicOptimizedTheta to the best_quality preset only.
  • Raise TimeLimitExceeded if less than 10s remaining, warn that the model might exceed the time limit if less than 30s remaining.

I will do another round of benchmarking after adding TFT & updating the SFF model from the new GluonTS release. We can update the presets based on these results.

@github-actions
Copy link

Job PR-2758-ed4706a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2758/ed4706a/index.html

@shchur shchur merged commit 106ba2f into autogluon:master Jan 27, 2023
@shchur shchur deleted the statsforecast branch January 27, 2023 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants