Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto_arima's error_action="ignore" does not work when alternative training methods are specified #312

Closed
adgustaf opened this issue Mar 20, 2020 · 15 comments
Labels

Comments

@adgustaf
Copy link

Describe the bug
If I specify error_action="ignore" in auto_arima when method is anything other than the default of CSS-MLE, errors from statsmodels are no longer ignored.

To Reproduce
Execute the following python code (this is using pmdarima 1.5.3)

import pmdarima

y = pmdarima.datasets.load_wineind()
model = pmdarima.auto_arima(y=y, error_action="ignore", suppress_warnings=True, trace=True, method='CSS')

Versions

System:
python: 3.6.10 |Anaconda, Inc.| (default, Jan 7 2020, 15:18:16) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\adgustaf\AppData\Local\Continuum\anaconda3\envs\pmdarima_v153\python.exe
machine: Windows-10-10.0.18362-SP0

Python dependencies:
pip: 20.0.2
setuptools: 46.0.0.post20200309
sklearn: 0.22.2.post1
statsmodels: 0.11.1
numpy: 1.18.1
scipy: 1.4.1
Cython: 0.29.15
pandas: 1.0.1
joblib: 0.14.1
pmdarima: 1.5.3

Expected behavior
I expect the training to complete just as when the default method of CSS-MLE is applied.

Actual behavior
Fitting fails

Additional context
None

@tgsmith61591
Copy link
Member

Interesting, can you share the stacktrace?

@adgustaf
Copy link
Author

adgustaf commented Mar 20, 2020

Performing stepwise search to minimize aic
Fit ARIMA: (2, 1, 2)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (0, 1, 0)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (1, 1, 0)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (0, 1, 1)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (0, 1, 0)x(0, 0, 0, 0) (constant=False); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (1, 1, 1)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (2, 1, 1)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (2, 1, 0)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (3, 1, 0)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (4, 1, 0)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (5, 1, 0)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (5, 1, 1)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (4, 1, 1)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (3, 1, 1)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (3, 1, 2)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (4, 1, 2)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (5, 1, 2)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (5, 1, 3)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (4, 1, 3)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (3, 1, 3)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (2, 1, 3)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (1, 1, 3)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (0, 1, 3)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (0, 1, 2)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds
Fit ARIMA: (1, 1, 2)x(0, 0, 0, 0) (constant=True); AIC=nan, BIC=nan, Time=nan seconds

ValueError Traceback (most recent call last)
in
----> 1 model = pmdarima.auto_arima(y=y, error_action="ignore", suppress_warnings=True, trace=True, method='CSS')

~\AppData\Local\Continuum\anaconda3\envs\pmdarima_v153\lib\site-packages\pmdarima\arima\auto.py in auto_arima(y, exogenous, start_p, d, start_q, max_p, max_d, max_q, start_P, D, start_Q, max_P, max_D, max_Q, max_order, m, seasonal, stationary, information_criterion, alpha, test, seasonal_test, stepwise, n_jobs, start_params, trend, method, maxiter, offset_test_args, seasonal_test_args, suppress_warnings, error_action, trace, random, random_state, n_fits, return_valid_fits, out_of_sample_size, scoring, scoring_args, with_intercept, sarimax_kwargs, **fit_args)
582
583 # filter the non-successful ones
--> 584 filtered = _post_ppc_arima(all_res)
585
586 # sort by the criteria - lower is better for both AIC and BIC

~\AppData\Local\Continuum\anaconda3\envs\pmdarima_v153\lib\site-packages\pmdarima\arima\auto.py in _post_ppc_arima(a)
627 # if the list is empty, or if it was an ARIMA and it's None
628 if not a: # check for truthiness rather than None explicitly
--> 629 raise ValueError('Could not successfully fit ARIMA to input data. '
630 'It is likely your data is non-stationary. Please '
631 'induce stationarity or try a different '

ValueError: Could not successfully fit ARIMA to input data. It is likely your data is non-stationary. Please induce stationarity or try a different range of model order params. If your data is seasonal, check the period (m) of the data

@adgustaf
Copy link
Author

adgustaf commented Mar 20, 2020

Is the error just that no fits at all occurred? Perhaps this is an issue with CSS fitting in statsmodels...

@tgsmith61591
Copy link
Member

When I try to run one of those models outside of the auto_arima loop, I get:

ValueError: Unknown fit method css

Your method is invalid. I'll admit, the confusion is probably the default err message auto_arima is returning. The valid methods are documented here

method : str, optional (default=’lbfgs’)

The method determines which solver from scipy.optimize is used, and it can be chosen from among the following strings:

‘newton’ for Newton-Raphson
‘nm’ for Nelder-Mead
‘bfgs’ for Broyden-Fletcher-Goldfarb-Shanno (BFGS)
‘lbfgs’ for limited-memory BFGS with optional box constraints
‘powell’ for modified Powell’s method
‘cg’ for conjugate gradient
‘ncg’ for Newton-conjugate gradient
‘basinhopping’ for global basin-hopping solver
The explicit arguments in fit are passed to the solver, with the exception of the basin-hopping solver. Each solver has several optional arguments that are not the same across solvers. These can be passed as **fit_kwargs

@adgustaf
Copy link
Author

adgustaf commented Mar 21, 2020

Aha! Okay thanks, forgive me I thought I was specifying method as in statsmodels.

However, even using the format I think you're supposed to use, I still cannot get conditional sum of squares (CSS) fitting to work. See the following:

fit_kwargs = {'method': 'css'}
model = pmdarima.auto_arima(y=y, error_action="ignore", suppress_warnings=True, trace=True, **fit_kwargs)

I get the same kind of trace where no models are fit. The reason I think is that the method argument to statsmodels (where one chooses CSS, CSS-MLE, and MLE) conflicts with your models. If I try something like

fit_args = {'method': 'css'}
model = pmdarima.auto_arima(y=y, error_action="ignore", suppress_warnings=True, trace=True, method='lbfgs', **fit_args)

then I get an error for multiple values for keyword argument method. Am I doing something obvious wrong here? I'm from more of an R than Python background, so forgive me if this is something obvious.

If such fitting methods aren't possible (at the very least MLE should be as it has a likelihood and thus an AIC score can be computed -- default is CSS-MLE, where CSS is used as a starting point for computing MLE), please let me know.

@tgsmith61591
Copy link
Member

In Python, this:

do_foo(x=1)

is functionally the same as:

do_foo(**{"x": 1})

So by passing method in kwargs, it won't help you at all. We pass method through to the statsmodels SARIMAX fit method, so we only support what they do, which is the following:

  • 'newton' for Newton-Raphson, 'nm' for Nelder-Mead
  • 'bfgs' for Broyden-Fletcher-Goldfarb-Shanno (BFGS)
  • 'lbfgs' for limited-memory BFGS with optional box constraints
  • 'powell' for modified Powell's method
  • 'cg' for conjugate gradient
  • 'ncg' for Newton-conjugate gradient
  • 'basinhopping' for global basin-hopping solver

CSS and CSS-MLE are not supported by the SARIMAX class.

@adgustaf
Copy link
Author

adgustaf commented Mar 21, 2020

Hi,

Thanks for your comments. Indeed it seems I missed what options would be passed to what fit method. I was looking at

https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima_model.ARIMA.fit.html

which certainly seems to support training method 'CSS' which is much faster for long time series, though it has some issues associated with it outlined here:

https://cran.r-project.org/doc/Rnews/Rnews_2002-2.pdf

Forgive me if I am missing something in the API, but unless there is a means to actually specify CSS training, perhaps this means that pmdarima does not replicate R's auto.arima in a very meaningful way. Namely, see the following documentation:

https://www.rdocumentation.org/packages/forecast/versions/8.11/topics/auto.arima

Auto.arima's "approximation" field triggers CSS training:

approximation = (length(x) > 150 | frequency(x) > 12),
If TRUE, estimation is via conditional sums of squares and the information criteria used for model selection are approximated. The final model is still computed using maximum likelihood estimation. Approximation should be used for long time series or a high seasonal period to avoid excessive computation times.

It's possible that the above describes what CSS-MLE is supposed to be, but I'm unsure. From the same API linked above:

methodstr {‘css-mle’,’mle’,’css’}
This is the loglikelihood to maximize. If “css-mle”, the conditional sum of squares likelihood is maximized and its values are used as starting values for the computation of the exact likelihood via the Kalman filter. If “mle”, the exact likelihood is maximized via the Kalman Filter. If “css” the conditional sum of squares likelihood is maximized. All three methods use start_params as starting parameters. See above for more information.

Can you verify that these are equivalent? If not, shall I close this thread and instead create a feature request?

@tgsmith61591
Copy link
Member

tgsmith61591 commented Mar 21, 2020

As I mentioned above, we use SARIMAX under the hood, not the statmodels ARIMA class. This was changed several minor versions ago for a number of reasons, not the least of which being that the statsmodels ARIMA class is functionally deprecated.

No need to open a feature request; we use statsmodels, so we're at their mercy for supported optimization methods.

perhaps this means that pmdarima does not replicate R's auto.arima in a very meaningful way

If the lack of CSS poses a problem for you, perhaps you should consider submitting such a feature in statsmodels' SARIMAX class.

@tgsmith61591
Copy link
Member

Or perhaps you could implement the approximation strategy here? PRs are welcome

@adgustaf
Copy link
Author

Hi @tgsmith61591,

Firstly, thanks for your help and correspondence. Some background on my situation may be helpful...

I work for Microsoft right now and I'm looking into speeding up ARIMA training for AutoML; we use pmdarima currently. Hence my interest into looking into alternative training methods, some of which maybe faster for use in our platform.

There's obviously a bit of a learning curve for me as I'm more comfortable with R than Python (come from a stats background), so forgive me for any confusion regarding holes in my Python knowledge.

We are tied to pmdarima version 1.1.1, whereas I think the discussion in this thread applied to 1.5.3; this is my fault -- I hadn't changed the conda environment to the correct one when reporting the above, so forgive me for that please. Note however that indeed the options I mentioned were available in 1.1.1:

https://alkaline-ml.com/pmdarima/1.1.1/_modules/pmdarima/arima/auto.html

The nevertheless persists in 1.1.1. Here is a minimal example to reproduce:

import pmdarima

y = pmdarima.datasets.load_wineind()
model = pmdarima.auto_arima(y=y, error_action="ignore", method='CSS')

This produces the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-12514017638a> in <module>
      2 
      3 y = pmdarima.datasets.load_wineind()
----> 4 model = pmdarima.auto_arima(y=y)

~\AppData\Local\Continuum\anaconda3\envs\pmdarima_v111\lib\site-packages\pmdarima\arima\auto.py in auto_arima(y, exogenous, start_p, d, start_q, max_p, max_d, max_q, start_P, D, start_Q, max_P, max_D, max_Q, max_order, m, seasonal, stationary, information_criterion, alpha, test, seasonal_test, stepwise, n_jobs, start_params, trend, method, transparams, solver, maxiter, disp, callback, offset_test_args, seasonal_test_args, suppress_warnings, error_action, trace, random, random_state, n_fits, return_valid_fits, out_of_sample_size, scoring, scoring_args, with_intercept, **fit_args)
    663 
    664     # filter the non-successful ones
--> 665     filtered = _post_ppc_arima(all_res)
    666 
    667     # sort by the criteria - lower is better for both AIC and BIC

~\AppData\Local\Continuum\anaconda3\envs\pmdarima_v111\lib\site-packages\pmdarima\arima\auto.py in _post_ppc_arima(a)
    928     # if the list is empty, or if it was an ARIMA and it's None
    929     if not a:  # check for truthiness rather than None explicitly
--> 930         raise ValueError('Could not successfully fit ARIMA to input data. '
    931                          'It is likely your data is non-stationary. Please '
    932                          'induce stationarity or try a different '

ValueError: Could not successfully fit ARIMA to input data. It is likely your data is non-stationary. Please induce stationarity or try a different range of model order params. If your data is seasonal, check the period (m) of the data.

I believe however that this error is just issue 294, as this error happens irrespective of the method argument (or whether it's supplied).

At any rate, the need for investigating these options is just an investigation on my part regarding speed. If you have recommendations regarding the "fastest" means of training (as opposed to the most accurate), that'd be helpful and appreciated. In my experience with R, specifying CSS worked a bit better for larger datasets. There also was another approach used by McLeod in the FitAR package outlined in this paper; I haven't seen this in any Python packages.

Right now I'm alternatively checking out the two state-space representations in SARIMAX for version 1.5.3, though we cannot upgrade to that bc the most recent version of scikit-learn breaks other things in our pipeline (and thus are tied to earlier versions of pmdarima until these problems are fixed on our end). Nevertheless, recommendations for fitting speed for the most recent version are appreciated for when we are able to upgrade. Thanks again for your help!

@adgustaf
Copy link
Author

Hi @tgsmith61591,

I just wrote a notebook to compare the various fitting options for pmdarima v1.1.1 on the NYC energy data used in this notebook. I was able to get this fitting on a subset of the data (the first 10k samples) w/o issue without the errors happening above (strangely seemed to be an issue for short series or just the wine series?).

I used only the first 10k samples for fitting for each of the three options for method (CSS, CSS-MLE, and MLE), and repeated fitting 10 times, storing the min, median, mean, and max times.

As I suspected, CSS training is appreciably faster (7 times as much) as compared to CSS-MLE, at least on this data for my machine:

  • CSS: about 20 sec
  • CSS-MLE: about 146 sec
  • MLE: about 241 sec

Incidentally, runtimes didn't vary much due to any processes going on in my system.

I'm now running a similar timings notebook for the two statespace options in statsmodels (via pmdarima v1.5.3 -- I presume this works given that SARIMAX provides them as options) to see whether they make a difference (Hamilton vs. Harvey). See information on the two representations here:

https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html

I don't think there are any other obvious knobs to turn, other than optimizers which I don't want to touch b/c in my experience the quasinewton method BFGS (or its limited memory variant) works very well.

The upshot of all of this is that for use for large data, it may be the case that removing the CSS training option is problematic. Even if CSS isn't a maximum likelihood fit, it at least is some fit in a principled manner which may make sense for certain applications.

I'll report back here tomorrow with complete timings for v1.5.3 with the two state-space representations. I'll happily upload the notebooks, the data csv, the csv files exported by the notebooks, etc. once those runs have completed.

@adgustaf
Copy link
Author

adgustaf commented Mar 23, 2020

Hi @tgsmith61591,

Unfortunately it seems there is indeed a major degradation in terms of speed when omitting CSS from your training methods supported. I ran a similar notebook pegged to pmdarima v1.5.3 using the Hamilton vs. Harvey options for state-space model representation. I'm not sure they actually got passed to SARIMAX correctly (timings look very similar, and hopefully I'm not doing anything naive due to my relative lack of proficiency in Python). Nevertheless, the timings were about 156 seconds, worse than CSS-MLE on ARIMA, and far worse than CSS on ARIMA.

Attached are the notebooks, data, timings, etc., in a zip. Feel free to verify on your own or let me know if there's a better approach with SARIMAX that you may be aware of (forgive me if I've missed it!).

Shall I raise this as an issue? I personally believe this is a major degradation in performance that merits the old method being supported, but ultimately it is your package. Let me know what you think!

Thanks...
Timings.zip

EDIT: There are some copy and paste typos in the markdown of the description of the attached notebooks. Apologies for this.

@tgsmith61591
Copy link
Member

I appreciate the great lengths you've gone to create some detailed timings. This is super helpful. Something you said that I want to address:

I personally believe this is a major degradation in performance that merits the old method being supported, but ultimately it is your package

I want to point out again, it's not that we chose to stop supporting CSS/CSS-MLE; the statsmodels.tsa.arima.ARIMA class (per the thread I linked above) was deprecated, so we are using the SARIMAX class only under the hood, which does not support CSS(-MLE). It's worth noting that as of statsmodels version 0.11.*, the statsmodels sm.tsa.arima.ARIMA actually extends the SARIMAX model. This means that CSS is no longer a valid option, even if we were to revert to using that class rather than SARIMAX under the hood.

What is really comes down to is statsmodels's use of scipy to solve likelihood models. They only support a subset of the scipy.optimize methods, of which CSS is not one.

Now to actually get to your problem... I agree that we need a way to approximate for longer series. It would seem that we are limited by the scipy/statsmodels gods in the way of not having CSS at our disposal. I don't have a great solution for that right off the top of my head; I am always open to ideas. I also think that staying pinned to an old version of the package is not the best solution.

This seems like an issue that should be raised with statsmodels. I'll look into the paper you linked, but I would imagine we'll suffer the same problem there. This would be something we could get around if statsmodels allowed us to specify our own optimizer 🤔

@adgustaf
Copy link
Author

adgustaf commented Mar 23, 2020

Hi @tgsmith61591.

I must correct a misconception you have so we're operating on the same page here. CSS is not a solver. It's an objective function (as is MLE). CSS-MLE is not an objective function, but actually two different objective functions (the solution to the first optimization routine w/ the CSS objective function is used as a starting point for the second routine w/ as MLE an objective function). The solver in the CSS case is still (at least in R anyway) one of the many various optimization routines one can use for non-linear programming (e.g., BFGS, Nelder-Mead, etc.).

I recognize that you're just wrapping statsmodels, but this is a crucial distinction. See, for example, the documentation for arima in R:

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/arima.html

Note the fields method and optim.method. They refer to much different things. To understand the former, consider (e.g.) the following paper (and assuredly other references):

https://www.nuffield.ox.ac.uk/economics/Papers/1997/w6/ma.pdf

I have little idea what is going on in statsmodels world, and I appreciate your attempt to provide clarity. However given the misconception above, can you restate it? That way perhaps I can direct my timings to the statsmodels folks.

Incidentally, I performed the same timings in R's auto.arima. I'm not quite sure I mimicked the exact same setup or if you do slightly different things than Hyndman does, because I wound up getting ARIMA(2, 1, 3) models rather than ARIMA(5, 1, 5). R was about 5 times faster for each of the objective function options (i.e., each of the method arguments not the optim.method arguments). I'm sure this too needs to be directed to statsmodels, and may be just a manifestation of using Cython (as they do) rather than pure C libraries (as R's arima does and Hyndman wraps I believe), I'm not sure. Any words of wisdom as to where I should provide the issue is greatly appreciated.

Thanks for your help!

  • Adam

@tgsmith61591
Copy link
Member

tgsmith61591 commented Mar 23, 2020

given the misconception above, can you restate it?

I'll refer you to my previous comment:

As of statsmodels version 0.11.*, the statsmodels sm.tsa.arima.ARIMA actually extends the SARIMAX model. This means that CSS is no longer a valid method, even if we were to revert to using that class rather than SARIMAX under the hood.

I don't know how to better state this. If you follow the links I littered throughout my comments, you'll see that it's not supported anymore by them, save for in ARMA models. You might also actually take a look at how the LikelihoodModel in statsmodels is minimizing the objective function. There is no way to parameterize it, currently.

I feel like this thread has devolved into passive aggressive, academic one-upmanship, and nothing I'm saying is really moving it forward, so I'm going to lock it. I don't feel like this is making any progress towards solving your problem at this point, as I've stated and restated the above in at least 3 different ways now.

I'll leave the following question (the root question of the matter): what is your suggestion for how we should handle approximation of long time series, given the statsmodels limitations we're bumping into?

Open source depends on the collaborative input from multiple folks, and that means brainstorming solutions. I would think the easiest place for us to solve this here is by bringing this to the attention of the statsmodels developers, but I am open to suggestions (read: proposed solutions, which is not what this thread has garnered thus far). My proposed solution would be that statsmodels allows parameterizing the objective. Perhaps you have a better one. In any case, my email address can be located on my Git page. Please feel free to send me a note to continue this conversation.

@alkaline-ml alkaline-ml locked as too heated and limited conversation to collaborators Mar 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants