AutoRegression (AR), Integrated (I), Moving Average (MA) components form ARIMA model. 

AR part indicates observed valued is dependent on lagged values. 

MA component indicates relationship exists between current value and linear combination of prior stochastic terms. Each stochastic term is sampled from white noise which is a stationary random process that samples from a normal distribution with mean zero. Note MA in ARIMA isn't the same as the MA filter which smooths out a time series. Each stochastic term from white noise is independent and represents random shocks or error terms in the process (data we cannot account for). 

I component refers to number of differencing levels required to make series stationary to apply AR, MA.

Non-seasonal ARIMA models denoted ARIMA(p, d, q) where p, d, q are non-negative integers. Where p refers to order of AR, d is differencing levels needed, and q is order of MA. 

ARIMA models are a additivite combination of each component and solving for the parameters of each component allows for forecasting following time point. 

Denote observed outcomes by O, and the set of parameters that describe some stochastic process by theta. Therefore to compute probability of an event, do: P(O | theta). Meaning given some parameters we can figure out the probability of observing O. 

However in reality it is the opposite, where we are given observations O and want to estimate the parameters to obtain a model to use for forecasting or prediction. In that case, we want to choose parameters that maximize the likelihood that we get observations O. In other words:

L(theta | O) = P(O | theta)

By fixing the random variables to distinct values (representing observations), can solve for parameters (theta) through maximum likelihood estimation (MLE). MLE is analagous to the loss function used in machine learning which performs a very similar analysis. 

General MLE is an optimization problem that aims to produce solution equations that would work for any specifications of the random variables.

The same idea applies across both discrete and continuous space, in the continous space however we would use a probability density function, f(O | theta) rather than a distinct probability (due to probability laws involving sets). 

When evaluating and selecting ARIMA models by grid search, use AIC which measures number of parameters used and MLE to compute, compare many ARIMA models. 

In [1]:
import pandas as pd
import numpy as np
passenger_data = pd.read_csv('/content/airline_passengers.csv', index_col='Month', parse_dates=True)
birth_data = pd.read_csv('/content/DailyTotalFemaleBirths.csv', index_col='Date', parse_dates=True)

In [4]:
pip install pmdarima

Collecting pmdarima
  Downloading pmdarima-1.8.2-cp37-cp37m-manylinux1_x86_64.whl (1.5 MB)
[?25l[K     |▎                               | 10 kB 27.4 MB/s eta 0:00:01[K     |▌                               | 20 kB 35.4 MB/s eta 0:00:01[K     |▊                               | 30 kB 39.7 MB/s eta 0:00:01[K     |█                               | 40 kB 29.6 MB/s eta 0:00:01[K     |█▏                              | 51 kB 16.9 MB/s eta 0:00:01[K     |█▍                              | 61 kB 14.8 MB/s eta 0:00:01[K     |█▋                              | 71 kB 13.8 MB/s eta 0:00:01[K     |█▉                              | 81 kB 15.0 MB/s eta 0:00:01[K     |██                              | 92 kB 13.6 MB/s eta 0:00:01[K     |██▎                             | 102 kB 12.4 MB/s eta 0:00:01[K     |██▌                             | 112 kB 12.4 MB/s eta 0:00:01[K     |██▊                             | 122 kB 12.4 MB/s eta 0:00:01[K     |███                             | 133 

In [5]:
from pmdarima import auto_arima

Uses AIC for model selection therefore will not check higher order models after a certain point due to diminishing returns on AIC

In [8]:
stepwise_fit = auto_arima(birth_data['Births'], start_p=0, start_q=0,
                          max_p=6, max_q=3, seasonal=False, trace=True)

Performing stepwise search to minimize aic
 ARIMA(0,1,0)(0,0,0)[0] intercept   : AIC=2650.760, Time=0.02 sec
 ARIMA(1,1,0)(0,0,0)[0] intercept   : AIC=2565.234, Time=0.03 sec
 ARIMA(0,1,1)(0,0,0)[0] intercept   : AIC=2463.584, Time=0.09 sec
 ARIMA(0,1,0)(0,0,0)[0]             : AIC=2648.768, Time=0.01 sec
 ARIMA(1,1,1)(0,0,0)[0] intercept   : AIC=2460.154, Time=0.18 sec
 ARIMA(2,1,1)(0,0,0)[0] intercept   : AIC=2461.271, Time=0.25 sec
 ARIMA(1,1,2)(0,0,0)[0] intercept   : AIC=inf, Time=0.46 sec
 ARIMA(0,1,2)(0,0,0)[0] intercept   : AIC=2460.722, Time=0.18 sec
 ARIMA(2,1,0)(0,0,0)[0] intercept   : AIC=2536.154, Time=0.12 sec
 ARIMA(2,1,2)(0,0,0)[0] intercept   : AIC=2463.065, Time=0.58 sec
 ARIMA(1,1,1)(0,0,0)[0]             : AIC=2459.074, Time=0.07 sec
 ARIMA(0,1,1)(0,0,0)[0]             : AIC=2462.221, Time=0.04 sec
 ARIMA(1,1,0)(0,0,0)[0]             : AIC=2563.261, Time=0.03 sec
 ARIMA(2,1,1)(0,0,0)[0]             : AIC=2460.367, Time=0.11 sec
 ARIMA(1,1,2)(0,0,0)[0]             : 

In [9]:
stepwise_fit.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,365.0
Model:,"SARIMAX(1, 1, 1)",Log Likelihood,-1226.537
Date:,"Tue, 31 Aug 2021",AIC,2459.074
Time:,20:13:15,BIC,2470.766
Sample:,0,HQIC,2463.721
,- 365,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
ar.L1,0.1252,0.060,2.097,0.036,0.008,0.242
ma.L1,-0.9624,0.017,-56.429,0.000,-0.996,-0.929
sigma2,49.1512,3.250,15.122,0.000,42.781,55.522

0,1,2,3
Ljung-Box (L1) (Q):,0.04,Jarque-Bera (JB):,25.33
Prob(Q):,0.84,Prob(JB):,0.0
Heteroskedasticity (H):,0.96,Skew:,0.57
Prob(H) (two-sided):,0.81,Kurtosis:,3.6


In [12]:
#m indicates seasonal differencing, yearly data therefore make m=12 

stepwise_fit2 = auto_arima(passenger_data['Thousands of Passengers'], start_p=0, start_q=0,
                           max_p=4, max_q=4, season=True, trace=True, m=12)

Performing stepwise search to minimize aic
 ARIMA(0,1,0)(1,1,1)[12]             : AIC=1032.128, Time=0.19 sec
 ARIMA(0,1,0)(0,1,0)[12]             : AIC=1031.508, Time=0.02 sec
 ARIMA(1,1,0)(1,1,0)[12]             : AIC=1020.393, Time=0.09 sec
 ARIMA(0,1,1)(0,1,1)[12]             : AIC=1021.003, Time=0.15 sec
 ARIMA(1,1,0)(0,1,0)[12]             : AIC=1020.393, Time=0.03 sec
 ARIMA(1,1,0)(2,1,0)[12]             : AIC=1019.239, Time=0.25 sec
 ARIMA(1,1,0)(2,1,1)[12]             : AIC=inf, Time=1.76 sec
 ARIMA(1,1,0)(1,1,1)[12]             : AIC=1020.493, Time=0.33 sec
 ARIMA(0,1,0)(2,1,0)[12]             : AIC=1032.120, Time=0.18 sec
 ARIMA(2,1,0)(2,1,0)[12]             : AIC=1021.120, Time=0.34 sec
 ARIMA(1,1,1)(2,1,0)[12]             : AIC=1021.032, Time=0.44 sec
 ARIMA(0,1,1)(2,1,0)[12]             : AIC=1019.178, Time=0.28 sec
 ARIMA(0,1,1)(1,1,0)[12]             : AIC=1020.425, Time=0.10 sec
 ARIMA(0,1,1)(2,1,1)[12]             : AIC=inf, Time=2.04 sec
 ARIMA(0,1,1)(1,1,1)[12]     

In [13]:
stepwise_fit2.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,144.0
Model:,"SARIMAX(0, 1, 1)x(2, 1, [], 12)",Log Likelihood,-505.589
Date:,"Tue, 31 Aug 2021",AIC,1019.178
Time:,20:16:36,BIC,1030.679
Sample:,0,HQIC,1023.851
,- 144,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
ma.L1,-0.3634,0.074,-4.945,0.000,-0.508,-0.219
ar.S.L12,-0.1239,0.090,-1.372,0.170,-0.301,0.053
ar.S.L24,0.1911,0.107,1.783,0.075,-0.019,0.401
sigma2,130.4480,15.527,8.402,0.000,100.016,160.880

0,1,2,3
Ljung-Box (L1) (Q):,0.01,Jarque-Bera (JB):,4.59
Prob(Q):,0.92,Prob(JB):,0.1
Heteroskedasticity (H):,2.7,Skew:,0.15
Prob(H) (two-sided):,0.0,Kurtosis:,3.87
