### Outline of seasonal ARIMA modeling:

The seasonal part of an ARIMA model has the same structure as the non-seasonal part: it may have an AR factor, an MA factor, and/or an order of differencing. In the seasonal part of the model, all of these factors operate across multiples of lag s (the number of periods in a season).

A seasonal ARIMA model is classified as an ARIMA(p,d,q)x(P,D,Q) model, where P=number of seasonal autoregressive (SAR) terms, D=number of seasonal differences, Q=number of seasonal moving average (SMA) terms

In identifying a seasonal model, the first step is to determine whether or not a seasonal difference is needed, in addition to or perhaps instead of a non-seasonal difference. You should look at time series plots and ACF and PACF plots for all possible combinations of 0 or 1 non-seasonal difference and 0 or 1 seasonal difference. Caution: don't EVER use more than ONE seasonal difference, nor more than TWO total differences (seasonal and non-seasonal combined).

If the seasonal pattern is both strong and stable over time (e.g., high in the Summer and low in the Winter, or vice versa), then you probably should use a seasonal difference regardless of whether you use a non-seasonal difference, since this will prevent the seasonal pattern from "dying out" in the long-term forecasts. Let's add this to our list of rules for identifying models


In [2]:
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot as plt
from pandas.core import datetools
from pandas import Series, DataFrame
import pandas as pd
from pandas.tools.plotting import autocorrelation_plot
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
import statsmodels.api as sm

In [21]:
ts = pd.period_range(start='1/1/1970', end='5/1/1998', freq='M')

In [22]:
df=pd.read_csv('Auto_sales_data.csv')

In [32]:
#Create data index
df['Date'] = ts
df.set_index('Date',inplace=True)
df.drop('Index',axis=1)

Unnamed: 0_level_0,Sales
Date,Unnamed: 1_level_1
1970-01,16.151
1970-02,16.612
1970-03,18.807
1970-04,19.772
1970-05,20.054
1970-06,21.500
1970-07,19.943
1970-08,17.526
1970-09,16.806
1970-10,17.720


In [36]:
#Start by fitting simple first difference model
model=sm.tsa.statespace.SARIMAX(endog=df['Sales'],order=(0,1,1),seasonal_order=(0,1,1,12),trend='c',enforce_invertibility=False)
results=model.fit()
print(results.summary())

                                 Statespace Model Results                                 
Dep. Variable:                              Sales   No. Observations:                  341
Model:             SARIMAX(0, 1, 1)x(0, 1, 1, 12)   Log Likelihood                -624.900
Date:                            Thu, 22 Mar 2018   AIC                           1257.799
Time:                                    14:55:33   BIC                           1273.127
Sample:                                01-31-1970   HQIC                          1263.906
                                     - 05-31-1998                                         
Covariance Type:                              opg                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept      0.0029      0.008      0.342      0.732      -0.014       0.019
ma.L1         -0.5143      0.036   