# __AR(I)MA__

In [2]:
import pandas as pd
import numpy as np

from statsmodels.tsa.arima_model import ARMA,ARMAResults,ARIMA,ARIMAResults
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf
from pmdarima import auto_arima # on order to determine ARIMA orders

import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

In [3]:
df1 = pd.read_csv('../data/DailyTotalFemaleBirths.csv',index_col='Date',parse_dates=True)
df1.index.freq = 'D'
df1 = df1[:120]  # we only want the first four months

df2 = pd.read_csv('../data/TradeInventories.csv',index_col='Date',parse_dates=True)
df2.index.freq='MS'

In [4]:
print(df1.info())
df1.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 120 entries, 1959-01-01 to 1959-04-30
Freq: D
Data columns (total 1 columns):
Births    120 non-null int64
dtypes: int64(1)
memory usage: 1.9 KB
None


Unnamed: 0_level_0,Births
Date,Unnamed: 1_level_1
1959-01-01,35
1959-01-02,32
1959-01-03,30
1959-01-04,31
1959-01-05,44


In [5]:
print(df2.info())
df2.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 264 entries, 1997-01-01 to 2018-12-01
Freq: MS
Data columns (total 1 columns):
Inventories    264 non-null int64
dtypes: int64(1)
memory usage: 4.1 KB
None


Unnamed: 0_level_0,Inventories
Date,Unnamed: 1_level_1
1997-01-01,1301161
1997-02-01,1307080
1997-03-01,1303978
1997-04-01,1319740
1997-05-01,1327294


### __Test / train split__
Apart from providing data, there is no room to tweak time series forecasts with any feature engineering. Hence, the risk of overfitting to the existing dataset is little, which is ẃhy we do not split the dataset into train / validation / test here, but only into train and test data.

Rule of thumb: set the length of your test set equal to your intended forecast size. Here: 1 month

In [6]:
train = df1.iloc[:90]
test = df1.iloc[90:]

### __Fit ARMA(p,q) model__
Also check out help(ARMA) to learn what incoming arguments are available/expected, and what's being returned.

In [7]:
model = ARMA(train['Births'],order=(2,2))
results = model.fit()
results.summary()

0,1,2,3
Dep. Variable:,Births,No. Observations:,90.0
Model:,"ARMA(2, 2)",Log Likelihood,-307.905
Method:,css-mle,S.D. of innovations,7.405
Date:,"Fri, 10 Jul 2020",AIC,627.809
Time:,19:02:11,BIC,642.808
Sample:,01-01-1959,HQIC,633.858
,- 03-31-1959,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,39.7549,0.912,43.607,0.000,37.968,41.542
ar.L1.Births,-0.1850,1.087,-0.170,0.865,-2.315,1.945
ar.L2.Births,0.4352,0.644,0.675,0.501,-0.828,1.698
ma.L1.Births,0.2777,1.097,0.253,0.801,-1.872,2.427
ma.L2.Births,-0.3999,0.679,-0.589,0.557,-1.730,0.930

0,1,2,3,4
,Real,Imaginary,Modulus,Frequency
AR.1,-1.3181,+0.0000j,1.3181,0.5000
AR.2,1.7434,+0.0000j,1.7434,0.0000
MA.1,-1.2718,+0.0000j,1.2718,0.5000
MA.2,1.9662,+0.0000j,1.9662,0.0000


### __Predicted values for single month__

__What are we doing here?__

From the previous cell, our result object is aware of our training dataset. If we apply the method '.predict' to it, then 'start' indicates the index-based location, from where on we would like to make our forecast. For that, compare the last index value of the df 'train' which is 1959-03-31 and then the first index value of 'prediction', which is 1959-04-01. You can see that the '.predict' method deduces the last index value and the index granularity of our dataset from our results object and continues the dataset accordingly until it reaches the index-based location indicated by 'end'. It is possible to have the train set and '.predict' set overlapping but not to establish a gap between the two. E.g. try to run the below cell with start=len(train)-10 and with start=len(train)+1.

In [24]:
start=len(train)
end=len(train)+len(test)-1
predictions = results.predict(start=start, end=end).rename('ARMA(2,2) Predictions')

In [25]:
train

Unnamed: 0_level_0,Births
Date,Unnamed: 1_level_1
1959-01-01,35
1959-01-02,32
1959-01-03,30
1959-01-04,31
1959-01-05,44
...,...
1959-03-27,56
1959-03-28,36
1959-03-29,32
1959-03-30,50


In [26]:
predictions

1959-04-01    39.982226
1959-04-02    39.992613
1959-04-03    39.809832
1959-04-04    39.848174
1959-04-05    39.761539
1959-04-06    39.794255
1959-04-07    39.750500
1959-04-08    39.772833
1959-04-09    39.749661
1959-04-10    39.763667
1959-04-11    39.750991
1959-04-12    39.759432
1959-04-13    39.752354
1959-04-14    39.757337
1959-04-15    39.753335
1959-04-16    39.756244
1959-04-17    39.753964
1959-04-18    39.755651
1959-04-19    39.754347
1959-04-20    39.755323
1959-04-21    39.754575
1959-04-22    39.755138
1959-04-23    39.754708
1959-04-24    39.755032
1959-04-25    39.754785
1959-04-26    39.754972
1959-04-27    39.754830
1959-04-28    39.754938
1959-04-29    39.754856
1959-04-30    39.754918
Freq: D, Name: ARMA(2,2) Predictions, dtype: float64