# Homework 1
In this assignment, you'll build an AR/MA model to predict future births trends. 

You'll use the ADF test to check for stationarity, determine which ARMA orders to use, and plot your forecast against known data.

## Imports

In [2]:
import pandas as pd
import numpy as np
import pmdarima
%matplotlib inline

from statsmodels.tsa.arima_model import ARMA,ARMAResults,ARIMA,ARIMAResults
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf
from pmdarima import auto_arima
from statsmodels.tsa.stattools import adfuller

import warnings
warnings.filterwarnings("ignore")

ModuleNotFoundError: No module named 'pmdarima'

## Data

In [None]:
# Load datasets
df1 = pd.read_csv('./DailyTotalFemaleBirths.csv',index_col='Date',parse_dates=True)
df1.index.freq = 'D'
df1 = df1[:120]  # we only want the first four months
df1['Births'].plot(figsize=(12,5));

## Automate the augmented Dickey-Fuller Test

In [2]:
def adf_test(series,title=''):
    """
    Pass in a time series and an optional title, returns an ADF report
    """
    print(f'Augmented Dickey-Fuller Test: {title}')
    result = adfuller(series.dropna(),autolag='AIC') # .dropna() handles differenced data
    
    labels = ['ADF test statistic','p-value','# lags used','# observations']
    out = pd.Series(result[0:4],index=labels)

    for key,val in result[4].items():
        out[f'critical value ({key})']=val
        
    print(out.to_string())          # .to_string() removes the line "dtype: float64"
    
    if result[1] <= 0.05:
        print("Strong evidence against the null hypothesis")
        print("Reject the null hypothesis")
        print("Data has no unit root and is stationary")
    else:
        print("Weak evidence against the null hypothesis")
        print("Fail to reject the null hypothesis")
        print("Data has a unit root and is non-stationary")

### Problem 1: `Run the augmented Dickey-Fuller Test to confirm stationarity`

In [4]:
adf_test(df1['Births'])

Augmented Dickey-Fuller Test: 
ADF test statistic     -9.855384e+00
p-value                 4.373545e-17
# lags used             0.000000e+00
# observations          1.190000e+02
critical value (1%)    -3.486535e+00
critical value (5%)    -2.886151e+00
critical value (10%)   -2.579896e+00
Strong evidence against the null hypothesis
Reject the null hypothesis
Data has no unit root and is stationary


### Problem 2: `Determine the (p,q) ARMA Orders using` <tt>pmdarima.auto_arima</tt>
This tool should give just $p$ and $q$ value recommendations for this dataset.

In [5]:
auto_arima(df1['Births'],seasonal=False).summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,120.0
Model:,"ARMA(2, 2)",Log Likelihood,-405.37
Method:,css-mle,S.D. of innovations,6.991
Date:,"Sat, 23 Mar 2019",AIC,822.741
Time:,12:02:45,BIC,839.466
Sample:,0,HQIC,829.533
,,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,39.8162,0.108,368.841,0.000,39.605,40.028
ar.L1.y,1.8568,0.081,22.933,0.000,1.698,2.016
ar.L2.y,-0.8814,0.073,-12.030,0.000,-1.025,-0.738
ma.L1.y,-1.8634,0.109,-17.126,0.000,-2.077,-1.650
ma.L2.y,0.8634,0.108,8.020,0.000,0.652,1.074

0,1,2,3,4
,Real,Imaginary,Modulus,Frequency
AR.1,1.0533,-0.1582j,1.0652,-0.0237
AR.2,1.0533,+0.1582j,1.0652,0.0237
MA.1,1.0000,+0.0000j,1.0000,0.0000
MA.2,1.1583,+0.0000j,1.1583,0.0000


### Problem 3: `Split the data into train/test sets`
Set the size of your test set to generate a 1-month forecast.

In [6]:
# Set one month for testing
train = df1.iloc[:90]
test = df1.iloc[90:]

### Problem 4: `Fit an ARMA(p,q) Model`

In [7]:
model = ARMA(train['Births'],order=(2,2))
results = model.fit()
results.summary()

0,1,2,3
Dep. Variable:,Births,No. Observations:,90.0
Model:,"ARMA(2, 2)",Log Likelihood,-307.905
Method:,css-mle,S.D. of innovations,7.405
Date:,"Sat, 23 Mar 2019",AIC,627.809
Time:,12:08:30,BIC,642.808
Sample:,01-01-1959,HQIC,633.858
,- 03-31-1959,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,39.7549,0.912,43.607,0.000,37.968,41.542
ar.L1.Births,-0.1850,1.087,-0.170,0.865,-2.315,1.945
ar.L2.Births,0.4352,0.644,0.675,0.501,-0.828,1.698
ma.L1.Births,0.2777,1.097,0.253,0.801,-1.872,2.427
ma.L2.Births,-0.3999,0.679,-0.589,0.557,-1.730,0.930

0,1,2,3,4
,Real,Imaginary,Modulus,Frequency
AR.1,-1.3181,+0.0000j,1.3181,0.5000
AR.2,1.7434,+0.0000j,1.7434,0.0000
MA.1,-1.2718,+0.0000j,1.2718,0.5000
MA.2,1.9662,+0.0000j,1.9662,0.0000


### Problem 5: `Obtain a month's worth of predicted values

In [10]:
start=len(train)
end=len(train)+len(test)-1
predictions = results.predict(start=start, end=end).rename('ARMA(2,2) Predictions')

### Plot predictions against known values

In [None]:
title = 'Daily Total Female Births'
ylabel='Births'

ax = test['Births'].plot(legend=True,figsize=(12,6),title=title)
predictions.plot(legend=True)
ax.autoscale(axis='x',tight=True)
ax.set(xlabel=xlabel, ylabel=ylabel);