# ARIMA - Chapter 3
## AR or MA
In this exercise, you will use the ACF and PACF to decide whether some data is best suited to an MA model or an AR model. Selecting the right model order is of great importance to our predictions.

### Expected Behavior of ACF and PACF:
| Model | ACF | PACF |
|---|---|---|
| AR(p) | Tails off | Cuts off after lag p |
| MA(q) | Cuts off after lag q | Tails off |
| ARMA(p,q) | Tails off | Tails off |


In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt

# Create figure
fig, (ax1, ax2) = plt.subplots(2,1, figsize=(12,8))

# Plot the ACF of df
plot_acf(df, lags=10, zero=False, ax=ax1)

# Plot the PACF of df
plot_pacf(df, lags=10, zero=False, ax=ax2)

plt.show()

## Order of Earthquakes
In this exercise, you will use the ACF and PACF plots to decide on the most appropriate order to forecast the earthquakes time series.

In [None]:
# Create figure
fig, (ax1, ax2) = plt.subplots(2,1, figsize=(12,8))

# Plot ACF and PACF
plot_acf(earthquake, lags=15, zero=False, ax=ax1)
plot_pacf(earthquake, lags=15, zero=False, ax=ax2)

# Show plot
plt.show()

## Creating and Training a SARIMAX Model
Now, you will create and train a SARIMAX model for the earthquake time series.

In [None]:
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Instantiate model
model = SARIMAX(earthquake, order =(1,0,0))

# Train model
results = model.fit()

## Searching over Model Order
In this exercise, you will perform a search over different values of `p` and `q` to find the best model order using AIC and BIC.n order to choose the best order for this model you are going to have to do a search over lots of potential model orders to find the best set.
The SARIMAX model class and the time series DataFrame df are available in your environment.


In [None]:
# Create empty list to store search results
order_aic_bic=[]

# Loop over p values from 0-2
for p in range(3):
    # Loop over q values from 0-2
    for q in range(3):
        # Create and fit ARMA(p,q) model
        model = SARIMAX(df, order=(p,0,q))
        results = model.fit()

        # Append order and results tuple
        order_aic_bic.append((p,q,results.aic, results.bic))

## Choosing Order with AIC and BIC
Now that you have performed a search over many model orders, you will evaluate your results to find the best model order.

In [None]:
import pandas as pd

# Construct DataFrame from order_aic_bic
order_df = pd.DataFrame(order_aic_bic, columns=['p', 'q', 'AIC', 'BIC'])

# Print order_df in order of increasing AIC
print(order_df.sort_values('AIC'))

# Print order_df in order of increasing BIC
print(order_df.sort_values('BIC'))

## Mean Absolute Error
In this exercise you will apply an AIC-BIC order search for the earthquakes time series. In the last lesson you decided that this dataset looked like an AR(1) process. You will do a grid search over parameters to see if you get the same results. The ACF and PACF plots for this dataset are shown below.
Obviously, before you use the model to predict, you want to know how accurate your predictions are. The mean absolute error (MAE) is a good statistic for this. It is the mean difference between your predictions and the true values.

In this exercise you will calculate the MAE for an ARMA(1,1) model fit to the earthquakes time series.

Before using the model for predictions, you should check its accuracy using the Mean Absolute Error (MAE).

In [None]:
import numpy as np

# Fit model
model = SARIMAX(earthquake, order=(1,0,1))
results = model.fit()

# Calculate the mean absolute error from residuals
mae = np.mean(np.abs(results.resid))

# Print mean absolute error
print(mae)

# Make plot of time series for comparison
earthquake.plot()
plt.show()

## Diagnostic Summary Statistics
It is important to know when you need to go back to the drawing board in model design. In this exercise you will use the residual test statistics in the results summary to decide whether a model is a good fit to a time series.

Using residual test statistics to decide whether a model is a good fit to a time series.

Here is a reminder of the tests in the model summary:

Test	Null hypothesis	P-value name
Ljung-Box	There are no correlations in the residual
Prob(Q)
Jarque-Bera	The residuals are normally distributed	Prob(JB)
An unknown time series df and the SARIMAX model class are available for you in your environment.

In [None]:
# Create and fit model
model1 = SARIMAX(df, order=(3,0,1))
results1 = model1.fit()

# Print summary
print(results1.summary())

## Plot Diagnostics
Create diagnostic plots to evaluate the fit of the model.
It is important to know when you need to go back to the drawing board in model design. In this exercise you will use 4 common plots to decide whether a model is a good fit to some data.

Here is a reminder of what you would like to see in each of the plots for a model that fits well:

Test	Good fit
Standardized residual	There are no obvious patterns in the residuals
Histogram plus kde estimate	The KDE curve should be very similar to the normal distribution
Normal Q-Q	Most of the data points should lie on the straight line
Correlogram	95% of correlations for lag greater than zero should not be significant

In [None]:
# Create and fit model
model = SARIMAX(df, order=(1,1,1))
results = model.fit()

# Create the 4 diagnostics plots
results.plot_diagnostics()
plt.show()

## Identification using Box-Jenkins Methodology
Use the Dickey-Fuller test to check for stationarity.
In the following exercises you will apply to the Box-Jenkins methodology to go from an unknown dataset to a model which is ready to make forecasts.

You will be using a new time series. This is the personal savings as % of disposable income 1955-1979 in the US.

The first step of the Box-Jenkins methodology is Identification. In this exercise you will use the tools at your disposal to test whether this new time series is stationary.

The time series has been loaded in as a DataFrame savings and the adfuller() function has been imported.

Plot the time series using the DataFrame's .plot() method.
Apply the Dicky-Fuller test to the 'savings' column of the savings DataFrame and assign the test outcome to result.
Print the Dicky-Fuller test statistics and the associated p-value.


In [None]:
from statsmodels.tsa.stattools import adfuller

# Plot time series
savings.plot()
plt.show()

# Run Dickey-Fuller test
result = adfuller(savings['savings'])

# Print test statistic
print(result[0])

# Print p-value
print(result[1])

# **Identification II**
 You learned that the savings time series is stationary without differencing. Now that you have this information you can try and identify what order of model will be the best fit.

The plot_acf() and the plot_pacf() functions have been imported and the time series has been loaded into the DataFrame savings.

Make a plot of the ACF, for lags 1-10 and plot it on axis ax1.
Do the same for the PACF.

In [None]:
# Create figure
fig, (ax1, ax2) = plt.subplots(2,1, figsize=(12,8))

# Plot the ACF of savings on ax1
plot_acf(savings, lags=10, zero=False, ax=ax1)

# Plot the PACF of savings on ax2
plot_pacf(savings, lags=10, zero=False, ax=ax2)

plt.show()

# Estimation

In the last exercise, the ACF and PACF were a little inconclusive. The results suggest your data could be an ARMA(p,q) model or could be an imperfect AR(3) model. In this exercise you will search over models over some model orders to find the best one according to AIC.

The time series savings has been loaded and the SARIMAX class has been imported into your environment.
Loop over values of p from 0 to 3 and values of q from 0 to 3.
Inside the loop, create an ARMA(p,q) model with a constant trend.
Then fit the model to the time series savings.
At the end of each loop print the values of p and q and the AIC and BIC.

In [None]:
# Loop over p values from 0-3
for p in range(4):

  # Loop over q values from 0-3
    for q in range(4):
      try:
        # Create and fit ARMA(p,q) model
        model = SARIMAX(savings, order=(p,0,q), trend='c')
        results = model.fit()

        # Print p, q, AIC, BIC
        print(p, q, results.aic, results.bic)

      except:
        print(p, q, None, None)

# Diagnostics
You have arrived at the model diagnostic stage. So far you have found that the initial time series was stationary, but may have one outlying point. You identified promising model orders using the ACF and PACF and confirmed these insights by training a lot of models and using the AIC and BIC.

You found that the ARMA(1,2) model was the best fit to our data and now you want to check over the predictions it makes before you would move it into production.

The time series savings has been loaded and the SARIMAX class has been imported into your environment.

Retrain the ARMA(1,2) model on the time series, setting the trend to constant.
Create the 4 standard diagnostics plots.
Print the model residual summary statistics.

You can make the 4 standard diagnostics plots by using the .plot_diagnostics() method of the fitted model results object.
The residual test statistics can be found by using the.summary() method of the fitted model results object.

In [None]:
# Create and fit model
model = SARIMAX(savings, order=(1,0,2), trend='c')
results = model.fit()

# Create the 4 diagostics plots
results.plot_diagnostics()
plt.show()

# Print summary
print(results.summary())