# About Unit 6

Welcome to the Marquette University AIM time series analysis curriculum! In this unit, you will learn about **Autoregressive Integrated Moving Average (ARIMA) Models**, a versatile framework for time series forecasting. ARIMA combines:
- Autoregression (AR)
- Differencing (Integration, I)
- Moving Averages (MA)

By the end of this unit, you will:
- Understand ARIMA model components.
- Apply the **Box-Jenkins methodology** for model selection.
- Conduct model diagnostics and validation.

# Getting Started

**Import Packages**

Run the following code to bring the necessary packages into your environment. Ensure you are running a Python kernel >=3.0.0.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.stats.diagnostic import acorr_ljungbox

# ARIMA Model Components

### ARIMA
An ARIMA(p, d, q) model combines:
- **Autoregression (AR)**: Uses past values to predict the current value.
- **Integration (I)**: Differencing to make the series stationary.
- **Moving Average (MA)**: Models the relationship between current and past error terms.

The parameters of ARIMA:
- \( p \): Number of AR terms (lags).
- \( d \): Degree of differencing.
- \( q \): Number of MA terms.

In [None]:
# Simulate ARIMA(1,1,1) process
from statsmodels.tsa.arima_process import ArmaProcess

np.random.seed(42)
ar = [1, -0.5]  # AR(1): phi=0.5
ma = [1, 0.3]   # MA(1): theta=0.3
arma_process = ArmaProcess(ar=ar, ma=ma)
data_arima = arma_process.generate_sample(nsample=100)

# Differencing to simulate ARIMA(1,1,1)
data_arima = np.cumsum(data_arima)

# Plot the simulated ARIMA(1,1,1) data
plt.plot(data_arima)
plt.title('Simulated ARIMA(1,1,1) Process')
plt.show()

# Box-Jenkins Methodology

The Box-Jenkins methodology provides a systematic approach to build ARIMA models:

1. **Model Identification**:
   - Use plots of the series, ACF, and PACF to identify \( p, d, q \).
   - Ensure the series is stationary (apply differencing if necessary).

2. **Model Estimation**:
   - Fit the ARIMA model with chosen parameters.

3. **Model Diagnostics**:
   - Analyze residuals to ensure no patterns remain.
   - Perform Ljung-Box test to confirm residuals are white noise.

In [None]:
# Example: Fitting an ARIMA Model
# Simulated data
data = data_arima

# Plot ACF and PACF
plot_acf(data, lags=20)
plot_pacf(data, lags=20)
plt.show()

# Fit ARIMA(1,1,1)
model = ARIMA(data, order=(1, 1, 1))
fitted_model = model.fit()

# Print summary
print(fitted_model.summary())

# Model Diagnostics

### Residual Analysis
Residuals from a well-fitted ARIMA model should:
- Be uncorrelated (white noise).
- Have constant variance.

### Ljung-Box Test
The Ljung-Box test checks whether residuals are independent (uncorrelated):

- \( H_0 \): Residuals are uncorrelated.
- \( H_1 \): Residuals are correlated.

In [None]:
# Residual Analysis
residuals = fitted_model.resid

# Plot residuals
plt.figure(figsize=(10, 6))
plt.plot(residuals)
plt.title('Residuals of ARIMA(1,1,1)')
plt.show()

# Perform Ljung-Box test
ljung_box_results = acorr_ljungbox(residuals, lags=[10], return_df=True)
print(ljung_box_results)

# Model Refinement and Validation

If diagnostics reveal issues (e.g., patterns in residuals or significant Ljung-Box test results):
- Adjust the parameters \( p, d, q \).
- Refit the model and repeat diagnostics.

Validation ensures that the model generalizes well to unseen data.

In [None]:
# Forecasting with ARIMA
forecast_steps = 10
forecast = fitted_model.get_forecast(steps=forecast_steps)
forecast_mean = forecast.predicted_mean
forecast_ci = forecast.conf_int()

# Plot forecast
plt.plot(data, label='Original Data')
plt.plot(range(len(data), len(data) + forecast_steps), forecast_mean, label='Forecast', color='red')
plt.fill_between(range(len(data), len(data) + forecast_steps),
                 forecast_ci.iloc[:, 0], forecast_ci.iloc[:, 1],
                 color='pink', alpha=0.3, label='95% CI')
plt.legend()
plt.title('ARIMA Forecast')
plt.show()

# Summary

In this unit, you learned about **ARIMA Models**:
- ARIMA combines AR, MA, and differencing to handle non-stationary data.
- The Box-Jenkins methodology guides model identification, estimation, and diagnostics.
- Diagnostics such as residual analysis and Ljung-Box tests ensure the model is well-fitted.

ARIMA models are versatile tools for forecasting in time series analysis.