# ARIMA

Here’s a complete ARIMA module for your notebook, covering theory, diagnostics, and Python code using statsmodels.

⸻

ARIMA: AutoRegressive Integrated Moving Average

⸻

1. Components of ARIMA
	•	AR (AutoRegression): Uses past values:
$$
Y_t = \phi_1 Y_{t-1} + \dots + \phi_p Y_{t-p} + \epsilon_t
$$
	•	I (Integration): Differencing to make the time series stationary:
$$
Y_t’ = Y_t - Y_{t-1}
$$
	•	MA (Moving Average): Models error as linear combination of past errors:
$$
Y_t = \epsilon_t + \theta_1 \epsilon_{t-1} + \dots + \theta_q \epsilon_{t-q}
$$
	•	Combined: ARIMA(p, d, q)

⸻

2. When to Use ARIMA
	•	Data shows trend but not seasonality.
	•	Stationarity required → use ADF test to check.

⸻

3. Model Fitting Pipeline
	1.	Plot and inspect the series.
	2.	Make stationary (ADF test, differencing).
	3.	Use ACF and PACF to select p and q.
	4.	Fit model and check residuals.
	5.	Forecast future values.

⸻

4. Python: ARIMA with statsmodels

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller

# Simulated data with trend
np.random.seed(0)
time = np.arange(100)
data = 0.1*time + np.random.normal(scale=1.0, size=100)

# ADF Test
result = adfuller(data)
print("ADF Statistic:", result[0])
print("p-value:", result[1])

# Differencing if needed
diff = np.diff(data)

# Fit ARIMA(p=1, d=1, q=1)
model = ARIMA(data, order=(1,1,1))
model_fit = model.fit()
print(model_fit.summary())

# Forecast
forecast = model_fit.forecast(steps=10)
plt.plot(np.arange(len(data)), data, label="Original")
plt.plot(np.arange(len(data), len(data)+10), forecast, label="Forecast")
plt.legend()
plt.title("ARIMA Forecast")
plt.show()



⸻

5. Residual Diagnostics: Ljung-Box Test

Check whether residuals are white noise:

from statsmodels.stats.diagnostic import acorr_ljungbox

residuals = model_fit.resid
ljung_box = acorr_ljungbox(residuals, lags=[10], return_df=True)
print(ljung_box)



⸻

Would you like to include SARIMA for seasonality, or go deeper into ACF/PACF plots for model selection?