# Time Series Forecasting with ARIMA

## 1. Introduction
This notebook demonstrates a time series forecasting project using the classic Airline Passengers dataset. The goal is to predict the number of monthly airline passengers using an ARIMA model. The project covers data loading, visualization, time series decomposition, model training, and evaluation.

## 2. Data Loading and Preparation

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error

# Load the Airline Passengers dataset
data = pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv', index_col='Month', parse_dates=True)
df = data.rename(columns={'Passengers': 'n_passengers'})

df.head()

## 3. Exploratory Data Analysis (EDA)

In [None]:
# Plot the time series data
plt.figure(figsize=(12, 6))
plt.plot(df.index, df['n_passengers'])
plt.title('Monthly Airline Passengers')
plt.xlabel('Date')
plt.ylabel('Number of Passengers')
plt.grid(True)
plt.show()

In [None]:
# Decompose the time series
decomposition = seasonal_decompose(df['n_passengers'], model='multiplicative')
fig = decomposition.plot()
fig.set_size_inches(14, 7)
plt.show()

## 4. Stationarity Check

In [None]:
# Perform Dickey-Fuller test to check for stationarity
def test_stationarity(timeseries):
    result = adfuller(timeseries, autolag='AIC')
    print('Dickey-Fuller Test:')
    print(f'Test Statistic: {result[0]}')
    print(f'p-value: {result[1]}')
    print('Critical Values:')
    for key, value in result[4].items():
        print(f'\t{key}: {value}')

test_stationarity(df['n_passengers'])

The p-value is greater than 0.05, so we fail to reject the null hypothesis, indicating that the series is non-stationary. We need to apply differencing.

In [None]:
# Apply first-order differencing
df['diff_1'] = df['n_passengers'].diff().dropna()
test_stationarity(df['diff_1'].dropna())

## 5. ARIMA Model

In [None]:
# Plot ACF and PACF to determine ARIMA parameters
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
plot_acf(df['diff_1'].dropna(), ax=ax1, lags=40)
plot_pacf(df['diff_1'].dropna(), ax=ax2, lags=40)
plt.show()

In [None]:
# Build and train the ARIMA model
# Based on the ACF/PACF plots, we can try p=2, d=1, q=2
model = ARIMA(df['n_passengers'], order=(2, 1, 2))
results = model.fit()
print(results.summary())

## 6. Forecasting and Evaluation

In [None]:
# Forecast future values
forecast = results.get_forecast(steps=36)
forecast_ci = forecast.conf_int()

# Plot the forecast
plt.figure(figsize=(12, 6))
plt.plot(df.index, df['n_passengers'], label='Observed')
plt.plot(forecast.predicted_mean.index, forecast.predicted_mean, color='r', label='Forecast')
plt.fill_between(forecast_ci.index, forecast_ci.iloc[:, 0], forecast_ci.iloc[:, 1], color='pink')
plt.title('Airline Passengers Forecast')
plt.legend()
plt.show()

## 7. Conclusion
The ARIMA model successfully captures the trend and seasonality in the airline passenger data, providing a reasonable forecast for future values. The model's performance can be further improved by exploring seasonal ARIMA (SARIMA) models, which are specifically designed to handle seasonality.