# Time Series Forecasting with SARIMA

This notebook demonstrates how to fit a Seasonal ARIMA (SARIMA) model to a time series and generate forecasts using the `statsmodels` library. We will use the classic **AirPassengers** dataset, which contains monthly totals of international airline passengers from 1949 to 1960. The series exhibits clear seasonality and trend, making it an ideal example for SARIMA models.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Download the AirPassengers dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
df = pd.read_csv(url)
df.rename(columns={"Passengers": "value", "Month": "date"}, inplace=True)

# Convert to datetime and set index
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
df.set_index('date', inplace=True)
series = df['value'].astype(float)

# Plot the original series
series.plot(figsize=(10, 4), title='Monthly International Airline Passengers')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.show()

## Fit a SARIMA model

We will fit a SARIMA model with non-seasonal order \(1, 1, 1\) and seasonal order \(1, 1, 1, 12\). These orders specify the autoregressive \(p\), differencing \(d\), and moving average \(q\) terms, as well as the seasonal counterparts and the seasonal period (12 for monthly data).

In [None]:
# Define SARIMA order and seasonal order
order = (1, 1, 1)
seasonal_order = (1, 1, 1, 12)

# Fit the model
model = SARIMAX(series, order=order, seasonal_order=seasonal_order, enforce_stationarity=False, enforce_invertibility=False)
results = model.fit(disp=False)

# Display the model summary
results.summary()

## Forecast future values

After fitting the model, we can forecast the next 24 months. We also compute the 95\% confidence intervals for the forecast.

In [None]:
# Forecast 24 future periods
forecast_steps = 24
forecast = results.get_forecast(steps=forecast_steps)

# Create forecast index
forecast_index = pd.date_range(series.index[-1], periods=forecast_steps + 1, freq=pd.infer_freq(series.index)).drop(series.index[-1])
forecast_series = pd.Series(forecast.predicted_mean, index=forecast_index)
conf_int = forecast.conf_int()
conf_int.index = forecast_index

# Plot historical data and forecast
plt.figure(figsize=(10, 5))
plt.plot(series.index, series.values, label='Historical')
plt.plot(forecast_series.index, forecast_series.values, label='Forecast', color='C1')

# Confidence intervals
plt.fill_between(forecast_series.index, conf_int.iloc[:, 0], conf_int.iloc[:, 1], color='C1', alpha=0.3, label='95% CI')

plt.title('SARIMA Forecast')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.legend()
plt.show()

You can experiment with different values of `(p, d, q)` and `(P, D, Q, s)` to improve the model, and you can replace the AirPassengers dataset with your own time-series data by loading it into `series` before fitting the model.