<a href="https://colab.research.google.com/github/Seetureddy/SARIMA_Model/blob/main/SARIMA__MODEL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller

from sklearn.metrics import mean_absolute_error, mean_squared_error

import warnings
warnings.filterwarnings("ignore")

LOAD THE DATASET

In [None]:
df = pd.read_csv('/content/sample_data/price_data.csv')
df.head()

DATA PROCESSING


In [None]:
df.index = pd.to_datetime(df.index)
df = df.sort_index()

###MISSING VALUES HANDLINGS

Checking if any monthly prices are missing

In [None]:
df.isnull().sum()


Outlier detection

In [None]:
Q1 = df['avg_monthly_price'].quantile(0.25)
Q3 = df['avg_monthly_price'].quantile(0.75)
IQR = Q3 - Q1

Lower = Q1 - 1.5*IQR
Upper = Q3 + 1.5*IQR

Outliers = df[(df['avg_monthly_price'] < Lower) & (df['avg_monthly_price'] > Upper)]
Outliers.head(5)

Trend Visualization

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(df['avg_monthly_price'],label="Average Monthly Price" )
plt.title("Monthly Price Trend")
plt.xlabel("Date")
plt.ylabel("Price")
plt.legend()
plt.show()


##Seasonality Analysis

In [None]:
df['month'] = df.index.month



In [None]:
plt.figure(figsize=(10,4))
sns.boxplot(x='month', y ='avg_monthly_price', data=df)
plt.title("Monthly Price Boxplot")
plt.xlabel("Month")
plt.ylabel("Price")
plt.show()

Stationary Test

Interpretation:

p-value > 0.05 → data is non-stationary

Differencing is required

SARIMA handles differencing internally.

In [None]:
result = adfuller(df['avg_monthly_price'])
print('ADF Statisticsz:', result[0])
print('p-value:', result[1])

## Time_based Split
usually in Normal ML models we use
X_train, X_test, y_train, y_test = train_test_split(X, y)
Why this works?

Rows are independent
Order does not matter
Shuffling is allowed

Why This DOES NOT Work for Time Series ❌
Model sees future data while training
This causes data leakage
Forecast becomes unrealistic

In SARIMA / ARIMA, you do NOT manually create X.

X = past values of y (lags)
y = current value

In [None]:
train = df.iloc[:-12]

test = df.iloc[-12:]

## Creating and training model
What do these parameters mean?

(1,1,1) → short-term dependency

(1,1,1,12) → yearly seasonality

12 → monthly data

In [None]:
model = SARIMAX(train['avg_monthly_price'],
    order=(1,1,1),
    seasonal_order=(1,1,1,12),
    enforce_stationarity=False,
    enforce_invertibility=False)

model_fit = model.fit()
model_fit.summary

##Forescast on test data
here we dont use test data directly, the model itself predicts the data for next values based on last observed values( & pattern ), we only use the test data for finding accuarcy and validation..

Forecast_mean = Forecast.predicted_mean: This extracts the central predicted values (the average monthly prices) from the Forecast object for the 12 steps.
Forecast_ci = Forecast.conf_int(): This calculates and extracts the confidence intervals for each of the 12 forecasted steps. These intervals provide a range within which the actual future values are expected to fall, indicating the uncertainty of the prediction.

In [None]:
Forecast = model_fit.get_forecast(steps = 12)
Forecast_mean = Forecast.predicted_mean
Forecast_ci = Forecast.conf_int()
Forecast_ci

## Forecast vs Actual Visualization

In [None]:
plt.figure(figsize=(12,5))
plt.plot(test.index, test['avg_monthly_price'], label="Actual")
plt.plot(test.index, Forecast_mean, label="Forecast")
plt.fill_between(
    test.index,
    Forecast_ci.iloc[:,0],
    Forecast_ci.iloc[:,1],
    alpha=0.3
)
plt.title("Actual vs Forecast")
plt.legend()
plt.show()


##Model Evaluation

In [None]:
MAE = mean_absolute_error(test['avg_monthly_price'], Forecast_mean)
RMSE = np.sqrt(mean_squared_error(test['avg_monthly_price'], Forecast_mean))
mape = np.mean(np.abs((test['avg_monthly_price'] - Forecast_mean) / test['avg_monthly_price'])) * 100

print("Mean Absolute Error (MAE):", MAE)
print("Root Mean Squared Error (RMSE):", RMSE)
print("Mean Absolute Percentage Error (MAPE):", mape)


MAPE Range	        Interpretation
< 5%	Excellent     (rare in real markets)
5-10%	              Very good
10-15%	            Good / Acceptable ✅
15-20%	            Usable with caution
> 20%	              Needs improvement

Business insight:

Rising trend → hedge costs, adjust pricing(Increase safety stock before price peaks,Gradual price increase instead of sudden hikes)

Falling trend → delay procurement(Shift to short-term contracts,Run price-sensitive promotions)

Wide CI → higher risk(Scenario planning)