# Store Sales Forecasting using AutoARIMA & Prophet

## Problem Statement
Retail businesses require accurate sales forecasts to optimize inventory planning,
staffing, and revenue strategies. Poor forecasting can lead to stock shortages
or overstocking.

## Objective
- Download dataset programmatically using kagglehub
- Aggregate daily sales into monthly totals
- Perform stationarity testing
- Apply AutoARIMA for automatic parameter selection
- Compare results with Prophet
- Evaluate models using MAE and RMSE


In [None]:
import kagglehub
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf

!pip install pmdarima
from pmdarima import auto_arima
from prophet import Prophet
import os

plt.rcParams["figure.figsize"] = (10,5)

In [None]:
# Download dataset
path = kagglehub.dataset_download("rohitsahoo/sales-forecasting")
print("Dataset Path:", path)

print("Files inside dataset folder:")
print(os.listdir(path))

csv_files = [f for f in os.listdir(path) if f.endswith(".csv")]
print("CSV files:", csv_files)
csv_file_path = os.path.join(path, csv_files[0])

df = pd.read_csv(csv_file_path)
df.head()

In [None]:
df.info()

In [None]:
df.columns

In [None]:
#Data Cleaning
df["Order Date"] = pd.to_datetime(df["Order Date"],format="%d/%m/%Y")
df = df.groupby("Order Date")["Sales"].sum().reset_index()
df.set_index("Order Date", inplace=True)

# Convert daily to monthly sales
df = df.resample("M").sum()

df.head(5)

In [None]:
df.isnull().sum()

In [None]:
#Sales Trend Visualization
plt.figure()
plt.plot(df.index, df["Sales"])
plt.title("Monthly Store Sales")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.show()

In [None]:
seasonal_decompose = seasonal_decompose(df["Sales"], model="additive", period=12)
seasonal_decompose.plot()
plt.show()

In [None]:
#Sationary Test Using ADF
result = adfuller(df["Sales"])
print("ADF Statistic:", result[0])
print("p-value:", result[1])

In [None]:
#NH: data is non-stationary
#AH: data is stationary
if result[1]<0.05:
  print("reject NH : indicating data is stationary")
elif result[1]>0.05:
  print("Fail to Reject NH: hence data is non stationary")

A low p-value in ADF test indicates we can reject the null hypothesis of non-stationarity, meaning the time series is stationary and suitable for ARIMA modeling.

In [None]:
#Train-Test Split
train = df["Sales"][:-6]
test = df["Sales"][-6:]

print("Train size:", len(train))
print("Test size:", len(test))


# Auto-ARIMA Model
AutoARIMA performs differencing automatically in case the data is non-stationary.

In [None]:
model_arima = auto_arima(
    train,
    seasonal=True,
    m=12,
    trace=True,
    suppress_warnings=True
)

forecast_arima = model_arima.predict(n_periods=6)  #n_periods must match the size of the test_Set

mae_arima = mean_absolute_error(test, forecast_arima)
rmse_arima = np.sqrt(mean_squared_error(test, forecast_arima))

print("AutoARIMA MAE:", mae_arima)
print("AutoARIMA RMSE:", rmse_arima)

# Prophet Model

In [None]:
#Prophet needs data in a DataFrame schema whith ds and y as column names
prophet_df = df.reset_index()
prophet_df.columns = ["ds", "y"]

train_prophet = prophet_df[:-6]

model_prophet = Prophet()
model_prophet.fit(train_prophet)

future = model_prophet.make_future_dataframe(periods=6, freq="M")
forecast = model_prophet.predict(future)

forecast_prophet = forecast["yhat"][-6:].values  #Prophet forecast includes historical + future,we need only last 6 months

mae_prophet = mean_absolute_error(test, forecast_prophet)
rmse_prophet = np.sqrt(mean_squared_error(test, forecast_prophet))

print("Prophet MAE:", mae_prophet)
print("Prophet RMSE:", rmse_prophet)


In [None]:
#Forecast Comparison
plt.figure()
plt.plot(train.index, train, label = 'Train')
plt.plot(test.index,  test, label = 'Actual')
plt.plot(test.index, forecast_arima, label= 'AutoARIMA Forecast')
plt.plot(test.index, forecast_prophet, label = 'Prophet Forecast')
plt.title('Forecast Comparison')
plt.legend()
plt.xlabel('Date')
plt.ylabel('Sales')
plt.grid(True)
plt.show()


# Conclusion
* Both AutoARIMA and Prophet successfully captured the overall upward sales trend.
* However, neither model was able to fully capture the sharp volatility observed in the test period.
* Prophet demonstrated slightly better adaptability to trend changes, while AutoARIMA produced smoother and more conservative forecasts.
* The large deviations suggest that external factors such as promotions/holidays may need to be incorporated for improved forecasting accuracy.

