<a href="https://colab.research.google.com/github/azhgh22/Walmart-Recruiting-Store-Sales-Forecasting/blob/main/notebooks/02_linear_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overview

In this notebook, we explore classic linear time series models — **ARIMA**, **SARIMA**, and **Prophet** — to model and forecast sales data.

We will apply each model **independently to each time series** (e.g., per Store/Department).

The primary goal here is to:

- Evaluate how well these classical models perform in this sales forecasting context.
- Compare their strengths and limitations.
- Use them as baseline references for future modeling approaches.

We will not focus on extensive hyperparameter tuning in this notebook. Instead, our efforts will be directed toward **data cleaning** and **feature engineering**, as these often have a great impact on model quality in time series tasks.



# Notebook Setup

The following setup is provided as a basic example for initializing the notebook environment. It includes necessary imports, optional configuration, and a placeholder for data loading or downloading.

This section is **not part of the core model logic**, and the code here may vary depending on your environment or data access method.

## Setup Environment


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
from google.colab import userdata
token = userdata.get('GITHUB_TOKEN')
user_name = userdata.get('GITHUB_USERNAME')
mail = userdata.get('GITHUB_MAIL')

!git config --global user.name "{user_name}"
!git config --global user.email "{mail}"
!git clone https://{token}@github.com/azhgh22/Walmart-Recruiting-Store-Sales-Forecasting.git

%cd Walmart-Recruiting-Store-Sales-Forecasting

Cloning into 'Walmart-Recruiting-Store-Sales-Forecasting'...
remote: Enumerating objects: 312, done.[K
remote: Counting objects: 100% (103/103), done.[K
remote: Compressing objects: 100% (84/84), done.[K
remote: Total 312 (delta 50), reused 46 (delta 19), pack-reused 209 (from 1)[K
Receiving objects: 100% (312/312), 6.81 MiB | 14.32 MiB/s, done.
Resolving deltas: 100% (147/147), done.
/content/Walmart-Recruiting-Store-Sales-Forecasting


In [3]:
!pip install -r requirements.txt

Collecting onnx (from -r requirements.txt (line 3))
  Downloading onnx-1.18.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting dagshub (from -r requirements.txt (line 8))
  Downloading dagshub-0.5.10-py3-none-any.whl.metadata (12 kB)
Collecting mlflow (from -r requirements.txt (line 9))
  Downloading mlflow-3.1.1-py3-none-any.whl.metadata (29 kB)
Collecting neuralforecast (from -r requirements.txt (line 10))
  Downloading neuralforecast-3.0.2-py3-none-any.whl.metadata (14 kB)
Collecting appdirs>=1.4.4 (from dagshub->-r requirements.txt (line 8))
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting dacite~=1.6.0 (from dagshub->-r requirements.txt (line 8))
  Downloading dacite-1.6.0-py3-none-any.whl.metadata (14 kB)
Collecting gql[requests] (from dagshub->-r requirements.txt (line 8))
  Downloading gql-3.5.3-py2.py3-none-any.whl.metadata (9.4 kB)
Collecting dataclasses-json (from dagshub->-r requirements.txt (line 8))
  Dow

In [4]:
from google.colab import userdata
kaggle_json_path = userdata.get('KAGGLE_JSON_PATH')
! ./src/data_loader.sh -f {kaggle_json_path}

Setting up Kaggle credentials...
Ensuring data directory exists at 'data/'...
Downloading data from Kaggle for competition: 'walmart-recruiting-store-sales-forecasting'...
Downloading walmart-recruiting-store-sales-forecasting.zip to data
  0% 0.00/2.70M [00:00<?, ?B/s]
100% 2.70M/2.70M [00:00<00:00, 631MB/s]
Unzipping files...
Archive:  walmart-recruiting-store-sales-forecasting.zip
  inflating: features.csv.zip        
  inflating: sampleSubmission.csv.zip  
  inflating: stores.csv              
  inflating: test.csv.zip            
  inflating: train.csv.zip           
Archive:  features.csv.zip
  inflating: features.csv            
Archive:  sampleSubmission.csv.zip
  inflating: sampleSubmission.csv    
Archive:  test.csv.zip
  inflating: test.csv                
Archive:  train.csv.zip
  inflating: train.csv               
Data downloaded and extracted successfully to 'data/'.


In [5]:
from sklearn import set_config
set_config(transform_output="pandas")

In [6]:
!wandb login

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 
[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mzhorzholianimate[0m ([33mMLBeasts[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


## Load and Split Data

In [55]:
from src import data_loader, processing
import importlib
importlib.reload(processing)

dataframes = data_loader.load_raw_data()
df = processing.run_preprocessing(dataframes, process_test=False)['train']
X_train, y_train, X_valid, y_valid = processing.split_data(df, separate_target=True)

print(f"Shapes of X_train and y_train: {X_train.shape}, {y_train.shape}")
print(f"Shapes of X_valid and y_valid: {X_valid.shape}, {y_valid.shape}")

Data loading complete.
Shapes of X_train and y_train: (279085, 15), (279085,)
Shapes of X_valid and y_valid: (142485, 15), (142485,)


In [56]:
X_train

Unnamed: 0,Store,Dept,Date,IsHoliday,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,Type,Size
0,1,1,2010-02-05,False,42.31,2.572,,,,,,211.096358,8.106,A,151315
1,1,2,2010-02-05,False,42.31,2.572,,,,,,211.096358,8.106,A,151315
2,1,3,2010-02-05,False,42.31,2.572,,,,,,211.096358,8.106,A,151315
3,1,4,2010-02-05,False,42.31,2.572,,,,,,211.096358,8.106,A,151315
4,1,5,2010-02-05,False,42.31,2.572,,,,,,211.096358,8.106,A,151315
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
279080,45,93,2011-11-25,True,48.71,3.492,140.87,384.82,26961.99,28.59,1110.12,188.350400,8.523,B,118221
279081,45,94,2011-11-25,True,48.71,3.492,140.87,384.82,26961.99,28.59,1110.12,188.350400,8.523,B,118221
279082,45,95,2011-11-25,True,48.71,3.492,140.87,384.82,26961.99,28.59,1110.12,188.350400,8.523,B,118221
279083,45,97,2011-11-25,True,48.71,3.492,140.87,384.82,26961.99,28.59,1110.12,188.350400,8.523,B,118221


In [57]:
X_valid

Unnamed: 0,Store,Dept,Date,IsHoliday,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,Type,Size
279085,1,1,2011-12-02,False,48.91,3.172,5629.51,68.00,1398.11,2084.64,20475.32,218.714733,7.866,A,151315
279086,1,2,2011-12-02,False,48.91,3.172,5629.51,68.00,1398.11,2084.64,20475.32,218.714733,7.866,A,151315
279087,1,3,2011-12-02,False,48.91,3.172,5629.51,68.00,1398.11,2084.64,20475.32,218.714733,7.866,A,151315
279088,1,4,2011-12-02,False,48.91,3.172,5629.51,68.00,1398.11,2084.64,20475.32,218.714733,7.866,A,151315
279089,1,5,2011-12-02,False,48.91,3.172,5629.51,68.00,1398.11,2084.64,20475.32,218.714733,7.866,A,151315
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
421565,45,93,2012-10-26,False,58.85,3.882,4018.91,58.08,100.00,211.94,858.33,192.308899,8.667,B,118221
421566,45,94,2012-10-26,False,58.85,3.882,4018.91,58.08,100.00,211.94,858.33,192.308899,8.667,B,118221
421567,45,95,2012-10-26,False,58.85,3.882,4018.91,58.08,100.00,211.94,858.33,192.308899,8.667,B,118221
421568,45,97,2012-10-26,False,58.85,3.882,4018.91,58.08,100.00,211.94,858.33,192.308899,8.667,B,118221


# Data Cleaning and Feature Engineering


In this section, we apply several preprocessing steps to prepare the data for modeling, particularly linear models. Since training linear models on the full dataset is computationally expensive, we use a subset of selected (Store, Dept) combinations that still reflect overall patterns.

The feature engineering steps include:

- **Subset selection**: We use only a portion of the data (specific store-department combinations) to reduce training time while preserving general trends.

- **Dropping uninformative features**: `MarkDown` columns appear to carry little useful information and are dropped to simplify the feature space.

- **Encoding categorical variables**: Linear models require numeric inputs, so we apply one-hot encoding to categorical features such as `IsHoliday` and `Type`.

- **Adding time-based features**: We extract features like `Year`, `Month`, `Week`, Feurier Features from the `Date` column. These are mainly intended for time series models like SARIMAX and are not used in other models.


Use only a portion of the data (30/50/70 store-department combinations).

In [58]:
import pandas as pd

X_train_ = X_train.reset_index(drop=True)
X_valid_ = X_valid.reset_index(drop=True)
y_train_ = y_train.reset_index(drop=True)
y_valid_ = y_valid.reset_index(drop=True)

subset_keys = X_train_[['Store', 'Dept']].drop_duplicates().sample(n=100, random_state=42)

X_train = pd.merge(X_train_, subset_keys, on=['Store', 'Dept'], how='inner')
X_valid = pd.merge(X_valid_, subset_keys, on=['Store', 'Dept'], how='inner')

y_train = y_train_.iloc[X_train.index]
y_valid = y_valid_.iloc[X_valid.index]


X_train

Unnamed: 0,Store,Dept,Date,IsHoliday,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,Type,Size
0,1,1,2010-02-05,False,42.31,2.572,,,,,,211.096358,8.106,A,151315
1,1,2,2010-02-05,False,42.31,2.572,,,,,,211.096358,8.106,A,151315
2,1,3,2010-02-05,False,42.31,2.572,,,,,,211.096358,8.106,A,151315
3,1,4,2010-02-05,False,42.31,2.572,,,,,,211.096358,8.106,A,151315
4,1,5,2010-02-05,False,42.31,2.572,,,,,,211.096358,8.106,A,151315
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
279080,45,93,2011-11-25,True,48.71,3.492,140.87,384.82,26961.99,28.59,1110.12,188.350400,8.523,B,118221
279081,45,94,2011-11-25,True,48.71,3.492,140.87,384.82,26961.99,28.59,1110.12,188.350400,8.523,B,118221
279082,45,95,2011-11-25,True,48.71,3.492,140.87,384.82,26961.99,28.59,1110.12,188.350400,8.523,B,118221
279083,45,97,2011-11-25,True,48.71,3.492,140.87,384.82,26961.99,28.59,1110.12,188.350400,8.523,B,118221


We drop the `Markdown` columns from the dataset, as they appear to provide little value in this context. These columns contain a large number of missing values, making it difficult to extract meaningful signals or engineer reliable features.


In [59]:
from feature_engineering import feature_transformers


columns_to_drop=['MarkDown1', 'MarkDown2', 'MarkDown3', 'MarkDown4', 'MarkDown5']
drop_markdowns = feature_transformers.ChangeColumns(columns_to_drop=columns_to_drop)
X_train_t = drop_markdowns.fit_transform(X_train)
X_valid_t = drop_markdowns.transform(X_valid)

X_train_t

Unnamed: 0,Store,Dept,Date,IsHoliday,Temperature,Fuel_Price,CPI,Unemployment,Type,Size
0,1,1,2010-02-05,False,42.31,2.572,211.096358,8.106,A,151315
1,1,2,2010-02-05,False,42.31,2.572,211.096358,8.106,A,151315
2,1,3,2010-02-05,False,42.31,2.572,211.096358,8.106,A,151315
3,1,4,2010-02-05,False,42.31,2.572,211.096358,8.106,A,151315
4,1,5,2010-02-05,False,42.31,2.572,211.096358,8.106,A,151315
...,...,...,...,...,...,...,...,...,...,...
279080,45,93,2011-11-25,True,48.71,3.492,188.350400,8.523,B,118221
279081,45,94,2011-11-25,True,48.71,3.492,188.350400,8.523,B,118221
279082,45,95,2011-11-25,True,48.71,3.492,188.350400,8.523,B,118221
279083,45,97,2011-11-25,True,48.71,3.492,188.350400,8.523,B,118221


Next, we convert the `IsHoliday` column from boolean (`True`/`False`) to integer format (`1`/`0`).

In [60]:
from sklearn.preprocessing import FunctionTransformer

bool_to_int = FunctionTransformer(
    lambda df: df.assign(
        **{col: df[col].astype(int) for col in df.select_dtypes(include='bool').columns}
    )
)

print("Value counts in IsHoliday column before transformation:", X_train_t['IsHoliday'].value_counts())

X_train_t = bool_to_int.fit_transform(X_train_t)
X_valid_t = bool_to_int.transform(X_valid_t)

print("Values counts in IsHoliday column after transformation", X_train_t['IsHoliday'].value_counts())


X_train_t

Value counts in IsHoliday column before transformation: IsHoliday
False    258394
True      20691
Name: count, dtype: int64
Values counts in IsHoliday column after transformation IsHoliday
0    258394
1     20691
Name: count, dtype: int64


Unnamed: 0,Store,Dept,Date,IsHoliday,Temperature,Fuel_Price,CPI,Unemployment,Type,Size
0,1,1,2010-02-05,0,42.31,2.572,211.096358,8.106,A,151315
1,1,2,2010-02-05,0,42.31,2.572,211.096358,8.106,A,151315
2,1,3,2010-02-05,0,42.31,2.572,211.096358,8.106,A,151315
3,1,4,2010-02-05,0,42.31,2.572,211.096358,8.106,A,151315
4,1,5,2010-02-05,0,42.31,2.572,211.096358,8.106,A,151315
...,...,...,...,...,...,...,...,...,...,...
279080,45,93,2011-11-25,1,48.71,3.492,188.350400,8.523,B,118221
279081,45,94,2011-11-25,1,48.71,3.492,188.350400,8.523,B,118221
279082,45,95,2011-11-25,1,48.71,3.492,188.350400,8.523,B,118221
279083,45,97,2011-11-25,1,48.71,3.492,188.350400,8.523,B,118221


The `Type` column has only three distinct categories. We apply one-hot encoding to convert these categorical values into binary indicator columns. Since the number of categories is small, this transformation will not significantly increase the dataset size.

In [61]:
set(X_train_t['Type'].values)

{'A', 'B', 'C'}

In [62]:
from feature_engineering.encoders import CustomOneHotEncoder

one_hot_encoder = CustomOneHotEncoder(columns=['Type'])
X_train_t = one_hot_encoder.fit_transform(X_train_t)
X_valid_t = one_hot_encoder.transform(X_valid_t)


X_train_t

Unnamed: 0,Store,Dept,Date,IsHoliday,Temperature,Fuel_Price,CPI,Unemployment,Size,Type_A,Type_B,Type_C
0,1,1,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,0,0
1,1,2,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,0,0
2,1,3,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,0,0
3,1,4,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,0,0
4,1,5,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...
279080,45,93,2011-11-25,1,48.71,3.492,188.350400,8.523,118221,0,1,0
279081,45,94,2011-11-25,1,48.71,3.492,188.350400,8.523,118221,0,1,0
279082,45,95,2011-11-25,1,48.71,3.492,188.350400,8.523,118221,0,1,0
279083,45,97,2011-11-25,1,48.71,3.492,188.350400,8.523,118221,0,1,0


To help the models capture temporal patterns, we add time-related features such as:

- Week of the year  
- Month  
- Year  

Additionally, we adjust the DataFrame’s index to be based on the date column, which facilitates time series modeling and resampling Also, we will add Fourier Features and Holiday Proximity.


In [63]:
from feature_engineering import time_features

params = {
    'add_week_num' : True,
    'add_holiday_flags' : False,
    'add_holiday_proximity': True,
    'add_holiday_windows': False,
    'add_fourier_features': True,
    'add_month_and_year': True,
    'replace_time_index': False,
    'list_of_holiday_proximity': [],
}

feature_adder = time_features.FeatureAdder(**params)
X_train_t = feature_adder.fit_transform(X_train_t)
X_valid_t = feature_adder.transform(X_valid_t)

X_train_t

Unnamed: 0,Store,Dept,Date,IsHoliday,Temperature,Fuel_Price,CPI,Unemployment,Size,Type_A,...,Type_C,Month,Year,WeekOfYear,Days_until_next_holiday,Days_since_last_holiday,week_sin,week_cos,month_sin,month_cos
0,1,1,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,...,0,2,2010,5,7,999.0,0.568065,0.822984,0.866025,0.500000
1,1,2,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,...,0,2,2010,5,7,999.0,0.568065,0.822984,0.866025,0.500000
2,1,3,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,...,0,2,2010,5,7,999.0,0.568065,0.822984,0.866025,0.500000
3,1,4,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,...,0,2,2010,5,7,999.0,0.568065,0.822984,0.866025,0.500000
4,1,5,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,...,0,2,2010,5,7,999.0,0.568065,0.822984,0.866025,0.500000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
279080,45,93,2011-11-25,1,48.71,3.492,188.350400,8.523,118221,0,...,0,11,2011,47,0,77.0,-0.568065,0.822984,-0.500000,0.866025
279081,45,94,2011-11-25,1,48.71,3.492,188.350400,8.523,118221,0,...,0,11,2011,47,0,77.0,-0.568065,0.822984,-0.500000,0.866025
279082,45,95,2011-11-25,1,48.71,3.492,188.350400,8.523,118221,0,...,0,11,2011,47,0,77.0,-0.568065,0.822984,-0.500000,0.866025
279083,45,97,2011-11-25,1,48.71,3.492,188.350400,8.523,118221,0,...,0,11,2011,47,0,77.0,-0.568065,0.822984,-0.500000,0.866025


Now we implement the preprocessing pipeline with all the steps described above.


In [64]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from feature_engineering import feature_transformers, time_features
from feature_engineering.encoders import CustomOneHotEncoder

import importlib
importlib.reload(feature_transformers)

columns_to_drop = [
    'MarkDown1', 'MarkDown2', 'MarkDown3',
    'MarkDown4', 'MarkDown5'
]

time_feature_params = {
    'add_week_num': True,
    'add_holiday_flags': False,
    'add_holiday_proximity': True,
    'add_holiday_windows': False,
    'add_fourier_features': True,
    'add_month_and_year': True,
    'replace_time_index': False,
    'list_of_holiday_proximity': [],
}

preprocess = Pipeline(steps=[
    ('drop_markdown', feature_transformers.ChangeColumns(columns_to_drop=columns_to_drop)),
    ('bool_to_int', feature_transformers.BoolToInt()),
    ('type_encoding', CustomOneHotEncoder(columns=['Type'])),
    ('add_time_features', time_features.FeatureAdder(**time_feature_params)),
])

X_train_t = preprocess.fit_transform(X_train)
X_valid_t = preprocess.transform(X_valid)

X_train_t

Unnamed: 0,Store,Dept,Date,IsHoliday,Temperature,Fuel_Price,CPI,Unemployment,Size,Type_A,...,Type_C,Month,Year,WeekOfYear,Days_until_next_holiday,Days_since_last_holiday,week_sin,week_cos,month_sin,month_cos
0,1,1,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,...,0,2,2010,5,7,999.0,0.568065,0.822984,0.866025,0.500000
1,1,2,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,...,0,2,2010,5,7,999.0,0.568065,0.822984,0.866025,0.500000
2,1,3,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,...,0,2,2010,5,7,999.0,0.568065,0.822984,0.866025,0.500000
3,1,4,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,...,0,2,2010,5,7,999.0,0.568065,0.822984,0.866025,0.500000
4,1,5,2010-02-05,0,42.31,2.572,211.096358,8.106,151315,1,...,0,2,2010,5,7,999.0,0.568065,0.822984,0.866025,0.500000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
279080,45,93,2011-11-25,1,48.71,3.492,188.350400,8.523,118221,0,...,0,11,2011,47,0,77.0,-0.568065,0.822984,-0.500000,0.866025
279081,45,94,2011-11-25,1,48.71,3.492,188.350400,8.523,118221,0,...,0,11,2011,47,0,77.0,-0.568065,0.822984,-0.500000,0.866025
279082,45,95,2011-11-25,1,48.71,3.492,188.350400,8.523,118221,0,...,0,11,2011,47,0,77.0,-0.568065,0.822984,-0.500000,0.866025
279083,45,97,2011-11-25,1,48.71,3.492,188.350400,8.523,118221,0,...,0,11,2011,47,0,77.0,-0.568065,0.822984,-0.500000,0.866025


# Training

## Arima

We begin by training an **ARIMA** model as a baseline. This model is one of the simplest time series forecasting approaches and relies **only on the temporal structure** of the data — specifically the `Date` feature.

In [65]:
import pandas as pd
from sklearn.metrics import mean_absolute_error
from models import store_dept_sarimax
from src.utils import wmae as compute_wmae

import importlib
importlib.reload(store_dept_sarimax)

X_train_T = X_train_t[['Date', 'Store', 'Dept']]
X_valid_T = X_valid_t[['Date', 'Store', 'Dept']]

arima_model1 = store_dept_sarimax.StoreDeptSARIMAX(
    order=(1, 1, 1),
    use_all_exog=False,
)

arima_model1.fit(X_train_T, y_train)

train_preds = arima_model1.predict(X_train_T)
valid_preds = arima_model1.predict(X_valid_T)

train_preds = arima_model1.predict(X_train_T).fillna(0)
valid_preds = arima_model1.predict(X_valid_T).fillna(0)

train_wmae = compute_wmae(y_train, train_preds, is_holiday=X_train_t['IsHoliday'])
valid_wmae = compute_wmae(y_valid, valid_preds, is_holiday=X_valid_t['IsHoliday'])
train_mae = mean_absolute_error(y_train, train_preds)
valid_mae = mean_absolute_error(y_valid, valid_preds)

print(f"Train WMAE: {train_wmae:.2f}, MAE: {train_mae:.2f}")
print(f"Valid WMAE: {valid_wmae:.2f}, MAE: {valid_mae:.2f}")

Train WMAE: 2660.10, MAE: 2101.49
Valid WMAE: 4195.88, MAE: 4267.16


In [66]:
from configs import basic_config
from src import utils

import importlib
importlib.reload(basic_config)
importlib.reload(utils)

from sklearn.pipeline import Pipeline
from configs.basic_config import config as cfg
from configs.linear_model_configs import arima_config
from src.utils import log_to_wandb


pipeline = Pipeline([
    ('preprocess', preprocess),
    ('model', arima_model1)
])


cur_config = (cfg.copy())
cur_config['replace_time_index'] = False
cur_config['time_features'].remove('HolidayFlags')
merged_config = {**cur_config, **arima_config}


log_to_wandb(
    model=pipeline,
    train_score=train_wmae,
    val_score=valid_wmae,
    config=merged_config,
    run_name='arima_02',
    artifact_name="arima"
)

0,1
train_wmae,▁
val_wmae,▁

0,1
train_wmae,2660.10467
val_wmae,4195.88199


The ARIMA-based model we experimented with achieved a **WMAE of approximately 4,195**. While this is still relatively high, it demonstrates that there is **some predictive signal in the time series alone**, even without incorporating additional features. This version of the model serves as a reasonable baseline for evaluating more advanced approaches.

## Sarima

Next, we train a **SARIMA** model, which extends ARIMA by incorporating **seasonality**. Since our data is weekly, we set the **seasonal period to 52**, corresponding to the number of weeks in a year.


In [40]:
import pandas as pd
from sklearn.metrics import mean_absolute_error
from models import store_dept_sarimax
from src.utils import wmae as compute_wmae

import importlib
importlib.reload(store_dept_sarimax)

X_train_T = X_train_t[['Date', 'Store', 'Dept']]
X_valid_T = X_valid_t[['Date', 'Store', 'Dept']]

sarima_model = store_dept_sarimax.StoreDeptSARIMAX(
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 52),
    use_all_exog=False
)

sarima_model.fit(X_train_T, y_train)

train_preds = sarima_model.predict(X_train_T).fillna(0)
valid_preds = sarima_model.predict(X_valid_T).fillna(0)

train_wmae = compute_wmae(y_train, train_preds, is_holiday=X_train_t['IsHoliday'])
valid_wmae = compute_wmae(y_valid, valid_preds, is_holiday=X_valid_t['IsHoliday'])
train_mae = mean_absolute_error(y_train, train_preds)
valid_mae = mean_absolute_error(y_valid, valid_preds)

print(f"Train WMAE: {train_wmae:.2f}, MAE: {train_mae:.2f}")
print(f"Valid WMAE: {valid_wmae:.2f}, MAE: {valid_mae:.2f}")

Train WMAE: 21910.49, MAE: 21684.93
Valid WMAE: 22466.55, MAE: 22748.92


In [41]:
from configs import basic_config
from configs import linear_model_configs

import importlib
importlib.reload(basic_config)
importlib.reload(linear_model_configs)

from sklearn.pipeline import Pipeline
from configs.basic_config import config as cfg
from configs.linear_model_configs import sarima_config
from src.utils import log_to_wandb


pipeline = Pipeline([
    ('preprocess', preprocess),
    ('model', sarima_model)
])

cur_config = (cfg.copy())
cur_config['replace_time_index'] = False
cur_config['time_features'].remove('HolidayFlags')
merged_config = {**cur_config, **sarima_config}

log_to_wandb(
    model=pipeline,
    train_score=train_wmae,
    val_score=valid_wmae,
    config=merged_config,
    run_name='sarima_01',
    artifact_name="sarima"
)

[34m[1mwandb[0m: Currently logged in as: [33mzhorzholianimate[0m ([33mMLBeasts[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


0,1
train_wmae,▁
val_wmae,▁

0,1
train_wmae,21910.49295
val_wmae,22466.54836


The SARIMA model, using a seasonal period of 52 weeks, achieved a **WMAE of approximately 2,246** on the validation set. This is a significant improvement over the earlier ARIMA models. While this result is promising, it's important to note that the model was trained and evaluated on a **subset of the full dataset**. As a result, the performance may not fully generalize to all (Store, Dept) combinations or the complete data distribution.

## SarimaX

To explore the impact of incorporating additional features, we now train a **SARIMAX** model — a generalization of SARIMA that allows for the inclusion of exogenous (external) variables.

In [22]:
import pandas as pd
from sklearn.metrics import mean_absolute_error
from models import store_dept_sarimax
from src.utils import wmae as compute_wmae


sarimax_model = store_dept_sarimax.StoreDeptSARIMAX(
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 52),
    use_all_exog=True
)

sarimax_model.fit(X_train_t, y_train)

train_preds = sarimax_model.predict(X_train_t).fillna(0)
valid_preds = sarimax_model.predict(X_valid_t).fillna(0)

train_wmae = compute_wmae(y_train, train_preds, is_holiday=X_train_t['IsHoliday'])
valid_wmae = compute_wmae(y_valid, valid_preds, is_holiday=X_valid_t['IsHoliday'])
train_mae = mean_absolute_error(y_train, train_preds)
valid_mae = mean_absolute_error(y_valid, valid_preds)

print(f"Train WMAE: {train_wmae:.2f}, MAE: {train_mae:.2f}")
print(f"Valid WMAE: {valid_wmae:.2f}, MAE: {valid_mae:.2f}")

Train WMAE: 578148.08, MAE: 543193.36
Valid WMAE: 308131.33, MAE: 308497.52


In [23]:
from configs import basic_config
from configs import linear_model_configs

import importlib
importlib.reload(basic_config)
importlib.reload(linear_model_configs)

from sklearn.pipeline import Pipeline
from configs.basic_config import config as cfg
from configs.linear_model_configs import sarimax_config
from src.utils import log_to_wandb


pipeline = Pipeline([
    ('preprocess', preprocess),
    ('model', sarimax_model)
])

cur_config = (cfg.copy())
cur_config['replace_time_index'] = False
cur_config['time_features'].remove('HolidayFlags')
merged_config = {**cur_config, **sarimax_config}

log_to_wandb(
    model=pipeline,
    train_score=train_wmae,
    val_score=valid_wmae,
    config=merged_config,
    run_name='sarimax_01',
    artifact_name="sarima"
)

[34m[1mwandb[0m: Currently logged in as: [33mzhorzholianimate[0m ([33mMLBeasts[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


0,1
train_wmae,▁
val_wmae,▁

0,1
train_wmae,578148.07771
val_wmae,308131.32731


The SARIMAX model, despite incorporating additional features, **performed worse** than the SARIMA model on the validation set. This result does **not necessarily mean** that the added variables are uninformative. Rather, it may indicate that: the additional features do **not have a strong linear or additive relationship** with the target variable within the SARIMAX framework.

#  Conclusion

In this notebook, we explored a series of time series forecasting models, gradually increasing in complexity to understand their performance and limitations on sales data:

- **ARIMA** served as a baseline model, using only past values of the target and ignoring all other features. It produced a very high error (~4,000 WMAE), showing that without external context or seasonality, the model struggles to capture sales dynamics.

- **SARIMA** introduced weekly seasonality (52-week cycles), which led to a substantial improvement (~2,246 WMAE). This suggests that sales patterns exhibit strong annual seasonality, and modeling this explicitly provides real benefits.

- **SARIMAX**, which included external variables (e.g., `Type`, `IsHoliday`, and time features), surprisingly underperformed. This does **not mean the added features are irrelevant**, but rather that:
  - The SARIMAX model may be too rigid to capture complex interactions.
  - The additional features might introduce noise or collinearity when used improperly.
  - Limited training data (due to working with subsets) may have amplified overfitting when more predictors were added.

## Key Takeaways

- **Seasonality matters**: Modeling temporal cycles with SARIMA gave a clear performance boost.
- **Feature integration is nontrivial**: Simply adding external variables does not guarantee better performance; the modeling technique must be capable of learning from them.
