<a href="https://colab.research.google.com/github/abarb2022/Walmart-Recruiting---Store-Sales-Forecasting/blob/main/model_experiment_arima.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Downloading Kaggle data sets directly into Colab**

Install the kaggle python library

In [None]:
! pip install kaggle



Mount the Google drive so you can store your kaggle API credentials for future use

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Make a directory for kaggle at the temporary instance location on Colab drive.

Download your kaggle API key (.json file). You can do this by going to your kaggle account page and clicking 'Create new API token' under the API section.

In [5]:
! mkdir ~/.kaggle

Upload the json file to Google Drive and then copy to the temporary location.

In [6]:
!cp /content/drive/MyDrive/ColabNotebooks/kaggle_API_credentials/kaggle.json ~/.kaggle/kaggle.json

Change the file permissions to read/write to the owner only

In [7]:
! chmod 600 ~/.kaggle/kaggle.json

**Competitions and Datasets are the two types of Kaggle data**

**1. Download competition data**

If you get 403 Forbidden error, you need to click 'Late Submission' on the Kaggle page for that competition.

In [8]:
! kaggle competitions download -c walmart-recruiting-store-sales-forecasting

Downloading walmart-recruiting-store-sales-forecasting.zip to /content
  0% 0.00/2.70M [00:00<?, ?B/s]
100% 2.70M/2.70M [00:00<00:00, 1.12GB/s]


Unzip, in case the downloaded file is zipped. Refresh the files on the left hand side to update the view.

In [9]:
! unzip walmart-recruiting-store-sales-forecasting

Archive:  walmart-recruiting-store-sales-forecasting.zip
  inflating: features.csv.zip        
  inflating: sampleSubmission.csv.zip  
  inflating: stores.csv              
  inflating: test.csv.zip            
  inflating: train.csv.zip           


In [10]:
import pandas as pd
import numpy as np
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder # For Type encoding if not using category dtype directly
from sklearn.metrics import mean_absolute_error
import matplotlib.pyplot as plt
import seaborn as sns
import gc # For garbage collection
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.expand_frame_repr', False)

In [11]:
stores = pd.read_csv('stores.csv')
train = pd.read_csv("train.csv.zip")
features = pd.read_csv('features.csv.zip')
sample = pd.read_csv('sampleSubmission.csv.zip')
test = pd.read_csv('test.csv.zip')

In [12]:
# Convert 'Date' columns to datetime objects for easier manipulation
train['Date'] = pd.to_datetime(train['Date'])
test['Date'] = pd.to_datetime(test['Date'])
features['Date'] = pd.to_datetime(features['Date'])

# Merge features with train and test data.
# Note: 'IsHoliday' is present in both train/test and features.csv.
# We'll merge on it to ensure consistency, but if there were discrepancies,
# we'd need a more careful merge strategy.
train_df = pd.merge(train, features, on=['Store', 'Date', 'IsHoliday'], how='left')
test_df = pd.merge(test, features, on=['Store', 'Date', 'IsHoliday'], how='left')

# Merge store information
train_df = pd.merge(train_df, stores, on='Store', how='left')
test_df = pd.merge(test_df, stores, on='Store', how='left')

print("\n--- Merged Train Data Head ---")
print(train_df.head())
print("\n--- Merged Test Data Head ---")
print(test_df.head())

print("\n--- Merged Train Data Info ---")
print(train_df.info())
print("\n--- Merged Test Data Info ---")
print(test_df.info())

# Free up memory
del train, test, features, stores
gc.collect()


--- Merged Train Data Head ---
   Store  Dept       Date  Weekly_Sales  IsHoliday  Temperature  Fuel_Price  MarkDown1  MarkDown2  MarkDown3  MarkDown4  MarkDown5         CPI  Unemployment Type    Size
0      1     1 2010-02-05      24924.50      False        42.31       2.572        NaN        NaN        NaN        NaN        NaN  211.096358         8.106    A  151315
1      1     1 2010-02-12      46039.49       True        38.51       2.548        NaN        NaN        NaN        NaN        NaN  211.242170         8.106    A  151315
2      1     1 2010-02-19      41595.55      False        39.93       2.514        NaN        NaN        NaN        NaN        NaN  211.289143         8.106    A  151315
3      1     1 2010-02-26      19403.54      False        46.63       2.561        NaN        NaN        NaN        NaN        NaN  211.319643         8.106    A  151315
4      1     1 2010-03-05      21827.90      False        46.50       2.625        NaN        NaN        NaN        Na

0

## **DATA CLEANING**


In [18]:
class MissingValueImputer(BaseEstimator, TransformerMixin):
    """
    Custom Transformer to handle missing values for specific columns.
    - MarkDown columns: fill with 0.
    - Other specified numerical columns: fill with ffill then bfill, fallback to mean.
    """
    def __init__(self, markdown_cols=None, numerical_cols_to_impute=None):
        self.markdown_cols = markdown_cols if markdown_cols is not None else [f'MarkDown{i}' for i in range(1, 6)]
        self.numerical_cols_to_impute = numerical_cols_to_impute if numerical_cols_to_impute is not None else ['Temperature', 'Fuel_Price', 'CPI', 'Unemployment']
        self.means = {} # To store means for fallback imputation during transform

    def fit(self, X, y=None):
        # Calculate means for fallback imputation from the training data
        for col in self.numerical_cols_to_impute:
            if col in X.columns:
                self.means[col] = X[col].mean()
        return self

    def transform(self, X):
        X_copy = X.copy()


        for col in self.markdown_cols:
          if col in X_copy.columns:
            X_copy[f"{col}_was_missing"] = X_copy[col].isna().astype(int)
            X_copy[col] = X_copy[col].fillna(0)


        # Impute other numerical columns with ffill then bfill, fallback to mean
        for col in self.numerical_cols_to_impute:
            if col in X_copy.columns:
                X_copy[col] = X_copy[col].ffill().bfill()
                # Fallback to mean if NaNs still exist (e.g., if all values were NaN in a column)
                if X_copy[col].isnull().any() and col in self.means:
                    X_copy[col] = X_copy[col].fillna(self.means[col])
        return X_copy

In [14]:
class DateFeatureExtractor(BaseEstimator, TransformerMixin):
    """
    Custom Transformer to extract temporal features from the 'Date' column.
    """
    def __init__(self, date_column='Date'):
        self.date_column = date_column

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        X_copy = X.copy()
        if self.date_column not in X_copy.columns:
            raise ValueError(f"Date column '{self.date_column}' not found in DataFrame.")

        X_copy[self.date_column] = pd.to_datetime(X_copy[self.date_column])

        X_copy['Year'] = X_copy[self.date_column].dt.year
        X_copy['Month'] = X_copy[self.date_column].dt.month
        X_copy['Month_sin'] = np.sin(2 * np.pi * X_copy['Month'] / 12)
        X_copy['Month_cos'] = np.cos(2 * np.pi * X_copy['Month'] / 12)

        # Using .dt.isocalendar().week for consistent week numbering across years
        X_copy['Week'] = X_copy[self.date_column].dt.isocalendar().week.astype(int)
        X_copy['Day'] = X_copy[self.date_column].dt.day
        X_copy['DayOfWeek'] = X_copy[self.date_column].dt.dayofweek

        X_copy['Week_sin'] = np.sin(2 * np.pi * X_copy['Week'] / 52)
        X_copy['Week_cos'] = np.cos(2 * np.pi * X_copy['Week'] / 52)

        # Markdown aggregation
        X_copy['Total_MarkDown'] = X_copy[[f'MarkDown{i}' for i in range(1, 6)]].sum(axis=1)
        X_copy['MarkDown_Intensity'] = X_copy['Total_MarkDown'] / (X_copy['Total_MarkDown'].mean() + 1)

        # Economic indicators
        X_copy['Fuel_CPI_Ratio'] = X_copy['Fuel_Price'] / X_copy['CPI']
        X_copy['Economic_Index'] = (X_copy['CPI'] * 0.4 + (100 - X_copy['Unemployment']) * 0.6) / 100


        # Convert IsHoliday to integer if it exists and is boolean
        if 'IsHoliday' in X_copy.columns and X_copy['IsHoliday'].dtype == bool:
            X_copy['IsHoliday'] = X_copy['IsHoliday'].astype(int)

        # Keep the 'Date' column for ARIMA
        return X_copy # Removed .drop(columns=[self.date_column, "Month", "Week"])

In [15]:
y_train = train_df['Weekly_Sales']
X_train = train_df.drop(columns=['Weekly_Sales', 'Id'], errors='ignore')

temp_train_df = X_train.copy()
temp_train_df['Date'] = pd.to_datetime(train_df['Date']) # Get original dates back for sorting
temp_train_df['Weekly_Sales'] = y_train

temp_train_df = temp_train_df.sort_values(by='Date').reset_index(drop=True)

# Define a cutoff date for validation
validation_cutoff_date = pd.to_datetime('2012-09-01')

X_train_split = temp_train_df[temp_train_df['Date'] < validation_cutoff_date]
y_train_split = temp_train_df[temp_train_df['Date'] < validation_cutoff_date]['Weekly_Sales']

X_val_split = temp_train_df[temp_train_df['Date'] >= validation_cutoff_date]
y_val_split = temp_train_df[temp_train_df['Date'] >= validation_cutoff_date]['Weekly_Sales']

def weighted_mean_absolute_error(y_true, y_pred, weights):
    return np.sum(weights * np.abs(y_true - y_pred)) / np.sum(weights)

val_weights = np.where(X_val_split['IsHoliday'] == 1, 5, 1)


In [21]:
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from statsmodels.tsa.arima.model import ARIMA
import pandas as pd
import numpy as np
import warnings
from tqdm import tqdm
from statsmodels.tools.sm_exceptions import ConvergenceWarning, ValueWarning
from joblib import Parallel, delayed


class ARIMAModelWrapper(BaseEstimator, TransformerMixin):
    def __init__(self, order, seasonal_order=(0,0,0,0), verbose=True):
        self.order = order
        self.seasonal_order = seasonal_order
        self.verbose = verbose
        self.models = {}
        self.last_values = {}
        self.global_average = None  # <- NEW


    def fit(self, X, y=None):
        # Suppress all statsmodels warnings
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=ConvergenceWarning)
            warnings.simplefilter("ignore", category=UserWarning)
            warnings.simplefilter("ignore", category=ValueWarning)

            grouped = X.groupby(['Store', 'Dept'])

            # Create progress bar if verbose
            if self.verbose:
                groups = tqdm(grouped, desc="Training ARIMA models", unit="store-dept")
            else:
                groups = grouped

            if 'Weekly_Sales' in X.columns:
                self.global_average = X['Weekly_Sales'].mean()


            for (store, dept), group in groups:
                ts_data = group.set_index('Date')['Weekly_Sales']

                # Force weekly frequency to prevent warnings
                ts_data = ts_data.asfreq('W-FRI')

                if len(ts_data.dropna()) < 3:
                  if self.verbose:
                      print(f"Skipping Store {store}, Dept {dept} due to insufficient data")
                  continue


                if len(ts_data) > 0:
                    try:
                        with warnings.catch_warnings():
                            warnings.simplefilter("ignore")
                            model = ARIMA(ts_data,
                                         order=self.order,
                                         seasonal_order=self.seasonal_order)
                            fitted_model = model.fit()
                            self.models[(store, dept)] = fitted_model
                            self.last_values[(store, dept)] = ts_data.iloc[-1]
                    except Exception as e:
                        if self.verbose:
                          print(f"Failed on Store {store}, Dept {dept} - {str(e)}")



                        continue
        return self

    def transform(self, X):
        # This will return ARIMA predictions for the existing dates
        # For production, you might want a separate predict method
        return X

    def predict(self, X):
        # Make predictions for each store-dept combination in X
        predictions = []
        for _, row in X.iterrows():
            store = row['Store']
            dept = row['Dept']
            date = row['Date']

            if (store, dept) in self.models and self.models[(store, dept)] is not None:
                try:
                     # Get the forecast for this specific date
                    model = self.models[(store, dept)]
                    # Calculate the number of steps from the last training data point to the prediction date
                    # Assuming weekly data frequency
                    # Find the last date the model was trained on
                    last_train_date = model.model.data.dates[-1]
                    steps = (date - last_train_date).days // 7


                    if steps >= 0: # Predict from the last training date onwards
                        forecast = model.forecast(steps=steps + 1) # Forecast up to the prediction date
                        pred = forecast.iloc[-1]
                    else: # If the date is before the last training date, use the observed value
                         # This case should ideally not happen in a standard forecast scenario,
                         # but included for robustness if predict is used on past dates.
                         # We would need to find the closest date in the training data
                        print(f"Warning: Predicting for a date before the last training date for Store {store}, Dept {dept}, Date {date}")
                        if (store, dept) in self.last_values:
                          pred = self.last_values[(store, dept)]
                        else:
                          # Safe fallback value (mean of all sales?)
                          print("Safe fallback value (mean of all sales?)")
                          pred = global_average


                except Exception as e:
                    print(f"Prediction failed for Store {store}, Dept {dept}, Date {date}: {str(e)}")
                    pred = self.last_values[(store, dept)] if (store, dept) in self.last_values else np.nan # Fallback to last value or NaN
            else:
                # Fallback - use last known value or NaN if no model was fitted
                pred = self.last_values[(store, dept)] if (store, dept) in self.last_values else np.nan


            predictions.append(pred)

        return np.array(predictions)

In [36]:
# Define the full pipeline
arima_order = (1,0,1)  # Simplified order

arima_seasonal_order=(0,0,0,0)
# Preprocessing steps
preprocessing = Pipeline([
    ('missing_value_imputer', MissingValueImputer()),
    ('date_feature_extractor', DateFeatureExtractor())
    ])



# Full pipeline with ARIMA
full_pipeline = Pipeline([
    ('preprocessing', preprocessing),
    ('arima_model', ARIMAModelWrapper(order=arima_order, seasonal_order=arima_seasonal_order))
])


full_pipeline.fit(X_train_split, y_train_split)
predictions = full_pipeline.predict(X_val_split)






  X_copy[col] = X_copy[col].fillna(method='ffill').fillna(method='bfill')
Training ARIMA models:   4%|▍         | 139/3326 [00:08<03:40, 14.47store-dept/s]

Skipping Store 2, Dept 77 due to insufficient data


Training ARIMA models:   6%|▋         | 214/3326 [00:14<02:42, 19.17store-dept/s]

Skipping Store 3, Dept 78 due to insufficient data


Training ARIMA models:   7%|▋         | 218/3326 [00:14<02:46, 18.62store-dept/s]

Skipping Store 3, Dept 83 due to insufficient data


Training ARIMA models:   8%|▊         | 266/3326 [00:16<02:33, 19.93store-dept/s]

Skipping Store 4, Dept 39 due to insufficient data


Training ARIMA models:  11%|█         | 366/3326 [00:24<02:11, 22.55store-dept/s]

Skipping Store 5, Dept 77 due to insufficient data
Skipping Store 5, Dept 78 due to insufficient data


Training ARIMA models:  13%|█▎        | 439/3326 [00:28<02:32, 18.99store-dept/s]

Skipping Store 6, Dept 77 due to insufficient data


Training ARIMA models:  15%|█▌        | 515/3326 [00:35<02:47, 16.76store-dept/s]

Skipping Store 7, Dept 78 due to insufficient data


Training ARIMA models:  16%|█▌        | 533/3326 [00:36<02:14, 20.70store-dept/s]

Skipping Store 7, Dept 99 due to insufficient data


Training ARIMA models:  20%|█▉        | 665/3326 [00:43<01:58, 22.42store-dept/s]

Skipping Store 9, Dept 77 due to insufficient data
Skipping Store 9, Dept 78 due to insufficient data


Training ARIMA models:  20%|██        | 674/3326 [00:45<06:26,  6.86store-dept/s]

Skipping Store 9, Dept 93 due to insufficient data


Training ARIMA models:  22%|██▏       | 740/3326 [00:49<02:06, 20.51store-dept/s]

Skipping Store 10, Dept 77 due to insufficient data


Training ARIMA models:  29%|██▊       | 951/3326 [01:04<02:15, 17.58store-dept/s]

Skipping Store 13, Dept 43 due to insufficient data


Training ARIMA models:  29%|██▉       | 970/3326 [01:05<02:19, 16.84store-dept/s]

Skipping Store 13, Dept 77 due to insufficient data


Training ARIMA models:  31%|███       | 1029/3326 [01:08<01:55, 19.81store-dept/s]

Skipping Store 14, Dept 43 due to insufficient data


Training ARIMA models:  33%|███▎      | 1101/3326 [01:15<01:43, 21.40store-dept/s]

Skipping Store 15, Dept 37 due to insufficient data


Training ARIMA models:  33%|███▎      | 1108/3326 [01:15<01:34, 23.36store-dept/s]

Skipping Store 15, Dept 43 due to insufficient data
Skipping Store 15, Dept 48 due to insufficient data


Training ARIMA models:  34%|███▍      | 1146/3326 [01:17<02:02, 17.83store-dept/s]

Skipping Store 15, Dept 99 due to insufficient data


Training ARIMA models:  36%|███▋      | 1206/3326 [01:21<01:50, 19.15store-dept/s]

Skipping Store 16, Dept 77 due to insufficient data
Skipping Store 16, Dept 78 due to insufficient data


Training ARIMA models:  37%|███▋      | 1220/3326 [01:21<02:01, 17.30store-dept/s]

Skipping Store 16, Dept 99 due to insufficient data


Training ARIMA models:  40%|████      | 1332/3326 [01:29<01:42, 19.42store-dept/s]

Skipping Store 18, Dept 39 due to insufficient data


Training ARIMA models:  40%|████      | 1341/3326 [01:30<01:42, 19.35store-dept/s]

Skipping Store 18, Dept 48 due to insufficient data


Training ARIMA models:  41%|████▏     | 1375/3326 [01:32<01:33, 20.78store-dept/s]

Skipping Store 18, Dept 99 due to insufficient data


Training ARIMA models:  42%|████▏     | 1413/3326 [01:33<01:41, 18.92store-dept/s]

Skipping Store 19, Dept 39 due to insufficient data


Training ARIMA models:  47%|████▋     | 1576/3326 [01:45<01:24, 20.66store-dept/s]

Skipping Store 21, Dept 48 due to insufficient data
Skipping Store 21, Dept 50 due to insufficient data


Training ARIMA models:  48%|████▊     | 1590/3326 [01:46<01:38, 17.65store-dept/s]

Skipping Store 21, Dept 77 due to insufficient data


Training ARIMA models:  48%|████▊     | 1602/3326 [01:47<02:12, 12.99store-dept/s]

Skipping Store 21, Dept 96 due to insufficient data


Training ARIMA models:  48%|████▊     | 1607/3326 [01:47<02:18, 12.39store-dept/s]

Skipping Store 21, Dept 99 due to insufficient data


Training ARIMA models:  51%|█████     | 1686/3326 [01:53<01:23, 19.55store-dept/s]

Skipping Store 22, Dept 99 due to insufficient data


Training ARIMA models:  53%|█████▎    | 1762/3326 [01:57<01:19, 19.67store-dept/s]

Skipping Store 23, Dept 99 due to insufficient data


Training ARIMA models:  57%|█████▋    | 1897/3326 [02:07<01:16, 18.65store-dept/s]

Skipping Store 25, Dept 77 due to insufficient data


Training ARIMA models:  59%|█████▉    | 1975/3326 [02:11<01:06, 20.38store-dept/s]

Skipping Store 26, Dept 78 due to insufficient data


Training ARIMA models:  61%|██████    | 2027/3326 [02:16<00:59, 21.71store-dept/s]

Skipping Store 27, Dept 39 due to insufficient data


Training ARIMA models:  63%|██████▎   | 2109/3326 [02:20<01:02, 19.38store-dept/s]

Skipping Store 28, Dept 43 due to insufficient data


Training ARIMA models:  67%|██████▋   | 2221/3326 [02:29<01:03, 17.45store-dept/s]

Skipping Store 29, Dept 99 due to insufficient data


Training ARIMA models:  67%|██████▋   | 2240/3326 [02:31<01:21, 13.31store-dept/s]

Skipping Store 30, Dept 19 due to insufficient data


Training ARIMA models:  68%|██████▊   | 2252/3326 [02:32<00:56, 19.12store-dept/s]

Skipping Store 30, Dept 33 due to insufficient data


Training ARIMA models:  73%|███████▎  | 2422/3326 [02:44<00:50, 17.83store-dept/s]

Skipping Store 32, Dept 77 due to insufficient data


Training ARIMA models:  74%|███████▍  | 2464/3326 [02:47<00:57, 14.99store-dept/s]

Skipping Store 33, Dept 27 due to insufficient data


Training ARIMA models:  74%|███████▍  | 2477/3326 [02:48<00:58, 14.58store-dept/s]

Skipping Store 33, Dept 49 due to insufficient data


Training ARIMA models:  75%|███████▍  | 2484/3326 [02:49<01:00, 14.03store-dept/s]

Skipping Store 33, Dept 71 due to insufficient data


Training ARIMA models:  77%|███████▋  | 2562/3326 [02:55<00:34, 22.00store-dept/s]

Skipping Store 34, Dept 77 due to insufficient data
Skipping Store 34, Dept 78 due to insufficient data


Training ARIMA models:  81%|████████  | 2678/3326 [03:03<00:45, 14.28store-dept/s]

Skipping Store 36, Dept 29 due to insufficient data


Training ARIMA models:  81%|████████  | 2683/3326 [03:03<00:47, 13.47store-dept/s]

Skipping Store 36, Dept 36 due to insufficient data


Training ARIMA models:  81%|████████  | 2696/3326 [03:06<01:13,  8.51store-dept/s]

Skipping Store 36, Dept 71 due to insufficient data


Training ARIMA models:  81%|████████▏ | 2704/3326 [03:07<01:06,  9.37store-dept/s]

Skipping Store 36, Dept 85 due to insufficient data


Training ARIMA models:  82%|████████▏ | 2716/3326 [03:08<00:41, 14.59store-dept/s]

Skipping Store 36, Dept 99 due to insufficient data


Training ARIMA models:  83%|████████▎ | 2758/3326 [03:11<00:36, 15.43store-dept/s]

Skipping Store 37, Dept 71 due to insufficient data


Training ARIMA models:  84%|████████▎ | 2779/3326 [03:12<00:32, 16.95store-dept/s]

Skipping Store 37, Dept 99 due to insufficient data


Training ARIMA models:  84%|████████▍ | 2809/3326 [03:15<00:38, 13.29store-dept/s]

Skipping Store 38, Dept 35 due to insufficient data


Training ARIMA models:  85%|████████▌ | 2842/3326 [03:19<00:38, 12.64store-dept/s]

Skipping Store 38, Dept 99 due to insufficient data


Training ARIMA models:  87%|████████▋ | 2898/3326 [03:24<00:27, 15.48store-dept/s]

Skipping Store 39, Dept 78 due to insufficient data


Training ARIMA models:  89%|████████▉ | 2974/3326 [03:29<00:26, 13.16store-dept/s]

Skipping Store 40, Dept 78 due to insufficient data


Training ARIMA models:  91%|█████████ | 3026/3326 [03:34<00:14, 20.42store-dept/s]

Skipping Store 41, Dept 37 due to insufficient data


Training ARIMA models:  93%|█████████▎| 3099/3326 [03:39<00:16, 13.79store-dept/s]

Skipping Store 42, Dept 34 due to insufficient data
Skipping Store 42, Dept 41 due to insufficient data


Training ARIMA models:  95%|█████████▍| 3153/3326 [03:45<00:14, 11.70store-dept/s]

Skipping Store 43, Dept 24 due to insufficient data


Training ARIMA models:  95%|█████████▌| 3167/3326 [03:46<00:12, 13.11store-dept/s]

Skipping Store 43, Dept 55 due to insufficient data


Training ARIMA models:  97%|█████████▋| 3221/3326 [03:50<00:07, 14.14store-dept/s]

Skipping Store 44, Dept 34 due to insufficient data


Training ARIMA models:  98%|█████████▊| 3255/3326 [03:53<00:05, 14.18store-dept/s]

Skipping Store 44, Dept 99 due to insufficient data


Training ARIMA models: 100%|█████████▉| 3323/3326 [03:59<00:00, 14.15store-dept/s]

Skipping Store 45, Dept 96 due to insufficient data


Training ARIMA models: 100%|██████████| 3326/3326 [04:00<00:00, 13.86store-dept/s]
  X_copy[col] = X_copy[col].fillna(method='ffill').fillna(method='bfill')


In [37]:
def weighted_mean_absolute_error(y_true, y_pred, weights):
    return np.sum(weights * np.abs(y_true - y_pred)) / np.sum(weights)

val_weights = np.where(X_val_split['IsHoliday'] == 1, 5, 1)
print (weighted_mean_absolute_error(y_val_split, predictions, val_weights))


1999.510169856433


In [25]:
!pip install dagshub


Collecting dagshub
  Downloading dagshub-0.5.10-py3-none-any.whl.metadata (12 kB)
Collecting appdirs>=1.4.4 (from dagshub)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting dacite~=1.6.0 (from dagshub)
  Downloading dacite-1.6.0-py3-none-any.whl.metadata (14 kB)
Collecting gql[requests] (from dagshub)
  Downloading gql-3.5.3-py2.py3-none-any.whl.metadata (9.4 kB)
Collecting dataclasses-json (from dagshub)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting treelib>=1.6.4 (from dagshub)
  Downloading treelib-1.8.0-py3-none-any.whl.metadata (3.3 kB)
Collecting pathvalidate>=3.0.0 (from dagshub)
  Downloading pathvalidate-3.3.1-py3-none-any.whl.metadata (12 kB)
Collecting boto3 (from dagshub)
  Downloading boto3-1.39.4-py3-none-any.whl.metadata (6.6 kB)
Collecting semver (from dagshub)
  Downloading semver-3.0.4-py3-none-any.whl.metadata (6.8 kB)
Collecting dagshub-annotation-converter>=0.1.5 (from dagshub)
  Downloading dagshub_an

In [26]:
!pip install mlflow


Collecting mlflow
  Downloading mlflow-3.1.1-py3-none-any.whl.metadata (29 kB)
Collecting mlflow-skinny==3.1.1 (from mlflow)
  Downloading mlflow_skinny-3.1.1-py3-none-any.whl.metadata (30 kB)
Collecting alembic!=1.10.0,<2 (from mlflow)
  Downloading alembic-1.16.4-py3-none-any.whl.metadata (7.3 kB)
Collecting docker<8,>=4.0.0 (from mlflow)
  Downloading docker-7.1.0-py3-none-any.whl.metadata (3.8 kB)
Collecting graphene<4 (from mlflow)
  Downloading graphene-3.4.3-py2.py3-none-any.whl.metadata (6.9 kB)
Collecting gunicorn<24 (from mlflow)
  Downloading gunicorn-23.0.0-py3-none-any.whl.metadata (4.4 kB)
Collecting databricks-sdk<1,>=0.20.0 (from mlflow-skinny==3.1.1->mlflow)
  Downloading databricks_sdk-0.58.0-py3-none-any.whl.metadata (39 kB)
Collecting opentelemetry-api<3,>=1.9.0 (from mlflow-skinny==3.1.1->mlflow)
  Downloading opentelemetry_api-1.35.0-py3-none-any.whl.metadata (1.5 kB)
Collecting opentelemetry-sdk<3,>=1.9.0 (from mlflow-skinny==3.1.1->mlflow)
  Downloading opentele

In [27]:

import dagshub, mlflow
# Try to get credentials from environment first
dagshub.init(
    repo_owner='abarb22',
    repo_name='Walmart-Recruiting---Store-Sales-Forecasting',
    mlflow=True
)
mlflow.set_experiment("ARIMA_Training")


Output()



Open the following link in your browser to authorize the client:
https://dagshub.com/login/oauth/authorize?state=d97ecdc7-59f0-4ba1-92d9-48b871aa37a1&client_id=32b60ba385aa7cecf24046d8195a71c07dd345d9657977863b52e7748e0f0f28&middleman_request_id=a86998497631a0d85891aae820edd2f8a3535f6a5d13671b86bc05af844acbfc




<Experiment: artifact_location='mlflow-artifacts:/6e01db02c8e240aebfa89d3184cdf829', creation_time=1751573162919, experiment_id='2', last_update_time=1751573162919, lifecycle_stage='active', name='ARIMA_Training', tags={}>

In [None]:
with mlflow.start_run(run_name="ARIMA_Data_Cleaning"):
    # Log data cleaning parameters
    mlflow.log_param("missing_value_strategy", "MarkDowns->0, others->ffill/bfill/mean")
    mlflow.log_param("date_features_extracted", True)


    # Log metrics about data quality
    mlflow.log_metric("cleaned_missing_values", train_df.isna().sum().sum())


In [None]:
with mlflow.start_run(run_name="ARIMA_Feature_Engineering"):
    # Log feature engineering parameters
    mlflow.log_params({
        "temporal_features": ["Year", "Month", "Week", "DayOfWeek"],
        "cyclical_features": ["Month_sin", "Month_cos", "Week_sin", "Week_cos"],
        "economic_features": ["Fuel_CPI_Ratio", "Economic_Index"],
        "markdown_features": ["Total_MarkDown", "MarkDown_Intensity"]
    })

    # Your feature engineering
    feature_pipeline = Pipeline([
        ('date_extractor', DateFeatureExtractor())
    ])

    X_featured = feature_pipeline.fit_transform(X_train_split)

    # Log results
    mlflow.log_metric("total_features", len(X_featured.columns))
    mlflow.log_metric("time_span_days", (X_featured['Date'].max() - X_featured['Date'].min()).days)

In [None]:


with mlflow.start_run(run_name="ARIMA_Model_Training"):
    # Log model parameters
    arima_params = {
        'order': (1,0,1),
        'seasonal_order': (0,0,0,0),
        'trend': 'c'
    }
    mlflow.log_params(arima_params)

    preprocessing = Pipeline([
        ('missing_value_imputer', MissingValueImputer()),
        ('date_feature_extractor', DateFeatureExtractor()),
    ])

    # Full pipeline with ARIMA
    full_pipeline = Pipeline([
        ('preprocessing', preprocessing),
        ('arima_model', ARIMAModelWrapper(order=arima_params['order'], seasonal_order=arima_params['seasonal_order']))
    ])


    full_pipeline.fit(X_train_split, y_train_split)
    val_preds = full_pipeline.predict(X_val_split)


    val_wmae = weighted_mean_absolute_error(y_val_split, val_preds, val_weights)
        # Log metrics
    mlflow.log_metrics({
        "train_samples": len(X_train_split),
        "val_samples": len(X_val_split),
        "val_wmae": val_wmae,
    })

    # Log model (as artifact since statsmodels doesn't have native MLflow support)
    import joblib
    joblib.dump(full_pipeline, "arima_pipeline.joblib")
    mlflow.log_artifact("arima_pipeline.joblib")


  X_copy[col] = X_copy[col].fillna(method='ffill').fillna(method='bfill')
  grouped = X.groupby(['Store', 'Dept'])
  X_copy[col] = X_copy[col].fillna(method='ffill').fillna(method='bfill')


In [41]:
with mlflow.start_run(run_name="ARIMA_Model_Training"):
    # Log model parameters
    arima_params = {
        'order': (1, 0, 1),
        'seasonal_order': (0, 0, 0, 0),
        'trend': 'c'
    }
    mlflow.log_params(arima_params)

    preprocessing = Pipeline([
        ('missing_value_imputer', MissingValueImputer()),
        ('date_feature_extractor', DateFeatureExtractor()),
    ])

    arima_model = ARIMAModelWrapper(order=arima_params['order'], seasonal_order=arima_params['seasonal_order'], verbose=True)

    # Full pipeline with ARIMA
    full_pipeline = Pipeline([
        ('preprocessing', preprocessing),
        ('arima_model', arima_model)
    ])

    full_pipeline.fit(X_train_split, y_train_split)
    val_preds = full_pipeline.predict(X_val_split)

    val_wmae = weighted_mean_absolute_error(y_val_split, val_preds, val_weights)

    # Extra info from your model
    trained_models_count = len(arima_model.models)
    fallback_preds_count = np.sum(pd.Series(val_preds).isna() | pd.Series(val_preds) == arima_model.global_average)
    skipped_count = (X_train_split.groupby(['Store', 'Dept']).ngroups) - trained_models_count

    # Log metrics
    mlflow.log_metrics({
        "train_samples": len(X_train_split),
        "val_samples": len(X_val_split),
        "val_wmae": val_wmae,
        "trained_groups": trained_models_count,
        "skipped_groups": skipped_count,
        "fallback_predictions": int(fallback_preds_count),
    })

    if arima_model.global_average is not None:
        mlflow.log_metric("global_average_fallback_value", arima_model.global_average)

    # Log model artifact
    import joblib
    joblib.dump(full_pipeline, "arima_pipeline.joblib")
    mlflow.log_artifact("arima_pipeline.joblib")


  X_copy[col] = X_copy[col].fillna(method='ffill').fillna(method='bfill')
Training ARIMA models:   4%|▍         | 136/3326 [00:28<25:41,  2.07store-dept/s]

Skipping Store 2, Dept 77 due to insufficient data


Training ARIMA models:   6%|▋         | 213/3326 [00:49<04:42, 11.03store-dept/s]

Skipping Store 3, Dept 78 due to insufficient data


Training ARIMA models:   7%|▋         | 218/3326 [00:49<04:54, 10.56store-dept/s]

Skipping Store 3, Dept 83 due to insufficient data


Training ARIMA models:   8%|▊         | 266/3326 [00:53<03:10, 16.03store-dept/s]

Skipping Store 4, Dept 39 due to insufficient data


Training ARIMA models:  11%|█         | 366/3326 [01:05<03:11, 15.47store-dept/s]

Skipping Store 5, Dept 77 due to insufficient data
Skipping Store 5, Dept 78 due to insufficient data


Training ARIMA models:  13%|█▎        | 437/3326 [01:17<05:36,  8.58store-dept/s]

Skipping Store 6, Dept 77 due to insufficient data


Training ARIMA models:  15%|█▌        | 515/3326 [01:24<02:55, 15.99store-dept/s]

Skipping Store 7, Dept 78 due to insufficient data


Training ARIMA models:  16%|█▌        | 532/3326 [01:25<02:38, 17.65store-dept/s]

Skipping Store 7, Dept 99 due to insufficient data


Training ARIMA models:  20%|█▉        | 662/3326 [01:39<04:03, 10.92store-dept/s]

Skipping Store 9, Dept 77 due to insufficient data
Skipping Store 9, Dept 78 due to insufficient data


Training ARIMA models:  20%|██        | 675/3326 [01:41<06:59,  6.33store-dept/s]

Skipping Store 9, Dept 93 due to insufficient data


Training ARIMA models:  22%|██▏       | 740/3326 [01:48<03:05, 13.91store-dept/s]

Skipping Store 10, Dept 77 due to insufficient data


Training ARIMA models:  29%|██▊       | 951/3326 [02:05<02:12, 17.93store-dept/s]

Skipping Store 13, Dept 43 due to insufficient data


Training ARIMA models:  29%|██▉       | 968/3326 [02:07<06:05,  6.45store-dept/s]

Skipping Store 13, Dept 77 due to insufficient data


Training ARIMA models:  31%|███       | 1028/3326 [02:12<01:53, 20.24store-dept/s]

Skipping Store 14, Dept 43 due to insufficient data


Training ARIMA models:  33%|███▎      | 1102/3326 [02:17<01:47, 20.78store-dept/s]

Skipping Store 15, Dept 37 due to insufficient data


Training ARIMA models:  33%|███▎      | 1108/3326 [02:17<01:39, 22.19store-dept/s]

Skipping Store 15, Dept 43 due to insufficient data


Training ARIMA models:  33%|███▎      | 1111/3326 [02:17<01:49, 20.18store-dept/s]

Skipping Store 15, Dept 48 due to insufficient data


Training ARIMA models:  34%|███▍      | 1144/3326 [02:20<03:54,  9.30store-dept/s]

Skipping Store 15, Dept 99 due to insufficient data


Training ARIMA models:  36%|███▋      | 1206/3326 [02:25<01:41, 20.80store-dept/s]

Skipping Store 16, Dept 77 due to insufficient data
Skipping Store 16, Dept 78 due to insufficient data


Training ARIMA models:  37%|███▋      | 1222/3326 [02:25<01:43, 20.33store-dept/s]

Skipping Store 16, Dept 99 due to insufficient data


Training ARIMA models:  40%|████      | 1332/3326 [02:31<01:37, 20.49store-dept/s]

Skipping Store 18, Dept 39 due to insufficient data


Training ARIMA models:  40%|████      | 1338/3326 [02:31<01:43, 19.29store-dept/s]

Skipping Store 18, Dept 48 due to insufficient data


Training ARIMA models:  41%|████▏     | 1376/3326 [02:36<01:35, 20.42store-dept/s]

Skipping Store 18, Dept 99 due to insufficient data


Training ARIMA models:  42%|████▏     | 1413/3326 [02:37<01:30, 21.18store-dept/s]

Skipping Store 19, Dept 39 due to insufficient data


Training ARIMA models:  47%|████▋     | 1576/3326 [02:49<01:18, 22.39store-dept/s]

Skipping Store 21, Dept 48 due to insufficient data
Skipping Store 21, Dept 50 due to insufficient data


Training ARIMA models:  48%|████▊     | 1589/3326 [02:49<01:28, 19.60store-dept/s]

Skipping Store 21, Dept 77 due to insufficient data


Training ARIMA models:  48%|████▊     | 1604/3326 [02:50<01:34, 18.27store-dept/s]

Skipping Store 21, Dept 96 due to insufficient data
Skipping Store 21, Dept 99 due to insufficient data


Training ARIMA models:  51%|█████     | 1685/3326 [02:54<01:16, 21.48store-dept/s]

Skipping Store 22, Dept 99 due to insufficient data


Training ARIMA models:  53%|█████▎    | 1762/3326 [03:00<02:29, 10.46store-dept/s]

Skipping Store 23, Dept 99 due to insufficient data


Training ARIMA models:  57%|█████▋    | 1898/3326 [03:08<01:17, 18.43store-dept/s]

Skipping Store 25, Dept 77 due to insufficient data


Training ARIMA models:  59%|█████▉    | 1973/3326 [03:14<01:18, 17.23store-dept/s]

Skipping Store 26, Dept 78 due to insufficient data


Training ARIMA models:  61%|██████    | 2027/3326 [03:16<00:56, 22.99store-dept/s]

Skipping Store 27, Dept 39 due to insufficient data


Training ARIMA models:  63%|██████▎   | 2110/3326 [03:21<01:03, 19.00store-dept/s]

Skipping Store 28, Dept 43 due to insufficient data


Training ARIMA models:  67%|██████▋   | 2223/3326 [03:30<01:01, 17.88store-dept/s]

Skipping Store 29, Dept 99 due to insufficient data


Training ARIMA models:  67%|██████▋   | 2240/3326 [03:31<01:15, 14.37store-dept/s]

Skipping Store 30, Dept 19 due to insufficient data


Training ARIMA models:  68%|██████▊   | 2254/3326 [03:32<01:00, 17.61store-dept/s]

Skipping Store 30, Dept 33 due to insufficient data


Training ARIMA models:  73%|███████▎  | 2420/3326 [03:43<00:42, 21.11store-dept/s]

Skipping Store 32, Dept 77 due to insufficient data


Training ARIMA models:  74%|███████▍  | 2464/3326 [03:47<00:51, 16.64store-dept/s]

Skipping Store 33, Dept 27 due to insufficient data


Training ARIMA models:  74%|███████▍  | 2474/3326 [03:48<01:54,  7.43store-dept/s]

Skipping Store 33, Dept 49 due to insufficient data


Training ARIMA models:  75%|███████▍  | 2484/3326 [03:50<02:10,  6.48store-dept/s]

Skipping Store 33, Dept 71 due to insufficient data


Training ARIMA models:  77%|███████▋  | 2562/3326 [03:54<00:35, 21.60store-dept/s]

Skipping Store 34, Dept 77 due to insufficient data
Skipping Store 34, Dept 78 due to insufficient data


Training ARIMA models:  81%|████████  | 2679/3326 [04:04<00:38, 16.91store-dept/s]

Skipping Store 36, Dept 29 due to insufficient data


Training ARIMA models:  81%|████████  | 2684/3326 [04:04<00:33, 19.12store-dept/s]

Skipping Store 36, Dept 36 due to insufficient data


Training ARIMA models:  81%|████████  | 2698/3326 [04:05<00:41, 15.07store-dept/s]

Skipping Store 36, Dept 71 due to insufficient data


Training ARIMA models:  81%|████████▏ | 2704/3326 [04:06<00:49, 12.63store-dept/s]

Skipping Store 36, Dept 85 due to insufficient data


Training ARIMA models:  82%|████████▏ | 2716/3326 [04:07<00:37, 16.14store-dept/s]

Skipping Store 36, Dept 99 due to insufficient data


Training ARIMA models:  83%|████████▎ | 2758/3326 [04:10<00:34, 16.70store-dept/s]

Skipping Store 37, Dept 71 due to insufficient data


Training ARIMA models:  84%|████████▎ | 2779/3326 [04:11<00:31, 17.41store-dept/s]

Skipping Store 37, Dept 99 due to insufficient data


Training ARIMA models:  84%|████████▍ | 2809/3326 [04:15<01:15,  6.81store-dept/s]

Skipping Store 38, Dept 35 due to insufficient data


Training ARIMA models:  85%|████████▌ | 2841/3326 [04:18<00:34, 14.21store-dept/s]

Skipping Store 38, Dept 99 due to insufficient data


Training ARIMA models:  87%|████████▋ | 2898/3326 [04:21<00:26, 16.39store-dept/s]

Skipping Store 39, Dept 78 due to insufficient data


Training ARIMA models:  89%|████████▉ | 2975/3326 [04:28<00:39,  8.88store-dept/s]

Skipping Store 40, Dept 78 due to insufficient data


Training ARIMA models:  91%|█████████ | 3026/3326 [04:31<00:13, 21.47store-dept/s]

Skipping Store 41, Dept 37 due to insufficient data


Training ARIMA models:  93%|█████████▎| 3099/3326 [04:36<00:14, 15.15store-dept/s]

Skipping Store 42, Dept 34 due to insufficient data
Skipping Store 42, Dept 41 due to insufficient data


Training ARIMA models:  95%|█████████▍| 3154/3326 [04:42<00:12, 13.90store-dept/s]

Skipping Store 43, Dept 24 due to insufficient data


Training ARIMA models:  95%|█████████▌| 3167/3326 [04:43<00:09, 16.06store-dept/s]

Skipping Store 43, Dept 55 due to insufficient data


Training ARIMA models:  97%|█████████▋| 3221/3326 [04:47<00:08, 13.01store-dept/s]

Skipping Store 44, Dept 34 due to insufficient data


Training ARIMA models:  98%|█████████▊| 3253/3326 [04:49<00:04, 16.38store-dept/s]

Skipping Store 44, Dept 99 due to insufficient data


Training ARIMA models: 100%|██████████| 3326/3326 [04:56<00:00, 11.23store-dept/s]


Skipping Store 45, Dept 96 due to insufficient data


  X_copy[col] = X_copy[col].fillna(method='ffill').fillna(method='bfill')


🏃 View run ARIMA_Model_Training at: https://dagshub.com/abarb22/Walmart-Recruiting---Store-Sales-Forecasting.mlflow/#/experiments/2/runs/89b439f8294e4e10a4ce360e057cd16d
🧪 View experiment at: https://dagshub.com/abarb22/Walmart-Recruiting---Store-Sales-Forecasting.mlflow/#/experiments/2


In [28]:
from itertools import product
import mlflow
import joblib

param_grid = list(product([0, 1, 2], [0, 1], [0, 1, 2]))  # (p, d, q)

preprocessing = Pipeline([
    ('missing_value_imputer', MissingValueImputer()),
    ('date_feature_extractor', DateFeatureExtractor())
])

best_order = None
best_wmae = float('inf')

with mlflow.start_run(run_name="ARIMA_Model_GridSearch"):
    for order in param_grid:
        print(f"Trying ARIMA order: {order}")
        arima_model = ARIMAModelWrapper(order=order, seasonal_order=(0,0,0,0), verbose=False)

        pipeline = Pipeline([
            ('preprocessing', preprocessing),
            ('arima_model', arima_model)
        ])

        try:
            # Start a nested run for this specific ARIMA(p,d,q) config
            with mlflow.start_run(run_name=f"ARIMA_{order}", nested=True):
                pipeline.fit(X_train_split, y_train_split)
                preds = pipeline.predict(X_val_split)
                val_wmae = weighted_mean_absolute_error(y_val_split, preds, val_weights)

                print(f"WMAE for order {order}: {val_wmae:.4f}")

                # Log parameters and metrics for this sub-run
                mlflow.log_params({
                    'order_p': order[0],
                    'order_d': order[1],
                    'order_q': order[2],
                })
                mlflow.log_metric("val_wmae", val_wmae)

                if val_wmae < best_wmae:
                    best_wmae = val_wmae
                    best_order = order

        except Exception as e:
            print(f"Order {order} failed: {e}")
            continue

    print(f"\n✅ Best ARIMA order found: {best_order} with WMAE: {best_wmae:.4f}")

    # Optional: train and log best model
    final_pipeline = Pipeline([
        ('preprocessing', preprocessing),
        ('arima_model', ARIMAModelWrapper(order=best_order, seasonal_order=(0,0,0,0)))
    ])

    final_pipeline.fit(X_train_split, y_train_split)
    final_preds = final_pipeline.predict(X_val_split)
    final_wmae = weighted_mean_absolute_error(y_val_split, final_preds, val_weights)

    mlflow.log_params({
        'best_order_p': best_order[0],
        'best_order_d': best_order[1],
        'best_order_q': best_order[2],
        'seasonal_order': (0, 0, 0, 0)
    })

    mlflow.log_metrics({
        "train_samples": len(X_train_split),
        "val_samples": len(X_val_split),
        "best_val_wmae": final_wmae
    })

    joblib.dump(final_pipeline, "best_arima_pipeline.joblib")
    mlflow.log_artifact("best_arima_pipeline.joblib")


Trying ARIMA order: (0, 0, 0)
WMAE for order (0, 0, 0): 2425.4089
🏃 View run ARIMA_(0, 0, 0) at: https://dagshub.com/abarb22/Walmart-Recruiting---Store-Sales-Forecasting.mlflow/#/experiments/2/runs/81fa95ac04ad41ed8422f195a098c80c
🧪 View experiment at: https://dagshub.com/abarb22/Walmart-Recruiting---Store-Sales-Forecasting.mlflow/#/experiments/2
Trying ARIMA order: (0, 0, 1)
WMAE for order (0, 0, 1): 2424.8848
🏃 View run ARIMA_(0, 0, 1) at: https://dagshub.com/abarb22/Walmart-Recruiting---Store-Sales-Forecasting.mlflow/#/experiments/2/runs/2acc72532e8949c5abd697975e5e25b8
🧪 View experiment at: https://dagshub.com/abarb22/Walmart-Recruiting---Store-Sales-Forecasting.mlflow/#/experiments/2
Trying ARIMA order: (0, 0, 2)
WMAE for order (0, 0, 2): 2298.0891
🏃 View run ARIMA_(0, 0, 2) at: https://dagshub.com/abarb22/Walmart-Recruiting---Store-Sales-Forecasting.mlflow/#/experiments/2/runs/07caeed2bb5c4f988851f050229f421e
🧪 View experiment at: https://dagshub.com/abarb22/Walmart-Recruiting---

Training ARIMA models:   4%|▍         | 138/3326 [00:12<04:01, 13.23store-dept/s]

Skipping Store 2, Dept 77 due to insufficient data


Training ARIMA models:   6%|▋         | 213/3326 [00:20<04:26, 11.70store-dept/s]

Skipping Store 3, Dept 78 due to insufficient data


Training ARIMA models:   7%|▋         | 219/3326 [00:21<04:12, 12.32store-dept/s]

Skipping Store 3, Dept 83 due to insufficient data


Training ARIMA models:   8%|▊         | 266/3326 [00:24<04:05, 12.44store-dept/s]

Skipping Store 4, Dept 39 due to insufficient data


Training ARIMA models:  11%|█         | 361/3326 [00:35<04:53, 10.09store-dept/s]

Skipping Store 5, Dept 77 due to insufficient data
Skipping Store 5, Dept 78 due to insufficient data


Training ARIMA models:  13%|█▎        | 437/3326 [00:41<03:33, 13.56store-dept/s]

Skipping Store 6, Dept 77 due to insufficient data


Training ARIMA models:  15%|█▌        | 515/3326 [00:49<03:18, 14.13store-dept/s]

Skipping Store 7, Dept 78 due to insufficient data


Training ARIMA models:  16%|█▌        | 532/3326 [00:50<03:10, 14.64store-dept/s]

Skipping Store 7, Dept 99 due to insufficient data


Training ARIMA models:  20%|█▉        | 665/3326 [01:03<02:58, 14.94store-dept/s]

Skipping Store 9, Dept 77 due to insufficient data
Skipping Store 9, Dept 78 due to insufficient data


Training ARIMA models:  20%|██        | 676/3326 [01:04<03:09, 14.01store-dept/s]

Skipping Store 9, Dept 93 due to insufficient data


Training ARIMA models:  22%|██▏       | 740/3326 [01:10<02:42, 15.90store-dept/s]

Skipping Store 10, Dept 77 due to insufficient data


Training ARIMA models:  29%|██▊       | 951/3326 [01:30<02:50, 13.90store-dept/s]

Skipping Store 13, Dept 43 due to insufficient data


Training ARIMA models:  29%|██▉       | 969/3326 [01:31<02:56, 13.33store-dept/s]

Skipping Store 13, Dept 77 due to insufficient data


Training ARIMA models:  31%|███       | 1029/3326 [01:38<02:38, 14.47store-dept/s]

Skipping Store 14, Dept 43 due to insufficient data


Training ARIMA models:  33%|███▎      | 1102/3326 [01:44<02:31, 14.72store-dept/s]

Skipping Store 15, Dept 37 due to insufficient data


Training ARIMA models:  33%|███▎      | 1106/3326 [01:44<02:57, 12.53store-dept/s]

Skipping Store 15, Dept 43 due to insufficient data


Training ARIMA models:  33%|███▎      | 1110/3326 [01:45<07:14,  5.10store-dept/s]

Skipping Store 15, Dept 48 due to insufficient data


Training ARIMA models:  34%|███▍      | 1146/3326 [01:49<02:36, 13.91store-dept/s]

Skipping Store 15, Dept 99 due to insufficient data


Training ARIMA models:  36%|███▌      | 1203/3326 [01:54<02:24, 14.66store-dept/s]

Skipping Store 16, Dept 77 due to insufficient data
Skipping Store 16, Dept 78 due to insufficient data


Training ARIMA models:  37%|███▋      | 1221/3326 [01:55<02:35, 13.56store-dept/s]

Skipping Store 16, Dept 99 due to insufficient data


Training ARIMA models:  40%|████      | 1332/3326 [02:06<02:03, 16.08store-dept/s]

Skipping Store 18, Dept 39 due to insufficient data


Training ARIMA models:  40%|████      | 1338/3326 [02:09<10:44,  3.08store-dept/s]

Skipping Store 18, Dept 48 due to insufficient data


Training ARIMA models:  41%|████▏     | 1373/3326 [02:13<02:43, 11.95store-dept/s]

Skipping Store 18, Dept 99 due to insufficient data


Training ARIMA models:  42%|████▏     | 1412/3326 [02:16<02:25, 13.14store-dept/s]

Skipping Store 19, Dept 39 due to insufficient data


Training ARIMA models:  47%|████▋     | 1572/3326 [02:31<02:36, 11.20store-dept/s]

Skipping Store 21, Dept 48 due to insufficient data
Skipping Store 21, Dept 50 due to insufficient data


Training ARIMA models:  48%|████▊     | 1590/3326 [02:33<02:41, 10.72store-dept/s]

Skipping Store 21, Dept 77 due to insufficient data


Training ARIMA models:  48%|████▊     | 1602/3326 [02:34<02:15, 12.70store-dept/s]

Skipping Store 21, Dept 96 due to insufficient data


Training ARIMA models:  48%|████▊     | 1607/3326 [02:35<04:43,  6.06store-dept/s]

Skipping Store 21, Dept 99 due to insufficient data


Training ARIMA models:  51%|█████     | 1684/3326 [02:43<02:09, 12.70store-dept/s]

Skipping Store 22, Dept 99 due to insufficient data


Training ARIMA models:  53%|█████▎    | 1760/3326 [02:50<02:08, 12.18store-dept/s]

Skipping Store 23, Dept 99 due to insufficient data


Training ARIMA models:  57%|█████▋    | 1898/3326 [03:03<01:31, 15.64store-dept/s]

Skipping Store 25, Dept 77 due to insufficient data


Training ARIMA models:  59%|█████▉    | 1973/3326 [03:09<01:39, 13.66store-dept/s]

Skipping Store 26, Dept 78 due to insufficient data


Training ARIMA models:  61%|██████    | 2028/3326 [03:15<01:19, 16.43store-dept/s]

Skipping Store 27, Dept 39 due to insufficient data


Training ARIMA models:  63%|██████▎   | 2106/3326 [03:22<01:34, 12.93store-dept/s]

Skipping Store 28, Dept 43 due to insufficient data


Training ARIMA models:  67%|██████▋   | 2222/3326 [03:35<01:38, 11.17store-dept/s]

Skipping Store 29, Dept 99 due to insufficient data


Training ARIMA models:  67%|██████▋   | 2237/3326 [03:37<02:19,  7.81store-dept/s]

Skipping Store 30, Dept 19 due to insufficient data


Training ARIMA models:  68%|██████▊   | 2252/3326 [03:40<02:16,  7.84store-dept/s]

Skipping Store 30, Dept 33 due to insufficient data


Training ARIMA models:  73%|███████▎  | 2420/3326 [03:57<01:01, 14.70store-dept/s]

Skipping Store 32, Dept 77 due to insufficient data


Training ARIMA models:  74%|███████▍  | 2463/3326 [04:02<01:27,  9.85store-dept/s]

Skipping Store 33, Dept 27 due to insufficient data


Training ARIMA models:  74%|███████▍  | 2477/3326 [04:05<01:45,  8.06store-dept/s]

Skipping Store 33, Dept 49 due to insufficient data


Training ARIMA models:  75%|███████▍  | 2483/3326 [04:05<01:31,  9.19store-dept/s]

Skipping Store 33, Dept 71 due to insufficient data


Training ARIMA models:  77%|███████▋  | 2562/3326 [04:13<00:47, 16.09store-dept/s]

Skipping Store 34, Dept 77 due to insufficient data
Skipping Store 34, Dept 78 due to insufficient data


Training ARIMA models:  81%|████████  | 2678/3326 [04:27<01:04,  9.97store-dept/s]

Skipping Store 36, Dept 29 due to insufficient data


Training ARIMA models:  81%|████████  | 2681/3326 [04:27<00:48, 13.20store-dept/s]

Skipping Store 36, Dept 36 due to insufficient data


Training ARIMA models:  81%|████████  | 2696/3326 [04:30<01:28,  7.15store-dept/s]

Skipping Store 36, Dept 71 due to insufficient data


Training ARIMA models:  81%|████████▏ | 2704/3326 [04:31<01:14,  8.33store-dept/s]

Skipping Store 36, Dept 85 due to insufficient data


Training ARIMA models:  82%|████████▏ | 2716/3326 [04:33<00:52, 11.58store-dept/s]

Skipping Store 36, Dept 99 due to insufficient data


Training ARIMA models:  83%|████████▎ | 2758/3326 [04:37<00:54, 10.52store-dept/s]

Skipping Store 37, Dept 71 due to insufficient data


Training ARIMA models:  83%|████████▎ | 2777/3326 [04:39<00:33, 16.44store-dept/s]

Skipping Store 37, Dept 99 due to insufficient data


Training ARIMA models:  84%|████████▍ | 2809/3326 [04:44<00:51,  9.99store-dept/s]

Skipping Store 38, Dept 35 due to insufficient data


Training ARIMA models:  85%|████████▌ | 2841/3326 [04:48<00:44, 11.00store-dept/s]

Skipping Store 38, Dept 99 due to insufficient data


Training ARIMA models:  87%|████████▋ | 2896/3326 [04:55<01:25,  5.04store-dept/s]

Skipping Store 39, Dept 78 due to insufficient data


Training ARIMA models:  89%|████████▉ | 2976/3326 [05:02<00:26, 13.43store-dept/s]

Skipping Store 40, Dept 78 due to insufficient data


Training ARIMA models:  91%|█████████ | 3026/3326 [05:08<00:25, 11.91store-dept/s]

Skipping Store 41, Dept 37 due to insufficient data


Training ARIMA models:  93%|█████████▎| 3098/3326 [05:16<00:23,  9.86store-dept/s]

Skipping Store 42, Dept 34 due to insufficient data


Training ARIMA models:  93%|█████████▎| 3100/3326 [05:16<00:19, 11.61store-dept/s]

Skipping Store 42, Dept 41 due to insufficient data


Training ARIMA models:  95%|█████████▍| 3151/3326 [05:26<00:16, 10.67store-dept/s]

Skipping Store 43, Dept 24 due to insufficient data


Training ARIMA models:  95%|█████████▌| 3167/3326 [05:28<00:15, 10.55store-dept/s]

Skipping Store 43, Dept 55 due to insufficient data


Training ARIMA models:  97%|█████████▋| 3219/3326 [05:38<00:18,  5.70store-dept/s]

Skipping Store 44, Dept 34 due to insufficient data


Training ARIMA models:  98%|█████████▊| 3254/3326 [05:45<00:17,  4.18store-dept/s]

Skipping Store 44, Dept 99 due to insufficient data


Training ARIMA models: 100%|█████████▉| 3325/3326 [05:53<00:00,  9.75store-dept/s]

Skipping Store 45, Dept 96 due to insufficient data


Training ARIMA models: 100%|██████████| 3326/3326 [05:53<00:00,  9.41store-dept/s]


🏃 View run ARIMA_Model_GridSearch at: https://dagshub.com/abarb22/Walmart-Recruiting---Store-Sales-Forecasting.mlflow/#/experiments/2/runs/1e212ae6ec984521b6dbb3e28069426f
🧪 View experiment at: https://dagshub.com/abarb22/Walmart-Recruiting---Store-Sales-Forecasting.mlflow/#/experiments/2
