# **Chapter 75: Retail Sales Forecasting System**

## **Learning Objectives**

By the end of this chapter, you will be able to:

- Understand the unique challenges of retail sales forecasting (multi‑location, promotions, seasonality, product hierarchies).
- Adapt the time‑series pipeline developed for financial data (NEPSE) to retail datasets.
- Engineer features specific to retail, such as promotional calendars, holiday effects, and store‑level attributes.
- Handle multiple time series (stores, products) efficiently.
- Implement a forecasting model that accounts for external factors (price changes, events).
- Evaluate forecasts using metrics relevant to retail (bias, forecast value added).
- Deploy and monitor a retail forecasting system in production.

---

## **75.1 Introduction to Retail Sales Forecasting**

Retail sales forecasting is a critical business function that drives inventory management, staffing, supply chain logistics, and financial planning. Unlike financial time series (e.g., stock prices), retail data exhibits several distinctive characteristics:

- **Hierarchical structure**: Sales can be aggregated by product category, store, region, or channel.
- **Strong seasonality**: Weekly patterns (weekend vs weekday), yearly patterns (holidays, seasons), and special events (Black Friday, Christmas).
- **Promotions and markdowns**: Temporary price reductions or marketing campaigns cause spikes that are difficult to model without external data.
- **External factors**: Weather, local events, competitor actions.
- **Multiple interrelated time series**: Sales of one product may be affected by stockouts of another (cannibalisation or complementarity).

The NEPSE prediction system we built is a single‑series (or multi‑symbol but each stock independent) model. For retail, we often need to model hundreds or thousands of series simultaneously, sharing information across them.

In this chapter, we will build a retail sales forecasting system for a hypothetical chain of stores. We will reuse many components from the NEPSE system: data ingestion, feature engineering, model training, backtesting, and deployment. The key differences lie in the feature set and the modeling approach.

---

## **75.2 Understanding Retail Data**

A typical retail dataset might include:

- **Sales data** (daily or weekly): store ID, product ID, date, units sold, revenue.
- **Product attributes**: category, price, cost, promotion flag.
- **Store attributes**: location, size, type (e.g., mall, standalone).
- **Calendar**: holidays, promotional periods, events.
- **External data**: weather (temperature, precipitation), economic indicators.

For our example, we will use a synthetic dataset that mimics a real retail chain. The data will be in CSV format with the following columns:

- `store_id`: unique store identifier
- `product_id`: unique product identifier
- `date`: date of sale
- `sales`: number of units sold
- `price`: selling price (may vary due to promotions)
- `promotion`: binary flag (1 if product on promotion that day)
- `holiday`: binary flag for public holiday
- `temperature`: daily average temperature at store location
- `weekday`: day of week (0=Monday, ..., 6=Sunday)

We will generate this data for multiple stores and products.

```python
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import random

def generate_retail_data(num_stores=5, num_products=10, days=365*2):
    """
    Generate synthetic retail sales data.
    """
    np.random.seed(42)
    start_date = datetime(2022, 1, 1)
    dates = [start_date + timedelta(days=i) for i in range(days)]
    
    records = []
    for store in range(1, num_stores+1):
        store_base = np.random.uniform(50, 200)  # base sales level
        store_trend = np.random.uniform(-0.01, 0.01)  # daily trend
        for product in range(1, num_products+1):
            product_popularity = np.random.uniform(0.5, 2.0)
            # Weekly seasonality
            weekly_pattern = np.array([0.8, 0.7, 0.9, 1.0, 1.2, 1.5, 1.1])  # Monday low, weekend high
            
            for date in dates:
                day_of_week = date.weekday()
                # Base sales with trend and seasonality
                base = store_base * (1 + store_trend * (date - start_date).days) * weekly_pattern[day_of_week] * product_popularity
                
                # Promotion effect: increase sales by 20-50%
                promotion = 1 if np.random.random() < 0.1 else 0
                promo_mult = 1.3 if promotion else 1.0
                
                # Holiday effect: maybe higher sales
                holiday = 1 if date.weekday() in [5,6] and np.random.random() < 0.2 else 0  # simplified
                holiday_mult = 1.2 if holiday else 1.0
                
                # Temperature effect (e.g., cold weather increases sales of warm products)
                temp = np.random.normal(15, 10)
                temp_effect = 1 + 0.01 * (temp - 15)  # slight effect
                
                # Random noise
                noise = np.random.normal(1, 0.1)
                
                sales = base * promo_mult * holiday_mult * temp_effect * noise
                sales = max(0, int(sales))
                
                records.append({
                    'store_id': store,
                    'product_id': product,
                    'date': date,
                    'sales': sales,
                    'price': np.random.uniform(10, 100),
                    'promotion': promotion,
                    'holiday': holiday,
                    'temperature': temp,
                    'weekday': day_of_week
                })
    
    df = pd.DataFrame(records)
    return df

# Generate data
retail_df = generate_retail_data(num_stores=3, num_products=5, days=365)
print(retail_df.head())
```

**Explanation:**

- This function generates a realistic retail dataset with multiple stores and products.
- It incorporates several factors: base store level, product popularity, weekly seasonality, promotions, holidays, temperature, and random noise.
- The data can be saved as CSV and used for the rest of the pipeline.

---

## **75.3 Adapting the Data Ingestion Pipeline**

We can reuse the `NEPSEIngestion` class from Chapter 74, but we need to handle multiple identifiers (store, product) and possibly different time frequencies. For retail, we might have daily or weekly data. We'll create a generic `RetailIngestion` class.

```python
from pathlib import Path
import logging

logger = logging.getLogger(__name__)

class RetailIngestion:
    """
    Ingests retail sales data from CSV.
    """
    def __init__(self, data_dir: str = "./data/retail"):
        self.data_dir = Path(data_dir)
        self.data_dir.mkdir(parents=True, exist_ok=True)
    
    def load_csv(self, file_path: str) -> pd.DataFrame:
        """
        Load retail CSV and perform basic validation.
        Expected columns: store_id, product_id, date, sales, price, promotion, holiday, temperature, weekday
        """
        logger.info(f"Loading retail data from {file_path}")
        df = pd.read_csv(file_path, parse_dates=['date'])
        
        required = ['store_id', 'product_id', 'date', 'sales']
        missing = [c for c in required if c not in df.columns]
        if missing:
            raise ValueError(f"Missing columns: {missing}")
        
        # Ensure date is datetime and sort
        df['date'] = pd.to_datetime(df['date'])
        df = df.sort_values(['store_id', 'product_id', 'date'])
        
        logger.info(f"Loaded {len(df)} records from {df['date'].min()} to {df['date'].max()}")
        return df
    
    def save_raw(self, df: pd.DataFrame, filename: str = None):
        if filename is None:
            filename = f"retail_raw_{datetime.now().strftime('%Y%m%d')}.parquet"
        path = self.data_dir / filename
        df.to_parquet(path, index=False)
        logger.info(f"Saved raw data to {path}")
        return path
```

**Explanation:**

- The ingestion is similar to NEPSE's, but we now expect `store_id` and `product_id` as key columns.
- Data is sorted by store, product, and date to ensure correct time ordering for each time series.
- We save as Parquet for efficiency.

---

## **75.4 Feature Engineering for Retail**

Retail feature engineering must capture patterns across multiple dimensions. We will create:

- **Temporal features**: day of week, month, quarter, holiday flags, days to nearest holiday.
- **Lagged features**: past sales for the same store/product.
- **Rolling statistics**: moving averages, standard deviations over various windows (7, 14, 30 days).
- **Promotion features**: promotion flag, days since last promotion, promotion intensity (price discount).
- **Store/product attributes**: average price, category popularity.
- **External features**: temperature (lagged and rolling), weather conditions.

We'll also create features that combine information across products (e.g., total store sales) to capture substitution effects.

```python
class RetailFeatureEngineer:
    """
    Feature engineering for retail sales data.
    """
    
    def __init__(self):
        self.feature_columns = []
    
    def compute_features(self, df: pd.DataFrame) -> pd.DataFrame:
        """
        Main feature engineering function.
        Expects columns: store_id, product_id, date, sales, price, promotion, holiday, temperature, weekday
        """
        df = df.copy()
        
        # Ensure sorted
        df = df.sort_values(['store_id', 'product_id', 'date'])
        
        # --- Temporal features ---
        df['year'] = df['date'].dt.year
        df['month'] = df['date'].dt.month
        df['day'] = df['date'].dt.day
        df['day_of_week'] = df['date'].dt.dayofweek
        df['week_of_year'] = df['date'].dt.isocalendar().week
        df['quarter'] = df['date'].dt.quarter
        df['is_weekend'] = (df['day_of_week'] >= 5).astype(int)
        
        # Holiday flags (assuming holiday column already exists)
        # Also create days before/after holiday
        df['days_to_holiday'] = df.groupby('store_id')['holiday'].transform(
            lambda x: (x.rolling(window=30, min_periods=1).apply(lambda s: (s.index[s].max() - s.index[-1]).days if s.any() else np.nan))
        )  # Simplified; better to compute using actual holiday calendar
        
        # --- Lag features (per store and product) ---
        for lag in [1, 2, 3, 7, 14, 28]:
            df[f'sales_lag_{lag}'] = df.groupby(['store_id', 'product_id'])['sales'].shift(lag)
            df[f'price_lag_{lag}'] = df.groupby(['store_id', 'product_id'])['price'].shift(lag)
            df[f'promotion_lag_{lag}'] = df.groupby(['store_id', 'product_id'])['promotion'].shift(lag)
        
        # --- Rolling statistics (per store and product) ---
        windows = [7, 14, 28, 56]
        for window in windows:
            # Sales rolling stats
            df[f'sales_rolling_mean_{window}'] = df.groupby(['store_id', 'product_id'])['sales'].transform(
                lambda x: x.rolling(window, min_periods=1).mean()
            )
            df[f'sales_rolling_std_{window}'] = df.groupby(['store_id', 'product_id'])['sales'].transform(
                lambda x: x.rolling(window, min_periods=1).std()
            )
            df[f'sales_rolling_min_{window}'] = df.groupby(['store_id', 'product_id'])['sales'].transform(
                lambda x: x.rolling(window, min_periods=1).min()
            )
            df[f'sales_rolling_max_{window}'] = df.groupby(['store_id', 'product_id'])['sales'].transform(
                lambda x: x.rolling(window, min_periods=1).max()
            )
            
            # Promotion rolling (percentage of days on promotion)
            df[f'promotion_rate_{window}'] = df.groupby(['store_id', 'product_id'])['promotion'].transform(
                lambda x: x.rolling(window, min_periods=1).mean()
            )
            
            # Price rolling
            df[f'price_rolling_mean_{window}'] = df.groupby(['store_id', 'product_id'])['price'].transform(
                lambda x: x.rolling(window, min_periods=1).mean()
            )
        
        # --- Store-level features (aggregated across products) ---
        # Total store sales
        df['store_total_sales'] = df.groupby(['store_id', 'date'])['sales'].transform('sum')
        # Store average price
        df['store_avg_price'] = df.groupby(['store_id', 'date'])['price'].transform('mean')
        # Number of products sold (as proxy for assortment)
        df['store_products_sold'] = df.groupby(['store_id', 'date'])['product_id'].transform('nunique')
        
        # --- Product-level features (across stores) ---
        # Product total sales across all stores
        df['product_total_sales'] = df.groupby(['product_id', 'date'])['sales'].transform('sum')
        # Product average price
        df['product_avg_price'] = df.groupby(['product_id', 'date'])['price'].transform('mean')
        
        # --- Interaction features ---
        # Sales * promotion (maybe use as separate feature)
        df['sales_promo_interaction'] = df['sales'] * df['promotion']
        # Price * promotion
        df['price_promo'] = df['price'] * df['promotion']
        
        # --- External features (weather) ---
        # Could include rolling weather stats
        df['temp_rolling_mean_7'] = df.groupby(['store_id'])['temperature'].transform(
            lambda x: x.rolling(7, min_periods=1).mean()
        )
        
        # --- Target: next day sales (for supervised learning) ---
        df['target_sales'] = df.groupby(['store_id', 'product_id'])['sales'].shift(-1)
        
        # Drop rows with NaN created by shifts
        df = df.dropna().reset_index(drop=True)
        
        # Store feature columns (exclude identifiers and target)
        exclude = ['store_id', 'product_id', 'date', 'sales', 'target_sales']
        self.feature_columns = [c for c in df.columns if c not in exclude]
        
        logger.info(f"Computed {len(self.feature_columns)} features")
        return df
```

**Explanation:**

- The feature engineering is per store‑product pair, using `groupby` to ensure correct lags and rolling statistics within each time series.
- We create both product‑specific features (lagged sales, rolling statistics) and aggregate features (store total sales, product total across stores) to capture cross‑series dependencies.
- The target is next‑day sales (`shift(-1)`).
- This approach is computationally intensive for many series; in production, you would use distributed computing (e.g., Spark, Dask) or a feature store that pre‑computes these.

---

## **75.5 Model Selection and Training**

For retail, we often need to forecast thousands of time series. A common approach is to use a global model that trains on all series simultaneously, learning shared patterns. We can use tree‑based models like XGBoost or LightGBM, or deep learning models like DeepAR (Amazon) or N‑BEATS. Here we'll use LightGBM, which is fast and handles categorical features well.

We'll also incorporate categorical features like store_id, product_id, month, day_of_week, etc. LightGBM can handle these natively.

```python
import lightgbm as lgb
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

class RetailForecaster:
    """
    Global forecasting model using LightGBM.
    """
    
    def __init__(self, feature_columns, categorical_features=None):
        self.feature_columns = feature_columns
        self.categorical_features = categorical_features if categorical_features else []
        self.model = None
    
    def prepare_data(self, df):
        X = df[self.feature_columns]
        y = df['target_sales']
        # Convert categorical columns to category dtype
        for col in self.categorical_features:
            if col in X.columns:
                X[col] = X[col].astype('category')
        return X, y
    
    def train_with_cv(self, X, y, n_splits=5):
        tscv = TimeSeriesSplit(n_splits=n_splits)
        cv_scores = []
        
        for fold, (train_idx, val_idx) in enumerate(tscv.split(X)):
            X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
            y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
            
            train_data = lgb.Dataset(X_train, label=y_train, categorical_feature=self.categorical_features)
            val_data = lgb.Dataset(X_val, label=y_val, categorical_feature=self.categorical_features, reference=train_data)
            
            params = {
                'objective': 'regression',
                'metric': 'mae',
                'boosting_type': 'gbdt',
                'num_leaves': 31,
                'learning_rate': 0.05,
                'feature_fraction': 0.9,
                'bagging_fraction': 0.8,
                'bagging_freq': 5,
                'verbose': -1
            }
            
            model = lgb.train(
                params,
                train_data,
                valid_sets=[val_data],
                num_boost_round=1000,
                callbacks=[lgb.early_stopping(10), lgb.log_evaluation(0)]
            )
            
            y_pred = model.predict(X_val, num_iteration=model.best_iteration)
            mae = mean_absolute_error(y_val, y_pred)
            rmse = np.sqrt(mean_squared_error(y_val, y_pred))
            cv_scores.append({'fold': fold, 'mae': mae, 'rmse': rmse, 'best_iteration': model.best_iteration})
            print(f"Fold {fold}: MAE={mae:.2f}, RMSE={rmse:.2f}, best_iter={model.best_iteration}")
        
        # Select best model (lowest validation MAE)
        best_fold = min(cv_scores, key=lambda x: x['mae'])
        print(f"Best fold: {best_fold['fold']} with MAE={best_fold['mae']:.2f}")
        
        # Retrain on full data with best iteration
        self.model = lgb.LGBMRegressor(
            objective='regression',
            num_leaves=31,
            learning_rate=0.05,
            feature_fraction=0.9,
            bagging_fraction=0.8,
            bagging_freq=5,
            n_estimators=best_fold['best_iteration'],
            verbose=-1
        )
        self.model.fit(X, y, categorical_feature=self.categorical_features)
        
        return cv_scores
    
    def predict(self, X):
        return self.model.predict(X)
```

**Explanation:**

- The forecaster uses LightGBM with early stopping on time‑series cross‑validation folds.
- `categorical_features` can include `store_id`, `product_id`, `month`, `day_of_week`, etc., which LightGBM handles internally.
- After CV, we retrain on the full dataset using the best number of iterations found in the best fold.

---

## **75.6 Backtesting for Retail**

Backtesting in retail is similar to financial backtesting: we simulate how the forecast would have performed historically. However, instead of trading, we evaluate the forecast error over time and possibly simulate inventory decisions.

We can reuse the `NEPSEBacktester` with modifications for retail metrics. Important retail metrics include:

- **Forecast bias**: average error (should be near zero).
- **Forecast accuracy**: MAE, RMSE, MAPE.
- **Forecast value added**: comparison with naive benchmarks (e.g., seasonal naive).
- **Pinball loss** for quantile forecasts (if we produce prediction intervals).

We'll create a simple backtester that rolls through time, training on expanding window and predicting the next day.

```python
class RetailBacktester:
    """
    Walk‑forward backtesting for retail forecasts.
    """
    
    def __init__(self, forecaster, feature_columns, categorical_features, 
                 initial_train_days=365, step_days=1):
        self.forecaster = forecaster
        self.feature_columns = feature_columns
        self.categorical_features = categorical_features
        self.initial_train_days = initial_train_days
        self.step_days = step_days
    
    def run(self, df):
        """
        df must be sorted by date.
        """
        df = df.sort_values('date').reset_index(drop=True)
        unique_dates = df['date'].unique()
        
        results = []
        
        for i in range(self.initial_train_days, len(unique_dates), self.step_days):
            train_end_date = unique_dates[i-1]
            test_date = unique_dates[i]
            
            # Training data: up to train_end_date
            train_df = df[df['date'] <= train_end_date]
            # Test data: exactly test_date
            test_df = df[df['date'] == test_date]
            
            if test_df.empty:
                continue
            
            # Prepare training data
            X_train, y_train = self.forecaster.prepare_data(train_df)
            # Train a new model on this expanding window
            model = lgb.LGBMRegressor(
                objective='regression',
                num_leaves=31,
                learning_rate=0.05,
                feature_fraction=0.9,
                bagging_fraction=0.8,
                bagging_freq=5,
                n_estimators=100,  # we'll use early stopping on a validation set
                verbose=-1
            )
            # Use last 30 days as validation for early stopping
            val_df = train_df[train_df['date'] > train_df['date'].max() - pd.Timedelta(days=30)]
            X_val, y_val = self.forecaster.prepare_data(val_df)
            model.fit(
                X_train, y_train,
                eval_set=[(X_val, y_val)],
                callbacks=[lgb.early_stopping(10)],
                categorical_feature=self.categorical_features
            )
            
            # Predict on test
            X_test, _ = self.forecaster.prepare_data(test_df)
            y_pred = model.predict(X_test)
            
            # Store results
            test_df = test_df.copy()
            test_df['predicted_sales'] = y_pred
            test_df['error'] = test_df['predicted_sales'] - test_df['target_sales']
            test_df['abs_error'] = test_df['error'].abs()
            test_df['train_end_date'] = train_end_date
            results.append(test_df[['store_id', 'product_id', 'date', 'target_sales', 'predicted_sales', 'error', 'abs_error']])
        
        results_df = pd.concat(results, ignore_index=True)
        return results_df
```

**Explanation:**

- This backtester iterates through time, expanding the training window and predicting one day ahead.
- For each step, it trains a new LightGBM model with early stopping on a recent validation set.
- The results include actuals and predictions, which can be used to compute overall metrics.

---

## **75.7 Deployment and Monitoring**

Deployment for retail is similar to the NEPSE system: we can have a batch job that runs daily to produce forecasts for all stores/products, and possibly a REST API for ad‑hoc queries.

### **75.7.1 Batch Prediction**

We can create a batch predictor that loads the latest features and generates forecasts for the next day.

```python
class RetailBatchPredictor:
    def __init__(self, model, feature_columns, categorical_features, feature_store):
        self.model = model
        self.feature_columns = feature_columns
        self.categorical_features = categorical_features
        self.feature_store = feature_store
    
    def predict_next_day(self, current_date):
        """
        Predict sales for date = current_date + 1 day.
        We need features up to current_date.
        """
        # Load features up to current_date
        # In practice, we would have a feature store that can retrieve the latest feature vectors
        # Here we assume we have a DataFrame with the latest features for each store-product
        # This is a placeholder.
        pass
```

### **75.7.2 Monitoring**

Monitoring for retail should track forecast error over time, detect data drift (e.g., changes in promotion effectiveness), and alert when accuracy drops below thresholds. We can reuse the `ModelMonitor` class from Chapter 74, adapting it to store‑product level metrics.

---

## **75.8 Case Study: Implementation with Synthetic Data**

Let's put it all together with our synthetic retail data.

```python
# Generate data
retail_df = generate_retail_data(num_stores=3, num_products=5, days=365*2)

# Ingest (save raw)
ingestor = RetailIngestion()
ingestor.save_raw(retail_df, "retail_sample.parquet")

# Feature engineering
engineer = RetailFeatureEngineer()
featured_df = engineer.compute_features(retail_df)
print(featured_df.head())

# Define categorical features
cat_features = ['store_id', 'product_id', 'month', 'day_of_week', 'is_weekend', 'promotion', 'holiday']

# Prepare forecaster
forecaster = RetailForecaster(engineer.feature_columns, categorical_features=cat_features)
X, y = forecaster.prepare_data(featured_df)

# Train with CV
cv_scores = forecaster.train_with_cv(X, y)

# Backtest
backtester = RetailBacktester(forecaster, engineer.feature_columns, cat_features, initial_train_days=365)
backtest_results = backtester.run(featured_df)

# Evaluate overall MAE
overall_mae = backtest_results['abs_error'].mean()
print(f"Overall MAE: {overall_mae:.2f}")
```

---

## **75.9 Lessons Learned from Retail Forecasting**

1. **Data quality is even more critical**: Missing sales data (due to stockouts) must be handled carefully; otherwise, the model learns to predict zero when product is unavailable.
2. **Promotion effects are powerful but tricky**: Promotions can cause spikes that are hard to predict if not properly feature‑engineered (e.g., include promotion duration, discount depth).
3. **Hierarchical reconciliation**: Forecasts at different levels (store, product, total) may not add up. Using reconciliation methods (top‑down, bottom‑up, optimal combination) improves consistency.
4. **Global models outperform local models** when many series are available, as they share strength across sparse series.
5. **External data (weather, events) adds value** but must be available for the forecast horizon.

---

## **75.10 Future Improvements**

- **Quantile forecasting** to provide prediction intervals for safety stock.
- **Deep learning models** like DeepAR or Temporal Fusion Transformer.
- **Incorporating image/text data** from advertisements.
- **Automated promotion feature extraction** from calendar data.
- **Real‑time inventory feedback**: adjust forecasts based on current stock levels.

---

## **Chapter Summary**

In this chapter, we adapted our time‑series prediction system to the retail domain. We built a synthetic retail dataset, engineered features specific to retail (promotions, holidays, store/product aggregates), trained a global LightGBM model, and backtested it with a walk‑forward approach. We highlighted the differences from financial forecasting and discussed deployment and monitoring considerations.

The principles remain the same as in the NEPSE system: clean data, robust feature engineering, time‑aware validation, and production deployment. The retail example demonstrates the flexibility of our pipeline.

In the next chapter, we will explore **Weather and Climate Prediction**, another domain with unique characteristics.

---

**End of Chapter 75**