# Sales Forecast Model

This notebook calculates forecast unit sales based on demand forecasts and available inventory (on-hand and on-order).

The forecast unit sales represents how much of the unit demand can be captured/covered by available inventory.

**Output columns:**
- SKU: Product SKU
- MONTH: Forecast month
- UNIT DEMAND: Forecasted unit demand
- FORECAST UNIT SALES: Unit demand that can be captured by available inventory
- MISSED DEMAND: Unit demand that cannot be captured due to insufficient inventory

## Setup and Imports

In [69]:
import pandas as pd
import numpy as np
from datetime import datetime
from dateutil.relativedelta import relativedelta
import warnings
warnings.filterwarnings('ignore')

## Configuration

Set file paths here. All data files should be in the `data` folder.

In [70]:
# File paths (relative to project root)
data_folder = '../data'
demand_forecast_file = f'{data_folder}/CZ Demand Forecast Sample.csv'
on_hand_inventory_file = f'{data_folder}/on hand inventory_sample.csv'
on_order_file = f'{data_folder}/CZ On Order Sample Data.csv'

## Load Data Files

In [71]:
print("Loading data files...")

# Load demand forecast
demand_forecast = pd.read_csv(demand_forecast_file)
print(f"Demand forecast shape: {demand_forecast.shape}")
print(f"Demand forecast columns: {demand_forecast.columns.tolist()}")
print(f"\nDemand forecast sample:")
print(demand_forecast.head())

# Load on-hand inventory
on_hand_inventory = pd.read_csv(on_hand_inventory_file)
print(f"\nOn-hand inventory shape: {on_hand_inventory.shape}")
print(f"On-hand inventory columns: {on_hand_inventory.columns.tolist()}")
print(f"\nOn-hand inventory sample:")
print(on_hand_inventory.head())

# Load on-order inventory
on_order = pd.read_csv(on_order_file)
print(f"\nOn-order inventory shape: {on_order.shape}")
print(f"On-order inventory columns: {on_order.columns.tolist()}")
print(f"\nOn-order inventory sample:")
print(on_order.head())

Loading data files...
Demand forecast shape: (54376, 4)
Demand forecast columns: ['SKU', 'MONTH', 'UNIT DEMAND', 'UNIT SALES']

Demand forecast sample:
          SKU      MONTH  UNIT DEMAND  UNIT SALES
0   P20001-05  11/1/2025          458         489
1   P20001-24  11/1/2025          380         477
2   P20001-25  11/1/2025          345         384
3   P20001-23  11/1/2025          281         325
4  D104007-01  11/1/2025          126         302

On-hand inventory shape: (6109, 7)
On-hand inventory columns: ['SKU', 'ON_HAND_QTY', 'QTY_COMMITTED', 'QTY_BACKORDERED', 'UNFULFILLED_QTY', 'AVAILABLE_ON_HAND_QTY', 'CURRENT_INVENTORY_POSITION']

On-hand inventory sample:
                   SKU  ON_HAND_QTY  QTY_COMMITTED  QTY_BACKORDERED  \
0           R120041-01            0              0                0   
1  0118316-010-U-W-009            0              0                0   
2            R94001-03            0              0                0   
3  0118316-010-U-W-008            0      

## Data Cleaning and Preparation

In [72]:
# Clean and prepare demand forecast data
# Convert MONTH to datetime
demand_forecast['MONTH'] = pd.to_datetime(demand_forecast['MONTH'], errors='coerce')
# Ensure SKU is string
demand_forecast['SKU'] = demand_forecast['SKU'].astype(str)

# Remove any rows with missing critical data
demand_forecast = demand_forecast.dropna(subset=['SKU', 'MONTH', 'UNIT DEMAND'])
print(f"Demand forecast after cleaning: {demand_forecast.shape}")

# Clean on-hand inventory data
on_hand_inventory['SKU'] = on_hand_inventory['SKU'].astype(str)
# Fill missing values with 0 for numeric columns
numeric_cols = ['ON_HAND_QTY', 'AVAILABLE_ON_HAND_QTY', 'CURRENT_INVENTORY_POSITION']
for col in numeric_cols:
    if col in on_hand_inventory.columns:
        on_hand_inventory[col] = pd.to_numeric(on_hand_inventory[col], errors='coerce').fillna(0)
print(f"On-hand inventory after cleaning: {on_hand_inventory.shape}")

# Clean on-order inventory data
on_order['SKU'] = on_order['SKU'].astype(str)
# Parse Estimate Ship Date - handle the column name with spaces
ship_date_col = 'Estimate Ship Date Date'  # Based on the CSV header
on_order[ship_date_col] = pd.to_datetime(on_order[ship_date_col], errors='coerce')
# Fill missing quantities with 0
qty_col = 'Expected Shipment Quantity'
on_order[qty_col] = pd.to_numeric(on_order[qty_col], errors='coerce').fillna(0)
# Remove rows with missing SKU or ship date
on_order = on_order.dropna(subset=['SKU', ship_date_col])
print(f"On-order inventory after cleaning: {on_order.shape}")

Demand forecast after cleaning: (54376, 4)
On-hand inventory after cleaning: (6109, 7)
On-order inventory after cleaning: (772, 5)


## Process On-Order Inventory by Month

On-order inventory becomes sellable after the Estimate Ship Date. We need to aggregate on-order quantities by SKU and the month they become available.

In [73]:
# Create a month column for on-order data based on Estimate Ship Date
on_order['SHIP_MONTH'] = on_order[ship_date_col].dt.to_period('M').dt.to_timestamp()

# Aggregate on-order quantities by SKU and month
on_order_monthly = on_order.groupby(['SKU', 'SHIP_MONTH'])[qty_col].sum().reset_index()
on_order_monthly = on_order_monthly.rename(columns={qty_col: 'ON_ORDER_QTY', 'SHIP_MONTH': 'MONTH'})

print(f"On-order inventory aggregated by SKU and month: {on_order_monthly.shape}")
print(f"\nSample on-order monthly data:")
print(on_order_monthly.head(10))

On-order inventory aggregated by SKU and month: (695, 3)

Sample on-order monthly data:
          SKU      MONTH  ON_ORDER_QTY
0  B100001-01 2026-05-01            90
1  B100002-01 2026-05-01            70
2  B100002-02 2026-05-01            60
3  B100005-01 2026-05-01            10
4  B100005-02 2026-05-01            25
5  B116001-01 2025-12-01            57
6  B116001-02 2025-12-01            55
7  B120001-01 2026-01-01            35
8  B120001-02 2026-01-01            65
9  B120002-01 2026-01-01            35


## Calculate Available Inventory by Month

For each SKU and month, we need to calculate:
- Starting available on-hand inventory
- Cumulative on-order inventory that has arrived by that month
- Total available inventory = available on-hand + cumulative on-order

In [74]:
# Get unique SKUs and months from demand forecast
unique_skus = demand_forecast['SKU'].unique()
unique_months = sorted(demand_forecast['MONTH'].unique())

print(f"Unique SKUs: {len(unique_skus)}")
print(f"Unique months: {len(unique_months)}")
print(f"Month range: {unique_months[0]} to {unique_months[-1]}")

# Create a base dataframe with all SKU-month combinations from demand forecast
forecast_base = demand_forecast[['SKU', 'MONTH', 'UNIT DEMAND']].copy()

# Merge with on-hand inventory to get starting available inventory
# We'll use AVAILABLE_ON_HAND_QTY as the starting point
forecast_base = forecast_base.merge(
    on_hand_inventory[['SKU', 'AVAILABLE_ON_HAND_QTY', 'CURRENT_INVENTORY_POSITION']],
    on='SKU',
    how='left'
)
forecast_base['AVAILABLE_ON_HAND_QTY'] = forecast_base['AVAILABLE_ON_HAND_QTY'].fillna(0)
forecast_base['CURRENT_INVENTORY_POSITION'] = forecast_base['CURRENT_INVENTORY_POSITION'].fillna(0)

print(f"\nForecast base after merging on-hand inventory: {forecast_base.shape}")
print(f"\nSample forecast base:")
print(forecast_base.head(10))

Unique SKUs: 3884
Unique months: 14
Month range: 2025-11-01 00:00:00 to 2026-12-01 00:00:00

Forecast base after merging on-hand inventory: (54376, 5)

Sample forecast base:
          SKU      MONTH  UNIT DEMAND  AVAILABLE_ON_HAND_QTY  \
0   P20001-05 2025-11-01          458                      0   
1   P20001-24 2025-11-01          380                    147   
2   P20001-25 2025-11-01          345                      0   
3   P20001-23 2025-11-01          281                      0   
4  D104007-01 2025-11-01          126                      0   
5   P20001-36 2025-11-01          360                      0   
6  D104009-01 2025-11-01          136                      0   
7  D104008-01 2025-11-01           99                      0   
8  D104010-01 2025-11-01          105                      0   
9  D104011-01 2025-11-01          115                      0   

   CURRENT_INVENTORY_POSITION  
0                        -570  
1                         147  
2                        

In [75]:
# Merge with on-order monthly data
forecast_base = forecast_base.merge(
    on_order_monthly,
    on=['SKU', 'MONTH'],
    how='left'
)
forecast_base['ON_ORDER_QTY'] = forecast_base['ON_ORDER_QTY'].fillna(0)

# Sort by SKU and MONTH for proper processing
forecast_base = forecast_base.sort_values(['SKU', 'MONTH']).reset_index(drop=True)

print(f"\nForecast base after merging on-order data: {forecast_base.shape}")
print(f"\nSample forecast base:")
print(forecast_base.head(15))


Forecast base after merging on-order data: (54376, 6)

Sample forecast base:
                          SKU      MONTH  UNIT DEMAND  AVAILABLE_ON_HAND_QTY  \
0   000004556-004-FL-S-002-B1 2025-11-01            0                      0   
1   000004556-004-FL-S-002-B1 2025-12-01            0                      0   
2   000004556-004-FL-S-002-B1 2026-01-01            0                      0   
3   000004556-004-FL-S-002-B1 2026-02-01            0                      0   
4   000004556-004-FL-S-002-B1 2026-03-01            0                      0   
5   000004556-004-FL-S-002-B1 2026-04-01            0                      0   
6   000004556-004-FL-S-002-B1 2026-05-01            0                      0   
7   000004556-004-FL-S-002-B1 2026-06-01            0                      0   
8   000004556-004-FL-S-002-B1 2026-07-01            0                      0   
9   000004556-004-FL-S-002-B1 2026-08-01            0                      0   
10  000004556-004-FL-S-002-B1 2026-09-01  

## Calculate Forecast Unit Sales

Forecast Unit Sales = min(Unit Demand, Available Inventory)

Available Inventory for a given month = Available On-Hand + Cumulative On-Order that has arrived by that month

**Backorder/Pre-order Handling:**
- If CURRENT_INVENTORY_POSITION is negative, it indicates pre-orders/backorders exist
- Backorders are tracked month-to-month and persist until fully fulfilled
- When on-order inventory arrives, backorders are fulfilled FIRST before new sales
- Only inventory remaining after backorder fulfillment is available for new sales
- This ensures pre-sold commitments are honored before allocating inventory to new demand

In [None]:
# For each SKU, track inventory month by month
forecast_base['AVAILABLE_INVENTORY'] = 0.0
forecast_base['FORECAST_UNIT_SALES'] = 0.0
forecast_base['MISSED_DEMAND'] = 0.0
forecast_base['BACKORDERS_PENDING'] = 0.0  # Track backorders at start of each month
forecast_base['BACKORDERS_FULFILLED'] = 0.0  # Track backorders fulfilled each month

for sku in unique_skus:
    sku_mask = forecast_base['SKU'] == sku
    sku_data = forecast_base[sku_mask].copy().sort_values('MONTH').reset_index(drop=True)
    
    # Get starting inventory position
    starting_available = sku_data.iloc[0]['AVAILABLE_ON_HAND_QTY']
    starting_position = sku_data.iloc[0]['CURRENT_INVENTORY_POSITION']
    
    # Ensure starting available is non-negative
    starting_available = max(0, starting_available)
    
    # Track backorders/pre-orders (negative position means backorders exist)
    # If CURRENT_INVENTORY_POSITION is negative, that's the number of backorders to fulfill
    # Backorders persist across months until fully fulfilled
    backorders = abs(min(0, starting_position))
    
    # Starting available inventory (what we can sell now)
    # AVAILABLE_ON_HAND_QTY already excludes committed/backordered units
    # Ensure it's non-negative
    current_inventory = max(0, starting_available)
    
    # Process each month
    for i in range(len(sku_data)):
        row = sku_data.iloc[i]
        month = row['MONTH']
        
        # Add on-order inventory arriving this month
        on_order_this_month = max(0, row['ON_ORDER_QTY'])  # Ensure non-negative
        
                # Store backorders at start of month (before fulfillment)
        backorders_at_start = backorders
        backorders_fulfilled_this_month = 0
        
        # CRITICAL: If we have backorders, fulfill them FIRST with incoming on-order inventory
        # Backorders are pre-sold commitments, so they must be fulfilled before new sales
        # Backorders carry forward month-to-month until fully fulfilled
        if backorders > 0 and on_order_this_month > 0:
            # Fulfill as many backorders as possible with this month's on-order
            backorders_fulfilled_this_month = min(backorders, on_order_this_month)
            backorders -= backorders_fulfilled_this_month  # Reduce backorders (carries forward if not fully fulfilled)
            # Only the remaining on-order (after fulfilling backorders) is available for new sales
            available_from_on_order = on_order_this_month - backorders_fulfilled_this_month
            current_inventory += available_from_on_order
        elif backorders > 0:
            # We have backorders but no on-order this month - backorders carry forward
            # No inventory available for new sales this month
            available_from_on_order = 0
            # current_inventory remains unchanged (backorders still pending)
        else:
            # No backorders, so all on-order is available for new sales
            current_inventory += on_order_this_month
        
        # Ensure inventory never goes negative
        current_inventory = max(0, current_inventory)
        
        # Available inventory at start of month (after on-order arrives, backorders fulfilled)
        # This is what's available for NEW sales (not committed/backordered)
        # Ensure it's non-negative
        available_at_start = max(0, current_inventory)
        
        # Calculate forecast sales: min(demand, available inventory)
        # Both should be non-negative, so forecast_sales will be non-negative
        demand = max(0, row['UNIT DEMAND'])  # Ensure demand is non-negative
        forecast_sales = min(demand, available_at_start)
        
        # Ensure forecast sales is non-negative (should already be, but double-check)
        forecast_sales = max(0, forecast_sales)
        
        # Update inventory after sales
        current_inventory = max(0, available_at_start - forecast_sales)
        
                # Calculate missed demand (demand not captured by available inventory)
        missed_demand = max(0, demand - forecast_sales)
        
        # Store results
        idx = forecast_base[sku_mask].index[i]
        forecast_base.loc[idx, 'AVAILABLE_INVENTORY'] = available_at_start
        forecast_base.loc[idx, 'FORECAST_UNIT_SALES'] = forecast_sales
        forecast_base.loc[idx, 'MISSED_DEMAND'] = missed_demand
        forecast_base.loc[idx, 'BACKORDERS_PENDING'] = backorders_at_start  # Backorders at start of month
        forecast_base.loc[idx, 'BACKORDERS_FULFILLED'] = backorders_fulfilled_this_month  # Backorders fulfilled this month

print("Calculated forecast unit sales and missed demand with proper inventory tracking")
print(f"\nSample results:")
print(forecast_base[['SKU', 'MONTH', 'UNIT DEMAND', 'FORECAST_UNIT_SALES', 'MISSED_DEMAND', 'AVAILABLE_INVENTORY', 'BACKORDERS_PENDING', 'BACKORDERS_FULFILLED']].head(20))

# Show examples of SKUs with backorders being fulfilled over time
skus_with_backorders = forecast_base[forecast_base['BACKORDERS_PENDING'] > 0]['SKU'].unique()
if len(skus_with_backorders) > 0:
    print(f"\n=== Examples of backorder fulfillment (showing first 3 SKUs with backorders) ===")
    print(f"Total SKUs with backorders: {len(skus_with_backorders)}")
    for sku in skus_with_backorders[:3]:
        sku_data = forecast_base[forecast_base['SKU'] == sku].sort_values('MONTH')
        print(f"\nSKU: {sku}")
        # Show columns that exist
        display_cols = ['MONTH', 'BACKORDERS_PENDING', 'BACKORDERS_FULFILLED', 'ON_ORDER_QTY', 'AVAILABLE_INVENTORY', 'FORECAST_UNIT_SALES']
        available_cols = [col for col in display_cols if col in sku_data.columns]
        print(sku_data[available_cols].to_string())
else:
    print("\nNo SKUs with backorders found in the data.")

## Create Final Output

Create the final output with the requested columns plus inventory and backorder tracking fields:
- SKU, MONTH, UNIT DEMAND, FORECAST UNIT SALES, MISSED DEMAND
- AVAILABLE INVENTORY, ON ORDER QTY, BACKORDERS PENDING, BACKORDERS FULFILLED

In [None]:
# Create final output dataframe
# Include inventory and backorder tracking fields for validation
final_output = forecast_base[['SKU', 'MONTH', 'UNIT DEMAND', 'FORECAST_UNIT_SALES', 'MISSED_DEMAND',
                              'AVAILABLE_INVENTORY', 'ON_ORDER_QTY', 'BACKORDERS_PENDING', 'BACKORDERS_FULFILLED']].copy()

# Ensure UNIT DEMAND is non-negative (clean any negative demand values)
final_output['UNIT DEMAND'] = final_output['UNIT DEMAND'].clip(lower=0)

# Round all numeric fields to whole numbers (since we're dealing with units)
# IMPORTANT: After rounding, ensure forecast sales never exceeds demand and is never negative
final_output['FORECAST_UNIT_SALES'] = final_output['FORECAST_UNIT_SALES'].round().astype(int)
final_output['MISSED_DEMAND'] = final_output['MISSED_DEMAND'].round().astype(int)
final_output['AVAILABLE_INVENTORY'] = final_output['AVAILABLE_INVENTORY'].round().astype(int)
final_output['ON_ORDER_QTY'] = final_output['ON_ORDER_QTY'].round().astype(int)
final_output['BACKORDERS_PENDING'] = final_output['BACKORDERS_PENDING'].round().astype(int)
final_output['BACKORDERS_FULFILLED'] = final_output['BACKORDERS_FULFILLED'].round().astype(int)

# Ensure forecast sales is non-negative (should already be, but enforce it)
final_output['FORECAST_UNIT_SALES'] = final_output['FORECAST_UNIT_SALES'].clip(lower=0)

# Cap forecast sales at demand (in case rounding caused it to exceed)
# This ensures forecast sales never exceeds demand
final_output['FORECAST_UNIT_SALES'] = final_output[['FORECAST_UNIT_SALES', 'UNIT DEMAND']].min(axis=1)

# Recalculate missed demand based on capped forecast sales to ensure consistency
final_output['MISSED_DEMAND'] = (final_output['UNIT DEMAND'] - final_output['FORECAST_UNIT_SALES']).clip(lower=0)

# Now rename columns for better readability
final_output = final_output.rename(columns={
    'FORECAST_UNIT_SALES': 'FORECAST UNIT SALES',
    'MISSED_DEMAND': 'MISSED DEMAND',
    'AVAILABLE_INVENTORY': 'AVAILABLE INVENTORY',
    'ON_ORDER_QTY': 'ON ORDER QTY',
    'BACKORDERS_PENDING': 'BACKORDERS PENDING',
    'BACKORDERS_FULFILLED': 'BACKORDERS FULFILLED'
})

# Sort by SKU and MONTH
final_output = final_output.sort_values(['SKU', 'MONTH']).reset_index(drop=True)

print("Final output shape:", final_output.shape)
print(f"\nFinal output columns: {final_output.columns.tolist()}")
print(f"\nFinal output sample (first 20 rows):")
print(final_output.head(20))
print(f"\nFinal output summary:")
print(final_output.describe())

## Validation and Analysis

Let's check some statistics to validate the results.

In [None]:
# Validation checks
print("=== Validation Checks ===\n")

# Check 1: Forecast sales should never exceed demand
exceeds_demand = final_output[final_output['FORECAST UNIT SALES'] > final_output['UNIT DEMAND']]
if len(exceeds_demand) > 0:
    print(f"⚠️  WARNING: {len(exceeds_demand)} rows where forecast sales exceed demand")
    print(exceeds_demand.head())
else:
    print("✓ Forecast sales never exceed demand")

# Check 2: Forecast sales should be non-negative
negative_sales = final_output[final_output['FORECAST UNIT SALES'] < 0]
if len(negative_sales) > 0:
    print(f"⚠️  WARNING: {len(negative_sales)} rows with negative forecast sales")
else:
    print("✓ All forecast sales are non-negative")

# Check 3: Summary statistics
print(f"\n=== Summary Statistics ===")
print(f"Total SKUs: {final_output['SKU'].nunique()}")
print(f"Total months: {final_output['MONTH'].nunique()}")
print(f"Total rows: {len(final_output)}")
print(f"\nTotal Unit Demand: {final_output['UNIT DEMAND'].sum():,.0f}")
print(f"Total Forecast Unit Sales: {final_output['FORECAST UNIT SALES'].sum():,.0f}")
print(f"Total Missed Demand: {final_output['MISSED DEMAND'].sum():,.0f}")
print(f"Demand Coverage: {(final_output['FORECAST UNIT SALES'].sum() / final_output['UNIT DEMAND'].sum() * 100):.1f}%")

# Check 4: Show some examples where demand is not fully met (missed demand)
print(f"\n=== Examples with highest missed demand ===")
missed_demand_examples = final_output[final_output['MISSED DEMAND'] > 0].copy()
missed_demand_examples = missed_demand_examples.sort_values('MISSED DEMAND', ascending=False)
print(f"Rows with missed demand: {len(missed_demand_examples)}")
print(missed_demand_examples.head(20))

# Check 5: Verify missed demand calculation
print(f"\n=== Verification: MISSED DEMAND should equal UNIT DEMAND - FORECAST UNIT SALES ===")
verification = final_output.copy()
verification['CALCULATED_MISSED'] = verification['UNIT DEMAND'] - verification['FORECAST UNIT SALES']
verification['DIFF'] = verification['MISSED DEMAND'] - verification['CALCULATED_MISSED']
mismatches = verification[verification['DIFF'] != 0]
if len(mismatches) > 0:
    print(f"⚠️  WARNING: {len(mismatches)} rows where MISSED DEMAND calculation doesn't match")
    print(mismatches.head())
else:
    print("✓ MISSED DEMAND calculation is correct")

## Export Results

Save the final output to a CSV file.

In [None]:
# Export to CSV
output_file = '../data/sales_forecast_output.csv'
final_output.to_csv(output_file, index=False)
print(f"✓ Results exported to: {output_file}")
print(f"  Total rows: {len(final_output):,}")