# **Chapter 80: Supply Chain Optimization**

## **Learning Objectives**

By the end of this chapter, you will be able to:

- Understand the core components of supply chain optimization: demand forecasting, inventory management, and lead time prediction.
- Identify the different types of data required for supply chain modeling (sales, orders, shipments, inventory levels, supplier information).
- Engineer features that capture seasonality, trends, promotions, and external factors affecting demand.
- Build and evaluate models for demand forecasting at multiple levels (SKU, store, warehouse).
- Implement inventory optimization models (e.g., reorder point, safety stock) using forecast uncertainty.
- Predict lead times using historical shipment data and external factors (e.g., port congestion, weather).
- Design a multi‑echelon inventory system that optimizes stock across warehouses and retail locations.
- Integrate forecasting and optimization into a production pipeline with monitoring and alerting.

---

## **80.1 Introduction to Supply Chain Optimization**

Supply chain optimization aims to ensure that the right products are available at the right places, at the right times, and in the right quantities – while minimizing costs. It is a complex, multi‑faceted problem that involves:

- **Demand forecasting**: Predicting future customer demand at various levels (SKU, category, store, region).
- **Inventory optimization**: Determining optimal stock levels to meet demand while avoiding overstock and stockouts.
- **Lead time prediction**: Estimating how long it will take for orders to arrive from suppliers.
- **Replenishment planning**: Deciding when and how much to order.
- **Network design**: Structuring warehouses and distribution centers.

Machine learning plays a crucial role in modern supply chains. Accurate demand forecasts reduce waste and improve service levels. Predictive models for lead times help manage uncertainty. Optimization algorithms balance trade‑offs between holding costs and stockout costs.

In this chapter, we will build a simplified supply chain optimization system. We'll use synthetic data that mimics a retail chain with multiple products, warehouses, and stores. We'll integrate demand forecasting (building on Chapter 75), lead time prediction, and inventory optimization into a cohesive pipeline.

---

## **80.2 Supply Chain Data Characteristics**

Supply chain data typically includes:

- **Sales data**: SKU, store, date, quantity sold, price, promotion flag.
- **Inventory data**: Current stock levels, reorder points, safety stock, stockouts.
- **Order data**: Purchase orders to suppliers, order dates, expected delivery dates, actual delivery dates.
- **Shipment data**: Shipment ID, origin, destination, departure date, arrival date, carrier, mode (air, sea, land).
- **Supplier data**: Supplier lead times, reliability, minimum order quantities.
- **External data**: Holidays, weather, economic indicators, port congestion indices.

For our example, we'll generate synthetic data for a single warehouse supplying multiple stores with a few products.

```python
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import random

def generate_supply_chain_data(num_stores=5, num_products=3, days=365*2, seed=42):
    """
    Generate synthetic supply chain data.
    
    Returns
    -------
    sales_df : pd.DataFrame
        Daily sales per store and product.
    inventory_df : pd.DataFrame
        Daily inventory levels at warehouse.
    orders_df : pd.DataFrame
        Purchase orders to suppliers.
    shipments_df : pd.DataFrame
        Shipments from suppliers to warehouse.
    """
    np.random.seed(seed)
    random.seed(seed)
    
    start_date = datetime(2022, 1, 1)
    dates = [start_date + timedelta(days=i) for i in range(days)]
    
    # --- Sales data ---
    sales_records = []
    for store in range(1, num_stores+1):
        # Store-specific base demand
        store_base = np.random.uniform(50, 200)
        for product in range(1, num_products+1):
            product_mult = np.random.uniform(0.5, 2.0)
            # Weekly seasonality
            weekly_pattern = np.array([0.8, 0.7, 0.9, 1.0, 1.2, 1.5, 1.1])
            # Promotion effect
            promo_days = random.sample(range(days), int(days*0.1))  # 10% promotion days
            
            for i, date in enumerate(dates):
                dow = date.weekday()
                base = store_base * product_mult * weekly_pattern[dow]
                # Promotion
                promo = 1 if i in promo_days else 0
                promo_mult = 1.5 if promo else 1.0
                # Random noise
                noise = np.random.normal(1, 0.1)
                sales = base * promo_mult * noise
                sales = max(0, int(sales))
                
                sales_records.append({
                    'date': date,
                    'store_id': store,
                    'product_id': product,
                    'sales': sales,
                    'promotion': promo
                })
    
    sales_df = pd.DataFrame(sales_records)
    
    # --- Inventory data (warehouse level) ---
    # We'll simulate inventory as a simple time series: starting stock, then subtract sales, add orders.
    # For simplicity, we'll generate orders separately and then compute inventory.
    # We'll create an orders DataFrame and then derive inventory.
    
    # --- Orders data (purchase orders to suppliers) ---
    orders_records = []
    # Each product has a supplier with a lead time distribution
    supplier_lead_time = {p: np.random.uniform(5, 15) for p in range(1, num_products+1)}  # mean lead time
    order_quantity = {p: np.random.randint(500, 2000) for p in range(1, num_products+1)}
    
    # Generate orders every few weeks
    for product in range(1, num_products+1):
        order_interval = np.random.randint(20, 40)  # days between orders
        for i in range(0, days, order_interval):
            order_date = dates[i]
            # Lead time (days) – stochastic
            lead_time = int(np.random.normal(supplier_lead_time[product], 2))
            expected_delivery = order_date + timedelta(days=lead_time)
            # Actual delivery may vary
            actual_delay = np.random.normal(0, 2)  # days
            actual_delivery = expected_delivery + timedelta(days=int(actual_delay))
            orders_records.append({
                'order_date': order_date,
                'product_id': product,
                'quantity': order_quantity[product],
                'expected_delivery': expected_delivery,
                'actual_delivery': actual_delivery,
                'supplier_id': np.random.randint(1, 4)
            })
    
    orders_df = pd.DataFrame(orders_records)
    
    # --- Shipments data (for lead time prediction) ---
    shipments_records = []
    for _, order in orders_df.iterrows():
        # Simulate a shipment per order
        shipment_id = f"SHIP_{len(shipments_records)+1}"
        shipments_records.append({
            'shipment_id': shipment_id,
            'order_date': order['order_date'],
            'product_id': order['product_id'],
            'supplier_id': order['supplier_id'],
            'departure_date': order['order_date'] + timedelta(days=np.random.randint(1, 3)),
            'arrival_date': order['actual_delivery'],
            'origin': f"Supplier_{order['supplier_id']}",
            'destination': 'Warehouse',
            'mode': np.random.choice(['air', 'sea', 'land'], p=[0.2, 0.3, 0.5])
        })
    
    shipments_df = pd.DataFrame(shipments_records)
    
    return sales_df, orders_df, shipments_df

# Generate data
sales_df, orders_df, shipments_df = generate_supply_chain_data()
print("Sales shape:", sales_df.shape)
print(sales_df.head())
print("\nOrders shape:", orders_df.shape)
print(orders_df.head())
print("\nShipments shape:", shipments_df.shape)
print(shipments_df.head())
```

**Explanation:**

- We generate three related DataFrames: sales (daily per store/product), orders (purchase orders to suppliers), and shipments (details of each shipment).
- Sales include promotion flags and weekly seasonality.
- Orders have stochastic lead times; actual delivery may deviate from expected.
- Shipments include mode and dates, which we will use for lead time prediction.

---

## **80.3 Demand Forecasting**

Demand forecasting is the foundation of supply chain optimization. We need forecasts at the SKU‑store level, and also aggregated to warehouse level. The features are similar to retail forecasting (Chapter 75) but with an added focus on supply chain variables like promotions and external events.

We'll build a demand forecasting model using LightGBM, similar to Chapter 75, but now we will also incorporate **causal factors** like promotions and holidays.

```python
class DemandForecaster:
    """
    Demand forecasting at SKU-store level.
    """
    
    def __init__(self, feature_columns, categorical_features=None):
        self.feature_columns = feature_columns
        self.categorical_features = categorical_features if categorical_features else []
        self.model = None
    
    def prepare_data(self, df, target='sales'):
        X = df[self.feature_columns]
        y = df[target]
        for col in self.categorical_features:
            if col in X.columns:
                X[col] = X[col].astype('category')
        return X, y
    
    def train(self, X, y, params=None):
        if params is None:
            params = {
                'objective': 'regression',
                'metric': 'mae',
                'boosting_type': 'gbdt',
                'num_leaves': 31,
                'learning_rate': 0.05,
                'feature_fraction': 0.8,
                'bagging_fraction': 0.8,
                'bagging_freq': 5,
                'verbose': -1
            }
        train_data = lgb.Dataset(X, label=y, categorical_feature=self.categorical_features)
        self.model = lgb.train(params, train_data, num_boost_round=100)
        return self.model
    
    def predict(self, X):
        return self.model.predict(X)

# Feature engineering for demand
def engineer_demand_features(sales_df, add_lags=True, add_rolling=True):
    df = sales_df.copy()
    # Sort
    df = df.sort_values(['store_id', 'product_id', 'date'])
    
    # Time features
    df['date'] = pd.to_datetime(df['date'])
    df['year'] = df['date'].dt.year
    df['month'] = df['date'].dt.month
    df['day'] = df['date'].dt.day
    df['dayofweek'] = df['date'].dt.dayofweek
    df['weekend'] = (df['dayofweek'] >= 5).astype(int)
    
    # Cyclical encoding
    df['month_sin'] = np.sin(2 * np.pi * df['month'] / 12)
    df['month_cos'] = np.cos(2 * np.pi * df['month'] / 12)
    df['dayofweek_sin'] = np.sin(2 * np.pi * df['dayofweek'] / 7)
    df['dayofweek_cos'] = np.cos(2 * np.pi * df['dayofweek'] / 7)
    
    # Promotion (already in data)
    # Lag features
    if add_lags:
        for lag in [1, 7, 14, 28]:
            df[f'sales_lag_{lag}'] = df.groupby(['store_id', 'product_id'])['sales'].shift(lag)
            df[f'promo_lag_{lag}'] = df.groupby(['store_id', 'product_id'])['promotion'].shift(lag)
    
    # Rolling features
    if add_rolling:
        for window in [7, 14, 28]:
            df[f'sales_rolling_mean_{window}'] = df.groupby(['store_id', 'product_id'])['sales'].transform(
                lambda x: x.rolling(window, min_periods=1).mean().shift(1)
            )
            df[f'sales_rolling_std_{window}'] = df.groupby(['store_id', 'product_id'])['sales'].transform(
                lambda x: x.rolling(window, min_periods=1).std().shift(1)
            )
    
    # Drop rows with NaN
    df = df.dropna().reset_index(drop=True)
    
    return df
```

**Explanation:**

- The feature engineer adds time features, lagged sales and promotions, and rolling statistics.
- Lag of 7 and 14 capture weekly patterns; 28 captures monthly.
- The model is trained per store‑product; in practice, you might train a global model with store and product as categorical features.

---

## **80.4 Inventory Optimization**

Given demand forecasts, we need to determine optimal inventory levels. The classic approach is to set:

- **Reorder point (ROP)** = demand during lead time + safety stock.
- **Safety stock** = z * σ_during_lead_time, where z is a service level factor and σ is the standard deviation of demand during lead time.

But lead time itself is uncertain. We can predict lead times (next section) and incorporate that uncertainty.

We'll implement a simple inventory optimization function that uses forecast quantiles to set reorder points.

```python
def calculate_reorder_point(forecast_mean, forecast_std, lead_time_days, service_level=0.95):
    """
    Calculate reorder point using forecast uncertainty.
    
    Parameters
    ----------
    forecast_mean : float
        Mean daily demand forecast.
    forecast_std : float
        Standard deviation of daily demand (forecast error).
    lead_time_days : float
        Expected lead time (may be point estimate or mean).
    service_level : float
        Desired probability of no stockout during lead time.
    
    Returns
    -------
    rop : float
        Reorder point (units).
    safety_stock : float
        Safety stock component.
    """
    from scipy.stats import norm
    z = norm.ppf(service_level)  # z-score for service level
    demand_during_lead_time = forecast_mean * lead_time_days
    std_during_lead_time = forecast_std * np.sqrt(lead_time_days)
    safety_stock = z * std_during_lead_time
    rop = demand_during_lead_time + safety_stock
    return rop, safety_stock

# Example usage
# rop, ss = calculate_reorder_point(forecast_mean=100, forecast_std=20, lead_time_days=7, service_level=0.95)
# print(f"Reorder point: {rop:.0f} units (safety stock: {ss:.0f})")
```

**Explanation:**

- This formula assumes demand is normally distributed and independent across days. More sophisticated methods use empirical distributions or simulation.
- The forecast mean and standard deviation come from our demand forecasting model (point forecast and quantile models).

---

## **80.5 Lead Time Prediction**

Lead time – the time between placing an order and receiving it – is a key source of uncertainty. We can predict lead times using historical shipment data and features such as:

- **Supplier**: historical performance.
- **Mode**: air, sea, land.
- **Origin/destination**: distance, port congestion.
- **Time of year**: weather, holidays.
- **Order characteristics**: quantity, product type.

We'll build a regression model to predict lead time (days).

```python
def engineer_lead_time_features(shipments_df):
    df = shipments_df.copy()
    df['order_date'] = pd.to_datetime(df['order_date'])
    df['departure_date'] = pd.to_datetime(df['departure_date'])
    df['arrival_date'] = pd.to_datetime(df['arrival_date'])
    
    # Calculate actual lead time (target)
    df['lead_time'] = (df['arrival_date'] - df['order_date']).dt.days
    
    # Features
    df['month'] = df['order_date'].dt.month
    df['dayofweek'] = df['order_date'].dt.dayofweek
    df['mode'] = df['mode'].astype('category')
    df['supplier_id'] = df['supplier_id'].astype('category')
    
    # Distance proxy (could use actual distance if available)
    # For simplicity, we'll use a random feature
    df['distance_km'] = np.random.randint(100, 1000, len(df))
    
    # Historical supplier performance (mean lead time per supplier)
    supplier_mean = df.groupby('supplier_id')['lead_time'].transform('mean')
    df['supplier_mean_lead_time'] = supplier_mean
    
    # Drop rows with missing target (if any)
    df = df.dropna(subset=['lead_time'])
    
    feature_cols = ['month', 'dayofweek', 'mode', 'supplier_id', 'distance_km', 'supplier_mean_lead_time']
    return df, feature_cols

# Example
# shipments_feat, feat_cols = engineer_lead_time_features(shipments_df)
# from sklearn.ensemble import RandomForestRegressor
# X = pd.get_dummies(shipments_feat[feat_cols], drop_first=True)
# y = shipments_feat['lead_time']
# model = RandomForestRegressor(n_estimators=100, random_state=42)
# model.fit(X, y)
# predictions = model.predict(X)
```

**Explanation:**

- Lead time is the target variable.
- Features include calendar, mode, supplier, and a simple distance proxy.
- Supplier mean lead time captures historical reliability.
- The model can then be used to predict lead times for new orders.

---

## **80.6 Multi‑Echelon Inventory Optimization**

In a multi‑echelon system (e.g., warehouse → stores), inventory decisions at different levels interact. Optimizing each level independently can lead to suboptimal results (e.g., bullwhip effect).

A common approach is to use **distribution requirements planning (DRP)** or **multi‑echelon inventory optimization (MEIO)** algorithms. These consider:

- Demand at the lowest level (stores).
- Lead times between echelons.
- Service level targets.
- Cost trade‑offs.

We'll implement a simplified two‑echelon system (warehouse and stores) using a periodic review policy.

```python
class MultiEchelonInventory:
    """
    Simplified two‑echelon inventory optimization.
    """
    
    def __init__(self, warehouse, stores, lead_time_wh_to_store):
        self.warehouse = warehouse  # dict with 'stock', 'reorder_point', 'order_quantity'
        self.stores = stores  # list of dicts with 'id', 'stock', 'reorder_point', 'order_quantity'
        self.lead_time_wh_to_store = lead_time_wh_to_store  # days
    
    def review(self, daily_demand_forecast):
        """
        Run a periodic review (e.g., daily).
        daily_demand_forecast: dict with store_id -> forecast for next days.
        """
        actions = []
        # Check each store
        for store in self.stores:
            if store['stock'] <= store['reorder_point']:
                # Order from warehouse
                order_qty = store['order_quantity']
                actions.append({
                    'type': 'store_order',
                    'store_id': store['id'],
                    'quantity': order_qty,
                    'delivery_in': self.lead_time_wh_to_store
                })
                # Reduce warehouse stock (will be fulfilled if available)
                self.warehouse['stock'] -= order_qty
        
        # Check warehouse
        if self.warehouse['stock'] <= self.warehouse['reorder_point']:
            # Order from supplier
            order_qty = self.warehouse['order_quantity']
            actions.append({
                'type': 'warehouse_order',
                'quantity': order_qty,
                'lead_time': self.warehouse['supplier_lead_time']  # from lead time model
            })
        
        return actions
```

**Explanation:**

- This class simulates a simple inventory policy: when stock at a store falls below reorder point, an order is placed to the warehouse.
- The warehouse itself orders from a supplier when its stock is low.
- In practice, you would use optimization algorithms to set reorder points and order quantities based on demand forecasts and service level targets.

---

## **80.7 Integration and Deployment**

A production supply chain optimization system typically runs daily, performing:

1. **Demand forecasting** for all SKU‑store combinations (or aggregated).
2. **Lead time prediction** for outstanding orders.
3. **Inventory optimization** to generate replenishment recommendations.
4. **Publish recommendations** to a dashboard or ERP system.

We can use a batch pipeline orchestrated with Airflow, similar to earlier chapters.

```python
class SupplyChainOptimizer:
    def __init__(self, demand_model, lead_time_model, inventory_config):
        self.demand_model = demand_model
        self.lead_time_model = lead_time_model
        self.inventory_config = inventory_config
    
    def run_daily(self, sales_history, open_orders, current_inventory):
        """
        Main daily optimization routine.
        """
        # 1. Demand forecast for next N days
        demand_forecasts = self.forecast_demand(sales_history)
        
        # 2. Lead time predictions for open orders
        lead_time_preds = self.predict_lead_times(open_orders)
        
        # 3. Inventory optimization
        reorder_recommendations = self.optimize_inventory(
            current_inventory, demand_forecasts, lead_time_preds
        )
        
        return reorder_recommendations
    
    def forecast_demand(self, sales_history):
        # Feature engineering and prediction
        # ...
        pass
    
    def predict_lead_times(self, open_orders):
        # Feature engineering and prediction
        # ...
        pass
    
    def optimize_inventory(self, current_inventory, forecasts, lead_times):
        # Apply reorder point logic
        recommendations = []
        for sku, stock in current_inventory.items():
            forecast_mean = forecasts[sku]['mean']
            forecast_std = forecasts[sku]['std']
            lead_time = lead_times.get(sku, self.inventory_config['default_lead_time'])
            rop, ss = calculate_reorder_point(forecast_mean, forecast_std, lead_time, 
                                               service_level=self.inventory_config['service_level'])
            if stock <= rop:
                recommendations.append({
                    'sku': sku,
                    'action': 'order',
                    'quantity': self.inventory_config['order_quantity'][sku],
                    'priority': 'high' if stock < ss else 'medium'
                })
        return recommendations
```

**Explanation:**

- The optimizer integrates the three components.
- It uses demand forecasts (including uncertainty) and lead time predictions to compute reorder points.
- Recommendations can be sent to procurement systems or reviewed by planners.

---

## **80.8 Case Study: Optimizing a Retail Supply Chain**

Let's run a simplified end‑to‑end example with synthetic data.

```python
# Generate data
sales_df, orders_df, shipments_df = generate_supply_chain_data(num_stores=3, num_products=2, days=365*2)

# --- Demand forecasting ---
# Engineer features
demand_feat = engineer_demand_features(sales_df)

# Split (last 60 days for test)
split_date = demand_feat['date'].max() - pd.Timedelta(days=60)
train_demand = demand_feat[demand_feat['date'] < split_date]
test_demand = demand_feat[demand_feat['date'] >= split_date]

# Categorical features
cat_cols = ['store_id', 'product_id', 'month', 'dayofweek', 'promotion']

# Prepare data
feature_cols = [c for c in train_demand.columns if c not in ['date', 'sales'] and c not in cat_cols] + cat_cols
X_train, y_train = train_demand[feature_cols], train_demand['sales']
X_test, y_test = test_demand[feature_cols], test_demand['sales']

# Train model
forecaster = DemandForecaster(feature_cols, categorical_features=cat_cols)
forecaster.train(X_train, y_train)

# Predict on test
y_pred = forecaster.predict(X_test)
test_mae = mean_absolute_error(y_test, y_pred)
print(f"Demand forecast MAE: {test_mae:.2f}")

# --- Lead time prediction ---
shipments_feat, lt_feat_cols = engineer_lead_time_features(shipments_df)
# Convert categoricals
X_lt = pd.get_dummies(shipments_feat[lt_feat_cols], drop_first=True)
y_lt = shipments_feat['lead_time']
# Train/test split (time-based)
split_date_lt = shipments_feat['order_date'].quantile(0.8)
train_lt = shipments_feat[shipments_feat['order_date'] < split_date_lt]
test_lt = shipments_feat[shipments_feat['order_date'] >= split_date_lt]
X_train_lt = pd.get_dummies(train_lt[lt_feat_cols], drop_first=True)
y_train_lt = train_lt['lead_time']
X_test_lt = pd.get_dummies(test_lt[lt_feat_cols], drop_first=True)
y_test_lt = test_lt['lead_time']

from sklearn.ensemble import RandomForestRegressor
lt_model = RandomForestRegressor(n_estimators=100, random_state=42)
lt_model.fit(X_train_lt, y_train_lt)
y_pred_lt = lt_model.predict(X_test_lt)
lt_mae = mean_absolute_error(y_test_lt, y_pred_lt)
print(f"Lead time MAE: {lt_mae:.2f} days")

# --- Inventory optimization (simulate one product) ---
# For a specific product-store, get forecast mean and std
product_demo = sales_df[(sales_df['product_id']==1) & (sales_df['store_id']==1)].copy()
# Assume we have forecasts from our model; here we just use historical mean and std as proxy
forecast_mean = product_demo['sales'].mean()
forecast_std = product_demo['sales'].std()
lead_time = lt_model.predict(pd.get_dummies(pd.DataFrame([{
    'month': 6, 'dayofweek': 2, 'mode': 'land', 'supplier_id': 1, 
    'distance_km': 500, 'supplier_mean_lead_time': 10
}])))[0]

rop, ss = calculate_reorder_point(forecast_mean, forecast_std, lead_time, service_level=0.95)
print(f"For product 1 at store 1: reorder point = {rop:.0f}, safety stock = {ss:.0f}")
```

**Explanation:**

- We train a demand model and a lead time model.
- Using their outputs, we compute a reorder point for a specific SKU‑store.
- In a real system, this would be automated for all SKUs.

---

## **80.9 Lessons Learned from Supply Chain Optimization**

1. **Forecast uncertainty is crucial**: Point forecasts alone are insufficient for inventory decisions. Quantiles or prediction intervals are needed to set safety stock.
2. **Lead time variability matters**: Many companies ignore lead time variability and suffer stockouts. Predicting lead times improves inventory accuracy.
3. **Multi‑echelon effects**: Optimizing each level independently can lead to bullwhip effect. Consider system‑wide optimization.
4. **Data integration is challenging**: Supply chain data often resides in multiple systems (ERP, WMS, TMS). Building a unified data pipeline is a major effort.
5. **Model retraining frequency**: Demand patterns and lead times change; retrain models periodically (e.g., monthly).
6. **Collaboration with planners**: AI recommendations should augment, not replace, human planners. Provide explanations and allow overrides.

---

## **80.10 Future Directions**

- **Reinforcement learning for inventory control**: Learn optimal policies through simulation.
- **End‑to‑end deep learning**: Models that jointly forecast demand and optimize inventory.
- **Incorporating sustainability**: Optimize for carbon footprint as well as cost.
- **Real‑time replenishment**: Using IoT and point‑of‑sale data to trigger immediate orders.
- **Resilience planning**: Simulating disruptions (e.g., port closures) to build robust supply chains.

---

## **Chapter Summary**

In this chapter, we built a supply chain optimization system that integrates demand forecasting, lead time prediction, and inventory optimization. We generated synthetic data, engineered features, trained predictive models, and demonstrated how to compute reorder points using forecast uncertainty. We discussed multi‑echelon considerations and deployment. The principles and code patterns extend the time‑series forecasting pipeline developed in earlier chapters to a new, impactful domain.

This concludes Part X on case studies. The next chapters will cover advanced implementation patterns and industry best practices.

---

**End of Chapter 80**