# AI Demand Forecasting and Inventory Optimization
### Interactive Notebook for Learning and Experimentation

This notebook provides an interactive way to explore demand forecasting and inventory optimization.

**Learning Objectives:**
- Understand time series forecasting
- Compare multiple ML models
- Learn inventory optimization concepts
- Visualize business insights

## üì¶ Step 1: Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
from scipy import stats

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print("‚úÖ All libraries imported successfully!")

## üìä Step 2: Load and Explore the Dataset

In [None]:
# Load the dataset
df = pd.read_csv('retail_sales_data.csv')
df['date'] = pd.to_datetime(df['date'])

print(f"Dataset shape: {df.shape}")
print(f"\nColumns: {df.columns.tolist()}")
print(f"\nDate range: {df['date'].min()} to {df['date'].max()}")
print(f"\nProducts: {df['product'].unique()}")

df.head(10)

## üìà Step 3: Exploratory Data Analysis

In [None]:
# Summary statistics by product
print("Demand Statistics by Product:")
df.groupby('product')['demand'].agg(['mean', 'std', 'min', 'max', 'count'])

In [None]:
# Visualize demand over time for all products
fig, axes = plt.subplots(3, 2, figsize=(15, 12))
axes = axes.flatten()

for idx, product in enumerate(df['product'].unique()):
    product_data = df[df['product'] == product]
    axes[idx].plot(product_data['date'], product_data['demand'], alpha=0.7)
    axes[idx].set_title(f'{product} - Demand Over Time')
    axes[idx].set_xlabel('Date')
    axes[idx].set_ylabel('Demand')
    axes[idx].grid(True, alpha=0.3)
    axes[idx].tick_params(axis='x', rotation=45)

axes[-1].axis('off')  # Hide the last subplot
plt.tight_layout()
plt.show()

In [None]:
# Analyze patterns
print("üìä Demand Patterns Analysis\n")

# Weekend vs Weekday
print("Weekend vs Weekday Demand:")
print(df.groupby('is_weekend')['demand'].mean())

print("\nPromotion Impact:")
print(df.groupby('is_promotion')['demand'].mean())

print("\nMonthly Demand Pattern:")
print(df.groupby('month')['demand'].mean().sort_values(ascending=False))

## ü§ñ Step 4: Feature Engineering

Create lag features and rolling statistics for better forecasting.

In [None]:
def create_features(df, product_name):
    """Create time series features"""
    product_df = df[df['product'] == product_name].copy()
    product_df = product_df.sort_values('date').reset_index(drop=True)
    
    # Lag features
    for lag in [1, 7, 14, 30]:
        product_df[f'lag_{lag}'] = product_df['demand'].shift(lag)
    
    # Rolling statistics
    for window in [7, 14, 30]:
        product_df[f'rolling_mean_{window}'] = product_df['demand'].rolling(window=window).mean()
        product_df[f'rolling_std_{window}'] = product_df['demand'].rolling(window=window).std()
    
    # Date features
    product_df['year'] = product_df['date'].dt.year
    product_df['day_of_year'] = product_df['date'].dt.dayofyear
    
    # Drop NaN values
    product_df = product_df.dropna()
    
    return product_df

# Try it on Product_A
product_df = create_features(df, 'Product_A')
print(f"Features created! Shape: {product_df.shape}")
print(f"\nFeature columns:")
print(product_df.columns.tolist())

## üéØ Step 5: Train Forecasting Models

**Try This:** Change the product name below to analyze different products!

In [None]:
# Choose a product to analyze
PRODUCT = 'Product_A'  # üëà CHANGE THIS TO TRY DIFFERENT PRODUCTS

# Prepare data
product_df = create_features(df, PRODUCT)
feature_cols = [col for col in product_df.columns 
               if col not in ['date', 'product', 'demand', 'price']]

X = product_df[feature_cols]
y = product_df['demand']

# Train-test split (80-20)
split_idx = int(len(X) * 0.8)
X_train, X_test = X[:split_idx], X[split_idx:]
y_train, y_test = y[:split_idx], y[split_idx:]
dates_test = product_df['date'].iloc[split_idx:].values

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"\n‚úÖ Data prepared for training!")

In [None]:
# Train models
models = {
    'Random Forest': RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42),
    'Gradient Boosting': GradientBoostingRegressor(n_estimators=100, max_depth=5, random_state=42),
    'Linear Regression': LinearRegression()
}

results = {}

for model_name, model in models.items():
    print(f"\nTraining {model_name}...")
    model.fit(X_train_scaled, y_train)
    y_pred = model.predict(X_test_scaled)
    
    # Calculate metrics
    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    r2 = r2_score(y_test, y_pred)
    mape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100
    
    results[model_name] = {
        'predictions': y_pred,
        'MAE': mae,
        'RMSE': rmse,
        'R2': r2,
        'MAPE': mape
    }
    
    print(f"  MAE: {mae:.2f} | RMSE: {rmse:.2f} | R¬≤: {r2:.4f} | MAPE: {mape:.2f}%")

print("\n‚úÖ All models trained!")

## üìä Step 6: Compare Model Performance

In [None]:
# Create comparison dataframe
comparison_df = pd.DataFrame({
    'Model': list(results.keys()),
    'MAE': [results[m]['MAE'] for m in results.keys()],
    'RMSE': [results[m]['RMSE'] for m in results.keys()],
    'R¬≤': [results[m]['R2'] for m in results.keys()],
    'MAPE': [results[m]['MAPE'] for m in results.keys()]
})

print("\nüèÜ Model Performance Comparison:")
print(comparison_df.sort_values('MAPE'))

# Best model
best_model = comparison_df.loc[comparison_df['MAPE'].idxmin(), 'Model']
print(f"\n‚≠ê Best Model: {best_model}")

In [None]:
# Visualize model comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# MAPE comparison
axes[0].bar(comparison_df['Model'], comparison_df['MAPE'], color=['purple', 'teal', 'gold'])
axes[0].set_ylabel('MAPE (%)')
axes[0].set_title('Model Accuracy Comparison (Lower is Better)')
axes[0].tick_params(axis='x', rotation=45)
axes[0].grid(True, alpha=0.3, axis='y')

# R¬≤ comparison
axes[1].bar(comparison_df['Model'], comparison_df['R¬≤'], color=['purple', 'teal', 'gold'])
axes[1].set_ylabel('R¬≤ Score')
axes[1].set_title('Model Fit Comparison (Higher is Better)')
axes[1].tick_params(axis='x', rotation=45)
axes[1].set_ylim([0, 1])
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

## üé® Step 7: Visualize Forecasts

In [None]:
# Plot actual vs predicted for best model
best_predictions = results[best_model]['predictions']

plt.figure(figsize=(15, 6))
plt.plot(dates_test, y_test.values, label='Actual Demand', 
         linewidth=2, marker='o', markersize=4, alpha=0.7)
plt.plot(dates_test, best_predictions, label=f'{best_model} Forecast', 
         linewidth=2, marker='s', markersize=4, alpha=0.7)

plt.xlabel('Date')
plt.ylabel('Demand')
plt.title(f'{PRODUCT}: Actual vs Predicted Demand ({best_model})')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# Calculate accuracy
accuracy = 100 - results[best_model]['MAPE']
print(f"\n‚úÖ Forecast Accuracy: {accuracy:.2f}%")

## üì¶ Step 8: Inventory Optimization

Calculate optimal inventory levels based on forecasts.

**Try This:** Adjust the service level below (0.90 = 90%, 0.95 = 95%, 0.99 = 99%)

In [None]:
# Configuration
LEAD_TIME = 7  # days üëà CHANGE THIS
SERVICE_LEVEL = 0.95  # 95% üëà CHANGE THIS

# Calculate forecast error
errors = y_test.values - best_predictions
error_std = np.std(errors)

# Average demand during lead time
avg_demand_lead_time = np.mean(best_predictions) * LEAD_TIME

# Safety stock
z_score = stats.norm.ppf(SERVICE_LEVEL)
safety_stock = z_score * error_std * np.sqrt(LEAD_TIME)

# Reorder point
reorder_point = avg_demand_lead_time + safety_stock

# Economic Order Quantity
annual_demand = np.sum(best_predictions) * (365 / len(best_predictions))
holding_cost = 2  # $ per unit per year
order_cost = 50  # $ per order
eoq = np.sqrt((2 * annual_demand * order_cost) / holding_cost)

print("\nüì¶ INVENTORY OPTIMIZATION RESULTS")
print("="*50)
print(f"Average Daily Demand: {np.mean(best_predictions):.0f} units")
print(f"Safety Stock: {safety_stock:.0f} units")
print(f"Reorder Point: {reorder_point:.0f} units")
print(f"Economic Order Quantity: {eoq:.0f} units")
print(f"Service Level: {SERVICE_LEVEL*100:.0f}%")
print(f"Stockout Risk: {(1-SERVICE_LEVEL)*100:.1f}%")
print("="*50)

In [None]:
# Visualize inventory policy
metrics = ['Avg Daily\nDemand', 'Safety\nStock', 'Reorder\nPoint', 'EOQ']
values = [np.mean(best_predictions), safety_stock, reorder_point, eoq]

plt.figure(figsize=(10, 6))
bars = plt.bar(metrics, values, color=['#3498db', '#e74c3c', '#f39c12', '#2ecc71'], 
               edgecolor='black', linewidth=2)
plt.ylabel('Units')
plt.title(f'Inventory Policy for {PRODUCT}')
plt.grid(True, alpha=0.3, axis='y')

# Add value labels
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
            f'{height:.0f}', ha='center', va='bottom', fontweight='bold', fontsize=12)

plt.tight_layout()
plt.show()

## üí° Step 9: Business Recommendations

In [None]:
print(f"\nüéØ ACTIONABLE RECOMMENDATIONS FOR {PRODUCT}")
print("="*60)
print(f"\n1. FORECASTING:")
print(f"   ‚Ä¢ Use {best_model} for demand prediction")
print(f"   ‚Ä¢ Expected forecast accuracy: {100-results[best_model]['MAPE']:.1f}%")
print(f"   ‚Ä¢ Update forecasts weekly for best results")

print(f"\n2. INVENTORY POLICY:")
print(f"   ‚Ä¢ Maintain {safety_stock:.0f} units as safety stock")
print(f"   ‚Ä¢ Place order when inventory reaches {reorder_point:.0f} units")
print(f"   ‚Ä¢ Order {eoq:.0f} units each time")

print(f"\n3. EXPECTED OUTCOMES:")
print(f"   ‚Ä¢ {SERVICE_LEVEL*100:.0f}% order fulfillment rate")
print(f"   ‚Ä¢ Only {(1-SERVICE_LEVEL)*100:.1f}% stockout risk")
print(f"   ‚Ä¢ Optimized inventory holding costs")

print(f"\n4. COST IMPLICATIONS:")
holding_cost_total = safety_stock * holding_cost
print(f"   ‚Ä¢ Annual safety stock holding cost: ${holding_cost_total:.2f}")
print(f"   ‚Ä¢ Orders per year: {annual_demand/eoq:.0f}")
print(f"   ‚Ä¢ Annual ordering cost: ${(annual_demand/eoq)*order_cost:.2f}")

print("\n" + "="*60)

## üéì Learning Exercises

Try these experiments:

### Exercise 1: Compare All Products
Change `PRODUCT` variable and run analysis for each product. Which has:
- Best forecast accuracy?
- Highest safety stock requirement?
- Most stable demand?

### Exercise 2: Service Level Impact
Try different service levels (0.90, 0.95, 0.99):
- How does safety stock change?
- What's the cost vs. risk tradeoff?

### Exercise 3: Lead Time Sensitivity
Change `LEAD_TIME` (3, 7, 14 days):
- How does it affect reorder point?
- Impact on safety stock?

### Exercise 4: Feature Importance
Which features matter most for prediction?
```python
# For Random Forest
rf_model = models['Random Forest']
feature_importance = pd.DataFrame({
    'feature': feature_cols,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)
print(feature_importance)
```

### Exercise 5: Seasonal Analysis
Plot monthly demand patterns:
```python
monthly_avg = df[df['product']==PRODUCT].groupby('month')['demand'].mean()
monthly_avg.plot(kind='bar')
plt.title('Average Demand by Month')
plt.show()
```

## üöÄ Next Steps

To take this project further:

1. **Real Data**: Try with your own sales data
2. **More Models**: Add XGBoost, LSTM, Prophet
3. **Dashboard**: Build with Streamlit or Dash
4. **Automation**: Schedule daily forecasts
5. **Multi-location**: Optimize across warehouses
6. **Price Optimization**: Link pricing to demand
7. **API**: Create REST API for predictions

---

**Happy Learning! üéâ**