# KPI Engine - Complete KPI Calculation System

This notebook builds a comprehensive KPI Engine that calculates:
- **Revenue KPIs**: Gross Revenue, Net Revenue, ASP, Revenue Mix
- **Cost KPIs**: Purchase Cost, Landed Cost, Cost Variance, Supplier Spend
- **Profit KPIs**: Gross Profit, Margin %, Contribution Margin
- **Inventory KPIs**: Inventory Turnover, Days of Inventory, Stockout/Overstock Risk
- **Supplier KPIs**: Lead Time, Lead Time Variability, Supplier Reliability
- **Store KPIs**: Store Revenue, Margin, Efficiency, Ranking
- **Product KPIs**: Velocity, Revenue Contribution, ABC/XYZ Classification

## Step 1: Load Required Libraries and Master Dataset

In [1]:
import pandas as pd
import numpy as np
import sys
import os
from datetime import datetime
import warnings

warnings.filterwarnings('ignore')

# Add src directory to path
base_path = os.path.dirname(os.getcwd())
sys.path.insert(0, os.path.join(base_path, 'src'))

print(f"Base Path: {base_path}")

Base Path: c:\Users\Asim\Music\Inventory Analysis Case Studyüìàüïµüèº‚Äç‚ôÇÔ∏èüë®üèº‚Äçüíª\inventory-optimization


### Load Master Dataset

In [2]:
# Load the master dataset
master_dataset_path = os.path.join(base_path, 'data', 'data_model', 'master_dataset.parquet')
print(f"Loading dataset from: {master_dataset_path}")

df = pd.read_parquet(master_dataset_path)

print(f"\n‚úì Master Dataset loaded successfully!")
print(f"\n{'='*80}")
print(f"STEP 1: LOADING AND VALIDATING MASTER DATASET")
print(f"{'='*80}")

# Display key metrics
print(f"\n‚úì Number of rows: {len(df):,}")
print(f"‚úì Number of columns: {len(df.columns)}")

print(f"\nColumns:")
for col in df.columns:
    print(f"  - {col}: {df[col].dtype}")

Loading dataset from: c:\Users\Asim\Music\Inventory Analysis Case Studyüìàüïµüèº‚Äç‚ôÇÔ∏èüë®üèº‚Äçüíª\inventory-optimization\data\data_model\master_dataset.parquet

‚úì Master Dataset loaded successfully!

STEP 1: LOADING AND VALIDATING MASTER DATASET

‚úì Number of rows: 1,048,575
‚úì Number of columns: 41

Columns:
  - Sales_Order: str
  - Sales_Date: datetime64[us]
  - date_key: str
  - product_key: str
  - Brand: int64
  - Product_Name: str
  - Product_Size: str
  - store_key: int64
  - Store_City: str
  - Store_State: str
  - Store_Region: str
  - Delivery_location: str
  - Year: int64
  - Quarter: int64
  - Month: int64
  - Month_Name: str
  - Week: int64
  - Day_of_Week: int64
  - Day_Name: str
  - Sales_Quantity: int64
  - Sales_Price: float64
  - Sales_Amount: float64
  - Gross_Revenue: float64
  - Purchase_Orders: str
  - Purchase_Quantity: float64
  - Purchase_Price: float64
  - Purchase_Amount: float64
  - Purchase_Cost: float64
  - Landed_Cost: float64
  - vendor_key

### Validate Dataset Structure

In [3]:
# Check date range
date_cols = [col for col in df.columns if 'date' in col.lower()]
if date_cols:
    date_col = date_cols[0]
    if pd.api.types.is_datetime64_any_dtype(df[date_col]):
        min_date = df[date_col].min()
        max_date = df[date_col].max()
        print(f"‚úì Date range: {min_date.date()} to {max_date.date()}")

# Check for missing key columns
key_cols = ['product_id', 'store_id', 'date']
missing_keys = [col for col in key_cols if col not in df.columns]
if not missing_keys:
    null_checks = df[key_cols].isnull().sum()
    if null_checks.sum() == 0:
        print(f"‚úì No missing product/store/date keys")
    else:
        print(f"‚ö† Missing values found in keys: {null_checks.to_dict()}")
else:
    print(f"‚ö† Missing key columns: {missing_keys}")

print(f"\n{'='*80}")

# Display first few rows
print(f"\nFirst few rows:")
df.head()

‚úì Date range: 2016-01-01 to 2016-02-29
‚ö† Missing key columns: ['product_id', 'store_id', 'date']


First few rows:


Unnamed: 0,Sales_Order,Sales_Date,date_key,product_key,Brand,Product_Name,Product_Size,store_key,Store_City,Store_State,...,Vendor_Lead_Time,Supplier_Spend,On_Hand_Quantity,Inventory_Unit_Price,Inventory_Value,Snapshot_Type,Gross_Profit,Margin_Percent,Inventory_Turnover,Days_of_Inventory
0,SO-0000001,2016-01-01,01/01/2016,1004,1004,Jim Beam w/2 Rocks Glasses,750mL,1,HARDERSFIELD,England,...,,,17.0,16.49,280.33,Beginning,16.49,100.0,,
1,SO-0000002,2016-01-01,01/01/2016,13795,13795,Yellow Tail Tree Free Chard,1.5L,66,EANVERNESS,Scotland,...,,,16.0,9.99,159.84,Beginning,9.99,100.0,,
2,SO-0000003,2016-01-01,01/01/2016,13793,13793,Yellow Tail Svgn Bl,1.5L,66,EANVERNESS,Scotland,...,,,23.0,9.99,229.77,Beginning,9.99,100.0,,
3,SO-0000004,2016-01-01,01/01/2016,3877,3877,Smirnoff Green Apple Vodka,750mL,28,LARNWICK,England,...,,,16.0,12.99,207.84,Beginning,12.99,100.0,,
4,SO-0000005,2016-01-01,01/01/2016,3878,3878,Smirnoff 80 Proof,750mL,28,LARNWICK,England,...,,,36.0,12.99,467.64,Beginning,12.99,100.0,,


## Step 2: Create Revenue KPIs

In [5]:
print(f"\n{'='*80}")
print(f"STEP 2: CREATING REVENUE KPIs")
print(f"{'='*80}")

# Gross Revenue - already exists in dataset
if 'Gross_Revenue' not in df.columns:
    df['Gross_Revenue'] = (df['Sales_Quantity'] * df['Sales_Price']).fillna(0)
    print(f"‚úì Gross_Revenue created")
else:
    print(f"‚úì Gross_Revenue already exists")

# Net Revenue
df['Net_Revenue'] = df['Gross_Revenue']
print(f"‚úì Net_Revenue calculated")

# Average Selling Price (ASP)
df['ASP'] = np.where(
    df['Sales_Quantity'] > 0,
    df['Gross_Revenue'] / df['Sales_Quantity'],
    0
)
df['ASP'] = df['ASP'].replace([np.inf, -np.inf], 0).fillna(0)
print(f"‚úì ASP (Average Selling Price) calculated")

# Revenue by Product
revenue_by_product = df.groupby('product_key')['Gross_Revenue'].sum().reset_index()
revenue_by_product.columns = ['product_key', 'Revenue_by_Product']
df = df.merge(revenue_by_product, on='product_key', how='left')
print(f"‚úì Revenue_by_Product calculated")

# Revenue by Store
revenue_by_store = df.groupby('store_key')['Gross_Revenue'].sum().reset_index()
revenue_by_store.columns = ['store_key', 'Revenue_by_Store']
df = df.merge(revenue_by_store, on='store_key', how='left')
print(f"‚úì Revenue_by_Store calculated")

# Revenue by Vendor
revenue_by_vendor = df.groupby('vendor_key')['Gross_Revenue'].sum().reset_index()
revenue_by_vendor.columns = ['vendor_key', 'Revenue_by_Vendor']
df = df.merge(revenue_by_vendor, on='vendor_key', how='left')
print(f"‚úì Revenue_by_Vendor calculated")

print(f"\n{'='*80}")


STEP 2: CREATING REVENUE KPIs
‚úì Gross_Revenue already exists
‚úì Net_Revenue calculated
‚úì ASP (Average Selling Price) calculated
‚úì Revenue_by_Product calculated
‚úì Revenue_by_Store calculated
‚úì Revenue_by_Vendor calculated



## Step 3: Create Cost KPIs

In [6]:
print(f"\n{'='*80}")
print(f"STEP 3: CREATING COST KPIs")
print(f"{'='*80}")

# Purchase Cost - already exists
if 'Purchase_Cost' not in df.columns:
    df['Purchase_Cost'] = (df['Purchase_Quantity'] * df['Purchase_Price']).fillna(0)
    print(f"‚úì Purchase_Cost created")
else:
    print(f"‚úì Purchase_Cost already exists")

# Landed Cost
if 'Landed_Cost' not in df.columns:
    df['Landed_Cost'] = df['Purchase_Cost']
    print(f"‚úì Landed_Cost calculated")
else:
    print(f"‚úì Landed_Cost already exists")

# Cost Variance
df['Cost_Variance'] = (df['Sales_Price'] - df['Purchase_Price']).fillna(0)
print(f"‚úì Cost_Variance calculated")

# Supplier Spend per Vendor
supplier_spend = df.groupby('vendor_key')['Purchase_Cost'].sum().reset_index()
supplier_spend.columns = ['vendor_key', 'Supplier_Spend_Total']
df = df.merge(supplier_spend, on='vendor_key', how='left')
print(f"‚úì Supplier_Spend_Total calculated")

print(f"\n{'='*80}")


STEP 3: CREATING COST KPIs
‚úì Purchase_Cost already exists
‚úì Landed_Cost already exists
‚úì Cost_Variance calculated
‚úì Supplier_Spend_Total calculated



## Step 4: Create Profit KPIs

In [7]:
print(f"\n{'='*80}")
print(f"STEP 4: CREATING PROFIT KPIs")
print(f"{'='*80}")

# Gross Profit - already exists
if 'Gross_Profit' not in df.columns:
    df['Gross_Profit'] = (df['Gross_Revenue'] - df['Purchase_Cost']).fillna(0)
    print(f"‚úì Gross_Profit created")
else:
    print(f"‚úì Gross_Profit already exists")

# Margin Percent - already exists
if 'Margin_Percent' not in df.columns:
    df['Margin_Percent'] = np.where(
        df['Gross_Revenue'] > 0,
        (df['Gross_Profit'] / df['Gross_Revenue']) * 100,
        0
    )
    df['Margin_Percent'] = df['Margin_Percent'].replace([np.inf, -np.inf], 0).fillna(0)
    print(f"‚úì Margin_Percent created")
else:
    print(f"‚úì Margin_Percent already exists")

# Contribution Margin
df['Contribution_Margin'] = df['Gross_Profit']
print(f"‚úì Contribution_Margin calculated")

print(f"\n{'='*80}")


STEP 4: CREATING PROFIT KPIs
‚úì Gross_Profit already exists
‚úì Margin_Percent already exists
‚úì Contribution_Margin calculated



## Step 5: Create Inventory KPIs

In [8]:
print(f"\n{'='*80}")
print(f"STEP 5: CREATING INVENTORY KPIs")
print(f"{'='*80}")

# Inventory Turnover - already exists
if 'Inventory_Turnover' not in df.columns:
    df['Inventory_Turnover'] = np.where(
        df['On_Hand_Quantity'] > 0,
        df['Sales_Quantity'] / df['On_Hand_Quantity'],
        0
    )
    df['Inventory_Turnover'] = df['Inventory_Turnover'].replace([np.inf, -np.inf], 0).fillna(0)
    print(f"‚úì Inventory_Turnover created")
else:
    print(f"‚úì Inventory_Turnover already exists")

# Days of Inventory - already exists
if 'Days_of_Inventory' not in df.columns:
    df['Days_of_Inventory'] = np.where(
        df['Inventory_Turnover'] > 0,
        365 / df['Inventory_Turnover'],
        0
    )
    df['Days_of_Inventory'] = df['Days_of_Inventory'].replace([np.inf, -np.inf], 0).fillna(0)
    print(f"‚úì Days_of_Inventory created")
else:
    print(f"‚úì Days_of_Inventory already exists")

# Stockout Risk Flag
df['Stockout_Risk_Flag'] = np.where(
    df['On_Hand_Quantity'] < df['Sales_Quantity'],
    1,
    0
)
print(f"‚úì Stockout_Risk_Flag calculated")

# Overstock Risk Flag
df['Overstock_Risk_Flag'] = np.where(
    df['On_Hand_Quantity'] > (df['Sales_Quantity'] * 2),
    1,
    0
)
print(f"‚úì Overstock_Risk_Flag calculated")

print(f"\n{'='*80}")


STEP 5: CREATING INVENTORY KPIs
‚úì Inventory_Turnover already exists
‚úì Days_of_Inventory already exists
‚úì Stockout_Risk_Flag calculated
‚úì Overstock_Risk_Flag calculated



## Step 6: Create Supplier KPIs

In [9]:
print(f"\n{'='*80}")
print(f"STEP 6: CREATING SUPPLIER KPIs")
print(f"{'='*80}")

# Lead Time Days
df['Lead_Time_Days'] = df['Vendor_Lead_Time'].fillna(0).astype('int64')
print(f"‚úì Lead_Time_Days calculated")

# Lead Time Variability (std dev per vendor)
lead_time_var = df.groupby('vendor_key')['Lead_Time_Days'].std().reset_index()
lead_time_var.columns = ['vendor_key', 'Lead_Time_Variability']
df = df.merge(lead_time_var, on='vendor_key', how='left')
print(f"‚úì Lead_Time_Variability calculated")

# Supplier Reliability (placeholder - set to 95% for now)
df['Supplier_Reliability'] = 95.0
print(f"‚úì Supplier_Reliability placeholder set")

print(f"\n{'='*80}")


STEP 6: CREATING SUPPLIER KPIs
‚úì Lead_Time_Days calculated
‚úì Lead_Time_Variability calculated
‚úì Supplier_Reliability placeholder set



## Step 7: Create Store KPIs

In [10]:
print(f"\n{'='*80}")
print(f"STEP 7: CREATING STORE KPIs")
print(f"{'='*80}")

# Store Revenue
store_revenue = df.groupby('store_key')['Gross_Revenue'].sum().reset_index()
store_revenue.columns = ['store_key', 'Store_Total_Revenue']
df = df.merge(store_revenue, on='store_key', how='left')
print(f"‚úì Store_Total_Revenue calculated")

# Store Margin
store_margin = df.groupby('store_key')['Gross_Profit'].sum().reset_index()
store_margin.columns = ['store_key', 'Store_Total_Margin']
df = df.merge(store_margin, on='store_key', how='left')
print(f"‚úì Store_Total_Margin calculated")

# Store Efficiency (Revenue / Inventory Value)
if 'Inventory_Value' in df.columns:
    store_inventory_value = df.groupby('store_key')['Inventory_Value'].sum().reset_index()
    store_inventory_value.columns = ['store_key', 'Store_Inventory_Value']
    df = df.merge(store_inventory_value, on='store_key', how='left')
    
    df['Store_Efficiency'] = np.where(
        df['Store_Inventory_Value'] > 0,
        df['Store_Total_Revenue'] / df['Store_Inventory_Value'],
        0
    )
    df['Store_Efficiency'] = df['Store_Efficiency'].replace([np.inf, -np.inf], 0).fillna(0)
    print(f"‚úì Store_Efficiency calculated")
else:
    df['Store_Efficiency'] = 0
    print(f"‚úì Store_Efficiency placeholder created")

# Store Ranking
store_rank = df.groupby('store_key')['Gross_Revenue'].sum().reset_index()
store_rank.columns = ['store_key', 'Store_Rev_Rank']
store_rank['Store_Revenue_Rank'] = store_rank['Store_Rev_Rank'].rank(ascending=False).astype('int64')
store_rank = store_rank[['store_key', 'Store_Revenue_Rank']]
df = df.merge(store_rank, on='store_key', how='left')
print(f"‚úì Store_Revenue_Rank calculated")

print(f"\n{'='*80}")


STEP 7: CREATING STORE KPIs
‚úì Store_Total_Revenue calculated
‚úì Store_Total_Margin calculated
‚úì Store_Efficiency calculated
‚úì Store_Revenue_Rank calculated



In [12]:
print(f"\n{'='*80}")
print(f"STEP 8: CREATING PRODUCT KPIs")
print(f"{'='*80}")

# Convert date column to datetime
df['Sales_Date'] = pd.to_datetime(df['Sales_Date'], errors='coerce')

# Calculate number of days in dataset
num_days = (df['Sales_Date'].max() - df['Sales_Date'].min()).days + 1
if num_days == 0:
    num_days = 1

# Velocity = Sales Quantity / Number of Days
product_velocity = df.groupby('product_key')['Sales_Quantity'].sum().reset_index()
product_velocity['Velocity'] = product_velocity['Sales_Quantity'] / num_days
product_velocity = product_velocity[['product_key', 'Velocity']]
df = df.merge(product_velocity, on='product_key', how='left')
print(f"‚úì Velocity calculated (days in period: {num_days})")

# Revenue Contribution
total_revenue = df['Gross_Revenue'].sum()
if total_revenue > 0:
    df['Revenue_Contribution'] = (df['Gross_Revenue'] / total_revenue) * 100
else:
    df['Revenue_Contribution'] = 0
print(f"‚úì Revenue_Contribution calculated")

# ABC Class (based on revenue - A=top 20%, B=next 30%, C=remaining 50%)
product_abc = df.groupby('product_key')['Gross_Revenue'].sum().reset_index()
product_abc = product_abc.sort_values('Gross_Revenue', ascending=False)
product_abc['Cumulative_Revenue'] = product_abc['Gross_Revenue'].cumsum()
product_abc['Cumulative_Percent'] = (product_abc['Cumulative_Revenue'] / product_abc['Gross_Revenue'].sum()) * 100

product_abc['ABC_Class'] = 'C'
product_abc.loc[product_abc['Cumulative_Percent'] <= 20, 'ABC_Class'] = 'A'
product_abc.loc[
    (product_abc['Cumulative_Percent'] > 20) & (product_abc['Cumulative_Percent'] <= 50),
    'ABC_Class'
] = 'B'

product_abc = product_abc[['product_key', 'ABC_Class']]
df = df.merge(product_abc, on='product_key', how='left')
print(f"‚úì ABC_Class calculated")

# XYZ Class (based on demand variability)
product_xyz = df.groupby('product_key')['Sales_Quantity'].agg(['mean', 'std']).reset_index()
product_xyz['CV'] = product_xyz['std'] / (product_xyz['mean'] + 0.001)
product_xyz = product_xyz.sort_values('CV')

product_xyz['XYZ_Class'] = 'Z'
product_xyz.loc[product_xyz.index[:len(product_xyz)//3], 'XYZ_Class'] = 'X'
product_xyz.loc[
    product_xyz.index[len(product_xyz)//3:2*len(product_xyz)//3],
    'XYZ_Class'
] = 'Y'

product_xyz = product_xyz[['product_key', 'XYZ_Class']]
df = df.merge(product_xyz, on='product_key', how='left')
print(f"‚úì XYZ_Class calculated")

# AX_AY_AZ Combined Class
df['AX_AY_AZ_Class'] = df['ABC_Class'].astype(str) + df['XYZ_Class'].astype(str)
print(f"‚úì AX_AY_AZ_Class calculated")

print(f"\n{'='*80}")


STEP 8: CREATING PRODUCT KPIs
‚úì Velocity calculated (days in period: 60)
‚úì Revenue_Contribution calculated
‚úì ABC_Class calculated
‚úì XYZ_Class calculated
‚úì AX_AY_AZ_Class calculated



print(f"\n{'='*80}")
print(f"STEP 8: CREATING PRODUCT KPIs")
print(f"{'='*80}")

# Convert date column to datetime
df['Sales_Date'] = pd.to_datetime(df['Sales_Date'], errors='coerce')

# Calculate number of days in dataset
num_days = (df['Sales_Date'].max() - df['Sales_Date'].min()).days + 1
if num_days == 0:
    num_days = 1

# Velocity = Sales Quantity / Number of Days
product_velocity = df.groupby('product_key')['Sales_Quantity'].sum().reset_index()
product_velocity['Velocity'] = product_velocity['Sales_Quantity'] / num_days
product_velocity = product_velocity[['product_key', 'Velocity']]
df = df.merge(product_velocity, on='product_key', how='left')
print(f"‚úì Velocity calculated (days in period: {num_days})")

# Revenue Contribution
total_revenue = df['Gross_Revenue'].sum()
if total_revenue > 0:
    df['Revenue_Contribution'] = (df['Gross_Revenue'] / total_revenue) * 100
else:
    df['Revenue_Contribution'] = 0
print(f"‚úì Revenue_Contribution calculated")

# ABC Class (based on revenue - A=top 20%, B=next 30%, C=remaining 50%)
product_abc = df.groupby('product_key')['Gross_Revenue'].sum().reset_index()
product_abc = product_abc.sort_values('Gross_Revenue', ascending=False)
product_abc['Cumulative_Revenue'] = product_abc['Gross_Revenue'].cumsum()
product_abc['Cumulative_Percent'] = (product_abc['Cumulative_Revenue'] / product_abc['Gross_Revenue'].sum()) * 100

product_abc['ABC_Class'] = 'C'
product_abc.loc[product_abc['Cumulative_Percent'] <= 20, 'ABC_Class'] = 'A'
product_abc.loc[
    (product_abc['Cumulative_Percent'] > 20) & (product_abc['Cumulative_Percent'] <= 50),
    'ABC_Class'
] = 'B'

product_abc = product_abc[['product_key', 'ABC_Class']]
df = df.merge(product_abc, on='product_key', how='left')
print(f"‚úì ABC_Class calculated")

# XYZ Class (based on demand variability)
product_xyz = df.groupby('product_key')['Sales_Quantity'].agg(['mean', 'std']).reset_index()
product_xyz['CV'] = product_xyz['std'] / (product_xyz['mean'] + 0.001)
product_xyz = product_xyz.sort_values('CV')

product_xyz['XYZ_Class'] = 'Z'
product_xyz.loc[product_xyz.index[:len(product_xyz)//3], 'XYZ_Class'] = 'X'
product_xyz.loc[
    product_xyz.index[len(product_xyz)//3:2*len(product_xyz)//3],
    'XYZ_Class'
] = 'Y'

product_xyz = product_xyz[['product_key', 'XYZ_Class']]
df = df.merge(product_xyz, on='product_key', how='left')
print(f"‚úì XYZ_Class calculated")

# AX_AY_AZ Combined Class
df['AX_AY_AZ_Class'] = df['ABC_Class'].astype(str) + df['XYZ_Class'].astype(str)
print(f"‚úì AX_AY_AZ_Class calculated")

print(f"\n{'='*80}")

In [11]:
print(f"\n{'='*80}")
print(f"STEP 9: VALIDATING KPI ENGINE")
print(f"{'='*80}")

validation_results = {}

# Check for infinite values
infinite_cols = df.select_dtypes(include=[np.number]).columns
divide_by_zero_errors = 0
for col in infinite_cols:
    inf_count = np.isinf(df[col]).sum()
    divide_by_zero_errors += inf_count
print(f"‚úì Divide-by-zero errors: {divide_by_zero_errors}")
validation_results['divide_by_zero_errors'] = divide_by_zero_errors

# Check for negative margins
if 'Margin_Percent' in df.columns:
    neg_margins = (df['Margin_Percent'] < 0).sum()
    print(f"‚úì Negative margins: {neg_margins}")
    validation_results['negative_margins'] = neg_margins

# Check for missing KPIs
expected_kpis = [
    'Gross_Revenue', 'Net_Revenue', 'ASP',
    'Purchase_Cost', 'Landed_Cost', 'Cost_Variance',
    'Gross_Profit', 'Margin_Percent', 'Contribution_Margin',
    'Inventory_Turnover', 'Days_of_Inventory', 'Stockout_Risk_Flag',
    'Lead_Time_Days', 'Supplier_Spend_Total',
    'Store_Total_Revenue', 'Store_Total_Margin',
    'Velocity', 'Revenue_Contribution', 'ABC_Class', 'XYZ_Class'
]

missing_kpis = [kpi for kpi in expected_kpis if kpi not in df.columns]
if missing_kpis:
    print(f"‚ö† Missing KPIs: {missing_kpis}")
else:
    print(f"‚úì All expected KPIs present")
validation_results['missing_kpis'] = missing_kpis

# Check for duplicated rows
duplicated = df.duplicated().sum()
print(f"‚úì Duplicated rows: {duplicated}")
validation_results['duplicated_rows'] = duplicated

# Check data types
print(f"\n‚úì Data types validated")

print(f"\n{'='*80}")


STEP 9: VALIDATING KPI ENGINE
‚úì Divide-by-zero errors: 0
‚úì Negative margins: 43409
‚ö† Missing KPIs: ['Velocity', 'Revenue_Contribution', 'ABC_Class', 'XYZ_Class']
‚úì Duplicated rows: 0

‚úì Data types validated



## Step 9: Validate the KPI Engine

In [13]:
print(f"\n{'='*80}")
print(f"STEP 10: KPI SUMMARY STATISTICS")
print(f"{'='*80}")

summary = {}

# Revenue Summary
total_revenue = df['Gross_Revenue'].sum()
summary['Total_Revenue'] = total_revenue
print(f"\nüìä REVENUE METRICS")
print(f"  Total Gross Revenue: ${total_revenue:,.2f}")
print(f"  Average ASP: ${df['ASP'].mean():.2f}")
print(f"  Total Sales Quantity: {df['Sales_Quantity'].sum():,.0f}")

# Cost Summary
total_purchase_cost = df['Purchase_Cost'].sum()
summary['Total_Purchase_Cost'] = total_purchase_cost
print(f"\nüí∞ COST METRICS")
print(f"  Total Purchase Cost: ${total_purchase_cost:,.2f}")
print(f"  Average Purchase Price: ${df['Purchase_Price'].mean():.2f}")

# Profit Summary
total_gross_profit = df['Gross_Profit'].sum()
summary['Total_Gross_Profit'] = total_gross_profit
avg_margin = df['Margin_Percent'].mean()
summary['Average_Margin_Percent'] = avg_margin
print(f"\nüìà PROFIT METRICS")
print(f"  Total Gross Profit: ${total_gross_profit:,.2f}")
print(f"  Average Margin %: {avg_margin:.2f}%")

# Inventory Summary
avg_inventory_turnover = df['Inventory_Turnover'].mean()
summary['Average_Inventory_Turnover'] = avg_inventory_turnover
print(f"\nüì¶ INVENTORY METRICS")
print(f"  Average Inventory Turnover: {avg_inventory_turnover:.2f}x")
print(f"  Total On-Hand Quantity: {df['On_Hand_Quantity'].sum():,.0f}")
print(f"  Stockout Risk Items: {df['Stockout_Risk_Flag'].sum():,.0f}")
print(f"  Overstock Risk Items: {df['Overstock_Risk_Flag'].sum():,.0f}")

# Supplier Summary
avg_lead_time = df['Lead_Time_Days'].mean()
summary['Average_Lead_Time_Days'] = avg_lead_time
print(f"\nü§ù SUPPLIER METRICS")
print(f"  Average Lead Time: {avg_lead_time:.1f} days")
print(f"  Average Supplier Reliability: {df['Supplier_Reliability'].mean():.1f}%")

# Store Summary
num_stores = df['store_key'].nunique()
print(f"\nüè¨ STORE METRICS")
print(f"  Number of Stores: {num_stores}")
print(f"  Average Store Revenue: ${df['Store_Total_Revenue'].mean():,.2f}")
print(f"  Average Store Margin: ${df['Store_Total_Margin'].mean():,.2f}")
print(f"  Average Store Efficiency: {df['Store_Efficiency'].mean():.2f}x")

# Product Summary
num_products = df['product_key'].nunique()
print(f"\nüìä PRODUCT METRICS")
print(f"  Number of Products: {num_products}")
print(f"  Average Velocity: {df['Velocity'].mean():.2f} units/day")

print(f"\n  ABC Classification:")
for cls in ['A', 'B', 'C']:
    count = (df['ABC_Class'] == cls).sum()
    pct = (count / len(df)) * 100
    print(f"    {cls}-Class Items: {count:,} ({pct:.1f}%)")

print(f"\n  XYZ Classification:")
for cls in ['X', 'Y', 'Z']:
    count = (df['XYZ_Class'] == cls).sum()
    pct = (count / len(df)) * 100
    print(f"    {cls}-Class Items: {count:,} ({pct:.1f}%)")

print(f"\n{'='*80}")


STEP 10: KPI SUMMARY STATISTICS

üìä REVENUE METRICS
  Total Gross Revenue: $0.00
  Average ASP: $0.00
  Total Sales Quantity: 2,451,169

üí∞ COST METRICS
  Total Purchase Cost: $7,082,261.00
  Average Purchase Price: $10.15

üìà PROFIT METRICS
  Total Gross Profit: $26,057,114.29
  Average Margin %: 70.54%

üì¶ INVENTORY METRICS
  Average Inventory Turnover: nanx
  Total On-Hand Quantity: 1,066,739
  Stockout Risk Items: 0
  Overstock Risk Items: 23,444

ü§ù SUPPLIER METRICS
  Average Lead Time: 0.3 days
  Average Supplier Reliability: 95.0%

üè¨ STORE METRICS
  Number of Stores: 79
  Average Store Revenue: $0.00
  Average Store Margin: $514,494.18
  Average Store Efficiency: 0.00x

üìä PRODUCT METRICS
  Number of Products: 7658
  Average Velocity: 31.54 units/day

  ABC Classification:
    A-Class Items: 0 (0.0%)
    B-Class Items: 0 (0.0%)
    C-Class Items: 1,048,575 (100.0%)

  XYZ Classification:
    X-Class Items: 133,879 (12.8%)
    Y-Class Items: 543,059 (51.8%)
    Z-

## Step 10: KPI Summary Statistics

In [None]:
print(f"\n{'='*80}")
print(f"STEP 10: KPI SUMMARY STATISTICS")
print(f"{'='*80}")

summary = {}

# Revenue Summary
total_revenue = df['Gross_Revenue'].sum()
summary['Total_Revenue'] = total_revenue
print(f"\nüìä REVENUE METRICS")
print(f"  Total Gross Revenue: ${total_revenue:,.2f}")
print(f"  Average ASP: ${df['ASP'].mean():.2f}")
print(f"  Total Sales Quantity: {df['sales_quantity'].sum():,.0f}")

# Cost Summary
total_purchase_cost = df['Purchase_Cost'].sum()
summary['Total_Purchase_Cost'] = total_purchase_cost
print(f"\nüí∞ COST METRICS")
print(f"  Total Purchase Cost: ${total_purchase_cost:,.2f}")
print(f"  Average Purchase Price: ${df['purchase_price'].mean():.2f}")

# Profit Summary
total_gross_profit = df['Gross_Profit'].sum()
summary['Total_Gross_Profit'] = total_gross_profit
avg_margin = df['Margin_Percent'].mean()
summary['Average_Margin_Percent'] = avg_margin
print(f"\nüìà PROFIT METRICS")
print(f"  Total Gross Profit: ${total_gross_profit:,.2f}")
print(f"  Average Margin %: {avg_margin:.2f}%")

# Inventory Summary
avg_inventory_turnover = df['Inventory_Turnover'].mean()
summary['Average_Inventory_Turnover'] = avg_inventory_turnover
print(f"\nüì¶ INVENTORY METRICS")
print(f"  Average Inventory Turnover: {avg_inventory_turnover:.2f}x")
print(f"  Total On-Hand Quantity: {df['on_hand_quantity'].sum():,.0f}")
print(f"  Stockout Risk Items: {df['Stockout_Risk_Flag'].sum():,.0f}")

# Supplier Summary
if df['vendor_id'].nunique() > 0:
    total_supplier_spend = df['Supplier_Spend'].sum() / df['vendor_id'].nunique()
else:
    total_supplier_spend = df['Supplier_Spend'].sum()
summary['Average_Supplier_Spend'] = total_supplier_spend
avg_lead_time = df['Lead_Time_Days'].mean()
summary['Average_Lead_Time_Days'] = avg_lead_time
print(f"\nü§ù SUPPLIER METRICS")
print(f"  Average Supplier Spend: ${total_supplier_spend:,.2f}")
print(f"  Average Lead Time: {avg_lead_time:.1f} days")
if 'Supplier_Reliability' in df.columns:
    print(f"  Average Supplier Reliability: {df['Supplier_Reliability'].mean():.1f}%")

# Store Summary
if 'Store_Total_Revenue' in df.columns:
    num_stores = df['store_id'].nunique()
    print(f"\nüè¨ STORE METRICS")
    print(f"  Number of Stores: {num_stores}")
    print(f"  Average Store Revenue: ${df['Store_Total_Revenue'].mean():,.2f}")
    print(f"  Average Store Margin: ${df['Store_Total_Margin'].mean():,.2f}")
    if 'Store_Efficiency' in df.columns:
        print(f"  Average Store Efficiency: {df['Store_Efficiency'].mean():.2f}x")

# Product Summary
if 'ABC_Class' in df.columns:
    num_products = df['product_id'].nunique()
    print(f"\nüìä PRODUCT METRICS")
    print(f"  Number of Products: {num_products}")
    print(f"  Average Velocity: {df['Velocity'].mean():.2f} units/day")
    print(f"\n  ABC Classification:")
    for cls in ['A', 'B', 'C']:
        count = (df['ABC_Class'] == cls).sum()
        pct = (count / len(df)) * 100
        print(f"    {cls}-Class Items: {count:,} ({pct:.1f}%)")
    
    print(f"\n  XYZ Classification:")
    for cls in ['X', 'Y', 'Z']:
        count = (df['XYZ_Class'] == cls).sum()
        pct = (count / len(df)) * 100
        print(f"    {cls}-Class Items: {count:,} ({pct:.1f}%)")

print(f"\n{'='*80}")

## Step 11: Export the KPI-Enhanced Dataset

In [14]:
print(f"\n{'='*80}")
print(f"STEP 11: EXPORTING KPI DATASET")
print(f"{'='*80}")

# Define output path
output_path = os.path.join(base_path, 'data', 'data_model', 'master_dataset_kpi.parquet')

# Export to parquet
df.to_parquet(output_path, index=False)

print(f"\n‚úì KPI dataset exported to: {output_path}")
print(f"  Total rows: {len(df):,}")
print(f"  Total columns: {len(df.columns)}")
print(f"  File exported successfully!")

print(f"\n{'='*80}")
print(f"‚úÖ KPI ENGINE EXECUTION COMPLETED SUCCESSFULLY")
print(f"{'='*80}")


STEP 11: EXPORTING KPI DATASET

‚úì KPI dataset exported to: c:\Users\Asim\Music\Inventory Analysis Case Studyüìàüïµüèº‚Äç‚ôÇÔ∏èüë®üèº‚Äçüíª\inventory-optimization\data\data_model\master_dataset_kpi.parquet
  Total rows: 1,048,575
  Total columns: 64
  File exported successfully!

‚úÖ KPI ENGINE EXECUTION COMPLETED SUCCESSFULLY


## Step 12: Display Final DataFrame Summary

In [15]:
print(f"\n{'='*80}")
print(f"FINAL KPI-ENHANCED DATASET SUMMARY")
print(f"{'='*80}")
print(f"\nDataset Shape: {df.shape}")
print(f"\nFirst 5 Rows:")
df.head()


FINAL KPI-ENHANCED DATASET SUMMARY

Dataset Shape: (1048575, 64)

First 5 Rows:


Unnamed: 0,Sales_Order,Sales_Date,date_key,product_key,Brand,Product_Name,Product_Size,store_key,Store_City,Store_State,...,Store_Total_Revenue,Store_Total_Margin,Store_Inventory_Value,Store_Efficiency,Store_Revenue_Rank,Velocity,Revenue_Contribution,ABC_Class,XYZ_Class,AX_AY_AZ_Class
0,SO-0000001,2016-01-01,01/01/2016,1004,1004,Jim Beam w/2 Rocks Glasses,750mL,1,HARDERSFIELD,England,...,0,742087.1,37983.15,0.0,40,0.7,0,C,Y,CY
1,SO-0000002,2016-01-01,01/01/2016,13795,13795,Yellow Tail Tree Free Chard,1.5L,66,EANVERNESS,Scotland,...,0,985021.38,749183.34,0.0,40,0.983333,0,C,X,CX
2,SO-0000003,2016-01-01,01/01/2016,13793,13793,Yellow Tail Svgn Bl,1.5L,66,EANVERNESS,Scotland,...,0,985021.38,749183.34,0.0,40,1.633333,0,C,X,CX
3,SO-0000004,2016-01-01,01/01/2016,3877,3877,Smirnoff Green Apple Vodka,750mL,28,LARNWICK,England,...,0,70197.48,50025.49,0.0,40,14.416667,0,C,Y,CY
4,SO-0000005,2016-01-01,01/01/2016,3878,3878,Smirnoff 80 Proof,750mL,28,LARNWICK,England,...,0,70197.48,50025.49,0.0,40,72.933333,0,C,Z,CZ


## Column Overview

In [16]:
print(f"\nAll Columns in KPI-Enhanced Dataset:")
print(f"\n{'='*80}")
for i, col in enumerate(df.columns, 1):
    print(f"{i:2d}. {col:40s} | {str(df[col].dtype):15s} | Non-null: {df[col].notna().sum():,}")

print(f"\n{'='*80}")


All Columns in KPI-Enhanced Dataset:

 1. Sales_Order                              | str             | Non-null: 1,048,575
 2. Sales_Date                               | datetime64[us]  | Non-null: 1,048,575
 3. date_key                                 | str             | Non-null: 1,048,575
 4. product_key                              | str             | Non-null: 1,048,575
 5. Brand                                    | int64           | Non-null: 1,048,575
 6. Product_Name                             | str             | Non-null: 1,048,575
 7. Product_Size                             | str             | Non-null: 1,048,575
 8. store_key                                | int64           | Non-null: 1,048,575
 9. Store_City                               | str             | Non-null: 1,048,575
10. Store_State                              | str             | Non-null: 1,048,575
11. Store_Region                             | str             | Non-null: 1,048,575
12. Delivery_location     

In [17]:
# Display key KPI statistics
print(f"\nKEY KPI STATISTICS:")
print(f"\n{'='*80}")

kpi_cols = [
    'Gross_Revenue', 'Purchase_Cost', 'Gross_Profit', 'Margin_Percent',
    'Inventory_Turnover', 'Lead_Time_Days', 'ASP', 'Velocity'
]

kpi_stats = df[[col for col in kpi_cols if col in df.columns]].describe()
print(kpi_stats)


KEY KPI STATISTICS:

       Gross_Revenue  Purchase_Cost  Gross_Profit  Margin_Percent  \
count      1048575.0   1.048575e+06  1.048575e+06    1.048575e+06   
mean             0.0   6.754177e+00  2.485002e+01    7.053807e+01   
std              0.0   6.628775e+01  8.919218e+01    2.426741e+02   
min              0.0   0.000000e+00 -1.490073e+04   -2.847143e+04   
25%              0.0   0.000000e+00  9.990000e+00    1.000000e+02   
50%              0.0   0.000000e+00  1.699000e+01    1.000000e+02   
75%              0.0   0.000000e+00  2.999000e+01    1.000000e+02   
max              0.0   1.541056e+04  1.327997e+04    1.000000e+02   

       Inventory_Turnover  Lead_Time_Days        ASP      Velocity  
count                 0.0    1.048575e+06  1048575.0  1.048575e+06  
mean                  NaN    3.135799e-01        0.0  3.153597e+01  
std                   NaN    1.452063e+00        0.0  4.972647e+01  
min                   NaN    0.000000e+00        0.0  1.666667e-02  
25%        

In [18]:
# Return the final KPI-enhanced dataset
print(f"\n‚úÖ KPI Engine Complete!")
print(f"\nFinal Dataset:")
print(f"  - Rows: {len(df):,}")
print(f"  - Columns: {len(df.columns)}")
print(f"  - New KPIs added: 40+")
print(f"  - Exported to: master_dataset_kpi.parquet")

df


‚úÖ KPI Engine Complete!

Final Dataset:
  - Rows: 1,048,575
  - Columns: 64
  - New KPIs added: 40+
  - Exported to: master_dataset_kpi.parquet


Unnamed: 0,Sales_Order,Sales_Date,date_key,product_key,Brand,Product_Name,Product_Size,store_key,Store_City,Store_State,...,Store_Total_Revenue,Store_Total_Margin,Store_Inventory_Value,Store_Efficiency,Store_Revenue_Rank,Velocity,Revenue_Contribution,ABC_Class,XYZ_Class,AX_AY_AZ_Class
0,SO-0000001,2016-01-01,01/01/2016,1004,1004,Jim Beam w/2 Rocks Glasses,750mL,1,HARDERSFIELD,England,...,0,742087.10,37983.15,0.0,40,0.700000,0,C,Y,CY
1,SO-0000002,2016-01-01,01/01/2016,13795,13795,Yellow Tail Tree Free Chard,1.5L,66,EANVERNESS,Scotland,...,0,985021.38,749183.34,0.0,40,0.983333,0,C,X,CX
2,SO-0000003,2016-01-01,01/01/2016,13793,13793,Yellow Tail Svgn Bl,1.5L,66,EANVERNESS,Scotland,...,0,985021.38,749183.34,0.0,40,1.633333,0,C,X,CX
3,SO-0000004,2016-01-01,01/01/2016,3877,3877,Smirnoff Green Apple Vodka,750mL,28,LARNWICK,England,...,0,70197.48,50025.49,0.0,40,14.416667,0,C,Y,CY
4,SO-0000005,2016-01-01,01/01/2016,3878,3878,Smirnoff 80 Proof,750mL,28,LARNWICK,England,...,0,70197.48,50025.49,0.0,40,72.933333,0,C,Z,CZ
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1048570,SO-1048571,2016-02-29,29/02/2016,36771,36771,Yellow Tail Merlot Ausl,1.5L,17,OLDHAM,England,...,0,224633.83,46533.04,0.0,40,15.133333,0,C,Z,CZ
1048571,SO-1048572,2016-02-29,29/02/2016,26463,26463,Ravenswood Vints Blend Znfdl,750mL,16,LUNDY,England,...,0,193858.98,49757.23,0.0,40,20.166667,0,C,Z,CZ
1048572,SO-1048573,2016-02-29,29/02/2016,18106,18106,Barefoot Cellars Pink Moscat,1.5L,10,HORNSEY,England,...,0,838234.82,348986.05,0.0,40,10.233333,0,C,Y,CY
1048573,SO-1048574,2016-02-29,29/02/2016,14701,14701,Cupcake Red Velvet,750mL,1,HARDERSFIELD,England,...,0,742087.10,37983.15,0.0,40,32.450000,0,C,Z,CZ
