# Supply Chain Introduction for Data Scientists

Welcome to the Supply Chain Learning Repository! This notebook provides a hands-on introduction to supply chain concepts with practical Python examples.

## Learning Objectives
By the end of this notebook, you will:
1. Understand basic supply chain terminology and concepts
2. Learn how to model supply chain components in Python
3. Calculate key supply chain performance metrics
4. Visualize supply chain data effectively

Let's get started!

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import os

# Add src directory to path for importing our modules
sys.path.append('../src')

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (10, 6)

print("📦 Supply Chain Learning Environment Ready!")
print("Python version:", sys.version.split()[0])
print("Pandas version:", pd.__version__)
print("NumPy version:", np.__version__)

## 1. What is a Supply Chain?

A **supply chain** is a network of organizations, people, activities, information, and resources involved in moving a product or service from supplier to customer.

### Key Components:
- **Suppliers** - Provide raw materials and components
- **Manufacturers** - Transform materials into finished products
- **Distributors** - Store and transport products
- **Retailers** - Sell products to end customers
- **Customers** - Final consumers of products

Let's create a simple supply chain model:

In [None]:
# Import our supply chain basics module
from supply_chain_basics.introduction import create_basic_supply_chain, get_supply_chain_definitions

# Create a basic supply chain
supply_chain = create_basic_supply_chain()

print("🏭 Basic Supply Chain Structure:")
print("=" * 40)

for component_name, component in supply_chain.items():
    info = component.get_info()
    print(f"\n{info['name']}")
    print(f"  Type: {info['type']}")
    print(f"  Description: {info['description']}")
    print(f"  Upstream partners: {info['upstream_count']}")
    print(f"  Downstream partners: {info['downstream_count']}")

## 2. Key Supply Chain Terminology

Understanding the language of supply chain is crucial for data scientists working in this field:

In [None]:
# Get supply chain definitions
definitions = get_supply_chain_definitions()

print("📚 Key Supply Chain Terms:")
print("=" * 50)

for term, details in definitions.items():
    print(f"\n🔹 {term}")
    print(f"   Definition: {details['definition']}")
    print(f"   Why it matters: {details['importance']}")

## 3. Supply Chain Performance Metrics

Data scientists need to understand key metrics used to measure supply chain performance. Let's calculate some important KPIs:

In [None]:
from supply_chain_basics.introduction import calculate_key_metrics

# Sample company data
company_data = {
    'Company A': {'revenue': 5000000, 'cogs': 3000000, 'inventory': 500000, 'lead_time': 10},
    'Company B': {'revenue': 2000000, 'cogs': 1200000, 'inventory': 300000, 'lead_time': 21},
    'Company C': {'revenue': 8000000, 'cogs': 5200000, 'inventory': 650000, 'lead_time': 7}
}

print("📊 Supply Chain Performance Comparison:")
print("=" * 60)

metrics_df = []
for company, data in company_data.items():
    metrics = calculate_key_metrics(
        data['revenue'], data['cogs'], data['inventory'], data['lead_time']
    )
    metrics['company'] = company
    metrics_df.append(metrics)

# Create DataFrame for easier analysis
df = pd.DataFrame(metrics_df)
df = df.set_index('company')

print(df)
print("\n🎯 Key Insights:")
print(f"   - Best inventory turnover: {df['inventory_turnover'].idxmax()} ({df['inventory_turnover'].max():.1f}x)")
print(f"   - Lowest days of supply: {df['days_of_supply'].idxmin()} ({df['days_of_supply'].min():.1f} days)")
print(f"   - Highest gross margin: {df['gross_margin_percent'].idxmax()} ({df['gross_margin_percent'].max():.1f}%)")

## 4. Visualizing Supply Chain Data

Visual analysis is crucial for understanding supply chain patterns. Let's create some charts:

In [None]:
# Create visualizations of the metrics
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Supply Chain Performance Dashboard', fontsize=16, fontweight='bold')

# 1. Inventory Turnover
df['inventory_turnover'].plot(kind='bar', ax=axes[0,0], color='steelblue')
axes[0,0].set_title('Inventory Turnover Ratio')
axes[0,0].set_ylabel('Times per Year')
axes[0,0].tick_params(axis='x', rotation=45)

# 2. Days of Supply
df['days_of_supply'].plot(kind='bar', ax=axes[0,1], color='orange')
axes[0,1].set_title('Days of Supply')
axes[0,1].set_ylabel('Days')
axes[0,1].tick_params(axis='x', rotation=45)

# 3. Gross Margin
df['gross_margin_percent'].plot(kind='bar', ax=axes[1,0], color='green')
axes[1,0].set_title('Gross Margin %')
axes[1,0].set_ylabel('Percentage')
axes[1,0].tick_params(axis='x', rotation=45)

# 4. Lead Time vs Inventory Turnover (Scatter)
axes[1,1].scatter(df['lead_time_days'], df['inventory_turnover'], 
                  s=100, c=['red', 'blue', 'green'], alpha=0.7)
axes[1,1].set_xlabel('Lead Time (Days)')
axes[1,1].set_ylabel('Inventory Turnover')
axes[1,1].set_title('Lead Time vs Inventory Turnover')

# Add company labels to scatter plot
for i, company in enumerate(df.index):
    axes[1,1].annotate(company, 
                       (df.loc[company, 'lead_time_days'], df.loc[company, 'inventory_turnover']),
                       xytext=(5, 5), textcoords='offset points')

plt.tight_layout()
plt.show()

print("📈 Visualization Insights:")
print("   - Lower lead times often correlate with higher inventory turnover")
print("   - Companies can benchmark performance against peers")
print("   - Multiple metrics needed for comprehensive evaluation")

## 5. Creating Sample Supply Chain Data

As a data scientist, you'll often need to work with supply chain datasets. Let's create a realistic sample dataset:

In [None]:
# Create sample supply chain dataset
np.random.seed(42)  # For reproducible results

n_products = 50
n_suppliers = 10

# Product data
products_df = pd.DataFrame({
    'product_id': [f'P{i:03d}' for i in range(1, n_products + 1)],
    'category': np.random.choice(['Electronics', 'Clothing', 'Home', 'Sports', 'Books'], n_products),
    'unit_cost': np.round(np.random.uniform(10, 500, n_products), 2),
    'annual_demand': np.random.poisson(1000, n_products),
    'lead_time_days': np.random.randint(5, 30, n_products),
    'supplier_id': [f'S{i:03d}' for i in np.random.randint(1, n_suppliers + 1, n_products)]
})

# Calculate additional metrics
products_df['annual_cost'] = products_df['unit_cost'] * products_df['annual_demand']
products_df['eoq_estimate'] = np.sqrt(2 * products_df['annual_demand'] * 50 / (products_df['unit_cost'] * 0.2))

print("📦 Sample Supply Chain Dataset Created!")
print(f"Dataset shape: {products_df.shape}")
print("\nFirst 5 products:")
print(products_df.head())

print("\n📊 Dataset Summary:")
print(products_df.describe())

In [None]:
# Analyze the sample data
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Supply Chain Data Analysis', fontsize=16, fontweight='bold')

# 1. Distribution of annual demand by category
products_df.boxplot(column='annual_demand', by='category', ax=axes[0,0])
axes[0,0].set_title('Annual Demand by Category')
axes[0,0].set_xlabel('Category')
axes[0,0].tick_params(axis='x', rotation=45)

# 2. Lead time distribution
products_df['lead_time_days'].hist(bins=15, ax=axes[0,1], alpha=0.7, color='orange')
axes[0,1].set_title('Lead Time Distribution')
axes[0,1].set_xlabel('Lead Time (Days)')
axes[0,1].set_ylabel('Frequency')

# 3. Unit cost vs demand scatter
scatter = axes[1,0].scatter(products_df['unit_cost'], products_df['annual_demand'], 
                           c=products_df['lead_time_days'], alpha=0.6, cmap='viridis')
axes[1,0].set_xlabel('Unit Cost ($)')
axes[1,0].set_ylabel('Annual Demand')
axes[1,0].set_title('Cost vs Demand (colored by Lead Time)')
plt.colorbar(scatter, ax=axes[1,0], label='Lead Time (Days)')

# 4. Annual cost by category
category_costs = products_df.groupby('category')['annual_cost'].sum().sort_values(ascending=False)
category_costs.plot(kind='bar', ax=axes[1,1], color='green')
axes[1,1].set_title('Total Annual Cost by Category')
axes[1,1].set_ylabel('Annual Cost ($)')
axes[1,1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

print("\n🔍 Key Findings:")
print(f"   - Highest value category: {category_costs.index[0]} (${category_costs.iloc[0]:,.0f})")
print(f"   - Average lead time: {products_df['lead_time_days'].mean():.1f} days")
print(f"   - Most expensive product: ${products_df['unit_cost'].max():.2f}")
print(f"   - Highest demand product: {products_df['annual_demand'].max():,} units")

## 6. Next Steps in Your Supply Chain Journey

Congratulations! You've completed the introduction to supply chain concepts. Here's what you should explore next:

### 📚 Continue Learning:
1. **Pareto Analysis (80/20 Rule)** - Learn how to identify the vital few vs. the trivial many
2. **Supply Chain Optimization** - Explore mathematical models for optimal decision making
3. **Demand Forecasting** - Use machine learning for predicting future demand
4. **Risk Management** - Understand and mitigate supply chain risks

### 🛠️ Practical Applications:
- Apply these concepts to your organization's data
- Build dashboards for supply chain KPIs
- Develop optimization models for inventory management
- Create predictive models for demand planning

In [None]:
# Save our sample dataset for future use
products_df.to_csv('../data/sample_products.csv', index=False)
print("💾 Sample dataset saved to '../data/sample_products.csv'")
print("\n🎯 You're ready to explore the next notebooks:")
print("   📈 02_pareto_analysis_tutorial.ipynb")
print("   🔧 03_inventory_optimization.ipynb")
print("   📊 04_supplier_analysis.ipynb")

print("\n🎉 Happy Learning!")