# Supply Chain Analysis - Exploratory Data Analysis

This notebook performs EDA to support User Story 1: Analyzing shipping costs by transportation mode and route.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

## Load Data

In [None]:
# Load supply chain data
path = "../data/raw/supply_chain_data.csv"
df = pd.read_csv(path)

# Display basic information
print(f"Dataset shape: {df.shape}")
print(f"\nColumn names:")
print(df.columns.tolist())
print(f"\nFirst few rows:")
df.head()

## User Story 1: Shipping Cost Analysis by Transportation Mode

Analyze shipping costs across different transportation modes to identify cost-effective options.

In [None]:
# Calculate average shipping costs by transportation mode
shipping_summary = df.groupby('Transportation modes').agg({
    'Shipping costs': ['mean', 'median', 'std'],
    'Shipping times': ['mean', 'median'],
    'SKU': 'count'
}).round(2)

shipping_summary.columns = ['Avg_Cost', 'Median_Cost', 'Std_Cost', 'Avg_Time', 'Median_Time', 'Count']
shipping_summary = shipping_summary.reset_index()
print("Shipping Cost Summary by Transportation Mode:")
print(shipping_summary)

In [None]:
# Visualization 1: Average Shipping Costs by Transportation Mode
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Average shipping costs
ax1 = axes[0]
modes = shipping_summary['Transportation modes']
avg_costs = shipping_summary['Avg_Cost']
colors = ['#3498db', '#e74c3c', '#2ecc71']

bars = ax1.bar(modes, avg_costs, color=colors, alpha=0.7, edgecolor='black')
ax1.set_xlabel('Transportation Mode', fontsize=12)
ax1.set_ylabel('Average Shipping Cost ($)', fontsize=12)
ax1.set_title('Average Shipping Costs by Transportation Mode', fontsize=14, fontweight='bold')
ax1.grid(axis='y', alpha=0.3)

# Add value labels on bars
for bar in bars:
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height,
             f'${height:.2f}',
             ha='center', va='bottom', fontsize=10)

# Plot 2: Cost vs Time efficiency
ax2 = axes[1]
scatter = ax2.scatter(shipping_summary['Avg_Time'], 
                      shipping_summary['Avg_Cost'],
                      s=shipping_summary['Count']*10,
                      c=colors, alpha=0.6, edgecolor='black')

# Add labels for each point
for idx, row in shipping_summary.iterrows():
    ax2.annotate(row['Transportation modes'], 
                 (row['Avg_Time'], row['Avg_Cost']),
                 xytext=(5, 5), textcoords='offset points', fontsize=10)

ax2.set_xlabel('Average Shipping Time (days)', fontsize=12)
ax2.set_ylabel('Average Shipping Cost ($)', fontsize=12)
ax2.set_title('Shipping Cost vs Time by Transportation Mode', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../img/shipping_cost_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

## Analysis by Route

Examine how shipping costs vary across different routes for each transportation mode.

In [None]:
# Analyze costs by route and transportation mode
route_analysis = df.groupby(['Routes', 'Transportation modes']).agg({
    'Shipping costs': 'mean',
    'SKU': 'count'
}).round(2)

route_analysis.columns = ['Avg_Shipping_Cost', 'Shipment_Count']
route_analysis = route_analysis.reset_index()
print("\nShipping Cost by Route and Transportation Mode:")
print(route_analysis)

In [None]:
# Visualization 2: Heatmap of shipping costs by route and mode
pivot_data = route_analysis.pivot(index='Routes', 
                                   columns='Transportation modes', 
                                   values='Avg_Shipping_Cost')

plt.figure(figsize=(10, 6))
sns.heatmap(pivot_data, annot=True, fmt='.2f', cmap='RdYlGn_r', 
            cbar_kws={'label': 'Average Shipping Cost ($)'},
            linewidths=0.5, linecolor='gray')
plt.title('Average Shipping Costs: Routes vs Transportation Modes', fontsize=14, fontweight='bold')
plt.xlabel('Transportation Mode', fontsize=12)
plt.ylabel('Route', fontsize=12)
plt.tight_layout()
plt.savefig('../img/route_mode_heatmap.png', dpi=300, bbox_inches='tight')
plt.show()


## Key Findings

Based on this analysis:

1. **Transportation Mode Efficiency**: The data reveals significant variation in average shipping costs across transportation modes, with some modes offering better cost-time tradeoffs.

2. **Route Optimization Opportunities**: Certain route-mode combinations show notably higher or lower costs, suggesting opportunities for route optimization.

3. **Decision Support**: Supply chain managers can use these visualizations to:
   - Identify overpaying scenarios (high cost, slow delivery)
   - Compare alternative shipping strategies
   - Negotiate better rates with carriers on specific routes

These insights directly support User Story 1 by enabling filtering and comparison of shipping methods to identify cost-effective options.