# Supply Chain Anomaly Detection Demo
    
This notebook demonstrates the complete workflow of the Supply Chain Anomaly Detection system using a sample dataset. It walks through:

1. Creating sample data
2. Data preprocessing and feature engineering,
3. Anomaly detection,
4. Issue classification,
5. Recommendation generation,
6. Visualization and analysis,

Let's start by importing necessary libraries and setting up our environment.

In [None]:
!conda install -n .conda ipykernel --update-deps --force-reinstall

: 

In [0]:
# Upgrade NumPy to the latest version
%pip install --upgrade numpy

: 

In [0]:
%pip install mlflow

In [0]:
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Add the project root to Python path to import our package
# This assumes the notebook is in the notebooks/ directory
sys.path.append("../supply-chain-anomaly-detection")

# Configure plot style
plt.style.use('seaborn-whitegrid')
sns.set_context("notebook", font_scale=1.2)
plt.rcParams['figure.figsize'] = (10, 6)

# Import our main class
from src.models.sc_issue_detection import SupplyChainIssueDetection

# Optional: Import MLflow utilities if available
try:
    from mlflow import mlflow_utils
    MLFLOW_AVAILABLE = True
except ImportError:
    MLFLOW_AVAILABLE = False
    print("MLflow not available. Install with 'pip install mlflow' for experiment tracking.")

## 1. Load Data

### 1.1. Generate Sample Data

First, let's create a sample dataset for supply chain metrics. We'll create a dataset with some known anomalies to demonstrate the detection capabilities.

In [0]:
def create_sample_data(n_records=50, seed=42):
    """Create a sample dataset with supply chain metrics."""
    np.random.seed(seed)  # For reproducibility
    
    # Define sample values for categorical features
    skus = [f"SKU_{i}" for i in range(1, 6)]
    countries = ['US', 'UK', 'DE']
    quarters = ['2024Q1', '2024Q2']
    
    # Create base data
    data = pd.DataFrame({
        'SKU': np.random.choice(skus, n_records),
        'Country': np.random.choice(countries, n_records),
        'Quarter': np.random.choice(quarters, n_records),
        'SellThru': np.random.randint(100, 1000, n_records),
        'SellTo': np.random.randint(80, 950, n_records),
        'T2Inventory': np.random.randint(200, 2000, n_records),
        'DistributorInventory': np.random.randint(100, 1500, n_records),
        'Backlog': np.random.randint(0, 500, n_records),
        'Shipments': np.random.randint(50, 800, n_records),
        'AgedInventory': np.random.randint(0, 300, n_records),
        'WeeksOfStockT1': np.random.uniform(1, 10, n_records),
        'WeeksOfStockT2': np.random.uniform(2, 12, n_records),
        'NumCompetitors': np.random.randint(1, 10, n_records),
        'PricePositioning': np.random.uniform(80, 120, n_records),
        'TargetQty': np.random.randint(200, 1200, n_records)
    })
    
    # Introduce some anomalies (around 10%)
    anomaly_indices = np.random.choice(n_records, size=5, replace=False)
    
    # Inventory imbalances
    data.loc[anomaly_indices[0], 'WeeksOfStockT1'] = 15
    
    # Sales performance gaps
    data.loc[anomaly_indices[1], 'SellTo'] = int(data.loc[anomaly_indices[1], 'TargetQty'] * 0.4)
    
    # Pricing issues
    data.loc[anomaly_indices[2], 'PricePositioning'] = 130
    
    # Supply chain disruptions
    data.loc[anomaly_indices[3], 'Backlog'] = 800
    
    # Sell-through bottlenecks
    data.loc[anomaly_indices[4], 'SellThru'] = int(data.loc[anomaly_indices[4], 'SellTo'] * 0.5)
    
    return data
    
# Create the sample data
sample_data = create_sample_data(n_records=50)

# Create directory for data if it doesn't exist
os.makedirs('/Workspace/Repos/mohammed.jeddi@hp.com/supply-chain-anomaly-detection/src/data/sample/', exist_ok=True)

# Save the sample data
sample_data_path = '/Workspace/Repos/mohammed.jeddi@hp.com/supply-chain-anomaly-detection/src/data/sample/sample_supply_chain_data.csv'
sample_data.to_csv(sample_data_path, index=False)

print(f"Sample data created with shape: {sample_data.shape}")
sample_data.head()

## 2. Initialize and Configure the Supply Chain Issue Detection System

Now, let's initialize our anomaly detection system. We'll configure it with parameters suitable for our sample dataset.

In [0]:
# Initialize the detector
# Note: Set use_llm=True if you have an OpenAI API key and want LLM-enhanced recommendations
detector = SupplyChainIssueDetection(
    use_llm=False,  # Set to True if you have an OpenAI API key in environment
    contamination=0.3,  # We expect about 30% anomalies in our sample data
    random_state=42  # For reproducibility
)

print("Supply Chain Issue Detection system initialized")

## 3. Start MLflow Tracking (Optional)

If MLflow is available, we'll use it to track our experiment.

In [0]:
# Start MLflow tracking if available
if MLFLOW_AVAILABLE:
    mlflow_run = mlflow_utils.start_run(experiment_name="supply_chain_demo")
    
    # Log parameters
    params = {
        'contamination': 0.1,
        'random_state': 42,
        'use_llm': False,
        'data_source': 'sample_data'
    }
    mlflow_utils.log_parameters(params)
    
    print(f"MLflow tracking started with run ID: {mlflow_run.info.run_id}")
else:
    print("MLflow tracking not available. Continuing without experiment tracking.")

## 4. Data Processing Pipeline

Now, let's run through the entire pipeline steps using our sample data.

### 4.1 Load and Preprocess Data

In [0]:
# Load data
data = detector.load_data(text)
print(f"Loaded data with shape: {data.shape}")

# Preprocess data
processed_data = detector.preprocess_data()
print(f"Preprocessed data with shape: {processed_data.shape}")

# Show engineered features
engineered_features = [
    'SellThruToRatio', 
   'InventoryTurnoverRate', 
   'TargetAchievement',
   'TargetAchievement_ship', 
   'SupplyChainEfficiency', 
    'AgedInventoryPct'
]

processed_data[engineered_features].describe()

### 4.2 Detect Anomalies

In [0]:
# Train anomaly detector
detector.train_anomaly_detector()

# Count anomalies
anomaly_count = detector.data['is_anomaly'].sum()
print(f"Detected {anomaly_count} anomalies out of {len(detector.data)} records ({anomaly_count/len(detector.data)*100:.1f}%)")

# Look at model agreement
agreement_df = pd.DataFrame({
    'Isolation Forest': detector.data['if_anomaly'],
    'LOF': detector.data['lof_anomaly'] if 'lof_anomaly' in detector.data.columns else np.zeros(len(detector.data)),
    'One-Class SVM': detector.data['ocsvm_anomaly'],
   'Ensemble Decision': detector.data['is_anomaly'],
   'Anomaly Score': detector.data['anomaly_score']
})

# Display anomalies
anomalies_df = detector.data[detector.data['is_anomaly'] == 1].copy()
anomalies_df[['Global_Distributor_Group_Name','reporter_name', 'reporter_hq_id', 'PRODUCT_GROUP_BMT','fiscal_year_quarter', 'product_number', 'reporter_country_code', 'SellThruToRatio', 
            'InventoryTurnoverRate', 
            'TargetAchievement',
            'TargetAchievement_ship', 
            'SupplyChainEfficiency', 
            'AgedInventoryPct', 'anomaly_score']]

### 4.3 Visualize Anomalies

Key Insights:
- Critical stock-outs (<1 week) are associated with poor target achievement
- Overstocking (>7 weeks) indicates potential forecasting issues
- Optimal inventory levels fall between 2-5 weeks of stock
- Clustering of anomalies reveals distinct supply chain problem patterns

In [0]:
# Visualize anomalies by WeeksOfStock vs TargetAchievement
detector.visualize_anomalies('t1_wos', 'TargetAchievement')

# Visualize anomalies by SellThruToRatio vs InventoryTurnoverRate
detector.visualize_anomalies('SellThruToRatio', 'InventoryTurnoverRate')

### 4.4 Classify Issues

In [0]:
# Classify issues
classified_anomalies = detector.classify_issues(use_ml=True)

# Display issue distribution
issue_counts = classified_anomalies['issue_type'].value_counts()
print("Issue Type Distribution:")
print(issue_counts)

# Plot issue distribution
plt.figure(figsize=(10, 6))
issue_counts.plot(kind='bar', color='teal')
plt.title('Distribution of Issue Types')
plt.xlabel('Issue Type')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# Display classified anomalies
classified_anomalies[['Global_Distributor_Group_Name','reporter_name', 'reporter_hq_id', 'PRODUCT_GROUP_BMT','fiscal_year_quarter', 'product_number', 'reporter_country_code','issue_type', 'anomaly_score']]

### 4.5 Generate Recommendations

In [0]:
# Generate recommendations
recommendations = detector.generate_recommendations(classified_anomalies)

# Display recommendations
recommendations[['Global_Distributor_Group_Name','reporter_name', 'reporter_hq_id', 'PRODUCT_GROUP_BMT','fiscal_year_quarter', 'product_number', 'reporter_country_code', 'Issue_Type', 'Priority', 'Final_Recommendation']]

In [0]:
display(recommendations)

### 4.6 PCA Visualization and Analysis

In [0]:
detector.data=recommendations

In [0]:
# Visualize using PCA
pca, pca_anomalies = detector.visualize_with_pca()

# Analyze PCA results
pca_analysis = detector.analyze_pca_results(pca, pca_anomalies)

# Print key insights
print(f"Total explained variance: {pca_analysis['total_explained_variance']:.2f}%")
if pca_analysis['silhouette_score'] is not None:
    print(f"Silhouette score: {pca_analysis['silhouette_score']:.3f}")
print(f"Separation quality: {pca_analysis['separation_quality']}")

print("\nTop features driving categorization:")
top_features = sorted(pca_analysis['feature_importance'], key=lambda x: x['Total_Importance'], reverse=True)[:5]
for feature in top_features:
    print(f"  - {feature['Feature']}: {feature['Total_Importance']:.3f}")

## 5. Save Models and Results

In [0]:
# Create output directory if it doesn't exist
path=''
os.makedirs(path, exist_ok=True)

# Save models
model_path = path +'/demo_supply_chain_model'
detector.save_models(model_path)
print(f"Models saved to {model_path}")

# Save recommendations to CSV
os.makedirs('../supply-chain-anomaly-detection/data/processed', exist_ok=True)
recommendations.to_csv('../supply-chain-anomaly-detection/data/processed/demo_recommendations.csv', index=False)
print("Recommendations saved to ../data/processed/demo_recommendations.csv")

# Log to MLflow if available
if MLFLOW_AVAILABLE:
    # Log metrics
    metrics = {
        'anomaly_count': anomaly_count,
        'anomaly_percentage': anomaly_count/len(detector.data)*100
    }
    
    # Add issue type counts
    for issue, count in issue_counts.items():
        metrics[f'issue_count_{issue}'] = count
    
    # Add PCA metrics if available
    if pca_analysis:
        metrics['pca_explained_variance'] = pca_analysis['total_explained_variance']
        if pca_analysis['silhouette_score'] is not None:
            metrics['silhouette_score'] = pca_analysis['silhouette_score']
    
    mlflow_utils.log_metrics(metrics)
    
    # Log recommendations as artifact
    mlflow_utils.log_dataframe(recommendations, 'recommendations', 'csv')
    
    # Log model
    mlflow_utils.log_model(detector, "supply_chain_detector")
    
    print("Results logged to MLflow")

## 6. Test Model Loading

In [0]:

# Load the saved model
loaded_detector = SupplyChainIssueDetection.load_models(model_path)
print("Model loaded successfully")

# Test on new data - we'll use the same data for demo purposes
# In real scenarios, you would use new, unseen data
data, new_recommendations = loaded_detector.process_new_data(sample_data_path)

print(f"Processed new data: Found {len(new_recommendations)} anomalies")
new_recommendations[['SKU', 'Country', 'issue_type', 'Final_Recommendation']].head()

In [0]:
expected_features = loaded_detector.preprocessor.numerical_features
expected_features

## 7. End MLflow Run (if started)

In [0]:
# End MLflow run if it was started
if MLFLOW_AVAILABLE and 'mlflow_run' in locals():
    mlflow_run.__exit__(None, None, None)
    print("MLflow run ended")

## 8. Summary and Conclusions

In this notebook, we've demonstrated the complete workflow of the Supply Chain Anomaly Detection system:

1. **Data Processing**: We loaded and preprocessed supply chain data, creating engineered features to improve anomaly detection.

2. **Anomaly Detection**: Using an ensemble approach that combines multiple algorithms (Isolation Forest, LOF, One-Class SVM), we identified anomalous patterns in the data.

3. **Issue Classification**: We categorized the detected anomalies into specific issue types such as Inventory Imbalance, Sales Performance Gap, Pricing Issue, etc.

4. **Recommendation Generation**: For each anomaly, we generated tailored recommendations to address the underlying issues.

5. **Visualization and Analysis**: We created visualizations to understand anomalies and their patterns, and used PCA to validate issue categorization quality.

6. **Model Management**: We saved the models for future use and demonstrated how to load them for processing new data.

7. **MLflow Integration**: We showed how MLflow can be used to track experiments, metrics, and models.

This end-to-end workflow demonstrates the capabilities of the Supply Chain Anomaly Detection system for identifying and addressing issues in supply chain operations.