# Distribution Analysis

A comprehensive exploration of distribution shape, characteristics, and risk quantification using the RustLab ecosystem. This notebook demonstrates advanced distribution analysis techniques for understanding data patterns, tail behavior, and financial risk assessment.

## Learning Objectives

- **Shape Analysis**: Skewness, kurtosis, and distribution classification
- **Quantile Analysis**: Percentiles, quartiles, and robust statistics
- **Risk Metrics**: Value at Risk (VaR), Conditional VaR, and tail risk measures
- **Outlier Detection**: Statistical methods for identifying anomalous observations
- **Financial Applications**: Portfolio assessment, return distributions, and risk management

## Mathematical Foundation

Distribution analysis relies on higher-order moments and order statistics:

- **Skewness (γ₁)**: Third standardized moment measuring asymmetry
- **Kurtosis (γ₂)**: Fourth standardized moment measuring tail heaviness
- **Quantiles**: Order statistics dividing distributions into equal parts
- **Risk Measures**: Quantile-based metrics for downside risk assessment

In [2]:
// 📦 Setup: Dependencies and Imports
:dep rustlab-stats = { path = ".." }
:dep rustlab-math = { path = "../../rustlab-math" }
:dep rustlab-plotting = { path = "../../rustlab-plotting" }

// Global imports - these persist across all cells
use rustlab_stats::prelude::*;
use rustlab_math::*;
use rustlab_plotting::*;

// Test that everything is working
{
    let test_data = vec64![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];
    let skew_val = test_data.skewness();
    let kurt_val = test_data.kurtosis();
    let median_val = test_data.median();
    
    let setup_msg = format!("🎯 Setup complete! Test skewness: {:.3}, kurtosis: {:.3}, median: {:.1}", skew_val, kurt_val, median_val);
    println!("{}", setup_msg);
    println!("📊 Ready for comprehensive distribution analysis");
}

🎯 Setup complete! Test skewness: 0.000, kurtosis: -4.200, median: 5.5
📊 Ready for comprehensive distribution analysis


()

## 1. Shape Analysis: Skewness and Kurtosis

Understanding distribution asymmetry and tail behavior through higher-order moments.

In [3]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::*;
    use rustlab_plotting::*;
    
    println!("🔬 Distribution Shape Analysis: Skewness and Kurtosis");
    println!("{}", "=".repeat(55));
    
    // Create different distribution shapes for comparison
    
    // Symmetric distribution (approximately normal)
    let symmetric_data = vec64![12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0];
    
    // Right-skewed distribution (income-like)
    let right_skewed = vec64![25000.0, 30000.0, 32000.0, 35000.0, 38000.0, 42000.0, 45000.0, 120000.0, 150000.0, 200000.0];
    
    // Left-skewed distribution (test scores on easy exam)
    let left_skewed = vec64![45.0, 65.0, 75.0, 82.0, 85.0, 88.0, 90.0, 92.0, 94.0, 96.0, 98.0];
    
    // Heavy-tailed distribution (financial returns with extreme events)
    let heavy_tailed = vec64![-0.15, -0.08, -0.03, -0.01, 0.01, 0.02, 0.03, 0.05, 0.08, 0.12, -0.20, 0.18];
    
    println!("📊 Dataset 1: Symmetric Distribution (Normal-like)");
    
    let sym_skew = symmetric_data.skewness();
    let sym_kurt = symmetric_data.kurtosis();
    let sym_kurt_raw = symmetric_data.kurtosis_raw();
    
    let sym_summary = format!("   Skewness: {:.3} (close to 0 = symmetric)", sym_skew);
    println!("{}", sym_summary);
    let sym_kurt_msg = format!("   Excess Kurtosis: {:.3} (close to 0 = normal-like tails)", sym_kurt);
    println!("{}", sym_kurt_msg);
    let sym_kurt_raw_msg = format!("   Raw Kurtosis: {:.3} (close to 3 = normal reference)", sym_kurt_raw);
    println!("{}", sym_kurt_raw_msg);
    
    // Interpret skewness
    let skew_interp = match sym_skew.abs() {
        s if s < 0.5 => "approximately symmetric",
        s if s < 1.0 => "moderately skewed",
        _ => "highly skewed",
    };
    let sym_interp = format!("   Interpretation: Distribution is {}", skew_interp);
    println!("{}", sym_interp);
    
    println!();
    println!("📊 Dataset 2: Right-Skewed Distribution (Income Data)");
    
    let right_skew = right_skewed.skewness();
    let right_kurt = right_skewed.kurtosis();
    let right_mean = right_skewed.mean();
    let right_median = right_skewed.median();
    
    let right_summary = format!("   Skewness: {:.3} (positive = right tail)", right_skew);
    println!("{}", right_summary);
    let right_kurt_msg = format!("   Excess Kurtosis: {:.3}", right_kurt);
    println!("{}", right_kurt_msg);
    let right_central = format!("   Mean: ${:.0}, Median: ${:.0} (mean > median)", right_mean, right_median);
    println!("{}", right_central);
    
    if right_skew > 0.5 {
        println!("   → Classic income distribution: few high earners create right tail");
    }
    
    println!();
    println!("📊 Dataset 3: Left-Skewed Distribution (Easy Test Scores)");
    
    let left_skew = left_skewed.skewness();
    let left_kurt = left_skewed.kurtosis();
    let left_mean = left_skewed.mean();
    let left_median = left_skewed.median();
    
    let left_summary = format!("   Skewness: {:.3} (negative = left tail)", left_skew);
    println!("{}", left_summary);
    let left_kurt_msg = format!("   Excess Kurtosis: {:.3}", left_kurt);
    println!("{}", left_kurt_msg);
    let left_central = format!("   Mean: {:.1}, Median: {:.1} (mean < median)", left_mean, left_median);
    println!("{}", left_central);
    
    if left_skew < -0.3 {
        println!("   → Easy exam pattern: most students score high, few score low");
    }
    
    println!();
    println!("📊 Dataset 4: Heavy-Tailed Distribution (Financial Returns)");
    
    let heavy_skew = heavy_tailed.skewness();
    let heavy_kurt = heavy_tailed.kurtosis();
    
    let heavy_summary = format!("   Skewness: {:.3}", heavy_skew);
    println!("{}", heavy_summary);
    let heavy_kurt_msg = format!("   Excess Kurtosis: {:.3}", heavy_kurt);
    println!("{}", heavy_kurt_msg);
    
    // Interpret kurtosis for financial risk
    let tail_risk = match heavy_kurt {
        k if k > 1.0 => "High tail risk - extreme events more likely",
        k if k > 0.5 => "Moderate tail risk - some extreme events",
        k if k > -0.5 => "Normal tail risk - typical for financial data",
        _ => "Low tail risk - fewer extreme events than normal",
    };
    let risk_msg = format!("   Risk Assessment: {}", tail_risk);
    println!("{}", risk_msg);
    
    if heavy_kurt > 1.0 {
        println!("   → Warning: Fat tails detected - consider risk management strategies");
    }
    
    println!();
    println!("📈 Shape Analysis Summary:");
    println!("{}", "-".repeat(25));
    println!("🔵 Skewness Interpretation:");
    println!("   • γ₁ ≈ 0: Symmetric distribution");
    println!("   • γ₁ > 0: Right-skewed (longer right tail)");
    println!("   • γ₁ < 0: Left-skewed (longer left tail)");
    println!("   • |γ₁| > 1: Highly skewed, consider transformation");
    
    println!();
    println!("🔴 Kurtosis Interpretation:");
    println!("   • γ₂ ≈ 0: Normal-like tails (mesokurtic)");
    println!("   • γ₂ > 0: Heavy tails (leptokurtic) - more extreme events");
    println!("   • γ₂ < 0: Light tails (platykurtic) - fewer extreme events");
    println!("   • γ₂ > 2: Very heavy tails - significant risk");
    
    println!();
    println!("💡 Key Insight: Shape analysis reveals risk patterns invisible to mean/variance alone!");
}

🔬 Distribution Shape Analysis: Skewness and Kurtosis
📊 Dataset 1: Symmetric Distribution (Normal-like)
   Skewness: 0.000 (close to 0 = symmetric)
   Excess Kurtosis: -4.200 (close to 0 = normal-like tails)
   Raw Kurtosis: -1.200 (close to 3 = normal reference)
   Interpretation: Distribution is approximately symmetric

📊 Dataset 2: Right-Skewed Distribution (Income Data)
   Skewness: 1.366 (positive = right tail)
   Excess Kurtosis: -2.457
   Mean: $71700, Median: $40000 (mean > median)
   → Classic income distribution: few high earners create right tail

📊 Dataset 3: Left-Skewed Distribution (Easy Test Scores)
   Skewness: -1.573 (negative = left tail)
   Excess Kurtosis: -0.626
   Mean: 82.7, Median: 88.0 (mean < median)
   → Easy exam pattern: most students score high, few score low

📊 Dataset 4: Heavy-Tailed Distribution (Financial Returns)
   Skewness: -0.380
   Excess Kurtosis: -2.917
   Risk Assessment: Low tail risk - fewer extreme events than normal

📈 Shape Analysis Summary

()

## 2. Quantile Analysis and Risk Metrics

Exploring percentiles, quartiles, and financial risk measures using order statistics.

In [4]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::*;
    use rustlab_plotting::*;
    
    println!("🔬 Quantile Analysis and Financial Risk Metrics");
    println!("{}", "=".repeat(50));
    
    // Portfolio daily returns (hypothetical data with realistic patterns)
    let portfolio_returns = vec64![
        0.012, 0.008, -0.005, 0.015, 0.003, -0.002, 0.018, 0.001, -0.008, 0.025,
        -0.012, 0.007, 0.022, -0.015, 0.009, 0.004, -0.020, 0.011, 0.006, -0.003,
        0.013, -0.007, 0.019, 0.002, -0.025, 0.016, 0.005, -0.011, 0.008, 0.014,
        -0.018, 0.021, 0.003, -0.009, 0.017, 0.001, -0.013, 0.024, -0.004, 0.010,
        0.007, -0.016, 0.012, 0.020, -0.006, 0.015, 0.009, -0.022, 0.018, -0.001
    ];
    
    println!("📊 Portfolio Return Distribution Analysis:");
    
    // Basic statistics
    let mean_return = portfolio_returns.mean();
    let vol = portfolio_returns.std(None);
    let median_return = portfolio_returns.median();
    
    let basic_stats = format!("   Mean Return: {:.3}% daily ({:.1}% annualized)", mean_return * 100.0, mean_return * 252.0 * 100.0);
    println!("{}", basic_stats);
    let vol_stats = format!("   Volatility: {:.3}% daily ({:.1}% annualized)", vol * 100.0, vol * (252.0_f64).sqrt() * 100.0);
    println!("{}", vol_stats);
    let median_stats = format!("   Median Return: {:.3}%", median_return * 100.0);
    println!("{}", median_stats);
    
    println!();
    println!("📈 Quartile Analysis:");
    
    // Compute quartiles
    let (q1, q2, q3) = portfolio_returns.quartiles();
    let iqr = portfolio_returns.iqr();
    
    let q1_msg = format!("   Q1 (25th percentile): {:.3}% (25% of days worse than this)", q1 * 100.0);
    println!("{}", q1_msg);
    let q2_msg = format!("   Q2 (Median): {:.3}% (half of days above/below)", q2 * 100.0);
    println!("{}", q2_msg);
    let q3_msg = format!("   Q3 (75th percentile): {:.3}% (75% of days worse than this)", q3 * 100.0);
    println!("{}", q3_msg);
    let iqr_msg = format!("   IQR: {:.3}% (middle 50% range)", iqr * 100.0);
    println!("{}", iqr_msg);
    
    println!();
    println!("💰 Value at Risk (VaR) Analysis:");
    
    // Calculate VaR at different confidence levels using percentiles
    let var_95 = portfolio_returns.percentile(5.0, None);   // 95% VaR (5th percentile)
    let var_99 = portfolio_returns.percentile(1.0, None);   // 99% VaR (1st percentile)
    let var_90 = portfolio_returns.percentile(10.0, None);  // 90% VaR (10th percentile)
    
    let var95_msg = format!("   95% VaR: {:.3}% (5% chance of losing more than this daily)", var_95 * 100.0);
    println!("{}", var95_msg);
    let var99_msg = format!("   99% VaR: {:.3}% (1% chance of losing more than this daily)", var_99 * 100.0);
    println!("{}", var99_msg);
    let var90_msg = format!("   90% VaR: {:.3}% (10% chance of losing more than this daily)", var_90 * 100.0);
    println!("{}", var90_msg);
    
    // Calculate Conditional VaR (Expected Shortfall)
    // CVaR is the average of losses beyond VaR
    let returns_sorted = {
        let mut sorted = portfolio_returns.clone();
        let slice = sorted.as_mut_slice_unchecked();
        slice.sort_by(|a, b| a.partial_cmp(b).unwrap());
        sorted
    };
    
    // CVaR 95%: average of worst 5% of returns
    let worst_5_percent_count = (portfolio_returns.len() as f64 * 0.05).ceil() as usize;
    let worst_returns = &returns_sorted.as_slice_unchecked()[..worst_5_percent_count];
    let cvar_95 = worst_returns.iter().sum::<f64>() / worst_returns.len() as f64;
    
    println!();
    println!("🚨 Conditional Value at Risk (CVaR / Expected Shortfall):");
    let cvar_msg = format!("   95% CVaR: {:.3}% (expected loss when in worst 5% of outcomes)", cvar_95 * 100.0);
    println!("{}", cvar_msg);
    
    if cvar_95.abs() > var_95.abs() * 1.2 {
        println!("   → Warning: Tail losses are significantly worse than VaR suggests");
    }
    
    println!();
    println!("📊 Performance Percentiles:");
    
    // Performance analysis using various percentiles
    let p10 = portfolio_returns.percentile(10.0, None);
    let p25 = portfolio_returns.percentile(25.0, None);
    let p75 = portfolio_returns.percentile(75.0, None);
    let p90 = portfolio_returns.percentile(90.0, None);
    let p95 = portfolio_returns.percentile(95.0, None);
    let p99 = portfolio_returns.percentile(99.0, None);
    
    let perf_10 = format!("   P10: {:.3}% (worst 10% threshold)", p10 * 100.0);
    println!("{}", perf_10);
    let perf_90 = format!("   P90: {:.3}% (best 10% threshold)", p90 * 100.0);
    println!("{}", perf_90);
    let perf_95 = format!("   P95: {:.3}% (best 5% threshold)", p95 * 100.0);
    println!("{}", perf_95);
    let perf_99 = format!("   P99: {:.3}% (best 1% threshold)", p99 * 100.0);
    println!("{}", perf_99);
    
    // Risk-Return Assessment
    println!();
    println!("⚖️ Risk-Return Assessment:");
    
    let upside_potential = p90 - median_return;
    let downside_risk = median_return - p10;
    let risk_reward_ratio = upside_potential / downside_risk;
    
    let upside_msg = format!("   Upside Potential (P90 - Median): {:.3}%", upside_potential * 100.0);
    println!("{}", upside_msg);
    let downside_msg = format!("   Downside Risk (Median - P10): {:.3}%", downside_risk * 100.0);
    println!("{}", downside_msg);
    let ratio_msg = format!("   Risk-Reward Ratio: {:.2}", risk_reward_ratio);
    println!("{}", ratio_msg);
    
    if risk_reward_ratio > 1.0 {
        println!("   → Favorable risk-reward profile (upside > downside)");
    } else {
        println!("   → Unfavorable risk-reward profile (downside > upside)");
    }
    
    println!();
    println!("💡 Key Insight: Quantile-based risk metrics provide robust, non-parametric risk assessment!");
}

🔬 Quantile Analysis and Financial Risk Metrics
📊 Portfolio Return Distribution Analysis:
   Mean Return: 0.356% daily (89.7% annualized)
   Volatility: 1.301% daily (20.7% annualized)
   Median Return: 0.550%

📈 Quartile Analysis:
   Q1 (25th percentile): -0.575% (25% of days worse than this)
   Q2 (Median): 0.550% (half of days above/below)
   Q3 (75th percentile): 1.375% (75% of days worse than this)
   IQR: 1.950% (middle 50% range)

💰 Value at Risk (VaR) Analysis:
   95% VaR: -1.910% (5% chance of losing more than this daily)
   99% VaR: -2.353% (1% chance of losing more than this daily)
   90% VaR: -1.510% (10% chance of losing more than this daily)

🚨 Conditional Value at Risk (CVaR / Expected Shortfall):
   95% CVaR: -2.233% (expected loss when in worst 5% of outcomes)

📊 Performance Percentiles:
   P10: -1.510% (worst 10% threshold)
   P90: 1.910% (best 10% threshold)
   P95: 2.155% (best 5% threshold)
   P99: 2.451% (best 1% threshold)

⚖️ Risk-Return Assessment:
   Upside Pot

()

## 3. Outlier Detection and Data Quality Assessment

Using statistical methods to identify anomalous observations and assess data quality.

In [5]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::*;
    use rustlab_plotting::*;
    
    println!("🔬 Outlier Detection and Data Quality Assessment");
    println!("{}", "=".repeat(50));
    
    // Manufacturing quality control data with some outliers
    let measurements = vec64![
        // Normal measurements
        10.1, 10.0, 9.9, 10.2, 9.8, 10.3, 9.7, 10.1, 10.0, 9.9,
        10.2, 9.8, 10.1, 10.0, 9.9, 10.3, 9.7, 10.2, 10.0, 9.8,
        // Some outliers (measurement errors or process issues)
        12.5, 10.1, 9.9, 7.2, 10.0, 10.2, 15.8, 9.8, 10.1, 6.1
    ];
    
    println!("📊 Dataset: Manufacturing Quality Control Measurements");
    
    // Basic statistics
    let mean_val = measurements.mean();
    let median_val = measurements.median();
    let std_val = measurements.std(None);
    let mad_val = measurements.mad();
    
    let basic_mean = format!("   Mean: {:.2} (affected by outliers)", mean_val);
    println!("{}", basic_mean);
    let basic_median = format!("   Median: {:.2} (robust to outliers)", median_val);
    println!("{}", basic_median);
    let basic_std = format!("   Std Dev: {:.2} (inflated by outliers)", std_val);
    println!("{}", basic_std);
    let basic_mad = format!("   MAD: {:.2} (robust spread measure)", mad_val);
    println!("{}", basic_mad);
    
    println!();
    println!("🎯 Method 1: IQR-Based Outlier Detection (Tukey's Method)");
    
    // IQR method for outlier detection
    let (q1, q2, q3) = measurements.quartiles();
    let iqr = q3 - q1;
    let lower_fence = q1 - 1.5 * iqr;
    let upper_fence = q3 + 1.5 * iqr;
    
    let iqr_msg = format!("   Q1: {:.2}, Q3: {:.2}, IQR: {:.2}", q1, q3, iqr);
    println!("{}", iqr_msg);
    let fences_msg = format!("   Lower Fence: {:.2}, Upper Fence: {:.2}", lower_fence, upper_fence);
    println!("{}", fences_msg);
    
    // Identify outliers
    let mut iqr_outliers = Vec::new();
    let mut iqr_normal = Vec::new();
    
    for &value in measurements.as_slice_unchecked() {
        if value < lower_fence || value > upper_fence {
            iqr_outliers.push(value);
        } else {
            iqr_normal.push(value);
        }
    }
    
    let outlier_count = format!("   Outliers found: {} out of {} measurements", iqr_outliers.len(), measurements.len());
    println!("{}", outlier_count);
    
    if !iqr_outliers.is_empty() {
        println!("   Outlier values: {:?}", iqr_outliers);
    }
    
    println!();
    println!("🎯 Method 2: Z-Score Based Outlier Detection");
    
    // Z-score method (traditional but sensitive to outliers)
    let z_threshold = 2.5; // Common threshold for outlier detection
    let mut zscore_outliers = Vec::new();
    
    for &value in measurements.as_slice_unchecked() {
        let z_score = (value - mean_val) / std_val;
        if z_score.abs() > z_threshold {
            zscore_outliers.push((value, z_score));
        }
    }
    
    let z_threshold_msg = format!("   Z-score threshold: ±{:.1}", z_threshold);
    println!("{}", z_threshold_msg);
    let z_outliers_msg = format!("   Outliers found: {} measurements", zscore_outliers.len());
    println!("{}", z_outliers_msg);
    
    if !zscore_outliers.is_empty() {
        for (value, z_score) in &zscore_outliers {
            let z_detail = format!("     Value: {:.2}, Z-score: {:.2}", value, z_score);
            println!("{}", z_detail);
        }
    }
    
    println!();
    println!("🎯 Method 3: Modified Z-Score (Robust Method)");
    
    // Modified Z-score using median and MAD (more robust)
    let modified_threshold = 3.5; // Common threshold for modified Z-score
    let mut modified_outliers = Vec::new();
    
    for &value in measurements.as_slice_unchecked() {
        let modified_z = 0.6745 * (value - median_val) / mad_val; // 0.6745 makes MAD comparable to std dev
        if modified_z.abs() > modified_threshold {
            modified_outliers.push((value, modified_z));
        }
    }
    
    let mod_threshold_msg = format!("   Modified Z-score threshold: ±{:.1}", modified_threshold);
    println!("{}", mod_threshold_msg);
    let mod_outliers_msg = format!("   Outliers found: {} measurements", modified_outliers.len());
    println!("{}", mod_outliers_msg);
    
    if !modified_outliers.is_empty() {
        for (value, mod_z) in &modified_outliers {
            let mod_detail = format!("     Value: {:.2}, Modified Z-score: {:.2}", value, mod_z);
            println!("{}", mod_detail);
        }
    }
    
    println!();
    println!("📈 Data Quality Assessment:");
    
    // Calculate outlier percentage
    let outlier_percentage = (iqr_outliers.len() as f64 / measurements.len() as f64) * 100.0;
    let quality_pct = format!("   Outlier rate: {:.1}% using IQR method", outlier_percentage);
    println!("{}", quality_pct);
    
    // Data quality classification
    let quality_assessment = match outlier_percentage {
        p if p < 5.0 => "Excellent - typical outlier rate",
        p if p < 10.0 => "Good - moderate outlier rate",
        p if p < 20.0 => "Fair - high outlier rate, investigate process",
        _ => "Poor - excessive outliers, process control needed",
    };
    let quality_msg = format!("   Quality Assessment: {}", quality_assessment);
    println!("{}", quality_msg);
    
    // Impact of outliers on central tendency
    let iqr_normal_vec = VectorF64::from_slice(&iqr_normal);
    let clean_mean = iqr_normal_vec.mean();
    let mean_impact = ((mean_val - clean_mean) / clean_mean * 100.0).abs();
    
    let impact_msg = format!("   Outlier impact on mean: {:.1}% distortion", mean_impact);
    println!("{}", impact_msg);
    let clean_comparison = format!("   Clean data mean: {:.2} vs Outlier-affected mean: {:.2}", clean_mean, mean_val);
    println!("{}", clean_comparison);
    
    println!();
    println!("🛠️ Outlier Detection Method Comparison:");
    println!("{}", "-".repeat(35));
    println!("🔵 IQR Method (Tukey's):");
    println!("   • Pros: Robust, distribution-free, standard for box plots");
    println!("   • Cons: May miss mild outliers in small samples");
    println!("   • Best for: Exploratory analysis, visual inspection");
    
    println!();
    println!("🟡 Z-Score Method:");
    println!("   • Pros: Easy to interpret, works well for normal data");
    println!("   • Cons: Sensitive to outliers (masking effect)");
    println!("   • Best for: Normal distributions, theoretical work");
    
    println!();
    println!("🟢 Modified Z-Score (MAD-based):");
    println!("   • Pros: Robust to outliers, reliable for any distribution");
    println!("   • Cons: Less familiar, requires robust statistics");
    println!("   • Best for: Real-world data, process monitoring");
    
    println!();
    println!("💡 Key Insight: Use multiple methods and domain knowledge for robust outlier detection!");
}

🔬 Outlier Detection and Data Quality Assessment
📊 Dataset: Manufacturing Quality Control Measurements
   Mean: 10.06 (affected by outliers)
   Median: 10.00 (robust to outliers)
   Std Dev: 1.48 (inflated by outliers)
   MAD: 0.20 (robust spread measure)

🎯 Method 1: IQR-Based Outlier Detection (Tukey's Method)
   Q1: 9.83, Q3: 10.17, IQR: 0.35
   Lower Fence: 9.30, Upper Fence: 10.70
   Outliers found: 4 out of 30 measurements
   Outlier values: [12.5, 7.2, 15.8, 6.1]

🎯 Method 2: Z-Score Based Outlier Detection
   Z-score threshold: ±2.5
   Outliers found: 2 measurements
     Value: 15.80, Z-score: 3.88
     Value: 6.10, Z-score: -2.67

🎯 Method 3: Modified Z-Score (Robust Method)
   Modified Z-score threshold: ±3.5
   Outliers found: 4 measurements
     Value: 12.50, Modified Z-score: 8.43
     Value: 7.20, Modified Z-score: -9.44
     Value: 15.80, Modified Z-score: 19.56
     Value: 6.10, Modified Z-score: -13.15

📈 Data Quality Assessment:
   Outlier rate: 13.3% using IQR method


()

## 4. Summary and Best Practices

Comprehensive guidelines for distribution analysis and practical recommendations.

In [6]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::*;
    use rustlab_plotting::*;
    
    println!("🎯 Distribution Analysis: Summary and Best Practices");
    println!("{}", "=".repeat(55));
    
    // Comprehensive example demonstrating all concepts
    let real_world_data = vec64![
        // Core data cluster (80% of observations)
        45.2, 46.1, 44.8, 47.3, 45.9, 46.7, 44.5, 47.8, 45.1, 46.3,
        46.8, 45.4, 47.1, 44.9, 46.5, 45.7, 47.0, 44.7, 46.2, 45.8,
        46.9, 45.3, 47.4, 44.6, 46.6, 45.5, 47.2, 45.0, 46.4, 46.0,
        // Right tail (15% - higher values)
        48.5, 49.2, 50.1, 48.8, 49.7, 51.3, 48.9, 49.5,
        // Extreme outliers (5% - anomalous values)
        55.8, 58.2, 42.1, 39.5
    ];
    
    println!("📊 Complete Distribution Analysis Workflow:");
    println!("{}", "-".repeat(40));
    
    // Step 1: Shape characterization
    println!("1. 📈 Shape Characterization:");
    let skew = real_world_data.skewness();
    let kurt = real_world_data.kurtosis();
    let mean_val = real_world_data.mean();
    let median_val = real_world_data.median();
    
    let shape_skew = format!("   Skewness: {:.3} → Distribution is {}", skew, 
        if skew.abs() < 0.5 { "approximately symmetric" }
        else if skew > 0.0 { "right-skewed" }
        else { "left-skewed" });
    println!("{}", shape_skew);
    
    let shape_kurt = format!("   Kurtosis: {:.3} → Tails are {}", kurt,
        if kurt.abs() < 0.5 { "normal-like" }
        else if kurt > 0.0 { "heavier than normal" }
        else { "lighter than normal" });
    println!("{}", shape_kurt);
    
    let central_comparison = format!("   Mean: {:.2}, Median: {:.2} → {}", mean_val, median_val,
        if (mean_val - median_val).abs() < 0.1 { "Symmetric center" }
        else if mean_val > median_val { "Right tail pulls mean up" }
        else { "Left tail pulls mean down" });
    println!("{}", central_comparison);
    
    println!();
    
    // Step 2: Quantile analysis
    println!("2. 📊 Quantile-Based Analysis:");
    let (q1, q2, q3) = real_world_data.quartiles();
    let iqr = real_world_data.iqr();
    
    let quartile_summary = format!("   Quartiles: Q1={:.2}, Q2={:.2}, Q3={:.2}", q1, q2, q3);
    println!("{}", quartile_summary);
    let iqr_summary = format!("   IQR: {:.2} (middle 50% spread)", iqr);
    println!("{}", iqr_summary);
    
    // Risk percentiles
    let p5 = real_world_data.percentile(5.0, None);
    let p95 = real_world_data.percentile(95.0, None);
    let p99 = real_world_data.percentile(99.0, None);
    
    let risk_percentiles = format!("   Risk metrics: P5={:.2}, P95={:.2}, P99={:.2}", p5, p95, p99);
    println!("{}", risk_percentiles);
    
    println!();
    
    // Step 3: Outlier assessment
    println!("3. 🔍 Outlier Assessment:");
    let lower_fence = q1 - 1.5 * iqr;
    let upper_fence = q3 + 1.5 * iqr;
    
    let mut outlier_count = 0;
    let mut outlier_values = Vec::new();
    
    for &value in real_world_data.as_slice_unchecked() {
        if value < lower_fence || value > upper_fence {
            outlier_count += 1;
            outlier_values.push(value);
        }
    }
    
    let outlier_analysis = format!("   Outliers: {} found using IQR method ({:.1}%)", 
        outlier_count, (outlier_count as f64 / real_world_data.len() as f64) * 100.0);
    println!("{}", outlier_analysis);
    
    if !outlier_values.is_empty() {
        let extreme_values = format!("   Extreme values: {:?}", outlier_values);
        println!("{}", extreme_values);
    }
    
    println!();
    
    // Step 4: Risk assessment
    println!("4. ⚠️ Risk Assessment:");
    
    // Calculate tail ratios
    let lower_tail_length = q2 - q1;
    let upper_tail_length = q3 - q2;
    let tail_asymmetry = (upper_tail_length - lower_tail_length) / iqr;
    
    let tail_assessment = format!("   Tail asymmetry: {:.3} (0=symmetric, >0=right heavy, <0=left heavy)", tail_asymmetry);
    println!("{}", tail_assessment);
    
    // Risk classification
    let risk_level = match (kurt.abs(), outlier_count as f64 / real_world_data.len() as f64) {
        (k, o) if k > 2.0 || o > 0.1 => "High",
        (k, o) if k > 1.0 || o > 0.05 => "Moderate",
        _ => "Low",
    };
    
    let risk_classification = format!("   Overall risk level: {} (based on kurtosis and outlier rate)", risk_level);
    println!("{}", risk_classification);
    
    println!();
    
    // Best practices summary
    println!("🏆 Distribution Analysis Best Practices:");
    println!("{}", "-".repeat(35));
    
    println!("📊 Essential Analysis Steps:");
    println!("   1. Always examine shape (skewness, kurtosis) before assuming normality");
    println!("   2. Use robust statistics (median, MAD, IQR) for initial exploration");
    println!("   3. Apply multiple outlier detection methods for validation");
    println!("   4. Calculate risk metrics using quantiles (VaR, CVaR)");
    println!("   5. Compare classical vs robust measures to assess outlier impact");
    
    println!();
    println!("⚠️ Common Pitfalls to Avoid:");
    println!("   • Assuming normality without testing distribution shape");
    println!("   • Using only mean/std for skewed or heavy-tailed data");
    println!("   • Removing outliers without understanding their source");
    println!("   • Ignoring tail behavior in risk-sensitive applications");
    println!("   • Over-relying on single outlier detection method");
    
    println!();
    println!("🎯 Method Selection Guidelines:");
    println!("   📈 For shape analysis: Always compute skewness & kurtosis");
    println!("   📊 For central tendency: Use median for skewed data");
    println!("   📏 For spread: Use IQR/MAD for robust measurement");
    println!("   🔍 For outliers: Combine IQR and modified Z-score methods");
    println!("   💰 For risk: Use quantile-based VaR and CVaR measures");
    
    println!();
    println!("🚀 Next Steps: Apply these concepts to your domain-specific risk management needs!");
    println!("   • Financial: Portfolio risk assessment and stress testing");
    println!("   • Quality: Process control and capability analysis");
    println!("   • Medical: Reference intervals and diagnostic thresholds");
    println!("   • Operations: Performance monitoring and SLA management");
}

🎯 Distribution Analysis: Summary and Best Practices
📊 Complete Distribution Analysis Workflow:
----------------------------------------
1. 📈 Shape Characterization:
   Skewness: 1.436 → Distribution is right-skewed
   Kurtosis: 1.925 → Tails are heavier than normal
   Mean: 46.96, Median: 46.45 → Right tail pulls mean up

2. 📊 Quantile-Based Analysis:
   Quartiles: Q1=45.32, Q2=46.45, Q3=47.70
   IQR: 2.38 (middle 50% spread)
   Risk metrics: P5=44.51, P95=51.24, P99=57.22

3. 🔍 Outlier Assessment:
   Outliers: 4 found using IQR method (9.5%)
   Extreme values: [51.3, 55.8, 58.2, 39.5]

4. ⚠️ Risk Assessment:
   Tail asymmetry: 0.053 (0=symmetric, >0=right heavy, <0=left heavy)
   Overall risk level: Moderate (based on kurtosis and outlier rate)

🏆 Distribution Analysis Best Practices:
-----------------------------------
📊 Essential Analysis Steps:
   1. Always examine shape (skewness, kurtosis) before assuming normality
   2. Use robust statistics (median, MAD, IQR) for initial explor

()