# Descriptive Statistics Showcase

A comprehensive exploration of advanced descriptive statistics using the RustLab ecosystem. This notebook demonstrates robust statistical measures, distribution analysis, and statistical visualization techniques.

## Learning Objectives

- **Advanced Descriptives**: Beyond mean and standard deviation
- **Robust Statistics**: Handling outliers and non-normal data
- **Distribution Analysis**: Shape, spread, and central tendency measures
- **Statistical Visualization**: Publication-ready plots for data exploration
- **Real-World Applications**: Financial, scientific, and quality control examples

## Mathematical Foundation

We'll explore statistics that provide deeper insights into data characteristics:

- **Location**: Mean, median, mode, trimmed mean
- **Spread**: Standard deviation, MAD, IQR, range
- **Shape**: Skewness (asymmetry), kurtosis (tail behavior)
- **Robustness**: Breakdown points and influence functions

In [2]:
// 📦 Setup: Dependencies and Imports
:dep rustlab-stats = { path = ".." }
:dep rustlab-math = { path = "../../rustlab-math" }
:dep rustlab-plotting = { path = "../../rustlab-plotting" }

// Global imports - these persist across all cells
use rustlab_stats::prelude::*;
use rustlab_math::*;
use rustlab_plotting::*;

// Test that everything is working
{
    let test_data = vec64![1.0, 2.0, 3.0, 4.0, 5.0];
    let mean_val = test_data.mean();
    let median_val = test_data.median();
    
    let setup_msg = format!("🎯 Setup complete! Test mean: {:.1}, median: {:.1}", mean_val, median_val);
    println!("{}", setup_msg);
    println!("📊 Ready for statistical analysis with visualization");
}

🎯 Setup complete! Test mean: 3.0, median: 3.0
📊 Ready for statistical analysis with visualization


()

## 1. Basic vs Advanced Descriptive Statistics

Let's start by comparing basic statistics (mean, std) with advanced robust measures using a dataset with outliers.

In [3]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::*;
    use rustlab_plotting::*;
    
    // Create dataset with outliers (sensor measurements with anomalies)
    let normal_readings = vec64![22.1, 22.3, 21.9, 22.5, 22.0, 21.8, 22.4, 22.2, 21.7, 22.6];
    let outlier_readings = vec64![22.1, 22.3, 21.9, 22.5, 35.0, 21.8, 22.4, 22.2, 18.5, 22.6]; // 35.0 and 18.5 are outliers
    
    println!("📊 Comparing Classical vs Robust Statistics");
    println!("{}", "=".repeat(50));
    
    // Classical statistics
    let normal_mean = normal_readings.mean();
    let normal_std = normal_readings.std(None);
    let outlier_mean = outlier_readings.mean();
    let outlier_std = outlier_readings.std(None);
    
    // Robust statistics
    let normal_median = normal_readings.median();
    let normal_mad = normal_readings.mad();
    let outlier_median = outlier_readings.median();
    let outlier_mad = outlier_readings.mad();
    
    println!("🔵 Normal Data (no outliers):");
    let normal_summary = format!("   Mean: {:.2}, Std: {:.2}", normal_mean, normal_std);
    println!("{}", normal_summary);
    let normal_robust = format!("   Median: {:.2}, MAD: {:.2}", normal_median, normal_mad);
    println!("{}", normal_robust);
    
    println!();
    println!("🔴 Data with Outliers:");
    let outlier_summary = format!("   Mean: {:.2}, Std: {:.2} (affected by outliers)", outlier_mean, outlier_std);
    println!("{}", outlier_summary);
    let outlier_robust = format!("   Median: {:.2}, MAD: {:.2} (robust to outliers)", outlier_median, outlier_mad);
    println!("{}", outlier_robust);
    
    // Calculate impact
    let mean_change = ((outlier_mean - normal_mean) / normal_mean * 100.0).abs();
    let median_change = ((outlier_median - normal_median) / normal_median * 100.0).abs();
    
    println!();
    println!("📈 Impact Analysis:");
    let impact_mean = format!("   Mean changed by: {:.1}%", mean_change);
    println!("{}", impact_mean);
    let impact_median = format!("   Median changed by: {:.1}%", median_change);
    println!("{}", impact_median);
    
    println!();
    println!("💡 Key Insight: Robust statistics (median, MAD) are much less affected by outliers!");
}

📊 Comparing Classical vs Robust Statistics
🔵 Normal Data (no outliers):
   Mean: 22.15, Std: 0.30
   Median: 22.15, MAD: 0.25

🔴 Data with Outliers:
   Mean: 23.13, Std: 4.34 (affected by outliers)
   Median: 22.25, MAD: 0.30 (robust to outliers)

📈 Impact Analysis:
   Mean changed by: 4.4%
   Median changed by: 0.5%

💡 Key Insight: Robust statistics (median, MAD) are much less affected by outliers!


()

## 2. Statistical Visualization: Distribution Plots

Let's create some basic statistical visualizations to understand our data better.

In [8]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::*;
    use rustlab_plotting::*;
    
    // Create sample datasets for visualization
    let normal_sample = vec64![12.1, 11.8, 12.5, 11.9, 12.3, 12.0, 12.2, 11.7, 12.4, 12.6, 
                              11.5, 12.8, 12.1, 11.6, 12.7, 12.3, 11.9, 12.0, 12.5, 12.2];
    
    let skewed_sample = vec64![1.2, 1.5, 1.8, 2.0, 2.1, 2.3, 2.5, 2.8, 3.2, 3.8, 
                              4.5, 5.2, 6.0, 7.5, 9.0, 11.0, 13.5, 16.0, 20.0, 25.0];
    
    println!("📊 Creating Statistical Distribution Plots");
    println!("{}", "=".repeat(45));
    
    // Calculate statistics for annotation
    let normal_mean = normal_sample.mean();
    let normal_median = normal_sample.median();
    let normal_std = normal_sample.std(None);
    
    let skewed_mean = skewed_sample.mean();
    let skewed_median = skewed_sample.median();
    let skewed_skewness = skewed_sample.skewness();
    
    // Create histogram for normal-like data using the standalone function
    let hist_title = format!("Normal-like Distribution\nMean: {:.2}, Median: {:.2}, Std: {:.2}", 
                             normal_mean, normal_median, normal_std);
    
    // Use Plot::new() with proper method chaining
    let _result = Plot::new()
        .histogram(&normal_sample, 8)
        .title(&hist_title)
        .xlabel("Value")
        .ylabel("Frequency")
        .show();
    
    
    // Create histogram for skewed data
    let skewed_title = format!("Right-Skewed Distribution\nMean: {:.2}, Median: {:.2}, Skewness: {:.3}", 
                               skewed_mean, skewed_median, skewed_skewness);
    
    let _result2 = Plot::new()
        .histogram(&skewed_sample, 10)
        .title(&skewed_title)
        .xlabel("Value")
        .ylabel("Frequency")
        .show();
    
    
    // Summary statistics for both distributions
    println!();
    println!("📈 Distribution Comparison:");
    println!("{}", "-".repeat(30));
    
    println!("🔵 Normal-like Distribution:");
    let normal_summary = format!("   Mean: {:.2}, Median: {:.2} (close values = symmetric)", normal_mean, normal_median);
    println!("{}", normal_summary);
    let normal_spread = format!("   Standard deviation: {:.2}", normal_std);
    println!("{}", normal_spread);
    
    println!();
    println!("🔴 Right-Skewed Distribution:");
    let skewed_summary = format!("   Mean: {:.2}, Median: {:.2} (mean > median = right skew)", skewed_mean, skewed_median);
    println!("{}", skewed_summary);
    let skewed_shape = format!("   Skewness: {:.3} (positive = right tail)", skewed_skewness);
    println!("{}", skewed_shape);
    
    println!();
    println!("📊 Visualization Benefits:");
    println!("• Histograms reveal distribution shape and modality");
    println!("• Statistical annotations provide quantitative context");
    println!("• Comparison enables pattern recognition");
}

📊 Creating Statistical Distribution Plots

📈 Distribution Comparison:
------------------------------
🔵 Normal-like Distribution:
   Mean: 12.15, Median: 12.15 (close values = symmetric)
   Standard deviation: 0.36

🔴 Right-Skewed Distribution:
   Mean: 7.04, Median: 4.15 (mean > median = right skew)
   Skewness: 1.480 (positive = right tail)

📊 Visualization Benefits:
• Histograms reveal distribution shape and modality
• Statistical annotations provide quantitative context
• Comparison enables pattern recognition


## 3. Summary and Best Practices

Key takeaways from our exploration of descriptive statistics.

In [5]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::*;
    use rustlab_plotting::*;
    
    println!("🎯 Descriptive Statistics: Summary and Best Practices");
    println!("{}", "=".repeat(55));
    
    // Create a comprehensive example dataset
    let mixed_data = vec64![10.2, 10.5, 10.1, 10.8, 10.3, 10.6, 10.4, 10.7, 25.0, 10.0]; // Contains one outlier
    
    println!("📊 Complete Statistical Analysis Example:");
    println!("{}", "-".repeat(35));
    
    // Basic statistics
    let mean = mixed_data.mean();
    let median = mixed_data.median();
    let std_dev = mixed_data.std(None);
    let mad = mixed_data.mad();
    
    // Shape statistics
    let skewness = mixed_data.skewness();
    let kurtosis = mixed_data.kurtosis();
    
    // Quantile statistics
    let q1 = mixed_data.quantile(0.25, None);
    let q3 = mixed_data.quantile(0.75, None);
    let iqr = q3 - q1;
    
    println!("📈 Location Measures:");
    let location = format!("   Mean: {:.2} (affected by outlier)", mean);
    println!("{}", location);
    let robust_location = format!("   Median: {:.2} (robust to outlier)", median);
    println!("{}", robust_location);
    
    println!();
    println!("📏 Spread Measures:");
    let classical_spread = format!("   Standard Deviation: {:.2} (inflated by outlier)", std_dev);
    println!("{}", classical_spread);
    let robust_spread = format!("   MAD: {:.2} (robust to outlier)", mad);
    println!("{}", robust_spread);
    let quartile_spread = format!("   IQR: {:.2} (quartile-based spread)", iqr);
    println!("{}", quartile_spread);
    
    println!();
    println!("📊 Shape Measures:");
    let skew_interpretation = if skewness > 0.5 {
        "strong right skew (outlier effect)"
    } else if skewness.abs() < 0.5 {
        "approximately symmetric"
    } else {
        "left skewed"
    };
    let shape_skew = format!("   Skewness: {:.2} ({})", skewness, skew_interpretation);
    println!("{}", shape_skew);
    let shape_kurt = format!("   Kurtosis: {:.2} (tail heaviness)", kurtosis);
    println!("{}", shape_kurt);
    
    println!();
    println!("🏆 Best Practices Summary:");
    println!("{}", "-".repeat(25));
    
    println!("1. 📊 Always examine multiple statistics:");
    println!("   • Compare mean vs median for symmetry");
    println!("   • Compare std dev vs MAD for outlier sensitivity");
    println!("   • Check skewness and kurtosis for shape");
    
    println!();
    println!("2. 🎨 Use visualization effectively:");
    println!("   • Histograms for distribution shape");
    println!("   • Statistical annotations provide context");
    
    println!();
    println!("3. 🔧 Choose appropriate statistics:");
    println!("   • Robust statistics for real-world data");
    println!("   • Classical statistics for theoretical work");
    println!("   • Quantiles for risk and quality analysis");
    
    println!();
    println!("🎯 Next Steps: Explore hypothesis testing and correlation analysis!");
}

🎯 Descriptive Statistics: Summary and Best Practices
📊 Complete Statistical Analysis Example:
-----------------------------------
📈 Location Measures:
   Mean: 11.86 (affected by outlier)
   Median: 10.45 (robust to outlier)

📏 Spread Measures:
   Standard Deviation: 4.62 (inflated by outlier)
   MAD: 0.25 (robust to outlier)
   IQR: 0.45 (quartile-based spread)

📊 Shape Measures:
   Skewness: 3.14 (strong right skew (outlier effect))
   Kurtosis: 6.91 (tail heaviness)

🏆 Best Practices Summary:
-------------------------
1. 📊 Always examine multiple statistics:
   • Compare mean vs median for symmetry
   • Compare std dev vs MAD for outlier sensitivity
   • Check skewness and kurtosis for shape

2. 🎨 Use visualization effectively:
   • Histograms for distribution shape
   • Statistical annotations provide context

3. 🔧 Choose appropriate statistics:
   • Robust statistics for real-world data
   • Classical statistics for theoretical work
   • Quantiles for risk and quality analysis

🎯 

()