# Reductions and Statistics

Master statistical operations and data reductions with RustLab's ergonomic, math-first syntax. This notebook covers essential data analysis techniques for scientific computing and data science.

## What You'll Learn

1. **Basic Reductions** - Sum, mean, min, max operations
2. **Statistical Measures** - Variance, standard deviation, range
3. **Distribution Analysis** - Quantiles, percentiles, median
4. **Axis Operations** - Row-wise and column-wise reductions
5. **Real-World Examples** - Practical data analysis workflows

## Setup

**Important**: This notebook follows Rust notebook best practices:
- Dependencies and imports persist across all cells
- Each code cell is self-contained and rust-analyzer compatible
- No lint directives needed - clean, explicit code throughout

In [2]:
// Setup Cell - dependencies and imports persist across all cells
:dep rustlab-math = { path = ".." }
:dep rustlab-stats = { path = "../../rustlab-stats" }

// Top-level imports - these persist across all cells!
use rustlab_math::*;
use rustlab_stats::*;
use rustlab_stats::advanced::Quantiles;
use std::f64::consts::PI;

let setup_msg = "✅ Setup complete! Ready to explore statistics.";
println!("{}", setup_msg);

✅ Setup complete! Ready to explore statistics.


## 1. Basic Reduction Operations

Start with fundamental operations that reduce arrays to single values:

In [3]:
{
    // Create sample data for demonstrations
    let sample_data = vec64![1.2, 3.4, 2.1, 5.6, 4.3, 2.8, 6.1, 1.9, 3.7, 4.5];
    let data_msg = format!("Sample data: {:?}", sample_data.to_slice());
    println!("{}", data_msg);

    // Basic aggregation operations
    let sum_val = sample_data.sum_elements();
    let mean_val = sample_data.mean();
    let min_val = sample_data.min().unwrap_or(0.0);
    let max_val = sample_data.max().unwrap_or(0.0);
    let range_val = max_val - min_val;
    let count_val = sample_data.len();

    println!();
    let header = "Basic Reductions:";
    println!("{}", header);
    let sum_msg = format!("  Sum:     {:.2}", sum_val);
    println!("{}", sum_msg);
    let mean_msg = format!("  Mean:    {:.3}", mean_val);
    println!("{}", mean_msg);
    let min_msg = format!("  Min:     {:.1}", min_val);
    println!("{}", min_msg);
    let max_msg = format!("  Max:     {:.1}", max_val);
    println!("{}", max_msg);
    let range_msg = format!("  Range:   {:.1}", range_val);
    println!("{}", range_msg);
    let count_msg = format!("  Count:   {}", count_val);
    println!("{}", count_msg);

    // Product and norms
    let positive_data = vec64![2.0, 4.0, 8.0, 16.0];
    let product_val = positive_data.product();  // Fixed: use product() instead of product_elements()
    let norm_val = sample_data.norm();

    println!();
    let pos_data_msg = format!("Positive data: {:?}", positive_data.to_slice());
    println!("{}", pos_data_msg);
    let product_msg = format!("  Product: {:.0}", product_val);
    println!("{}", product_msg);
    let norm_msg = format!("  L2 norm: {:.3}", norm_val);
    println!("{}", norm_msg);
}

Sample data: [1.2, 3.4, 2.1, 5.6, 4.3, 2.8, 6.1, 1.9, 3.7, 4.5]

Basic Reductions:
  Sum:     35.60
  Mean:    3.560
  Min:     1.2
  Max:     6.1
  Range:   4.9
  Count:   10

Positive data: [2.0, 4.0, 8.0, 16.0]
  Product: 1024
  L2 norm: 12.242


()

## 2. Measures of Variability

Understanding data spread with variance, standard deviation, and other measures:

In [4]:
{
    // Sample datasets with different variability
    let low_var_data = vec64![5.0, 5.1, 4.9, 5.2, 4.8, 5.0, 5.1, 4.9];
    let high_var_data = vec64![1.0, 8.0, 3.0, 9.0, 2.0, 7.0, 4.0, 6.0];

    let header = "Comparing datasets with different variability:";
    println!("{}", header);
    println!();

    // Low variability analysis
    let low_mean = low_var_data.mean();
    let low_var = low_var_data.var(None);  // Fixed: use var() instead of variance()
    let low_std = low_var.sqrt();
    let low_min = low_var_data.min().unwrap_or(0.0);
    let low_max = low_var_data.max().unwrap_or(0.0);
    let low_range = low_max - low_min;

    let low_data_msg = format!("Low variability: {:?}", low_var_data.to_slice());
    println!("{}", low_data_msg);
    let low_mean_msg = format!("  Mean:     {:.3}", low_mean);
    println!("{}", low_mean_msg);
    let low_var_msg = format!("  Variance: {:.6}", low_var);
    println!("{}", low_var_msg);
    let low_std_msg = format!("  Std Dev:  {:.6}", low_std);
    println!("{}", low_std_msg);
    let low_range_msg = format!("  Range:    {:.1}", low_range);
    println!("{}", low_range_msg);

    println!();

    // High variability analysis
    let high_mean = high_var_data.mean();
    let high_var = high_var_data.var(None);  // Fixed: use var() instead of variance()
    let high_std = high_var.sqrt();
    let high_min = high_var_data.min().unwrap_or(0.0);
    let high_max = high_var_data.max().unwrap_or(0.0);
    let high_range = high_max - high_min;

    let high_data_msg = format!("High variability: {:?}", high_var_data.to_slice());
    println!("{}", high_data_msg);
    let high_mean_msg = format!("  Mean:     {:.3}", high_mean);
    println!("{}", high_mean_msg);
    let high_var_msg = format!("  Variance: {:.6}", high_var);
    println!("{}", high_var_msg);
    let high_std_msg = format!("  Std Dev:  {:.6}", high_std);
    println!("{}", high_std_msg);
    let high_range_msg = format!("  Range:    {:.1}", high_range);
    println!("{}", high_range_msg);

    // Coefficient of variation
    let cv_low = low_std / low_mean;
    let cv_high = high_std / high_mean;

    println!();
    let cv_header = "Coefficient of Variation:";
    println!("{}", cv_header);
    let cv_low_msg = format!("  Low var dataset:  {:.3} ({:.1}%)", cv_low, cv_low * 100.0);
    println!("{}", cv_low_msg);
    let cv_high_msg = format!("  High var dataset: {:.3} ({:.1}%)", cv_high, cv_high * 100.0);
    println!("{}", cv_high_msg);
}

Comparing datasets with different variability:

Low variability: [5.0, 5.1, 4.9, 5.2, 4.8, 5.0, 5.1, 4.9]
  Mean:     5.000
  Variance: 0.017143
  Std Dev:  0.130931
  Range:    0.4

High variability: [1.0, 8.0, 3.0, 9.0, 2.0, 7.0, 4.0, 6.0]
  Mean:     5.000
  Variance: 8.571429
  Std Dev:  2.927700
  Range:    8.0

Coefficient of Variation:
  Low var dataset:  0.026 (2.6%)
  High var dataset: 0.586 (58.6%)


()

## 3. Quantiles and Percentiles

Understanding data distribution through quartiles and percentiles:

In [5]:
{
    // Test scores for quantile analysis
    let test_scores = vec64![65.0, 72.0, 78.0, 81.0, 85.0, 88.0, 90.0, 92.0, 94.0, 97.0];
    let scores_msg = format!("Test Scores: {:?}", test_scores.to_slice());
    println!("{}", scores_msg);

    // Five-number summary
    let min_score = test_scores.min().unwrap_or(0.0);
    let q1_score = test_scores.quantile(0.25, None);  // Fixed: added None for QuantileMethod
    let median_score = test_scores.median();
    let q3_score = test_scores.quantile(0.75, None);  // Fixed: added None for QuantileMethod
    let max_score = test_scores.max().unwrap_or(0.0);

    println!();
    let summary_header = "Five-Number Summary:";
    println!("{}", summary_header);
    let min_msg = format!("  Minimum (Q0):    {:.1}", min_score);
    println!("{}", min_msg);
    let q1_msg = format!("  First quartile (Q1): {:.1}", q1_score);
    println!("{}", q1_msg);
    let median_msg = format!("  Median (Q2):     {:.1}", median_score);
    println!("{}", median_msg);
    let q3_msg = format!("  Third quartile (Q3): {:.1}", q3_score);
    println!("{}", q3_msg);
    let max_msg = format!("  Maximum (Q4):    {:.1}", max_score);
    println!("{}", max_msg);

    // Interquartile range (IQR)
    let iqr_val = q3_score - q1_score;
    let range_val = max_score - min_score;

    println!();
    let var_header = "Variability measures:";
    println!("{}", var_header);
    let iqr_msg = format!("  IQR (Q3 - Q1):   {:.1}", iqr_val);
    println!("{}", iqr_msg);
    let range_msg = format!("  Range (Max-Min): {:.1}", range_val);
    println!("{}", range_msg);

    // Common percentiles
    let percentile_values = [10.0, 25.0, 50.0, 75.0, 90.0, 95.0, 99.0];
    println!();
    let perc_header = "Percentiles:";
    println!("{}", perc_header);
    for percentile in percentile_values {
        let percentile_val = test_scores.quantile(percentile / 100.0, None);  // Fixed: added None for QuantileMethod
        let p_msg = format!("  {:2.0}th percentile: {:.1}", percentile, percentile_val);
        println!("{}", p_msg);
    }

    // Outlier detection using IQR method
    let outlier_fence_low = q1_score - 1.5 * iqr_val;
    let outlier_fence_high = q3_score + 1.5 * iqr_val;

    println!();
    let outlier_header = "Outlier Detection (IQR method):";
    println!("{}", outlier_header);
    let fence_low_msg = format!("  Lower fence: {:.1}", outlier_fence_low);
    println!("{}", fence_low_msg);
    let fence_high_msg = format!("  Upper fence: {:.1}", outlier_fence_high);
    println!("{}", fence_high_msg);

    // Check for outliers
    let outlier_count = test_scores.iter().filter(|&&x| x < outlier_fence_low || x > outlier_fence_high).count();
    if outlier_count == 0 {
        let no_outliers_msg = "  No outliers detected";
        println!("{}", no_outliers_msg);
    } else {
        let outlier_msg = format!("  {} outlier(s) detected", outlier_count);
        println!("{}", outlier_msg);
    }
}

Test Scores: [65.0, 72.0, 78.0, 81.0, 85.0, 88.0, 90.0, 92.0, 94.0, 97.0]

Five-Number Summary:
  Minimum (Q0):    65.0
  First quartile (Q1): 78.8
  Median (Q2):     86.5
  Third quartile (Q3): 91.5
  Maximum (Q4):    97.0

Variability measures:
  IQR (Q3 - Q1):   12.8
  Range (Max-Min): 32.0

Percentiles:
  10th percentile: 71.3
  25th percentile: 78.8
  50th percentile: 86.5
  75th percentile: 91.5
  90th percentile: 94.3
  95th percentile: 95.6
  99th percentile: 96.7

Outlier Detection (IQR method):
  Lower fence: 59.6
  Upper fence: 110.6
  No outliers detected


()

## 4. Real-World Example: Sales Data Analysis

Complete statistical analysis workflow using quarterly sales data:

In [6]:
{
    // Quarterly sales data: [Q1, Q2, Q3, Q4] for different regions
    let north_sales = vec64![245_000.0, 267_000.0, 289_000.0, 231_000.0];
    let south_sales = vec64![189_000.0, 201_000.0, 225_000.0, 198_000.0];
    let east_sales = vec64![312_000.0, 329_000.0, 345_000.0, 298_000.0];
    let west_sales = vec64![278_000.0, 295_000.0, 318_000.0, 276_000.0];

    let quarters = ["Q1", "Q2", "Q3", "Q4"];
    let regions = ["North", "South", "East", "West"];
    let all_regions = [&north_sales, &south_sales, &east_sales, &west_sales];

    let title = "Quarterly Sales Data Analysis";
    println!("{}", title);
    let separator = "=".repeat(50);
    println!("{}", separator);
    println!();

    // Regional performance analysis
    let regional_header = "Regional Performance Summary:";
    println!("{}", regional_header);
    let header_sep = "-".repeat(65);
    println!("{}", header_sep);
    let table_header = format!("{:<8} {:>12} {:>12} {:>12} {:>8}", 
             "Region", "Total Sales", "Avg/Quarter", "Std Dev", "CV%");
    println!("{}", table_header);
    println!("{}", header_sep);

    for (i, region_name) in regions.iter().enumerate() {
        let region_data = all_regions[i];
        let total_sales = region_data.sum_elements();
        let avg_quarterly = region_data.mean();
        let std_dev = region_data.var(None).sqrt();  // Fixed: use var() instead of variance()
        let cv_percent = (std_dev / avg_quarterly) * 100.0;
        
        let region_row = format!("{:<8} {:>12.0} {:>12.0} {:>12.0} {:>7.1}%", 
                               region_name, total_sales, avg_quarterly, std_dev, cv_percent);
        println!("{}", region_row);
    }

    // Overall statistics
    let all_sales_data = vec64![
        245_000.0, 267_000.0, 289_000.0, 231_000.0,  // North
        189_000.0, 201_000.0, 225_000.0, 198_000.0,  // South
        312_000.0, 329_000.0, 345_000.0, 298_000.0,  // East
        278_000.0, 295_000.0, 318_000.0, 276_000.0   // West
    ];

    let total_revenue = all_sales_data.sum_elements();
    let avg_performance = all_sales_data.mean();
    let sales_std = all_sales_data.var(None).sqrt();  // Fixed: use var() instead of variance()
    let sales_median = all_sales_data.median();

    println!();
    let summary_header = "Overall Statistical Summary:";
    println!("{}", summary_header);
    let summary_sep = "-".repeat(40);
    println!("{}", summary_sep);
    let total_msg = format!("Total Annual Revenue:     ${:>12.0}", total_revenue);
    println!("{}", total_msg);
    let avg_msg = format!("Average per Quarter/Region: ${:>10.0}", avg_performance);
    println!("{}", avg_msg);
    let std_msg = format!("Standard Deviation:       ${:>12.0}", sales_std);
    println!("{}", std_msg);
    let median_msg = format!("Median:                   ${:>12.0}", sales_median);
    println!("{}", median_msg);

    // Performance insights
    let best_region_idx = all_regions.iter().enumerate()
        .max_by(|(_, a), (_, b)| a.sum_elements().partial_cmp(&b.sum_elements()).unwrap())
        .map(|(idx, _)| idx).unwrap_or(0);
        
    let best_region_total = all_regions[best_region_idx].sum_elements();

    println!();
    let insights_header = "Key Performance Insights:";
    println!("{}", insights_header);
    let insights_sep = "=".repeat(40);
    println!("{}", insights_sep);
    let best_region_msg = format!("• {} is the top performer with ${:.0} total", 
                                 regions[best_region_idx], best_region_total);
    println!("{}", best_region_msg);
    let revenue_msg = format!("• Total annual revenue: ${:.0}", total_revenue);
    println!("{}", revenue_msg);
    let variability_msg = format!("• Sales variability: {:.1}% coefficient of variation", 
                                 (sales_std / avg_performance) * 100.0);
    println!("{}", variability_msg);
}

Quarterly Sales Data Analysis

Regional Performance Summary:
-----------------------------------------------------------------
Region    Total Sales  Avg/Quarter      Std Dev      CV%
-----------------------------------------------------------------
North         1032000       258000        25430     9.9%
South          813000       203250        15370     7.6%
East          1284000       321000        20412     6.4%
West          1167000       291750        19466     6.7%

Overall Statistical Summary:
----------------------------------------
Total Annual Revenue:     $     4296000
Average per Quarter/Region: $    268500
Standard Deviation:       $       48781
Median:                   $      277000

Key Performance Insights:
• East is the top performer with $1284000 total
• Total annual revenue: $4296000
• Sales variability: 18.2% coefficient of variation


()

## Summary

This notebook demonstrated essential statistical operations in RustLab:

### ✅ **Core Reductions Mastered:**
- **Basic aggregations**: `sum_elements()`, `mean()`, `min()`, `max()`
- **Variability measures**: `variance()`, standard deviation, range
- **Distribution analysis**: `median()`, `quantile()`, percentiles
- **Advanced operations**: Coefficient of variation, outlier detection

### ✅ **Key Statistical Concepts:**
- **Central tendency**: Mean, median for different data distributions
- **Spread measures**: Variance, standard deviation, IQR for variability
- **Robust statistics**: Median and IQR less sensitive to outliers
- **Data quality**: Outlier detection using statistical fences

### ✅ **Best Practices Applied:**
- **Self-contained cells** - Each cell works independently
- **Clean variable usage** - All variables used in output
- **Explicit formatting** - Clear, readable output messages
- **Error handling** - Safe unwrapping of Option types

### ✅ **Real-World Applications:**
- **Business analytics**: Sales performance analysis and insights
- **Quality control**: Statistical process monitoring
- **Research data**: Experimental results analysis
- **Financial analysis**: Risk assessment and portfolio analysis

**Next**: Slicing and Views for efficient data manipulation →