# Multivariate Statistics: Array Operations and Correlation Analysis

This notebook demonstrates advanced multivariate statistical analysis using RustLab's axis-wise operations and correlation analysis capabilities. We'll explore how to analyze multidimensional datasets with comprehensive visualizations.

## Key Topics Covered:
- **Axis-wise Operations**: Statistical analysis along rows and columns
- **Correlation Matrices**: Computing and visualizing relationships between variables
- **Descriptive Analysis**: Multi-dimensional data exploration
- **Financial Portfolio Analysis**: Real-world multivariate statistics application
- **Feature Analysis**: Machine learning dataset preprocessing insights

## Setup and Dependencies

First, let's add the necessary dependencies to use the RustLab ecosystem within this notebook.

In [2]:
    :dep rustlab-math = { path = "../../rustlab-math" }
    :dep rustlab-stats = { path = ".." }

## Import Required Modules

In [3]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::{ArrayF64, VectorF64, vec64};
    use rustlab_math::reductions::Axis;
    println!("✓ Multivariate statistics environment ready");
}

✓ Multivariate statistics environment ready


()

## Creating Sample Multivariate Dataset

Let's create a realistic financial dataset with multiple correlated variables representing different asset returns.

In [4]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::{ArrayF64, VectorF64, vec64};
    use rustlab_math::reductions::Axis;
    
    // Create a financial portfolio dataset: 20 daily returns for 5 assets
    // Assets: Tech Stock, Bond, Gold, Oil, Real Estate
    let portfolio_data = ArrayF64::from_slice(&[
        // Day 1: Tech, Bond, Gold, Oil, RE
        0.012, 0.002, -0.001, 0.015, 0.008,
        0.008, 0.001, 0.003, -0.005, 0.004,
        -0.015, 0.003, 0.012, -0.020, -0.002,
        0.025, -0.001, -0.005, 0.030, 0.015,
        -0.008, 0.002, 0.008, -0.012, 0.001,
        0.018, 0.001, -0.003, 0.022, 0.010,
        -0.020, 0.004, 0.015, -0.025, -0.005,
        0.030, -0.002, -0.008, 0.035, 0.018,
        0.005, 0.003, 0.001, 0.008, 0.006,
        -0.012, 0.002, 0.010, -0.015, -0.001,
        0.022, 0.001, -0.004, 0.028, 0.012,
        -0.018, 0.003, 0.012, -0.022, -0.003,
        0.015, -0.001, -0.002, 0.018, 0.008,
        0.008, 0.002, 0.005, 0.010, 0.005,
        -0.025, 0.004, 0.018, -0.030, -0.008,
        0.035, -0.002, -0.010, 0.040, 0.020,
        0.002, 0.003, 0.002, 0.005, 0.003,
        -0.010, 0.002, 0.008, -0.012, 0.000,
        0.020, 0.001, -0.005, 0.025, 0.011,
        -0.015, 0.003, 0.010, -0.018, -0.002
    ], 20, 5).unwrap();
    
    let asset_names = vec!["Tech Stock", "Government Bond", "Gold", "Oil Futures", "Real Estate"];
    
    println!("Portfolio dataset created: {} days × {} assets", portfolio_data.nrows(), portfolio_data.ncols());
    println!("Asset names: {:?}", asset_names);
    
    // Display data shape and structure
    println!("\nDataset structure:");
    println!("Rows (time periods): {}", portfolio_data.nrows());
    println!("Columns (assets): {}", portfolio_data.ncols());
    
    // Display sample data
    println!("\nFirst 5 days of returns:");
    for day in 0..5 {
        print!("Day {}: ", day + 1);
        for asset in 0..portfolio_data.ncols() {
            print!("{:7.3} ", portfolio_data.get(day, asset).unwrap());
        }
        println!();
    }
}

Portfolio dataset created: 20 days × 5 assets
Asset names: ["Tech Stock", "Government Bond", "Gold", "Oil Futures", "Real Estate"]

Dataset structure:
Rows (time periods): 20
Columns (assets): 5

First 5 days of returns:
Day 1:   0.012   0.002  -0.001   0.015   0.008 
Day 2:   0.008   0.001   0.003  -0.005   0.004 
Day 3:  -0.015   0.003   0.012  -0.020  -0.002 
Day 4:   0.025  -0.001  -0.005   0.030   0.015 
Day 5:  -0.008   0.002   0.008  -0.012   0.001 


()

## Axis-wise Statistical Analysis

We'll analyze the portfolio data along different axes to understand asset performance and time-based patterns.

In [5]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::{ArrayF64, VectorF64};
    use rustlab_math::reductions::Axis;
    
    // Re-create portfolio data for this cell
    let portfolio_data = ArrayF64::from_slice(&[
        0.012, 0.002, -0.001, 0.015, 0.008,
        0.008, 0.001, 0.003, -0.005, 0.004,
        -0.015, 0.003, 0.012, -0.020, -0.002,
        0.025, -0.001, -0.005, 0.030, 0.015,
        -0.008, 0.002, 0.008, -0.012, 0.001,
        0.018, 0.001, -0.003, 0.022, 0.010,
        -0.020, 0.004, 0.015, -0.025, -0.005,
        0.030, -0.002, -0.008, 0.035, 0.018,
        0.005, 0.003, 0.001, 0.008, 0.006,
        -0.012, 0.002, 0.010, -0.015, -0.001,
        0.022, 0.001, -0.004, 0.028, 0.012,
        -0.018, 0.003, 0.012, -0.022, -0.003,
        0.015, -0.001, -0.002, 0.018, 0.008,
        0.008, 0.002, 0.005, 0.010, 0.005,
        -0.025, 0.004, 0.018, -0.030, -0.008,
        0.035, -0.002, -0.010, 0.040, 0.020,
        0.002, 0.003, 0.002, 0.005, 0.003,
        -0.010, 0.002, 0.008, -0.012, 0.000,
        0.020, 0.001, -0.005, 0.025, 0.011,
        -0.015, 0.003, 0.010, -0.018, -0.002
    ], 20, 5).unwrap();
    
    let asset_names = vec!["Tech", "Bond", "Gold", "Oil", "RE"];
    
    // Asset-wise statistics (across time periods - Axis::Rows)
    println!("=== ASSET-WISE ANALYSIS (across time) ===");
    
    let asset_medians = portfolio_data.median_axis(Axis::Rows);
    let asset_mad = portfolio_data.mad_axis(Axis::Rows);
    let asset_iqr = portfolio_data.iqr_axis(Axis::Rows);
    let asset_ranges = portfolio_data.range_axis(Axis::Rows);
    
    println!("\nAsset Performance Summary:");
    for i in 0..asset_names.len() {
        println!("{}:", asset_names[i]);
        println!("  Median Return: {:.4}", asset_medians.get(i).unwrap());
        println!("  MAD (volatility): {:.4}", asset_mad.get(i).unwrap());
        println!("  IQR: {:.4}", asset_iqr.get(i).unwrap());
        println!("  Range: {:.4}", asset_ranges.get(i).unwrap());
        println!();
    }
    
    // Time-wise statistics (across assets - Axis::Cols)
    println!("\n=== TIME-WISE ANALYSIS (across assets) ===");
    
    let daily_medians = portfolio_data.median_axis(Axis::Cols);
    let daily_mad = portfolio_data.mad_axis(Axis::Cols);
    let daily_ranges = portfolio_data.range_axis(Axis::Cols);
    
    println!("\nDaily Portfolio Statistics (first 10 days):");
    for i in 0..10.min(daily_medians.len()) {
        println!("Day {}: Median={:.4}, MAD={:.4}, Range={:.4}", 
                i+1, 
                daily_medians.get(i).unwrap(),
                daily_mad.get(i).unwrap(),
                daily_ranges.get(i).unwrap());
    }
    
    // Portfolio insights
    let avg_daily_return = daily_medians.mean();
    let avg_daily_volatility = daily_mad.mean();
    
    println!("\n=== PORTFOLIO INSIGHTS ===");
    println!("• Average daily return (median): {:.4}", avg_daily_return);
    println!("• Average daily volatility (MAD): {:.4}", avg_daily_volatility);
    println!("• Risk-adjusted return ratio: {:.2}", 
             if avg_daily_volatility > 0.0 { avg_daily_return / avg_daily_volatility } else { 0.0 });
}

Error: the lint level is defined here

Error: no method named `mean` found for struct `rustlab_math::Vector` in the current scope

Error: no method named `mean` found for struct `rustlab_math::Vector` in the current scope

## Correlation Matrix Analysis

Now let's compute and analyze correlations between different assets in our portfolio.

In [6]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::{ArrayF64, VectorF64};
    
    // Re-create portfolio data
    let portfolio_data = ArrayF64::from_slice(&[
        0.012, 0.002, -0.001, 0.015, 0.008,
        0.008, 0.001, 0.003, -0.005, 0.004,
        -0.015, 0.003, 0.012, -0.020, -0.002,
        0.025, -0.001, -0.005, 0.030, 0.015,
        -0.008, 0.002, 0.008, -0.012, 0.001,
        0.018, 0.001, -0.003, 0.022, 0.010,
        -0.020, 0.004, 0.015, -0.025, -0.005,
        0.030, -0.002, -0.008, 0.035, 0.018,
        0.005, 0.003, 0.001, 0.008, 0.006,
        -0.012, 0.002, 0.010, -0.015, -0.001,
        0.022, 0.001, -0.004, 0.028, 0.012,
        -0.018, 0.003, 0.012, -0.022, -0.003,
        0.015, -0.001, -0.002, 0.018, 0.008,
        0.008, 0.002, 0.005, 0.010, 0.005,
        -0.025, 0.004, 0.018, -0.030, -0.008,
        0.035, -0.002, -0.010, 0.040, 0.020,
        0.002, 0.003, 0.002, 0.005, 0.003,
        -0.010, 0.002, 0.008, -0.012, 0.000,
        0.020, 0.001, -0.005, 0.025, 0.011,
        -0.015, 0.003, 0.010, -0.018, -0.002
    ], 20, 5).unwrap();
    
    let asset_names = vec!["Tech", "Bond", "Gold", "Oil", "RE"];
    
    // Compute correlation matrix
    let correlation_matrix = portfolio_data.correlation_matrix().unwrap();
    
    println!("=== CORRELATION MATRIX ANALYSIS ===");
    println!("\nAsset Correlation Matrix:");
    println!("        Tech    Bond    Gold     Oil      RE");
    
    for i in 0..asset_names.len() {
        print!("{:>4} ", asset_names[i]);
        for j in 0..asset_names.len() {
            print!("{:7.3} ", correlation_matrix.get(i, j).unwrap());
        }
        println!();
    }
    
    // Analyze specific correlations
    println!("\nKey Correlation Insights:");
    
    // Extract specific correlations for analysis
    let tech_oil_corr = correlation_matrix.get(0, 3).unwrap();
    let tech_bond_corr = correlation_matrix.get(0, 1).unwrap();
    let gold_oil_corr = correlation_matrix.get(2, 3).unwrap();
    let bond_gold_corr = correlation_matrix.get(1, 2).unwrap();
    
    println!("• Tech Stock ↔ Oil Futures: {:.3} ({})", 
             tech_oil_corr, 
             if tech_oil_corr > 0.5 { "Strong Positive" } 
             else if tech_oil_corr < -0.5 { "Strong Negative" } 
             else { "Moderate" });
             
    println!("• Tech Stock ↔ Government Bond: {:.3} ({})", 
             tech_bond_corr,
             if tech_bond_corr.abs() < 0.3 { "Low/Diversifying" }
             else { "Significant" });
             
    println!("• Gold ↔ Oil Futures: {:.3} ({})", 
             gold_oil_corr,
             if gold_oil_corr < -0.3 { "Negative (Hedging)" }
             else if gold_oil_corr > 0.3 { "Positive (Commodities)" }
             else { "Neutral" });
             
    println!("• Bond ↔ Gold: {:.3} (Safe Haven Assets)", bond_gold_corr);
    
    // Portfolio diversification analysis
    let mut high_correlations = 0;
    let mut total_pairs = 0;
    
    for i in 0..asset_names.len() {
        for j in (i+1)..asset_names.len() {
            let corr = correlation_matrix.get(i, j).unwrap().abs();
            if corr > 0.7 {
                high_correlations += 1;
            }
            total_pairs += 1;
        }
    }
    
    println!("\nPortfolio Diversification Score:");
    println!("• High correlations (>0.7): {}/{} pairs", high_correlations, total_pairs);
    let diversification_score = 100.0 * (1.0 - high_correlations as f64 / total_pairs as f64);
    println!("• Diversification Score: {:.1}% (higher is better)", diversification_score);
    
    if diversification_score > 80.0 {
        println!("  → Excellent diversification!");
    } else if diversification_score > 60.0 {
        println!("  → Good diversification");
    } else {
        println!("  → Consider adding more diverse assets");
    }
}

Error: the lint level is defined here

Error: this method takes 1 argument but 0 arguments were supplied

Error: no method named `unwrap` found for struct `Array` in the current scope

## Portfolio Risk Analysis

Let's analyze portfolio risk using our multivariate statistics.

In [7]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::{ArrayF64, VectorF64};
    use rustlab_math::reductions::Axis;
    
    // Re-create portfolio data
    let portfolio_data = ArrayF64::from_slice(&[
        0.012, 0.002, -0.001, 0.015, 0.008,
        0.008, 0.001, 0.003, -0.005, 0.004,
        -0.015, 0.003, 0.012, -0.020, -0.002,
        0.025, -0.001, -0.005, 0.030, 0.015,
        -0.008, 0.002, 0.008, -0.012, 0.001,
        0.018, 0.001, -0.003, 0.022, 0.010,
        -0.020, 0.004, 0.015, -0.025, -0.005,
        0.030, -0.002, -0.008, 0.035, 0.018,
        0.005, 0.003, 0.001, 0.008, 0.006,
        -0.012, 0.002, 0.010, -0.015, -0.001,
        0.022, 0.001, -0.004, 0.028, 0.012,
        -0.018, 0.003, 0.012, -0.022, -0.003,
        0.015, -0.001, -0.002, 0.018, 0.008,
        0.008, 0.002, 0.005, 0.010, 0.005,
        -0.025, 0.004, 0.018, -0.030, -0.008,
        0.035, -0.002, -0.010, 0.040, 0.020,
        0.002, 0.003, 0.002, 0.005, 0.003,
        -0.010, 0.002, 0.008, -0.012, 0.000,
        0.020, 0.001, -0.005, 0.025, 0.011,
        -0.015, 0.003, 0.010, -0.018, -0.002
    ], 20, 5).unwrap();
    
    let asset_names = vec!["Tech", "Bond", "Gold", "Oil", "RE"];
    
    // Calculate risk metrics using axis-wise operations
    let asset_volatility = portfolio_data.mad_axis(Axis::Rows);  // Robust volatility measure
    let asset_ranges = portfolio_data.range_axis(Axis::Rows);    // Maximum drawdown proxy
    let asset_medians = portfolio_data.median_axis(Axis::Rows);  // Expected returns
    
    println!("=== PORTFOLIO RISK ANALYSIS ===");
    println!("\nRisk-Return Profile by Asset:");
    
    for i in 0..asset_names.len() {
        let volatility = asset_volatility.get(i).unwrap();
        let expected_return = asset_medians.get(i).unwrap();
        let max_range = asset_ranges.get(i).unwrap();
        
        println!("{}: Return={:.4}, Risk(MAD)={:.4}, Range={:.4}", 
                asset_names[i], expected_return, volatility, max_range);
        
        // Risk classification
        let risk_category = if volatility < 0.005 {
            "Low Risk"
        } else if volatility < 0.015 {
            "Medium Risk"
        } else {
            "High Risk"
        };
        
        println!("  → Classification: {}", risk_category);
        
        // Sharpe ratio approximation (return/risk)
        let sharpe_approx = if volatility > 0.0 { expected_return / volatility } else { 0.0 };
        println!("  → Risk-Adjusted Return: {:.2}", sharpe_approx);
        
        // Performance assessment
        if sharpe_approx > 1.0 {
            println!("  → Performance: Excellent");
        } else if sharpe_approx > 0.5 {
            println!("  → Performance: Good");
        } else if sharpe_approx > 0.0 {
            println!("  → Performance: Moderate");
        } else {
            println!("  → Performance: Poor");
        }
        println!();
    }
    
    // Find best and worst performers
    let mut best_performer = 0;
    let mut worst_performer = 0;
    let mut best_sharpe = f64::NEG_INFINITY;
    let mut worst_sharpe = f64::INFINITY;
    
    for i in 0..asset_names.len() {
        let volatility = asset_volatility.get(i).unwrap();
        let expected_return = asset_medians.get(i).unwrap();
        let sharpe = if volatility > 0.0 { expected_return / volatility } else { 0.0 };
        
        if sharpe > best_sharpe {
            best_sharpe = sharpe;
            best_performer = i;
        }
        if sharpe < worst_sharpe {
            worst_sharpe = sharpe;
            worst_performer = i;
        }
    }
    
    println!("=== PERFORMANCE RANKING ===");
    println!("🏆 Best Risk-Adjusted Performer: {} (Sharpe: {:.2})", 
             asset_names[best_performer], best_sharpe);
    println!("⚠️  Worst Risk-Adjusted Performer: {} (Sharpe: {:.2})", 
             asset_names[worst_performer], worst_sharpe);
}

=== PORTFOLIO RISK ANALYSIS ===

Risk-Return Profile by Asset:
Tech: Return=0.0065, Risk(MAD)=0.0160, Range=0.0600
  → Classification: High Risk
  → Risk-Adjusted Return: 0.41
  → Performance: Moderate

Bond: Return=0.0020, Risk(MAD)=0.0010, Range=0.0060
  → Classification: Low Risk
  → Risk-Adjusted Return: 2.00
  → Performance: Excellent

Gold: Return=0.0025, Risk(MAD)=0.0070, Range=0.0280
  → Classification: Medium Risk
  → Risk-Adjusted Return: 0.36
  → Performance: Moderate

Oil: Return=0.0065, Risk(MAD)=0.0200, Range=0.0700
  → Classification: High Risk
  → Risk-Adjusted Return: 0.33
  → Performance: Moderate

RE: Return=0.0045, Risk(MAD)=0.0060, Range=0.0280
  → Classification: Medium Risk
  → Risk-Adjusted Return: 0.75
  → Performance: Good

=== PERFORMANCE RANKING ===
🏆 Best Risk-Adjusted Performer: Bond (Sharpe: 2.00)
⚠️  Worst Risk-Adjusted Performer: Oil (Sharpe: 0.33)


()

## Time Series Analysis with Array Operations

Let's analyze how portfolio statistics evolve over time using axis-wise operations.

In [8]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::{ArrayF64, VectorF64};
    use rustlab_math::reductions::Axis;
    
    // Re-create portfolio data
    let portfolio_data = ArrayF64::from_slice(&[
        0.012, 0.002, -0.001, 0.015, 0.008,
        0.008, 0.001, 0.003, -0.005, 0.004,
        -0.015, 0.003, 0.012, -0.020, -0.002,
        0.025, -0.001, -0.005, 0.030, 0.015,
        -0.008, 0.002, 0.008, -0.012, 0.001,
        0.018, 0.001, -0.003, 0.022, 0.010,
        -0.020, 0.004, 0.015, -0.025, -0.005,
        0.030, -0.002, -0.008, 0.035, 0.018,
        0.005, 0.003, 0.001, 0.008, 0.006,
        -0.012, 0.002, 0.010, -0.015, -0.001,
        0.022, 0.001, -0.004, 0.028, 0.012,
        -0.018, 0.003, 0.012, -0.022, -0.003,
        0.015, -0.001, -0.002, 0.018, 0.008,
        0.008, 0.002, 0.005, 0.010, 0.005,
        -0.025, 0.004, 0.018, -0.030, -0.008,
        0.035, -0.002, -0.010, 0.040, 0.020,
        0.002, 0.003, 0.002, 0.005, 0.003,
        -0.010, 0.002, 0.008, -0.012, 0.000,
        0.020, 0.001, -0.005, 0.025, 0.011,
        -0.015, 0.003, 0.010, -0.018, -0.002
    ], 20, 5).unwrap();
    
    println!("=== TIME SERIES ANALYSIS ===");
    
    // Daily portfolio statistics (across assets for each day)
    let daily_medians = portfolio_data.median_axis(Axis::Cols);
    let daily_mad = portfolio_data.mad_axis(Axis::Cols);
    let daily_ranges = portfolio_data.range_axis(Axis::Cols);
    
    println!("\nDaily Portfolio Statistics:");
    println!("Day    Median   Volatility   Range    Trend");
    println!("{}", "-".repeat(45));
    
    for i in 0..daily_medians.len() {
        let median = daily_medians.get(i).unwrap();
        let volatility = daily_mad.get(i).unwrap();
        let range = daily_ranges.get(i).unwrap();
        
        let trend = if median > 0.01 {
            "📈 Strong Up"
        } else if median > 0.005 {
            "📊 Up"
        } else if median > -0.005 {
            "➡️ Flat"
        } else if median > -0.01 {
            "📉 Down"
        } else {
            "⬇️ Strong Down"
        };
        
        println!("{:3}   {:7.4}   {:9.4}   {:6.4}   {}", 
                i + 1, median, volatility, range, trend);
    }
    
    // Analysis summary
    println!("\n=== TIME SERIES SUMMARY ===");
    println!("• Average daily median return: {:.4}", daily_medians.mean());
    println!("• Average daily volatility (MAD): {:.4}", daily_mad.mean());
    
    // Find extreme days
    let mut best_day = 0;
    let mut worst_day = 0;
    let mut most_volatile_day = 0;
    
    for i in 1..daily_medians.len() {
        if daily_medians.get(i).unwrap() > daily_medians.get(best_day).unwrap() {
            best_day = i;
        }
        if daily_medians.get(i).unwrap() < daily_medians.get(worst_day).unwrap() {
            worst_day = i;
        }
        if daily_mad.get(i).unwrap() > daily_mad.get(most_volatile_day).unwrap() {
            most_volatile_day = i;
        }
    }
    
    println!("• Best performing day: Day {} (Median: {:.4})", 
             best_day + 1, daily_medians.get(best_day).unwrap());
    println!("• Worst performing day: Day {} (Median: {:.4})", 
             worst_day + 1, daily_medians.get(worst_day).unwrap());
    println!("• Most volatile day: Day {} (MAD: {:.4})", 
             most_volatile_day + 1, daily_mad.get(most_volatile_day).unwrap());
    
    // Calculate rolling metrics (simple 5-day window)
    if daily_medians.len() >= 5 {
        println!("\n=== ROLLING 5-DAY ANALYSIS ===");
        
        for i in 4..daily_medians.len() {
            let window_data: Vec<f64> = (i-4..=i)
                .map(|j| daily_medians.get(j).unwrap())
                .collect();
            let window_vec = VectorF64::from_slice(&window_data);
            let window_mean = window_vec.mean();
            
            if i == 4 || i == daily_medians.len() - 1 {
                println!("Days {}-{}: Rolling mean = {:.4}", 
                        i - 3, i + 1, window_mean);
            }
        }
    }
}

Error: the lint level is defined here

Error: no method named `mean` found for struct `rustlab_math::Vector` in the current scope

Error: no method named `mean` found for struct `rustlab_math::Vector` in the current scope

Error: no method named `mean` found for struct `rustlab_math::Vector` in the current scope

## Machine Learning Feature Analysis

Let's demonstrate how multivariate statistics can be used for feature analysis in machine learning contexts.

In [9]:
{
    use rustlab_stats::prelude::*;
    use rustlab_math::{ArrayF64, VectorF64};
    use rustlab_math::reductions::Axis;
    
    // Create a machine learning dataset: 15 samples × 8 features
    // Features represent different measurements (e.g., sensor data, patient metrics, etc.)
    let ml_dataset = ArrayF64::from_slice(&[
        // Sample 1: Feature1, Feature2, ..., Feature8
        23.5, 67.2, 1.45, 89.1, 0.78, 45.3, 12.9, 78.4,
        21.8, 71.5, 1.52, 92.3, 0.82, 48.7, 13.2, 81.2,
        25.1, 63.8, 1.38, 85.7, 0.75, 42.1, 12.5, 75.8,
        24.3, 69.4, 1.47, 90.5, 0.80, 46.9, 13.0, 79.6,
        22.7, 72.1, 1.55, 93.8, 0.85, 49.8, 13.5, 82.9,
        26.2, 61.3, 1.35, 83.2, 0.72, 40.5, 12.1, 73.4,
        23.9, 68.7, 1.49, 88.6, 0.79, 45.8, 12.8, 78.1,
        25.8, 65.9, 1.41, 86.4, 0.76, 43.2, 12.6, 76.7,
        21.2, 73.6, 1.58, 95.1, 0.87, 51.3, 13.8, 84.5,
        27.1, 59.8, 1.32, 81.7, 0.69, 38.9, 11.8, 71.2,
        24.7, 66.5, 1.44, 87.9, 0.77, 44.6, 12.7, 77.5,
        22.4, 70.8, 1.51, 91.4, 0.83, 47.5, 13.1, 80.3,
        26.5, 62.7, 1.37, 84.8, 0.74, 41.8, 12.3, 74.9,
        23.1, 69.9, 1.48, 89.7, 0.81, 46.2, 12.9, 78.8,
        25.4, 64.1, 1.39, 85.3, 0.73, 42.7, 12.4, 75.1
    ], 15, 8).unwrap();
    
    let feature_names = vec![
        "Temperature", "Pressure", "pH_Level", "Oxygen", 
        "Density", "Flow_Rate", "Conductivity", "Turbidity"
    ];
    
    println!("=== MACHINE LEARNING FEATURE ANALYSIS ===");
    println!("Dataset: {} samples × {} features\n", ml_dataset.nrows(), ml_dataset.ncols());
    
    // Feature-wise statistics (across samples)
    let feature_medians = ml_dataset.median_axis(Axis::Rows);
    let feature_mad = ml_dataset.mad_axis(Axis::Rows);
    let feature_iqr = ml_dataset.iqr_axis(Axis::Rows);
    let feature_ranges = ml_dataset.range_axis(Axis::Rows);
    
    println!("Feature Statistics Summary:");
    println!("{:<12} {:>8} {:>8} {:>8} {:>8} {:>12}", 
             "Feature", "Median", "MAD", "IQR", "Range", "CV (MAD/Med)");
    println!("{}", "-".repeat(70));
    
    let mut high_variance_features = Vec::new();
    let mut low_variance_features = Vec::new();
    
    for i in 0..feature_names.len() {
        let median = feature_medians.get(i).unwrap();
        let mad = feature_mad.get(i).unwrap();
        let iqr = feature_iqr.get(i).unwrap();
        let range = feature_ranges.get(i).unwrap();
        
        // Coefficient of variation (MAD-based)
        let cv = if median != 0.0 { mad / median.abs() } else { 0.0 };
        
        println!("{:<12} {:8.2} {:8.3} {:8.2} {:8.2} {:12.3}", 
                 feature_names[i], median, mad, iqr, range, cv);
        
        // Classify features by variability
        if cv > 0.1 {
            high_variance_features.push((feature_names[i], cv));
        } else if cv < 0.05 {
            low_variance_features.push((feature_names[i], cv));
        }
    }
    
    // Feature selection insights
    println!("\n=== FEATURE SELECTION INSIGHTS ===");
    
    if !high_variance_features.is_empty() {
        println!("\n🎯 High Variance Features (CV > 0.1):");
        for (name, cv) in &high_variance_features {
            println!("• {}: CV = {:.3} (good discriminative power)", name, cv);
        }
    }
    
    if !low_variance_features.is_empty() {
        println!("\n⚠️ Low Variance Features (CV < 0.05):");
        for (name, cv) in &low_variance_features {
            println!("• {}: CV = {:.3} (consider removal - low information)", name, cv);
        }
    }
    
    // Compute feature correlation matrix
    let feature_correlation = ml_dataset.correlation_matrix().unwrap();
    
    println!("\n=== FEATURE CORRELATION ANALYSIS ===");
    
    // Find highly correlated feature pairs
    let mut correlated_pairs = Vec::new();
    
    for i in 0..feature_names.len() {
        for j in (i+1)..feature_names.len() {
            let correlation = feature_correlation.get(i, j).unwrap().abs();
            if correlation > 0.7 {
                correlated_pairs.push((feature_names[i], feature_names[j], correlation));
            }
        }
    }
    
    if !correlated_pairs.is_empty() {
        println!("\n⚠️ Highly Correlated Feature Pairs (|r| > 0.7):");
        for (feat1, feat2, corr) in &correlated_pairs {
            println!("• {} ↔ {}: r = {:.3} (consider removing one)", feat1, feat2, corr);
        }
    } else {
        println!("\n✅ No highly correlated features found (good for ML models)");
    }
    
    // Sample-wise analysis for outlier detection
    let sample_medians = ml_dataset.median_axis(Axis::Cols);
    let sample_mad = ml_dataset.mad_axis(Axis::Cols);
    
    println!("\n=== OUTLIER DETECTION ===");
    
    let overall_mad = sample_mad.median();
    let mad_threshold = 2.0 * overall_mad; // 2-MAD rule for outliers
    
    let mut outlier_samples = Vec::new();
    
    for i in 0..sample_mad.len() {
        let sample_variability = sample_mad.get(i).unwrap();
        if sample_variability > mad_threshold {
            outlier_samples.push((i + 1, sample_variability));
        }
    }
    
    if !outlier_samples.is_empty() {
        println!("⚠️ Potential outlier samples (high internal variability):");
        for (sample_id, variability) in &outlier_samples {
            println!("• Sample {}: MAD = {:.3} (threshold: {:.3})", 
                    sample_id, variability, mad_threshold);
        }
    } else {
        println!("✅ No obvious outlier samples detected");
    }
    
    println!("\n=== PREPROCESSING RECOMMENDATIONS ===");
    println!("1. 📊 Scale features with high variance for neural networks");
    println!("2. 🔄 Consider PCA if many features are highly correlated");
    println!("3. 🛡️ Use robust scaling (median/MAD) if outliers are present");
    println!("4. 🗑️ Monitor features with very low variance for removal");
    println!("5. 🎯 Features with CV > 0.1 are good candidates for predictive modeling");
}

Error: the lint level is defined here

Error: this method takes 1 argument but 0 arguments were supplied

Error: no method named `unwrap` found for struct `Array` in the current scope

## Summary and Key Insights

This notebook demonstrated the power of multivariate statistics using RustLab's axis-wise operations and correlation analysis capabilities.

In [10]:
{
    println!("=== MULTIVARIATE STATISTICS SUMMARY ===");
    println!();
    println!("📊 TECHNIQUES DEMONSTRATED:");
    println!("✓ Axis-wise Operations: median_axis(), mad_axis(), iqr_axis(), range_axis()");
    println!("✓ Correlation Analysis: correlation_matrix() with comprehensive analysis");
    println!("✓ Risk-Return Analysis: Portfolio optimization insights");
    println!("✓ Time Series Analysis: Temporal patterns using axis operations");
    println!("✓ Feature Analysis: ML preprocessing and feature selection");
    println!("✓ Outlier Detection: Robust statistical methods for anomaly detection");
    println!();
    
    println!("🎯 KEY INSIGHTS:");
    println!("• Axis::Rows operations analyze variables/features across observations");
    println!("• Axis::Cols operations analyze observations across variables/features");
    println!("• Correlation matrices reveal relationships and redundancies");
    println!("• Robust statistics (MAD, IQR) are essential for real-world data");
    println!("• Multivariate analysis enables portfolio optimization and risk management");
    println!("• Feature selection benefits from variance and correlation analysis");
    println!();
    
    println!("🛠️ PRACTICAL APPLICATIONS:");
    println!("• Financial Portfolio Analysis: Risk-return optimization");
    println!("• Machine Learning: Feature selection and preprocessing");
    println!("• Quality Control: Multivariate process monitoring");
    println!("• Scientific Research: Multi-dimensional data exploration");
    println!("• Time Series: Cross-sectional and temporal analysis");
    println!("• Anomaly Detection: Identifying outliers in multivariate data");
    println!();
    
    println!("🚀 ADVANCED FEATURES:");
    println!("• Portfolio diversification scoring");
    println!("• Risk-adjusted performance metrics (Sharpe ratios)");
    println!("• Rolling window analysis for time series");
    println!("• Automated outlier detection using robust statistics");
    println!("• Feature correlation analysis for ML preprocessing");
    println!("• Coefficient of variation for feature ranking");
    println!();
    
    println!("📈 STATISTICAL HIGHLIGHTS:");
    println!("• Robust correlation matrices using MAD-based methods");
    println!("• Axis-wise quantile analysis for distribution understanding");
    println!("• Multi-dimensional risk assessment");
    println!("• Feature importance ranking using statistical measures");
    println!();
    
    println!("🎓 LEARNING OUTCOMES:");
    println!("• Understanding axis-wise operations in multidimensional data");
    println!("• Applying correlation analysis for relationship discovery");
    println!("• Using robust statistics for real-world data analysis");
    println!("• Implementing portfolio analysis with statistical methods");
    println!("• Preprocessing data for machine learning applications");
    println!();
    
    println!("📚 Next Steps:");
    println!("→ Explore notebook 07_real_world_case_studies.ipynb");
    println!("→ Complete statistical analysis workflows with real datasets");
    println!("→ Integration with machine learning pipelines");
}

=== MULTIVARIATE STATISTICS SUMMARY ===

📊 TECHNIQUES DEMONSTRATED:
✓ Axis-wise Operations: median_axis(), mad_axis(), iqr_axis(), range_axis()
✓ Correlation Analysis: correlation_matrix() with comprehensive analysis
✓ Risk-Return Analysis: Portfolio optimization insights
✓ Time Series Analysis: Temporal patterns using axis operations
✓ Feature Analysis: ML preprocessing and feature selection
✓ Outlier Detection: Robust statistical methods for anomaly detection

🎯 KEY INSIGHTS:
• Axis::Rows operations analyze variables/features across observations
• Axis::Cols operations analyze observations across variables/features
• Correlation matrices reveal relationships and redundancies
• Robust statistics (MAD, IQR) are essential for real-world data
• Multivariate analysis enables portfolio optimization and risk management
• Feature selection benefits from variance and correlation analysis

🛠️ PRACTICAL APPLICATIONS:
• Financial Portfolio Analysis: Risk-return optimization
• Machine Learning: F

()