# Broadcasting and Advanced Array Operations

This notebook explores advanced array manipulation techniques in rustlab-math:
- Broadcasting semantics and rules
- Advanced indexing and slicing
- Reshaping and dimension manipulation
- Memory-efficient operations
- Vectorized operations

**Prerequisites**: Basic understanding of arrays and linear algebra

## Setup

**Important**: The setup cell below follows Rust notebook best practices:
- Dependencies and imports are declared at the **top level** (outside braces) so they persist across all cells
- Test code is wrapped in braces `{}` to avoid persistence issues with complex types
- This pattern ensures compatibility with both rust-analyzer and evcxr

In [2]:
// Setup Cell - dependencies and imports persist across all cells
:dep rustlab-math = { path = ".." }

// Top-level imports - these persist across all cells!
use rustlab_math::*;
use std::time::Instant;

// Test setup in braces (variables don't persist, but confirms setup works)
{
    let test_array = array64![[1.0, 2.0], [3.0, 4.0]];
    println!("✅ RustLab Broadcasting and Advanced Operations Demo");
    println!("Test array shape: {}x{}", test_array.nrows(), test_array.ncols());
}

✅ RustLab Broadcasting and Advanced Operations Demo
Test array shape: 2x2


()

## Broadcasting Fundamentals

Broadcasting allows operations between arrays of different shapes by automatically expanding dimensions.

In [3]:
// Basic broadcasting examples
let matrix = array64![
    [1.0, 2.0, 3.0],
    [4.0, 5.0, 6.0],
    [7.0, 8.0, 9.0]
];

println!("Original matrix (3x3):");
for i in 0..matrix.nrows() {
    for j in 0..matrix.ncols() {
        print!("{:6.1} ", matrix.get(i, j).unwrap());
    }
    println!();
}

// Scalar broadcasting - adds 10 to every element
let scalar_broadcast = &matrix + 10.0;
println!("\nMatrix + 10 (scalar broadcast):");
for i in 0..scalar_broadcast.nrows() {
    for j in 0..scalar_broadcast.ncols() {
        print!("{:6.1} ", scalar_broadcast.get(i, j).unwrap());
    }
    println!();
}

// Scalar multiplication
let scaled_matrix = &matrix * 2.0;
println!("\nMatrix * 2 (scalar multiplication):");
for i in 0..scaled_matrix.nrows() {
    for j in 0..scaled_matrix.ncols() {
        print!("{:6.1} ", scaled_matrix.get(i, j).unwrap());
    }
    println!();
}

Original matrix (3x3):
   1.0    2.0    3.0 
   4.0    5.0    6.0 
   7.0    8.0    9.0 

Matrix + 10 (scalar broadcast):
  11.0   12.0   13.0 
  14.0   15.0   16.0 
  17.0   18.0   19.0 

Matrix * 2 (scalar multiplication):
   2.0    4.0    6.0 
   8.0   10.0   12.0 
  14.0   16.0   18.0 


()

## Broadcasting Rules and Compatibility

Understanding when broadcasting works and when it doesn't.

In [4]:
// Broadcasting rules demonstration
println!("Broadcasting compatibility examples:");

// Scalar with any shape - always works
let any_shape = ArrayF64::zeros(2, 3);
let scalar_result = &any_shape * PI;
println!("\n(2,3) * π → (2,3): Scalar broadcasting always works");

// Same shape arrays - works perfectly
let a1 = ArrayF64::ones(3, 3);
let a2 = ArrayF64::ones(3, 3);
let same_shape = &a1 + &a2;
println!("(3,3) + (3,3) → (3,3): Same shape works");
println!("Result shape: {}x{}", same_shape.nrows(), same_shape.ncols());

// Element-wise operations
let base = array64![
    [1.0, 2.0, 3.0, 4.0],
    [5.0, 6.0, 7.0, 8.0]
];

println!("\nBase matrix for operations:");
for i in 0..base.nrows() {
    for j in 0..base.ncols() {
        print!("{:5.1} ", base.get(i, j).unwrap());
    }
    println!();
}

// Apply transformations using map()
let doubled = base.map(|x| x * 2.0);
println!("\nDoubled (using map):");
for i in 0..doubled.nrows() {
    for j in 0..doubled.ncols() {
        print!("{:5.1} ", doubled.get(i, j).unwrap());
    }
    println!();
}

// Apply mathematical functions
let exp_values = base.map(|x| (x / 4.0).exp());
println!("\nExp(x/4) values:");
for i in 0..exp_values.nrows() {
    for j in 0..exp_values.ncols() {
        print!("{:7.3} ", exp_values.get(i, j).unwrap());
    }
    println!();
}

Broadcasting compatibility examples:

(2,3) * π → (2,3): Scalar broadcasting always works
(3,3) + (3,3) → (3,3): Same shape works
Result shape: 3x3

Base matrix for operations:
  1.0   2.0   3.0   4.0 
  5.0   6.0   7.0   8.0 

Doubled (using map):
  2.0   4.0   6.0   8.0 
 10.0  12.0  14.0  16.0 

Exp(x/4) values:
  1.284   1.649   2.117   2.718 
  3.490   4.482   5.755   7.389 


()

## Array + Vector Broadcasting

RustLab supports automatic broadcasting between arrays and vectors - a critical feature for data science and machine learning!

In [5]:
// Array + Vector Broadcasting Examples
println!("=== Array + Vector Broadcasting ===");

// Create sample data matrix (3 samples, 4 features)
let data = array64![
    [1.0, 2.0, 3.0, 4.0],
    [5.0, 6.0, 7.0, 8.0],
    [9.0, 10.0, 11.0, 12.0]
];

println!("Original data matrix (3×4):");
for i in 0..data.nrows() {
    for j in 0..data.ncols() {
        print!("{:5.1} ", data.get(i, j).unwrap());
    }
    println!();
}

// 1. ROW-WISE BROADCASTING (Vector length = Matrix columns)
println!("\n1. Row-wise Broadcasting (Most Common in ML)");
let feature_means = vec64![2.0, 4.0, 6.0, 8.0];  // Length 4 = ncols
println!("Feature means: {:?}", feature_means.to_slice());

// Automatic broadcasting: subtract mean from each feature (column)
let centered_data = &data - &feature_means;
println!("\nCentered data (data - means):");
for i in 0..centered_data.nrows() {
    for j in 0..centered_data.ncols() {
        print!("{:6.1} ", centered_data.get(i, j).unwrap());
    }
    println!();
}

// Feature scaling example
let feature_scales = vec64![1.0, 0.5, 2.0, 0.1];  // Length 4 = ncols
let scaled_data = &centered_data * &feature_scales;
println!("\nScaled data (centered * scales):");
for i in 0..scaled_data.nrows() {
    for j in 0..scaled_data.ncols() {
        print!("{:6.2} ", scaled_data.get(i, j).unwrap());
    }
    println!();
}

// 2. COLUMN-WISE BROADCASTING (Vector length = Matrix rows)
println!("\n2. Column-wise Broadcasting");
let sample_weights = vec64![1.0, 1.5, 0.8];  // Length 3 = nrows
println!("Sample weights: {:?}", sample_weights.to_slice());

// Weight each sample (row) differently
let weighted_data = &data * &sample_weights;
println!("\nWeighted data (each sample scaled):");
for i in 0..weighted_data.nrows() {
    for j in 0..weighted_data.ncols() {
        print!("{:6.1} ", weighted_data.get(i, j).unwrap());
    }
    println!();
}

// 3. ALL ARITHMETIC OPERATIONS SUPPORT BROADCASTING
println!("\n3. All Arithmetic Operations");

// Addition (bias terms)
let bias = vec64![10.0, 20.0, 30.0, 40.0];  // Feature-wise bias
let biased = &data + &bias;
println!("Addition (data + bias):");
for i in 0..biased.nrows().min(2) {
    for j in 0..biased.ncols() {
        print!("{:5.1} ", biased.get(i, j).unwrap());
    }
    println!();
}

// Division (normalization)
let normalizers = vec64![2.0, 3.0, 4.0, 6.0];
let normalized = &data / &normalizers;
println!("\nDivision (data / normalizers):");
for i in 0..normalized.nrows().min(2) {
    for j in 0..normalized.ncols() {
        print!("{:5.2} ", normalized.get(i, j).unwrap());
    }
    println!();
}

// 4. COMMUTATIVE OPERATIONS
println!("\n4. Commutative Operations Work Both Ways");
let result1 = &data + &bias;
let result2 = &bias + &data;
println!("data + bias == bias + data? {}", 
         result1.get(0, 0) == result2.get(0, 0));

// 5. REAL-WORLD: FEATURE STANDARDIZATION
println!("\n5. Real-World Example: Z-Score Standardization");

// Calculate column means and stds
let mut col_means = Vec::new();
let mut col_stds = Vec::new();

for j in 0..data.ncols() {
    let mut sum = 0.0;
    for i in 0..data.nrows() {
        sum += data.get(i, j).unwrap();
    }
    let mean = sum / data.nrows() as f64;
    col_means.push(mean);
    
    let mut sum_sq_diff = 0.0;
    for i in 0..data.nrows() {
        let diff = data.get(i, j).unwrap() - mean;
        sum_sq_diff += diff * diff;
    }
    let std_dev = (sum_sq_diff / (data.nrows() - 1) as f64).sqrt();
    col_stds.push(std_dev);
}

let means_vec = VectorF64::from_slice(&col_means);
let stds_vec = VectorF64::from_slice(&col_stds);

println!("Column means: {:?}", col_means);
println!("Column stds: {:?}", col_stds);

// Z-score standardization using broadcasting
let z_scored = (&data - &means_vec) / &stds_vec;
println!("\nZ-score standardized data:");
for i in 0..z_scored.nrows() {
    for j in 0..z_scored.ncols() {
        print!("{:7.3} ", z_scored.get(i, j).unwrap());
    }
    println!();
}

// Verify: z-scored data should have ~0 mean and ~1 std
let mut verify_sum = 0.0;
for i in 0..z_scored.nrows() {
    verify_sum += z_scored.get(i, 0).unwrap();
}
let verify_mean = verify_sum / z_scored.nrows() as f64;
println!("\nVerification - First column mean after z-score: {:.6}", verify_mean);

println!("\n✨ Key Broadcasting Benefits:");
println!("• Natural mathematical syntax: data - means");
println!("• Automatic dimension detection (row-wise vs column-wise)");
println!("• All arithmetic operations supported (+, -, *, /)");
println!("• Commutative operations work both ways");
println!("• Essential for ML: feature normalization, batch processing");
println!("• Zero-cost: optimized implementations");

=== Array + Vector Broadcasting ===
Original data matrix (3×4):
  1.0   2.0   3.0   4.0 
  5.0   6.0   7.0   8.0 
  9.0  10.0  11.0  12.0 

1. Row-wise Broadcasting (Most Common in ML)
Feature means: [2.0, 4.0, 6.0, 8.0]



## Broadcasting vs Matrix Multiplication

**Critical distinction**: Element-wise broadcasting (`*`) vs matrix multiplication (`^`)

In [6]:
// Broadcasting vs Matrix Multiplication - Critical Distinction!
println!("=== Broadcasting vs Matrix Multiplication ===");

let data = array64![
    [1.0, 2.0, 3.0],
    [4.0, 5.0, 6.0]
];  // 2×3 matrix

let weights = vec64![0.1, 0.2, 0.3];  // 3-element vector

println!("Data matrix (2×3):");
for i in 0..data.nrows() {
    for j in 0..data.ncols() {
        print!("{:5.1} ", data.get(i, j).unwrap());
    }
    println!();
}
println!("Weights vector: {:?}", weights.to_slice());

// 1. ELEMENT-WISE BROADCASTING (using *)
println!("\n1. Element-wise Broadcasting (data * weights):");
println!("   Uses * operator → produces MATRIX output (same shape as input)");
let element_wise = &data * &weights;  // Broadcasting: each row scaled by weights
println!("   Result shape: {}×{}", element_wise.nrows(), element_wise.ncols());
for i in 0..element_wise.nrows() {
    for j in 0..element_wise.ncols() {
        print!("{:7.2} ", element_wise.get(i, j).unwrap());
    }
    println!();
}
println!("   Interpretation: Each feature scaled by its weight");

// 2. MATRIX MULTIPLICATION (using ^)
println!("\n2. Matrix Multiplication (data ^ weights):");
println!("   Uses ^ operator → produces VECTOR output");
let matrix_mult = &data ^ &weights;    // Linear combination: weighted sum per row
println!("   Result shape: vector of length {}", matrix_mult.len());
for i in 0..matrix_mult.len() {
    print!("{:7.2} ", matrix_mult.get(i).unwrap());
}
println!();
println!("   Interpretation: Weighted sum of features per sample");

// 3. WHEN TO USE WHICH
println!("\n3. When to Use Which Operation:");

println!("\n   ✅ Use BROADCASTING (*) for:");
println!("     • Feature scaling: scale each feature by different factor");
println!("     • Normalization: divide by standard deviations");
println!("     • Element-wise operations: mask, multiply by probabilities");
println!("     • Output: Same shape as input matrix");

println!("\n   ✅ Use MATRIX MULTIPLICATION (^) for:");
println!("     • Linear transformations: neural network layers");
println!("     • Weighted combinations: feature importance scoring");  
println!("     • Dot products: similarity calculations");
println!("     • Output: Reduced dimensions (matrix→vector, matrix→matrix)");

// 4. PRACTICAL EXAMPLES
println!("\n4. Practical Examples:");

// Feature normalization (broadcasting)
let feature_stds = vec64![1.0, 2.0, 3.0];
let normalized = &data / &feature_stds;
println!("\n   Feature normalization (broadcasting):");
println!("   data / stds = element-wise division");
for i in 0..normalized.nrows() {
    for j in 0..normalized.ncols() {
        print!("{:7.2} ", normalized.get(i, j).unwrap());
    }
    println!();
}

// Linear prediction (matrix multiplication)
let linear_weights = vec64![0.5, 0.3, 0.2];  // Feature importance weights
let predictions = &data ^ &linear_weights;
println!("\n   Linear prediction (matrix multiplication):");
println!("   data ^ weights = weighted sum per sample");
println!("   Predictions: {:?}", predictions.to_slice());

// Calculate manually to verify
let manual_pred_0 = 1.0*0.5 + 2.0*0.3 + 3.0*0.2;  // First row
let manual_pred_1 = 4.0*0.5 + 5.0*0.3 + 6.0*0.2;  // Second row
println!("   Manual verification:");
println!("     Sample 0: 1*0.5 + 2*0.3 + 3*0.2 = {:.2}", manual_pred_0);
println!("     Sample 1: 4*0.5 + 5*0.3 + 6*0.2 = {:.2}", manual_pred_1);

// 5. COMMON MISTAKES TO AVOID
println!("\n5. ⚠️  Common Mistakes to Avoid:");
println!("   ❌ Using * when you want linear combinations (use ^ instead)");
println!("   ❌ Using ^ when you want element-wise scaling (use * instead)");
println!("   ❌ Expecting vector output from broadcasting (always produces matrix)");
println!("   ❌ Expecting matrix output from matrix mult (depends on dimensions)");

println!("\n📝 Summary:");
println!("   • * = Element-wise broadcasting (Hadamard product)");
println!("   • ^ = Matrix multiplication (linear algebra)");
println!("   • Broadcasting preserves input matrix shape");
println!("   • Matrix multiplication reduces/transforms dimensions");
println!("   • Both are essential for different ML operations!");

Centered data (data - means):
  -1.0   -2.0   -3.0   -4.0 
   3.0    2.0    1.0    0.0 
   7.0    6.0    5.0    4.0 

Scaled data (centered * scales):
 -1.00  -1.00  -6.00  -0.40 
  3.00   1.00   2.00   0.00 
  7.00   3.00  10.00   0.40 

2. Column-wise Broadcasting
Sample weights: [1.0, 1.5, 0.8]

Weighted data (each sample scaled):
   1.0    2.0    3.0    4.0 
   7.5    9.0   10.5   12.0 
   7.2    8.0    8.8    9.6 

3. All Arithmetic Operations
Addition (data + bias):
 11.0  22.0  33.0  44.0 
 15.0  26.0  37.0  48.0 

Division (data / normalizers):
 0.50  0.67  0.75  0.67 
 2.50  2.00  1.75  1.33 

4. Commutative Operations Work Both Ways
data + bias == bias + data? true

5. Real-World Example: Z-Score Standardization
Column means: [5.0, 6.0, 7.0, 8.0]
Column stds: [4.0, 4.0, 4.0, 4.0]

Z-score standardized data:
 -1.000  -1.000  -1.000  -1.000 
  0.000   0.000   0.000   0.000 
  1.000   1.000   1.000   1.000 

Verification - First column mean after z-score: 0.000000

✨ Key Broadca

## Advanced Indexing and Slicing

Sophisticated ways to extract and manipulate array data.

In [7]:
// Create test data for indexing examples
let data = ArrayF64::zeros(5, 6);
println!("Created test matrix (5x6) filled with zeros");

// Basic element access
if let Some(val) = data.get(2, 3) {
    println!("Element at [2,3]: {}", val);
}

// Test with actual data
let test_data = array64![
    [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
    [7.0, 8.0, 9.0, 10.0, 11.0, 12.0],
    [13.0, 14.0, 15.0, 16.0, 17.0, 18.0],
    [19.0, 20.0, 21.0, 22.0, 23.0, 24.0],
    [25.0, 26.0, 27.0, 28.0, 29.0, 30.0]
];

println!("\nTest data matrix (5x6):");
for i in 0..test_data.nrows() {
    for j in 0..test_data.ncols() {
        print!("{:4.0} ", test_data.get(i, j).unwrap());
    }
    println!();
}

// Row access example
println!("\nRow 2 elements:");
for j in 0..test_data.ncols() {
    print!("{:4.0} ", test_data.get(2, j).unwrap());
}
println!();

// Column access example  
println!("\nColumn 3 elements:");
for i in 0..test_data.nrows() {
    print!("{:4.0} ", test_data.get(i, 3).unwrap());
}
println!();

// Corner elements
println!("\nCorner elements:");
println!("  Top-left [0,0]: {}", test_data.get(0, 0).unwrap());
println!("  Top-right [0,{}]: {}", test_data.ncols()-1, test_data.get(0, test_data.ncols()-1).unwrap());
println!("  Bottom-left [{},0]: {}", test_data.nrows()-1, test_data.get(test_data.nrows()-1, 0).unwrap());
println!("  Bottom-right [{},{}]: {}", test_data.nrows()-1, test_data.ncols()-1, 
         test_data.get(test_data.nrows()-1, test_data.ncols()-1).unwrap());

     • Element-wise operations: mask, multiply by probabilities
     • Output: Same shape as input matrix

   ✅ Use MATRIX MULTIPLICATION (^) for:
     • Linear transformations: neural network layers
     • Weighted combinations: feature importance scoring
     • Dot products: similarity calculations
     • Output: Reduced dimensions (matrix→vector, matrix→matrix)

4. Practical Examples:

   Feature normalization (broadcasting):
   data / stds = element-wise division
   1.00    1.00    1.00 
   4.00    2.50    2.00 

   Linear prediction (matrix multiplication):
   data ^ weights = weighted sum per sample
   Predictions: [1.7000000000000002, 4.7]
   Manual verification:
     Sample 0: 1*0.5 + 2*0.3 + 3*0.2 = 1.70
     Sample 1: 4*0.5 + 5*0.3 + 6*0.2 = 4.70

5. ⚠️  Common Mistakes to Avoid:
   ❌ Using * when you want linear combinations (use ^ instead)
   ❌ Using ^ when you want element-wise scaling (use * instead)
   ❌ Expecting vector output from broadcasting (always produces matrix)


## Reshaping and Views

Efficient array manipulation without copying data.

In [8]:
// Original array for reshaping and views
let original = array64![
    [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
    [7.0, 8.0, 9.0, 10.0, 11.0, 12.0]
];

println!("Original array (2x6):");
for i in 0..original.nrows() {
    for j in 0..original.ncols() {
        print!("{:5.1} ", original.get(i, j).unwrap());
    }
    println!();
}

// Transpose (efficient view operation)
let transposed = original.transpose();
println!("\nTransposed view (6x2) - no data copying:");
for i in 0..transposed.nrows().min(4) {  // Show first 4 rows
    for j in 0..transposed.ncols() {
        print!("{:5.1} ", transposed.get(i, j).unwrap());
    }
    println!();
}
println!("... (showing first 4 rows of 6)");

// Create different sized arrays to demonstrate shape flexibility
let vector_like = array64![[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]];  // 1x8
println!("\nVector-like array (1x8):");
for j in 0..vector_like.ncols() {
    print!("{:4.1} ", vector_like.get(0, j).unwrap());
}
println!();

let column_like = array64![
    [1.0], [2.0], [3.0], [4.0], [5.0], [6.0]
];  // 6x1
println!("\nColumn-like array (6x1):");
for i in 0..column_like.nrows() {
    println!("{:4.1}", column_like.get(i, 0).unwrap());
}

// Demonstrate efficient transpose
println!("\nTranspose of column becomes row (1x6):");
let transposed_col = column_like.transpose();
for j in 0..transposed_col.ncols() {
    print!("{:4.1} ", transposed_col.get(0, j).unwrap());
}
println!();

// Show that transpose is a view operation (very fast)
use std::time::Instant;
let large_test = ArrayF64::ones(100, 200);
let start = Instant::now();
let _transposed_large = large_test.transpose();
let transpose_time = start.elapsed();
println!("\nTransposing 100x200 matrix: {:?} (metadata-only operation)", transpose_time);

   7    8    9   10   11   12 
  13   14   15   16   17   18 
  19   20   21   22   23   24 
  25   26   27   28   29   30 

Row 2 elements:
  13   14   15   16   17   18 

Column 3 elements:
   4   10   16   22   28 

Corner elements:
  Top-left [0,0]: 1
  Top-right [0,5]: 6
  Bottom-left [4,0]: 25
  Bottom-right [4,5]: 30
Original array (2x6):
  1.0   2.0   3.0   4.0   5.0   6.0 
  7.0   8.0   9.0  10.0  11.0  12.0 

Transposed view (6x2) - no data copying:
  1.0   7.0 
  2.0   8.0 
  3.0   9.0 
  4.0  10.0 


## Memory-Efficient Operations

Techniques to minimize memory usage and maximize performance.

In [9]:
use std::time::Instant;

// Create test arrays for memory efficiency demonstration
let size = 100;  // Using moderate size for demonstration
let large_array = ArrayF64::zeros(size, size);

println!("Memory efficiency examples with {}x{} arrays:", size, size);
println!("Array size: {:.2} KB", (size * size * 8) as f64 / 1024.0);

// Compare different operation approaches
println!("\nOperation timing comparisons:");

// Scalar operations (very efficient)
let start = Instant::now();
let scalar_add_result = &large_array + 1.0;
let scalar_add_time = start.elapsed();
println!("Scalar addition (array + 1.0): {:?}", scalar_add_time);

let start = Instant::now();
let scalar_mult_result = &large_array * 2.0;
let scalar_mult_time = start.elapsed();
println!("Scalar multiplication (array * 2.0): {:?}", scalar_mult_time);

// Array-array operations (same size)
let another_array = ArrayF64::ones(size, size);
let start = Instant::now();
let array_add_result = &large_array + &another_array;
let array_add_time = start.elapsed();
println!("Array addition (array + array): {:?}", array_add_time);

// Element-wise map operations
let start = Instant::now();
let map_result = large_array.map(|x| x + 1.0);
let map_time = start.elapsed();
println!("Map operation (x + 1.0): {:?}", map_time);

// More complex map operation
let start = Instant::now();
let complex_map = large_array.map(|x| (x + 1.0).exp().sin());
let complex_map_time = start.elapsed();
println!("Complex map ((x+1).exp().sin()): {:?}", complex_map_time);

// Transpose operation (metadata only)
let start = Instant::now();
let transpose_result = large_array.transpose();
let transpose_time = start.elapsed();
println!("Transpose operation: {:?} (metadata-only)", transpose_time);

// Manual element access for comparison
let start = Instant::now();
let mut manual_sum = 0.0;
for i in 0..size {
    for j in 0..size {
        manual_sum += large_array.get(i, j).unwrap();
    }
}
let manual_time = start.elapsed();
println!("Manual sum via get(): {:?} (sum = {})", manual_time, manual_sum);

println!("\nMemory efficiency insights:");
println!("- Scalar operations: Very fast, optimized");
println!("- Array operations: Efficient for same-size arrays");
println!("- Map operations: Good for element-wise transforms");
println!("- Transpose: Essentially free (metadata change)");
println!("- Manual access: Slower but flexible");

... (showing first 4 rows of 6)

Vector-like array (1x8):
 1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0 

Column-like array (6x1):
 1.0
 2.0
 3.0
 4.0
 5.0
 6.0

Transpose of column becomes row (1x6):
 1.0  2.0  3.0  4.0  5.0  6.0 

Transposing 100x200 matrix: 85.439µs (metadata-only operation)
Memory efficiency examples with 100x100 arrays:
Array size: 78.12 KB

Operation timing comparisons:
Scalar addition (array + 1.0): 90.278µs
Scalar multiplication (array * 2.0): 92.732µs
Array addition (array + array): 124.528µs
Map operation (x + 1.0): 98.15µs
Complex map ((x+1).exp().sin()): 405.381µs
Transpose operation: 64.411µs (metadata-only)
Manual sum via get(): 30.263µs (sum = 0)

Memory efficiency insights:
- Scalar operations: Very fast, optimized
- Array operations: Efficient for same-size arrays
- Map operations: Good for element-wise transforms
- Transpose: Essentially free (metadata change)
- Manual access: Slower but flexible


## Vectorized Operations

Efficient batch operations that leverage SIMD instructions.

In [10]:
use std::time::Instant;

// Create test data for vectorized operations
let test_array = array64![
    [1.0, 2.0, 3.0, 4.0],
    [5.0, 6.0, 7.0, 8.0],
    [9.0, 10.0, 11.0, 12.0]
];

println!("Vectorized operations demonstration:");
println!("Test array (3x4):");
for i in 0..test_array.nrows() {
    for j in 0..test_array.ncols() {
        print!("{:4.0} ", test_array.get(i, j).unwrap());
    }
    println!();
}

// Mathematical functions using map (vectorized)
println!("\nVectorized mathematical functions:");

let start = Instant::now();
let sin_values = test_array.map(|x| x.sin());
let sin_time = start.elapsed();
println!("Sin values (timing: {:?}):", sin_time);
for i in 0..sin_values.nrows() {
    for j in 0..sin_values.ncols() {
        print!("{:7.3} ", sin_values.get(i, j).unwrap());
    }
    println!();
}

let start = Instant::now();
let exp_values = test_array.map(|x| (x / 10.0).exp());
let exp_time = start.elapsed();
println!("\nExp(x/10) values (timing: {:?}):", exp_time);
for i in 0..exp_values.nrows() {
    for j in 0..exp_values.ncols() {
        print!("{:7.3} ", exp_values.get(i, j).unwrap());
    }
    println!();
}

// Complex vectorized expressions
let start = Instant::now();
let complex_expr = test_array.map(|x| (x.sin().powi(2) + x.cos().powi(2)).sqrt());
let complex_time = start.elapsed();
println!("\nComplex expression sqrt(sin²(x) + cos²(x)) (timing: {:?}):", complex_time);
for i in 0..complex_expr.nrows() {
    for j in 0..complex_expr.ncols() {
        print!("{:7.4} ", complex_expr.get(i, j).unwrap());  // Should be ~1.0
    }
    println!();
}

// Statistics operations (manual implementation)
println!("\nStatistics operations:");

// Calculate statistics manually for demonstration
let mut sum_val = 0.0;
let mut min_val = f64::MAX;
let mut max_val = f64::MIN;
let mut count = 0;

for i in 0..test_array.nrows() {
    for j in 0..test_array.ncols() {
        let val = test_array.get(i, j).unwrap();
        sum_val += val;
        min_val = min_val.min(val);
        max_val = max_val.max(val);
        count += 1;
    }
}

let mean_val = sum_val / count as f64;

println!("  Sum: {:.2}", sum_val);
println!("  Mean: {:.2}", mean_val);
println!("  Min: {:.2}", min_val);
println!("  Max: {:.2}", max_val);
println!("  Count: {}", count);

// Demonstrate vectorized vs scalar comparison on larger array
let large_size = 1000;
let large_test = ArrayF64::ones(large_size, large_size);

println!("\nPerformance comparison on {}x{} array:", large_size, large_size);

// Vectorized operation using map
let start = Instant::now();
let _vectorized_result = large_test.map(|x| x.sin());
let vectorized_time = start.elapsed();
println!("Vectorized sin operation: {:?}", vectorized_time);

// Manual element-by-element (for comparison)
let start = Instant::now();
let mut manual_results = Vec::with_capacity(large_size * large_size);
for i in 0..large_size {
    for j in 0..large_size {
        manual_results.push(large_test.get(i, j).unwrap().sin());
    }
}
let manual_time = start.elapsed();
println!("Manual element-by-element: {:?}", manual_time);

if manual_time.as_nanos() > 0 && vectorized_time.as_nanos() > 0 {
    let speedup = manual_time.as_nanos() as f64 / vectorized_time.as_nanos() as f64;
    println!("Speedup: {:.1}x faster with vectorized approach", speedup);
}

Vectorized operations demonstration:
Test array (3x4):
   1    2    3    4 
   5    6    7    8 
   9   10   11   12 

Vectorized mathematical functions:
Sin values (timing: 1.63µs):
  0.841   0.909   0.141  -0.757 
 -0.959  -0.279   0.657   0.989 
  0.412  -0.544  -1.000  -0.537 

Exp(x/10) values (timing: 1.223µs):
  1.105   1.221   1.350   1.492 
  1.649   1.822   2.014   2.226 
  2.460   2.718   3.004   3.320 

Complex expression sqrt(sin²(x) + cos²(x)) (timing: 1.222µs):
 1.0000  1.0000  1.0000  1.0000 
 1.0000  1.0000  1.0000  1.0000 
 1.0000  1.0000  1.0000  1.0000 

Statistics operations:
  Sum: 78.00
  Mean: 6.50
  Min: 1.00
  Max: 12.00
  Count: 12

Performance comparison on 1000x1000 array:
Vectorized sin operation: 17.482954ms
Manual element-by-element: 49.041736ms
Speedup: 2.8x faster with vectorized approach


()

## Advanced Broadcasting Patterns

Sophisticated broadcasting use cases for real-world problems.

In [11]:
// Advanced Broadcasting Patterns for Real-World Applications
println!("Advanced Broadcasting Patterns:");

// Pattern 1: Matrix-Matrix Operations
println!("\n1. Matrix Operations and Broadcasting");

let a_mat = array64![[1.0, 2.0], [3.0, 4.0]];
let b_mat = array64![[5.0, 6.0], [7.0, 8.0]];

println!("Matrix A:");
for i in 0..a_mat.nrows() {
    for j in 0..a_mat.ncols() {
        print!("{:5.1} ", a_mat.get(i, j).unwrap());
    }
    println!();
}

println!("Matrix B:");
for i in 0..b_mat.nrows() {
    for j in 0..b_mat.ncols() {
        print!("{:5.1} ", b_mat.get(i, j).unwrap());
    }
    println!();
}

// Element-wise multiplication (Hadamard product)
let hadamard = &a_mat * &b_mat;
println!("\nElement-wise multiplication (A ⊙ B):");
for i in 0..hadamard.nrows() {
    for j in 0..hadamard.ncols() {
        print!("{:5.1} ", hadamard.get(i, j).unwrap());
    }
    println!();
}

// Matrix multiplication 
let matmul = &a_mat ^ &b_mat;
println!("\nMatrix multiplication (A × B):");
for i in 0..matmul.nrows() {
    for j in 0..matmul.ncols() {
        print!("{:5.1} ", matmul.get(i, j).unwrap());
    }
    println!();
}

// Pattern 2: Feature Standardization (Z-score normalization)
println!("\n2. Feature Standardization Example");
let features = array64![
    [1.0, 10.0, 100.0],
    [2.0, 20.0, 200.0],
    [3.0, 30.0, 300.0],
    [4.0, 40.0, 400.0],
    [5.0, 50.0, 500.0]
];  // 5 samples, 3 features

println!("Original features (5 samples x 3 features):");
for i in 0..features.nrows() {
    for j in 0..features.ncols() {
        print!("{:7.1} ", features.get(i, j).unwrap());
    }
    println!();
}

// Calculate mean for each feature (column)
let mut col_means = Vec::new();
let mut col_stds = Vec::new();

for j in 0..features.ncols() {
    let mut sum = 0.0;
    for i in 0..features.nrows() {
        sum += features.get(i, j).unwrap();
    }
    let mean = sum / features.nrows() as f64;
    col_means.push(mean);
    
    // Calculate standard deviation
    let mut sum_sq_diff = 0.0;
    for i in 0..features.nrows() {
        let diff = features.get(i, j).unwrap() - mean;
        sum_sq_diff += diff * diff;
    }
    let std_dev = (sum_sq_diff / (features.nrows() - 1) as f64).sqrt();
    col_stds.push(std_dev);
}

println!("\nFeature statistics:");
for j in 0..features.ncols() {
    println!("  Feature {}: mean={:.2}, std={:.2}", j, col_means[j], col_stds[j]);
}

// Standardize: (x - μ) / σ using manual iteration
let mut standardized_data = Vec::new();
for i in 0..features.nrows() {
    for j in 0..features.ncols() {
        let val = features.get(i, j).unwrap();
        let standardized_val = (val - col_means[j]) / col_stds[j];
        standardized_data.push(standardized_val);
    }
}
let standardized = ArrayF64::from_slice(&standardized_data, features.nrows(), features.ncols()).unwrap();

println!("\nStandardized features:");
for i in 0..standardized.nrows() {
    for j in 0..standardized.ncols() {
        print!("{:8.3} ", standardized.get(i, j).unwrap());
    }
    println!();
}

// Pattern 3: Softmax Function
println!("\n3. Softmax Function with Numerical Stability");
let logits = array64![
    [2.0, 1.0, 0.1],
    [1.0, 3.0, 0.2],
    [0.5, 2.0, 1.5]
];

println!("Logits:");
for i in 0..logits.nrows() {
    for j in 0..logits.ncols() {
        print!("{:5.1} ", logits.get(i, j).unwrap());
    }
    println!();
}

// Numerically stable softmax: subtract max for each row
let mut stable_softmax = Vec::new();
for i in 0..logits.nrows() {
    // Find max in row
    let mut row_max = f64::NEG_INFINITY;
    for j in 0..logits.ncols() {
        row_max = row_max.max(logits.get(i, j).unwrap());
    }
    
    // Compute exp(x - max) for numerical stability
    let mut row_exp_vals = Vec::new();
    let mut sum_exp = 0.0;
    for j in 0..logits.ncols() {
        let stable_val = (logits.get(i, j).unwrap() - row_max).exp();
        row_exp_vals.push(stable_val);
        sum_exp += stable_val;
    }
    
    // Normalize to get probabilities
    let row_probs: Vec<f64> = row_exp_vals.iter().map(|&x| x / sum_exp).collect();
    stable_softmax.push(row_probs);
}

println!("\nSoftmax probabilities (each row sums to 1):");
for (i, row) in stable_softmax.iter().enumerate() {
    for (j, &prob) in row.iter().enumerate() {
        print!("{:7.4} ", prob);
    }
    let row_sum: f64 = row.iter().sum();
    println!("  (sum: {:.4})", row_sum);
}

Advanced Broadcasting Patterns:

1. Matrix Operations and Broadcasting
Matrix A:
  1.0   2.0 
  3.0   4.0 
Matrix B:
  5.0   6.0 
  7.0   8.0 

Element-wise multiplication (A ⊙ B):
  5.0  12.0 
 21.0  32.0 

Matrix multiplication (A × B):
 19.0  22.0 
 43.0  50.0 

2. Feature Standardization Example
Original features (5 samples x 3 features):
    1.0    10.0   100.0 
    2.0    20.0   200.0 
    3.0    30.0   300.0 
    4.0    40.0   400.0 
    5.0    50.0   500.0 

Feature statistics:
  Feature 0: mean=3.00, std=1.58
  Feature 1: mean=30.00, std=15.81
  Feature 2: mean=300.00, std=158.11

Standardized features:
  -1.265   -1.265   -1.265 
  -0.632   -0.632   -0.632 
   0.000    0.000    0.000 
   0.632    0.632    0.632 
   1.265    1.265    1.265 

3. Softmax Function with Numerical Stability
Logits:
  2.0   1.0   0.1 
  1.0   3.0   0.2 
  0.5   2.0   1.5 

Softmax probabilities (each row sums to 1):
 0.6590  0.2424  0.0986   (sum: 1.0000)
 0.1131  0.8360  0.0508   (sum: 1.0000)
 0.1

()

## Dimension Manipulation and Tensor Operations

Advanced techniques for working with multi-dimensional data.

In [12]:
// Dimension Manipulation Examples
println!("Dimension Manipulation:");

let vector_data = vec64![1.0, 2.0, 3.0, 4.0];
println!("Original vector: {:?}", vector_data.to_slice());

// Create different shaped arrays
let row_matrix = array64![[1.0, 2.0, 3.0, 4.0]];      // 1x4
let col_matrix = array64![[1.0], [2.0], [3.0], [4.0]]; // 4x1

println!("\nAs row matrix (1x4):");
for j in 0..row_matrix.ncols() {
    print!("{:4.1} ", row_matrix.get(0, j).unwrap());
}
println!();

println!("\nAs column matrix (4x1):");
for i in 0..col_matrix.nrows() {
    println!("{:4.1}", col_matrix.get(i, 0).unwrap());
}

// Demonstrate outer product concept manually
let outer_result = ArrayF64::zeros(4, 4);
println!("\nOuter product concept (4x4 result):");
for i in 0..col_matrix.nrows() {
    for j in 0..row_matrix.ncols() {
        let val = col_matrix.get(i, 0).unwrap() * row_matrix.get(0, j).unwrap();
        print!("{:5.1} ", val);
    }
    println!();
}

// Matrix operations examples
println!("\nMatrix concatenation concepts:");
let A = array64![[1.0, 2.0], [3.0, 4.0]];
let B = array64![[5.0, 6.0], [7.0, 8.0]];

println!("Matrix A (2x2):");
for i in 0..A.nrows() {
    for j in 0..A.ncols() {
        print!("{:5.1} ", A.get(i, j).unwrap());
    }
    println!();
}

println!("Matrix B (2x2):");
for i in 0..B.nrows() {
    for j in 0..B.ncols() {
        print!("{:5.1} ", B.get(i, j).unwrap());
    }
    println!();
}

// Simulate vertical stacking manually
println!("\nVertical stacking concept (A above B):");
for i in 0..A.nrows() {
    for j in 0..A.ncols() {
        print!("{:5.1} ", A.get(i, j).unwrap());
    }
    println!();
}
for i in 0..B.nrows() {
    for j in 0..B.ncols() {
        print!("{:5.1} ", B.get(i, j).unwrap());
    }
    println!();
}

// Simulate horizontal stacking manually
println!("\nHorizontal stacking concept (A beside B):");
for i in 0..A.nrows() {
    for j in 0..A.ncols() {
        print!("{:5.1} ", A.get(i, j).unwrap());
    }
    print!("   ");
    for j in 0..B.ncols() {
        print!("{:5.1} ", B.get(i, j).unwrap());
    }
    println!();
}

// Different array shapes for flexibility demonstration
let shapes_demo = [
    ("1x8 (row vector)", 1, 8),
    ("8x1 (column vector)", 8, 1),
    ("2x4 (wide matrix)", 2, 4),
    ("4x2 (tall matrix)", 4, 2),
    ("3x3 (square matrix)", 3, 3),
];

println!("\nArray shape flexibility:");
for (name, rows, cols) in shapes_demo.iter() {
    let test_array = ArrayF64::ones(*rows, *cols);
    println!("  {} -> {}x{} = {} elements", name, rows, cols, rows * cols);
}

Dimension Manipulation:
Original vector: [1.0, 2.0, 3.0, 4.0]

As row matrix (1x4):
 1.0  2.0  3.0  4.0 

As column matrix (4x1):
 1.0
 2.0
 3.0
 4.0

Outer product concept (4x4 result):
  1.0   2.0   3.0   4.0 
  2.0   4.0   6.0   8.0 
  3.0   6.0   9.0  12.0 
  4.0   8.0  12.0  16.0 

Matrix concatenation concepts:
Matrix A (2x2):
  1.0   2.0 
  3.0   4.0 
Matrix B (2x2):
  5.0   6.0 
  7.0   8.0 

Vertical stacking concept (A above B):
  1.0   2.0 
  3.0   4.0 
  5.0   6.0 
  7.0   8.0 

Horizontal stacking concept (A beside B):
  1.0   2.0      5.0   6.0 
  3.0   4.0      7.0   8.0 

Array shape flexibility:
  1x8 (row vector) -> 1x8 = 8 elements
  8x1 (column vector) -> 8x1 = 8 elements
  2x4 (wide matrix) -> 2x4 = 8 elements
  4x2 (tall matrix) -> 4x2 = 8 elements
  3x3 (square matrix) -> 3x3 = 9 elements


()

## Performance Optimization Patterns

Best practices for high-performance array operations.

In [13]:
use std::time::Instant;

println!("Performance Optimization Patterns:");

let size = 100;
let A = ArrayF64::ones(size, size);
let B = ArrayF64::ones(size, size);

// Pattern 1: Multiple temporary allocations vs single operations
println!("\n1. Avoiding Multiple Temporary Allocations");

// Multiple temporaries (less efficient)
let start = Instant::now();
let temp1 = &A + &B;        // First temporary
let temp2 = &temp1 * 2.0;   // Second temporary  
let _result1 = &temp2 + 1.0; // Third temporary
let multiple_temps_time = start.elapsed();
println!("Multiple temporaries: {:?}", multiple_temps_time);

// Single complex operation (more efficient)
let start = Instant::now();
let _result2 = A.map(|x| (x + 1.0) * 2.0 + 1.0); // Single pass
let single_pass_time = start.elapsed();
println!("Single pass with map: {:?}", single_pass_time);

// Pattern 2: Memory access patterns
println!("\n2. Memory Access Patterns");

let matrix = ArrayF64::ones(size, size);

// Row-major access (cache-friendly)
let start = Instant::now();
let mut row_sum = 0.0;
for i in 0..size {
    for j in 0..size {
        row_sum += matrix.get(i, j).unwrap();
    }
}
let row_time = start.elapsed();
println!("Row-major access: {:?} (sum = {})", row_time, row_sum);

// Column-major access (less cache-friendly)
let start = Instant::now();
let mut col_sum = 0.0;
for j in 0..size {
    for i in 0..size {
        col_sum += matrix.get(i, j).unwrap();
    }
}
let col_time = start.elapsed();
println!("Column-major access: {:?} (sum = {})", col_time, col_sum);

if col_time.as_nanos() > 0 && row_time.as_nanos() > 0 {
    let cache_benefit = col_time.as_nanos() as f64 / row_time.as_nanos() as f64;
    println!("Cache-friendly access is {:.1}x faster", cache_benefit);
}

// Pattern 3: Scalar vs array operations efficiency
println!("\n3. Operation Type Comparison");

// Scalar operations (very fast)
let start = Instant::now();
let _scalar_result = &A + 5.0;
let scalar_time = start.elapsed();
println!("Scalar addition (A + 5.0): {:?}", scalar_time);

// Array operations (moderate speed)
let start = Instant::now();
let _array_result = &A + &B;
let array_time = start.elapsed();
println!("Array addition (A + B): {:?}", array_time);

// Complex element-wise (slower)
let start = Instant::now();
let _complex_result = A.map(|x| x.sin().exp());
let complex_time = start.elapsed();
println!("Complex map (sin(x).exp()): {:?}", complex_time);

// Pattern 4: Matrix operation ordering
println!("\n4. Operation Order Optimization");

// Create matrices for chain multiplication example
let tall_matrix = ArrayF64::ones(size * 2, 10);  // (200, 10)
let wide_matrix = ArrayF64::ones(10, size);      // (10, 100)
let vector = VectorF64::ones(size);              // (100,) as vector

println!("Matrix chain: ({}x{}) × ({}x{}) × vector({})", 
         tall_matrix.nrows(), tall_matrix.ncols(),
         wide_matrix.nrows(), wide_matrix.ncols(),
         vector.len());

// Order 1: (A × B) × v - creates large intermediate matrix
let start = Instant::now();
let ab = &tall_matrix ^ &wide_matrix;  // (200, 100) - large!
let _result_order1 = &ab ^ &vector;     // (200,)
let order1_time = start.elapsed();
println!("Order (A×B)×v: {:?} (creates {}x{} intermediate)", 
         order1_time, ab.nrows(), ab.ncols());

// Order 2: A × (B × v) - smaller intermediate
let start = Instant::now();
let bv = &wide_matrix ^ &vector;        // (10,) - small!
let _result_order2 = &tall_matrix ^ &bv; // (200,)
let order2_time = start.elapsed();
println!("Order A×(B×v): {:?} (creates vector of length {})", 
         order2_time, bv.len());

if order1_time.as_nanos() > 0 && order2_time.as_nanos() > 0 {
    let ordering_benefit = order1_time.as_nanos() as f64 / order2_time.as_nanos() as f64;
    println!("Better ordering is {:.1}x faster", ordering_benefit);
}

println!("\nPerformance Optimization Summary:");
println!("✓ Use single-pass operations when possible");
println!("✓ Access memory in row-major order for cache efficiency");
println!("✓ Prefer scalar operations when applicable");
println!("✓ Order matrix multiplications to minimize intermediate sizes");
println!("✓ Use map() for complex element-wise transformations");

Performance Optimization Patterns:

1. Avoiding Multiple Temporary Allocations
Multiple temporaries: 204.844µs
Single pass with map: 57.562µs

2. Memory Access Patterns
Row-major access: 27.924µs (sum = 10000)
Column-major access: 27.824µs (sum = 10000)
Cache-friendly access is 1.0x faster

3. Operation Type Comparison
Scalar addition (A + 5.0): 102.423µs
Array addition (A + B): 111.797µs
Complex map (sin(x).exp()): 333.981µs

4. Operation Order Optimization
Matrix chain: (200x10) × (10x100) × vector(100)
Order (A×B)×v: 6.795254ms (creates 200x100 intermediate)
Order A×(B×v): 4.234µs (creates vector of length 10)
Better ordering is 1604.9x faster

Performance Optimization Summary:
✓ Use single-pass operations when possible
✓ Access memory in row-major order for cache efficiency


## Real-World Example: Image Convolution

Demonstrate advanced operations with a practical computer vision example.

In [14]:
println!("Real-World Example: Image Processing");

// Create a synthetic "image" (grayscale)
let img_size = 8;  // Using smaller size for clear output
let test_image = array64![
    [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],
    [0.2, 0.4, 0.6, 0.8, 1.0, 0.8, 0.6, 0.4],
    [0.3, 0.6, 0.9, 1.0, 1.0, 0.9, 0.6, 0.3],
    [0.4, 0.8, 1.0, 1.0, 1.0, 1.0, 0.8, 0.4],
    [0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.5],
    [0.6, 0.8, 0.9, 1.0, 1.0, 0.9, 0.8, 0.6],
    [0.7, 0.6, 0.6, 0.8, 0.8, 0.6, 0.6, 0.7],
    [0.8, 0.4, 0.3, 0.4, 0.4, 0.3, 0.4, 0.8]
];

println!("Created {}x{} synthetic image", img_size, img_size);
println!("Original image:");
for i in 0..test_image.nrows() {
    for j in 0..test_image.ncols() {
        print!("{:5.2} ", test_image.get(i, j).unwrap());
    }
    println!();
}

// Calculate basic statistics manually
let mut min_val = f64::MAX;
let mut max_val = f64::MIN;
let mut sum_val = 0.0;
let mut count = 0;

for i in 0..test_image.nrows() {
    for j in 0..test_image.ncols() {
        let val = test_image.get(i, j).unwrap();
        min_val = min_val.min(val);
        max_val = max_val.max(val);
        sum_val += val;
        count += 1;
    }
}
let mean_val = sum_val / count as f64;

println!("\nImage statistics:");
println!("  Min: {:.4}", min_val);
println!("  Max: {:.4}", max_val);
println!("  Mean: {:.4}", mean_val);
println!("  Total pixels: {}", count);

// Image processing transformations using map()
println!("\nImage Processing Transformations:");

// 1. Brightness adjustment
let brightened = test_image.map(|x| (x + 0.2).clamp(0.0, 1.0));
println!("\n1. Brightened image (x + 0.2, clamped to [0,1]):");
for i in 0..brightened.nrows().min(4) {
    for j in 0..brightened.ncols().min(4) {
        print!("{:5.2} ", brightened.get(i, j).unwrap());
    }
    println!("...");
}
println!("...");

// 2. Contrast enhancement
let contrasted = test_image.map(|x| ((x - 0.5) * 1.5 + 0.5).clamp(0.0, 1.0));
println!("\n2. Contrast enhanced ((x-0.5)*1.5+0.5):");
for i in 0..contrasted.nrows().min(4) {
    for j in 0..contrasted.ncols().min(4) {
        print!("{:5.2} ", contrasted.get(i, j).unwrap());
    }
    println!("...");
}
println!("...");

// 3. Gamma correction
let gamma_corrected = test_image.map(|x| x.powf(0.7));
println!("\n3. Gamma corrected (x^0.7):");
for i in 0..gamma_corrected.nrows().min(4) {
    for j in 0..gamma_corrected.ncols().min(4) {
        print!("{:5.2} ", gamma_corrected.get(i, j).unwrap());
    }
    println!("...");
}
println!("...");

// 4. Thresholding (binary conversion)
let threshold = 0.5;
let thresholded = test_image.map(|x| if x > threshold { 1.0 } else { 0.0 });
println!("\n4. Thresholded at {} (binary image):", threshold);
for i in 0..thresholded.nrows() {
    for j in 0..thresholded.ncols() {
        print!("{:3.0} ", thresholded.get(i, j).unwrap());
    }
    println!();
}

// 5. Edge detection simulation (simple difference)
println!("\n5. Simple edge detection (horizontal differences):");
for i in 0..test_image.nrows() {
    for j in 0..(test_image.ncols()-1) {
        let curr = test_image.get(i, j).unwrap();
        let next = test_image.get(i, j+1).unwrap();
        let diff = (next - curr).abs();
        print!("{:5.2} ", diff);
    }
    println!();
}

// Image processing pipeline
let pipeline_result = test_image
    .map(|x| (x + 0.1).clamp(0.0, 1.0))  // Brightness
    .map(|x| x.powf(0.8))                 // Gamma
    .map(|x| ((x - 0.5) * 1.2 + 0.5).clamp(0.0, 1.0)); // Contrast

println!("\n6. Processing pipeline (brightness → gamma → contrast):");
for i in 0..pipeline_result.nrows().min(4) {
    for j in 0..pipeline_result.ncols().min(4) {
        print!("{:5.2} ", pipeline_result.get(i, j).unwrap());
    }
    println!("...");
}
println!("...");

println!("\nImage Processing Summary:");
println!("✓ Brightness adjustment with clamping");
println!("✓ Contrast enhancement with linear scaling");
println!("✓ Gamma correction for tone mapping");
println!("✓ Binary thresholding for segmentation");
println!("✓ Simple edge detection via differences");
println!("✓ Multi-step processing pipelines");

✓ Prefer scalar operations when applicable
✓ Order matrix multiplications to minimize intermediate sizes
✓ Use map() for complex element-wise transformations
Real-World Example: Image Processing
Created 8x8 synthetic image
Original image:
 0.10  0.20  0.30  0.40  0.50  0.60  0.70  0.80 
 0.20  0.40  0.60  0.80  1.00  0.80  0.60  0.40 
 0.30  0.60  0.90  1.00  1.00  0.90  0.60  0.30 
 0.40  0.80  1.00  1.00  1.00  1.00  0.80  0.40 
 0.50  1.00  1.00  1.00  1.00  1.00  1.00  0.50 
 0.60  0.80  0.90  1.00  1.00  0.90  0.80  0.60 
 0.70  0.60  0.60  0.80  0.80  0.60  0.60  0.70 
 0.80  0.40  0.30  0.40  0.40  0.30  0.40  0.80 

Image statistics:
  Min: 0.1000
  Max: 1.0000
  Mean: 0.6750
  Total pixels: 64

Image Processing Transformations:

1. Brightened image (x + 0.2, clamped to [0,1]):
 0.30  0.40  0.50  0.60 ...
 0.40  0.60  0.80  1.00 ...
 0.50  0.80  1.00  1.00 ...
 0.60  1.00  1.00  1.00 ...
...

2. Contrast enhanced ((x-0.5)*1.5+0.5):
 0.00  0.05  0.20  0.35 ...
 0.05  0.35  0.65 

## Summary

This notebook covered broadcasting and advanced array operations in rustlab-math:

### Key Broadcasting Concepts:
- **Scalar Broadcasting**: Arrays can be combined with scalars (e.g., `array + 10.0`)
- **Same-Shape Operations**: Arrays of identical dimensions work seamlessly
- **Element-wise Transformations**: Use `map()` for complex per-element operations
- **Memory Efficiency**: Operations are optimized for performance

### Advanced Techniques Learned:
- **Element Access**: Safe indexing with `get(i, j).unwrap()`
- **Shape Manipulation**: Transposition and different array layouts
- **Performance Patterns**: Cache-friendly access, operation ordering
- **Vectorized Operations**: Using `map()` for mathematical functions
- **Real-world Applications**: Image processing, feature standardization

### Current Broadcasting Support:
✅ **Scalar operations**: `array + scalar`, `array * scalar`
✅ **Same-shape arrays**: `array1 + array2` (when dimensions match)
✅ **Element-wise functions**: `map()` with mathematical operations
✅ **Matrix operations**: Clear distinction between `*` (element-wise) and `^` (matrix multiplication)

⚠️  **Limited**: True NumPy-style broadcasting between different shapes not yet implemented

### Performance Guidelines:
1. **Use scalar operations** when possible (fastest)
2. **Access arrays row-major** for cache efficiency  
3. **Chain operations with map()** to avoid temporaries
4. **Order matrix multiplications** to minimize intermediate sizes
5. **Prefer same-shape operations** over mixed dimensions

### Key Methods Available:
- **Creation**: `zeros()`, `ones()`, `array64![]`
- **Element-wise**: `map()`, `map_with_index()`
- **Arithmetic**: `+`, `-`, `*` (scalar and element-wise), `^` (matrix multiplication)
- **Transformations**: `transpose()`, `clamp()`
- **Access**: `get()`, `nrows()`, `ncols()`

### Practical Applications Demonstrated:
- Image processing pipelines (brightness, contrast, gamma correction)
- Feature standardization for machine learning
- Softmax function with numerical stability
- Performance optimization techniques

**Next**: Mathematical functions and constants →