---
**These materials are created by Prof. Ramesh Babu exclusively for M.Tech Students of SRM University**

© 2025 Prof. Ramesh Babu. All rights reserved. This material is protected by copyright and may not be reproduced, distributed, or transmitted in any form or by any means without prior written permission.

---

# 📊 T3-Exercise-4: Reduction Operations - Aggregating Intelligence
**Deep Neural Network Architectures (21CSE558T) - Week 2, Day 4**  
**M.Tech Lab Session - Duration: 30-45 minutes**

---

## 🎯 LEARNING OBJECTIVES
By the end of this exercise, you will:
- 🔢 Master the **fundamental aggregators**: Sum, Mean, Max, Min
- 📐 Understand **axis operations** and dimensional thinking
- 📊 Apply **statistical operations**: Variance, Standard Deviation, Moments
- 🧠 Build **real neural network components**: Loss functions, Attention, Normalization
- 🎯 Create **practical applications**: Metrics, Batch Statistics, Feature Selection
- 🔍 Debug **shape-related** reduction problems like a pro

## 🔗 CONNECTION TO NEURAL NETWORKS
Reduction operations are the **aggregation engines** of AI:
- 📉 **Loss Functions** → Reduce prediction errors to single numbers
- 📊 **Batch Normalization** → Aggregate statistics across batches
- 🎯 **Attention Mechanisms** → Weighted aggregation of information
- 📈 **Metrics & Evaluation** → Summarize model performance
- 🔍 **Feature Selection** → Find most important activations

**Mind-blowing insight:** Every neural network decision involves aggregating information! 🤯

## 📚 PREREQUISITES
- ✅ T3-Exercise-1 (Tensor Fundamentals)
- ✅ T3-Exercise-2 (Mathematical Operations) 
- ✅ T3-Exercise-3 (Activation Functions)
- 📐 Understanding of array dimensions and shapes

## ⚙️ SETUP & AGGREGATION TOOLKIT
🧮 Preparing our intelligence aggregation laboratory!

In [None]:
# 🧮 Complete toolkit for reduction operations
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import sys

# Set up beautiful visualizations
plt.style.use('default')
sns.set_palette("viridis")
np.random.seed(42)  # For reproducible examples
tf.random.set_seed(42)

# 🔧 Environment check
print("📊 REDUCTION OPERATIONS LABORATORY")
print("=" * 39)
print(f"🐍 Python: {sys.version.split()[0]}")
print(f"🔥 TensorFlow: {tf.__version__}")
print(f"🔢 NumPy: {np.__version__}")
print(f"📊 Visualization: Ready for data insights!")

# 🎮 Computational readiness
if tf.config.list_physical_devices('GPU'):
    print("🚀 GPU: Ready for massive parallel reductions!")
else:
    print("💻 CPU: Perfect for learning aggregation patterns!")

print("\n🧮 Ready to aggregate intelligence!\n")

# Helper function for beautiful tensor visualization
def visualize_tensor_reduction(tensor, title, reduction_type="sum", axis=None):
    """Create beautiful visualizations of tensor reductions"""
    if len(tensor.shape) == 2:
        fig, axes = plt.subplots(1, 3, figsize=(15, 4))
        
        # Original tensor
        im1 = axes[0].imshow(tensor, cmap='viridis', aspect='auto')
        axes[0].set_title(f'Original Tensor\n{tensor.shape}', fontweight='bold')
        axes[0].set_xlabel('Columns')
        axes[0].set_ylabel('Rows')
        plt.colorbar(im1, ax=axes[0])
        
        # Add values to cells
        for i in range(tensor.shape[0]):
            for j in range(tensor.shape[1]):
                axes[0].text(j, i, f'{tensor[i,j]:.1f}', ha='center', va='center', 
                           color='white' if tensor[i,j] < tf.reduce_mean(tensor) else 'black')
        
        if axis != 1:  # Row reduction (axis=0)
            if reduction_type == "sum":
                result = tf.reduce_sum(tensor, axis=0)
            elif reduction_type == "mean":
                result = tf.reduce_mean(tensor, axis=0)
            elif reduction_type == "max":
                result = tf.reduce_max(tensor, axis=0)
            
            axes[1].bar(range(len(result)), result, color='coral')
            axes[1].set_title(f'{reduction_type.capitalize()} along Rows (axis=0)\nResult shape: {result.shape}', fontweight='bold')
            axes[1].set_xlabel('Column Index')
            axes[1].set_ylabel(f'{reduction_type.capitalize()} Value')
            
            for i, v in enumerate(result):
                axes[1].text(i, v + 0.1, f'{v:.1f}', ha='center', fontweight='bold')
        
        if axis != 0:  # Column reduction (axis=1)
            if reduction_type == "sum":
                result = tf.reduce_sum(tensor, axis=1)
            elif reduction_type == "mean":
                result = tf.reduce_mean(tensor, axis=1)
            elif reduction_type == "max":
                result = tf.reduce_max(tensor, axis=1)
            
            axes[2].barh(range(len(result)), result, color='lightgreen')
            axes[2].set_title(f'{reduction_type.capitalize()} along Columns (axis=1)\nResult shape: {result.shape}', fontweight='bold')
            axes[2].set_ylabel('Row Index')
            axes[2].set_xlabel(f'{reduction_type.capitalize()} Value')
            axes[2].invert_yaxis()
            
            for i, v in enumerate(result):
                axes[2].text(v + 0.1, i, f'{v:.1f}', va='center', fontweight='bold')
        
        plt.suptitle(f'📊 {title}', fontsize=16, fontweight='bold')
        plt.tight_layout()
        plt.show()

print("🎨 Visualization toolkit ready!")

## 🧠 CORE CONCEPTS: The Art of Aggregation

### 🎭 **What Are Reduction Operations?**

**🔄 Simple Definition:**
Reduction operations take a tensor and **"squeeze"** it along one or more dimensions, aggregating information into fewer numbers.

**🎯 The Magic Formula:**
```
Many Numbers → Intelligent Aggregation → Fewer, More Meaningful Numbers
```

### 📐 **Understanding Axes (Dimensions):**

**🎪 Think of a tensor as a theater:**
- **Axis 0** (rows): Different audience members
- **Axis 1** (columns): Different time moments
- **Reducing axis 0**: "What happened on average across all audience members?"
- **Reducing axis 1**: "How did each person react over time?"

### 🧮 **The Aggregation Family:**

1. **➕ Sum Family** (tf.reduce_sum)
   - Adds everything up
   - Use case: Total loss, feature importance

2. **📊 Average Family** (tf.reduce_mean)
   - Finds the typical value
   - Use case: Batch statistics, performance metrics

3. **🏆 Extremes Family** (tf.reduce_max, tf.reduce_min)
   - Finds champions and laggards
   - Use case: Max pooling, feature selection

4. **📈 Statistical Family** (tf.reduce_std, tf.math.reduce_variance)
   - Measures spread and variability
   - Use case: Normalization, uncertainty quantification

### 🎯 **Why Neural Networks LOVE Reductions:**
- **Decision Making**: Aggregate evidence to make predictions
- **Efficiency**: Compress information without losing essence
- **Stability**: Average out noise and focus on signal
- **Scalability**: Handle variable-sized inputs

## ➕ STEP 1: The Sum Family - Adding It All Up
### 🏗️ Building the foundation of aggregation!

In [None]:
# ➕ Basic Sum Operations
print("➕ THE SUM FAMILY: Adding Intelligence Together")
print("=" * 46)

# Create a sample tensor (like a batch of feature activations)
sample_data = tf.constant([[1.0, 2.0, 3.0, 4.0],
                          [5.0, 6.0, 7.0, 8.0],
                          [9.0, 10.0, 11.0, 12.0]], dtype=tf.float32)

print("🎲 Sample Data (imagine: 3 samples, 4 features each):")
print(sample_data)
print(f"📏 Shape: {sample_data.shape}")
print()

# Different sum operations
total_sum = tf.reduce_sum(sample_data)
sum_axis0 = tf.reduce_sum(sample_data, axis=0)  # Sum across samples
sum_axis1 = tf.reduce_sum(sample_data, axis=1)  # Sum across features

print("🔢 Sum Operations Results:")
print(f"   🌍 Total sum (all elements): {total_sum.numpy()}")
print(f"   ⬇️ Sum axis=0 (across samples): {sum_axis0.numpy()}")
print(f"      📊 Shape: {sum_axis0.shape} - One sum per feature")
print(f"   ➡️ Sum axis=1 (across features): {sum_axis1.numpy()}")
print(f"      📊 Shape: {sum_axis1.shape} - One sum per sample")
print()

print("🧠 Neural Network Applications:")
print("   • Total sum: Overall activation magnitude")
print("   • Sum axis=0: Feature importance across batch")
print("   • Sum axis=1: Sample activation strength")
print()

# Visualize the sum operation
visualize_tensor_reduction(sample_data, "Sum Operations Visualization", "sum")

In [None]:
# 🎯 Practical Application: Building a Loss Function
print("🎯 BUILDING A LOSS FUNCTION WITH SUMS")
print("=" * 37)

# Simulate predictions and true values
predictions = tf.constant([[0.8, 0.1, 0.1],   # Sample 1: Confident in class 0
                          [0.3, 0.6, 0.1],   # Sample 2: Confident in class 1  
                          [0.2, 0.2, 0.6]])  # Sample 3: Confident in class 2

true_labels = tf.constant([[1.0, 0.0, 0.0],   # Sample 1: Actually class 0 ✅
                          [0.0, 1.0, 0.0],   # Sample 2: Actually class 1 ✅
                          [0.0, 0.0, 1.0]])  # Sample 3: Actually class 2 ✅

print("🎲 Classification Scenario:")
print(f"   Predictions shape: {predictions.shape} (3 samples, 3 classes)")
print(f"   True labels shape: {true_labels.shape}")
print()

print("📊 Sample-by-sample breakdown:")
class_names = ['🐱 Cat', '🐶 Dog', '🐦 Bird']
for i in range(3):
    pred_class = tf.argmax(predictions[i])
    true_class = tf.argmax(true_labels[i])
    print(f"   Sample {i+1}: Predicted {class_names[pred_class]} | True {class_names[true_class]}")
print()

# Calculate Mean Squared Error using reductions
squared_errors = tf.square(predictions - true_labels)
sample_losses = tf.reduce_sum(squared_errors, axis=1)  # Sum across classes for each sample
total_loss = tf.reduce_mean(sample_losses)  # Average across samples

print("📉 Loss Calculation Step-by-Step:")
print(f"   1️⃣ Squared errors shape: {squared_errors.shape}")
print(f"   2️⃣ Sample losses (sum per sample): {sample_losses.numpy()}")
print(f"   3️⃣ Final loss (mean across samples): {total_loss.numpy():.4f}")
print()

print("✨ What we learned:")
print("   • Sum reduces classes dimension (3) → (1) per sample")
print("   • Mean reduces samples dimension (3) → (1) total loss")
print("   • This is how neural networks measure their mistakes!")
print()

## 📊 STEP 2: The Average Family - Finding the Typical
### 🎯 The most important aggregation in machine learning!

In [None]:
# 📊 Mean Operations
print("📊 THE MEAN FAMILY: Finding the Typical Value")
print("=" * 44)

# Create batch data (simulate a mini-batch from training)
batch_size = 4
feature_size = 5
batch_data = tf.random.normal([batch_size, feature_size], mean=10, stddev=3)

print(f"🎲 Mini-batch Data ({batch_size} samples, {feature_size} features):")
print(batch_data)
print()

# Different mean operations
global_mean = tf.reduce_mean(batch_data)
feature_means = tf.reduce_mean(batch_data, axis=0)  # Mean per feature
sample_means = tf.reduce_mean(batch_data, axis=1)   # Mean per sample

print("📈 Mean Analysis:")
print(f"   🌍 Global mean: {global_mean.numpy():.3f}")
print(f"   📊 Feature means: {feature_means.numpy()}")
print(f"      📏 Shape: {feature_means.shape} (one mean per feature)")
print(f"   👤 Sample means: {sample_means.numpy()}")
print(f"      📏 Shape: {sample_means.shape} (one mean per sample)")
print()

print("🧠 Neural Network Insights:")
print("   • Feature means: Typical activation level per feature")
print("   • Sample means: Overall activation level per sample")
print("   • Critical for batch normalization!")
print()

# Visualize mean operations
visualize_tensor_reduction(batch_data, "Mean Operations Visualization", "mean")

In [None]:
# 🔧 Practical Application: Batch Normalization
print("🔧 BATCH NORMALIZATION: Using Means for Stability")
print("=" * 50)

# Simulate activations from a layer (before normalization)
raw_activations = tf.random.normal([8, 6], mean=15, stddev=5)  # 8 samples, 6 neurons

print("⚡ Raw Layer Activations (before normalization):")
print(f"   Shape: {raw_activations.shape}")
print(f"   Global mean: {tf.reduce_mean(raw_activations).numpy():.3f}")
print(f"   Global std: {tf.math.reduce_std(raw_activations).numpy():.3f}")
print()

# Batch normalization implementation
batch_mean = tf.reduce_mean(raw_activations, axis=0)  # Mean per feature
batch_var = tf.math.reduce_variance(raw_activations, axis=0)  # Variance per feature
batch_std = tf.sqrt(batch_var + 1e-8)  # Add epsilon for numerical stability

# Normalize: (x - mean) / std
normalized_activations = (raw_activations - batch_mean) / batch_std

print("🔧 Batch Normalization Process:")
print(f"   1️⃣ Batch means per feature: {batch_mean.numpy()}")
print(f"   2️⃣ Batch stds per feature: {batch_std.numpy()}")
print()

print("✨ After Normalization:")
print(f"   📊 New global mean: {tf.reduce_mean(normalized_activations).numpy():.6f}")
print(f"   📏 New global std: {tf.math.reduce_std(normalized_activations).numpy():.6f}")
print(f"   📈 Feature means: {tf.reduce_mean(normalized_activations, axis=0).numpy()}")
print(f"   📐 Feature stds: {tf.math.reduce_std(normalized_activations, axis=0).numpy()}")
print()

print("🎯 Batch Normalization Benefits:")
print("   • Each feature now has mean ≈ 0, std ≈ 1")
print("   • Prevents internal covariate shift")
print("   • Allows higher learning rates")
print("   • Acts as regularization")
print()

# Create comparison visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Before normalization
ax1.hist(raw_activations.numpy().flatten(), bins=20, alpha=0.7, color='red', edgecolor='black')
ax1.set_title('🔴 Before Batch Normalization', fontweight='bold')
ax1.set_xlabel('Activation Value')
ax1.set_ylabel('Frequency')
ax1.axvline(tf.reduce_mean(raw_activations), color='darkred', linestyle='--', linewidth=2, label=f'Mean: {tf.reduce_mean(raw_activations).numpy():.2f}')
ax1.legend()

# After normalization
ax2.hist(normalized_activations.numpy().flatten(), bins=20, alpha=0.7, color='green', edgecolor='black')
ax2.set_title('🟢 After Batch Normalization', fontweight='bold')
ax2.set_xlabel('Activation Value')
ax2.set_ylabel('Frequency')
ax2.axvline(tf.reduce_mean(normalized_activations), color='darkgreen', linestyle='--', linewidth=2, label=f'Mean: {tf.reduce_mean(normalized_activations).numpy():.3f}')
ax2.legend()

plt.suptitle('📊 Batch Normalization: Before vs After', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

## 🏆 STEP 3: The Extremes Family - Finding Champions
### 🔍 Max and Min operations for feature selection and pooling!

In [None]:
# 🏆 Max and Min Operations
print("🏆 THE EXTREMES FAMILY: Finding Champions and Laggards")
print("=" * 56)

# Create feature maps (like from a convolutional layer)
feature_maps = tf.random.uniform([3, 4], 0, 10, dtype=tf.float32)
feature_maps = tf.round(feature_maps)  # Round for cleaner display

print("🗺️ Feature Maps (3×4 - like small CNN feature maps):")
print(feature_maps)
print()

# Different max/min operations
global_max = tf.reduce_max(feature_maps)
global_min = tf.reduce_min(feature_maps)
max_per_row = tf.reduce_max(feature_maps, axis=1)  # Max in each row
max_per_col = tf.reduce_max(feature_maps, axis=0)  # Max in each column
min_per_row = tf.reduce_min(feature_maps, axis=1)  # Min in each row
min_per_col = tf.reduce_min(feature_maps, axis=0)  # Min in each column

print("🏅 Extreme Value Analysis:")
print(f"   🥇 Global maximum: {global_max.numpy()}")
print(f"   🥉 Global minimum: {global_min.numpy()}")
print(f"   ➡️ Max per row: {max_per_row.numpy()}")
print(f"   ⬇️ Max per column: {max_per_col.numpy()}")
print(f"   ➡️ Min per row: {min_per_row.numpy()}")
print(f"   ⬇️ Min per column: {min_per_col.numpy()}")
print()

print("🧠 Neural Network Applications:")
print("   • Global max/min: Overall feature activation range")
print("   • Max pooling: Dimensionality reduction in CNNs")
print("   • Feature selection: Identify most/least active features")
print()

# Visualize max operations
visualize_tensor_reduction(feature_maps, "Max Operations Visualization", "max")

In [None]:
# 🏊‍♂️ Practical Application: Max Pooling Implementation
print("🏊‍♂️ MAX POOLING: CNN's Dimension Reduction Hero")
print("=" * 46)

# Simulate a larger feature map
feature_map = tf.random.uniform([6, 6], 0, 10, dtype=tf.float32)
feature_map = tf.round(feature_map)  # Round for clarity

print("🗺️ Original Feature Map (6×6):")
print(feature_map)
print()

# Implement 2×2 max pooling manually for educational purposes
def manual_max_pool_2x2(tensor):
    """Manual 2x2 max pooling for educational visualization"""
    h, w = tensor.shape
    pooled = []
    
    print("🔍 2×2 Max Pooling Process:")
    for i in range(0, h, 2):
        row = []
        for j in range(0, w, 2):
            # Extract 2x2 patch
            patch = tensor[i:i+2, j:j+2]
            max_val = tf.reduce_max(patch)
            row.append(max_val)
            
            print(f"   Patch [{i}:{i+2}, {j}:{j+2}]: {patch.numpy().flatten()} → Max: {max_val.numpy()}")
        pooled.append(row)
    
    return tf.stack([tf.stack(row) for row in pooled])

# Apply max pooling
pooled_result = manual_max_pool_2x2(feature_map)

print("\n🏊‍♂️ After 2×2 Max Pooling:")
print(pooled_result)
print(f"📏 Shape reduction: {feature_map.shape} → {pooled_result.shape}")
print()

print("✨ Max Pooling Benefits:")
print("   • Reduces spatial dimensions by 2×")
print("   • Keeps strongest activations (most important features)")
print("   • Provides translation invariance")
print("   • Reduces computational load")
print()

# Visualize the pooling operation
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Original feature map
im1 = ax1.imshow(feature_map, cmap='viridis', aspect='equal')
ax1.set_title('🗺️ Original Feature Map (6×6)', fontweight='bold')
ax1.set_xlabel('Width')
ax1.set_ylabel('Height')
plt.colorbar(im1, ax=ax1)

# Add grid lines to show pooling regions
for i in range(0, 7, 2):
    ax1.axhline(i-0.5, color='red', linewidth=2)
    ax1.axvline(i-0.5, color='red', linewidth=2)

# Add values to cells
for i in range(6):
    for j in range(6):
        ax1.text(j, i, f'{feature_map[i,j]:.0f}', ha='center', va='center', 
               color='white', fontweight='bold')

# Pooled result
im2 = ax2.imshow(pooled_result, cmap='viridis', aspect='equal')
ax2.set_title('🏊‍♂️ After Max Pooling (3×3)', fontweight='bold')
ax2.set_xlabel('Width')
ax2.set_ylabel('Height')
plt.colorbar(im2, ax=ax2)

# Add values to pooled cells
for i in range(3):
    for j in range(3):
        ax2.text(j, i, f'{pooled_result[i,j]:.0f}', ha='center', va='center', 
               color='white', fontweight='bold')

plt.suptitle('🏊‍♂️ Max Pooling: Keeping the Champions', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

## 📈 STEP 4: Statistical Operations - Measuring Uncertainty
### 🎲 Variance, Standard Deviation, and Advanced Statistics!

In [None]:
# 📈 Statistical Reduction Operations
print("📈 STATISTICAL FAMILY: Measuring Spread and Uncertainty")
print("=" * 55)

# Create sample data with different distributions
stable_data = tf.random.normal([5, 4], mean=5, stddev=0.5)  # Low variance
volatile_data = tf.random.normal([5, 4], mean=5, stddev=2.0)  # High variance

print("🎲 Two Different Datasets:")
print(f"📊 Stable data (low variance):\n{stable_data}")
print(f"📊 Volatile data (high variance):\n{volatile_data}")
print()

# Calculate various statistics
def analyze_statistics(data, name):
    mean_val = tf.reduce_mean(data)
    var_val = tf.math.reduce_variance(data)
    std_val = tf.math.reduce_std(data)
    max_val = tf.reduce_max(data)
    min_val = tf.reduce_min(data)
    
    print(f"📊 {name} Statistics:")
    print(f"   📈 Mean: {mean_val.numpy():.3f}")
    print(f"   📏 Variance: {var_val.numpy():.3f}")
    print(f"   📐 Standard Deviation: {std_val.numpy():.3f}")
    print(f"   🏆 Max: {max_val.numpy():.3f}")
    print(f"   🥉 Min: {min_val.numpy():.3f}")
    print(f"   📊 Range: {(max_val - min_val).numpy():.3f}")
    print()
    
    return mean_val, var_val, std_val

stable_stats = analyze_statistics(stable_data, "Stable")
volatile_stats = analyze_statistics(volatile_data, "Volatile")

print("🔍 Key Insights:")
print(f"   • Stable data has std ≈ {stable_stats[2].numpy():.3f} (low uncertainty)")
print(f"   • Volatile data has std ≈ {volatile_stats[2].numpy():.3f} (high uncertainty)")
print("   • Standard deviation measures prediction reliability!")
print()

# Visualize the distributions
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Stable data distribution
ax1.hist(stable_data.numpy().flatten(), bins=15, alpha=0.7, color='blue', edgecolor='black')
ax1.axvline(stable_stats[0], color='red', linestyle='--', linewidth=2, label=f'Mean: {stable_stats[0].numpy():.2f}')
ax1.axvline(stable_stats[0] + stable_stats[2], color='orange', linestyle=':', linewidth=2, label=f'+1σ: {(stable_stats[0] + stable_stats[2]).numpy():.2f}')
ax1.axvline(stable_stats[0] - stable_stats[2], color='orange', linestyle=':', linewidth=2, label=f'-1σ: {(stable_stats[0] - stable_stats[2]).numpy():.2f}')
ax1.set_title(f'📊 Stable Data (σ={stable_stats[2].numpy():.3f})', fontweight='bold')
ax1.set_xlabel('Value')
ax1.set_ylabel('Frequency')
ax1.legend()

# Volatile data distribution
ax2.hist(volatile_data.numpy().flatten(), bins=15, alpha=0.7, color='red', edgecolor='black')
ax2.axvline(volatile_stats[0], color='blue', linestyle='--', linewidth=2, label=f'Mean: {volatile_stats[0].numpy():.2f}')
ax2.axvline(volatile_stats[0] + volatile_stats[2], color='orange', linestyle=':', linewidth=2, label=f'+1σ: {(volatile_stats[0] + volatile_stats[2]).numpy():.2f}')
ax2.axvline(volatile_stats[0] - volatile_stats[2], color='orange', linestyle=':', linewidth=2, label=f'-1σ: {(volatile_stats[0] - volatile_stats[2]).numpy():.2f}')
ax2.set_title(f'📊 Volatile Data (σ={volatile_stats[2].numpy():.3f})', fontweight='bold')
ax2.set_xlabel('Value')
ax2.set_ylabel('Frequency')
ax2.legend()

plt.suptitle('📈 Understanding Variance: Stable vs Volatile Data', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# 🎲 Practical Application: Uncertainty Quantification
print("🎲 UNCERTAINTY QUANTIFICATION: AI Confidence Estimation")
print("=" * 55)

# Simulate multiple model predictions (like in an ensemble)
n_models = 10
n_samples = 5
n_classes = 3

# Each model makes predictions for 5 samples across 3 classes
ensemble_predictions = tf.random.uniform([n_models, n_samples, n_classes], 0, 1)
# Normalize to make them proper probabilities
ensemble_predictions = tf.nn.softmax(ensemble_predictions, axis=-1)

print(f"🤖 Ensemble of {n_models} models predicting {n_samples} samples:")
print(f"   Shape: {ensemble_predictions.shape}")
print()

# Calculate ensemble statistics
mean_predictions = tf.reduce_mean(ensemble_predictions, axis=0)  # Average across models
std_predictions = tf.math.reduce_std(ensemble_predictions, axis=0)  # Uncertainty across models
var_predictions = tf.math.reduce_variance(ensemble_predictions, axis=0)

print("📊 Ensemble Analysis:")
print("Sample\tClass\tMean Prob\tStd Dev\tConfidence")
print("-" * 50)

class_names = ['🐱 Cat', '🐶 Dog', '🐦 Bird']
for sample_idx in range(n_samples):
    for class_idx in range(n_classes):
        mean_prob = mean_predictions[sample_idx, class_idx]
        std_dev = std_predictions[sample_idx, class_idx]
        
        # High confidence = high probability + low std
        confidence = "High" if mean_prob > 0.5 and std_dev < 0.1 else "Low" if std_dev > 0.2 else "Med"
        
        print(f"{sample_idx+1}\t{class_names[class_idx]}\t{mean_prob.numpy():.3f}\t\t{std_dev.numpy():.3f}\t{confidence}")
    print()

# Find most confident predictions
predicted_classes = tf.argmax(mean_predictions, axis=1)
max_probs = tf.reduce_max(mean_predictions, axis=1)
prediction_uncertainty = tf.reduce_mean(std_predictions, axis=1)  # Average uncertainty per sample

print("🎯 Final Predictions with Confidence:")
print("Sample\tPrediction\tProb\tUncertainty\tStatus")
print("-" * 50)
for i in range(n_samples):
    pred_class = predicted_classes[i]
    prob = max_probs[i]
    uncertainty = prediction_uncertainty[i]
    
    if prob > 0.7 and uncertainty < 0.1:
        status = "✅ Confident"
    elif uncertainty > 0.2:
        status = "⚠️ Uncertain"
    else:
        status = "🤔 Moderate"
    
    print(f"{i+1}\t{class_names[pred_class]}\t{prob.numpy():.3f}\t{uncertainty.numpy():.3f}\t\t{status}")

print("\n💡 Key Insights:")
print("   • Low standard deviation = High model agreement = High confidence")
print("   • High standard deviation = Model disagreement = Uncertainty")
print("   • This helps AI systems know when they don't know!")
print()

## 🎯 STEP 5: Advanced Applications - Attention & Feature Selection
### 🧠 Where reduction operations become the brain of AI!

In [None]:
# 🎯 Attention Mechanism Implementation
print("🎯 ATTENTION MECHANISM: Weighted Aggregation Intelligence")
print("=" * 58)

# Simulate sequence data (like words in a sentence)
sequence_length = 6
feature_dim = 4

# Create sequence representations (like word embeddings)
sequence = tf.random.normal([sequence_length, feature_dim], mean=0, stddev=1)
words = ['🌟', '🚀', '🧠', '💡', '⚡', '🎯']  # Emoji representations

print(f"📝 Input Sequence ({sequence_length} tokens, {feature_dim} features each):")
for i, word in enumerate(words):
    print(f"   {word} Token {i}: {sequence[i].numpy()}")
print()

# Calculate attention scores (simplified self-attention)
query = sequence[2]  # Let's focus on the 3rd token (🧠)
print(f"🔍 Query Token: {words[2]} {query.numpy()}")
print()

# Calculate attention scores using dot product
attention_scores = tf.reduce_sum(sequence * query, axis=1)  # Dot product with query
attention_weights = tf.nn.softmax(attention_scores)  # Convert to probabilities

print("🎭 Attention Analysis:")
print("Token\tScore\tWeight\tAttention")
print("-" * 35)
for i, (word, score, weight) in enumerate(zip(words, attention_scores, attention_weights)):
    attention_level = "🔥" if weight > 0.25 else "⚡" if weight > 0.15 else "💫"
    print(f"{word}\t{score.numpy():.3f}\t{weight.numpy():.3f}\t{attention_level}")

print()
print(f"✅ Attention weights sum: {tf.reduce_sum(attention_weights).numpy():.6f}")
print()

# Calculate attended representation (weighted sum)
attended_representation = tf.reduce_sum(
    sequence * tf.expand_dims(attention_weights, 1),  # Broadcast weights
    axis=0  # Sum across sequence length
)

print("🧠 Attention Output:")
print(f"   📊 Original query: {query.numpy()}")
print(f"   ✨ Attended repr: {attended_representation.numpy()}")
print()

print("💡 What Just Happened?")
print("   • Each token got an attention score based on similarity to query")
print("   • Scores converted to probabilities (softmax)")
print("   • Final representation = weighted sum of all tokens")
print("   • This is how transformers focus on relevant information!")
print()

# Visualize attention weights
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Attention weights visualization
bars = ax1.bar(words, attention_weights.numpy(), color='skyblue', edgecolor='navy')
ax1.set_title(f'🎯 Attention Weights for Query {words[2]}', fontweight='bold')
ax1.set_ylabel('Attention Weight')
ax1.set_xlabel('Tokens')

# Highlight the query token
bars[2].set_color('orange')
bars[2].set_edgecolor('red')

# Add value labels
for i, (bar, weight) in enumerate(zip(bars, attention_weights)):
    ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
             f'{weight.numpy():.3f}', ha='center', fontweight='bold')

# Attention heatmap
attention_matrix = tf.expand_dims(attention_weights, 0)
im = ax2.imshow(attention_matrix, cmap='Blues', aspect='auto')
ax2.set_title('🔥 Attention Heatmap', fontweight='bold')
ax2.set_xticks(range(len(words)))
ax2.set_xticklabels(words)
ax2.set_yticks([0])
ax2.set_yticklabels([f'Query {words[2]}'])
plt.colorbar(im, ax=ax2)

plt.suptitle('🎯 Attention Mechanism: How AI Focuses', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# 🔍 Feature Selection Using Reduction Operations
print("🔍 FEATURE SELECTION: Finding the Most Important Features")
print("=" * 58)

# Simulate high-dimensional data (like gene expression or image features)
n_samples = 100
n_features = 20
feature_data = tf.random.normal([n_samples, n_features], mean=0, stddev=1)

# Add some informative features (simulate signal)
# Make features 3, 7, 12, 18 more informative
informative_indices = [3, 7, 12, 18]
for idx in informative_indices:
    feature_data = tf.tensor_scatter_nd_add(
        feature_data, 
        [[i, idx] for i in range(n_samples)],
        tf.random.normal([n_samples], mean=2, stddev=0.5)
    )

print(f"🎲 Dataset: {n_samples} samples × {n_features} features")
print(f"🎯 Informative features (with signal): {informative_indices}")
print()

# Feature importance analysis using various reduction metrics
feature_means = tf.reduce_mean(tf.abs(feature_data), axis=0)  # Mean absolute values
feature_stds = tf.math.reduce_std(feature_data, axis=0)       # Standard deviations
feature_max = tf.reduce_max(tf.abs(feature_data), axis=0)     # Maximum absolute values
feature_variance = tf.math.reduce_variance(feature_data, axis=0)  # Variances

print("📊 Feature Importance Analysis:")
print("Feature\tMean|x|\tStd Dev\tMax|x|\tVariance\tScore")
print("-" * 55)

# Combined importance score
importance_scores = feature_means * feature_stds  # Simple combination

for i in range(n_features):
    marker = "🎯" if i in informative_indices else "📊"
    print(f"{marker} {i:2d}\t{feature_means[i].numpy():.3f}\t{feature_stds[i].numpy():.3f}\t{feature_max[i].numpy():.3f}\t{feature_variance[i].numpy():.3f}\t{importance_scores[i].numpy():.3f}")

print()

# Rank features by importance
top_k = 5
top_indices = tf.nn.top_k(importance_scores, k=top_k).indices

print(f"🏆 Top {top_k} Most Important Features:")
for rank, idx in enumerate(top_indices.numpy()):
    medal = "🥇" if rank == 0 else "🥈" if rank == 1 else "🥉" if rank == 2 else "🏅"
    signal = "✅ SIGNAL" if idx in informative_indices else "❌ noise"
    print(f"   {medal} Rank {rank+1}: Feature {idx} (score: {importance_scores[idx].numpy():.3f}) - {signal}")

print()

# Calculate detection accuracy
detected_informative = sum(1 for idx in top_indices.numpy() if idx in informative_indices)
detection_rate = detected_informative / len(informative_indices)

print(f"🎯 Feature Selection Performance:")
print(f"   Detected {detected_informative}/{len(informative_indices)} informative features")
print(f"   Detection rate: {detection_rate:.2%}")
print()

# Visualize feature importance
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 10))

# Feature importance scores
colors = ['red' if i in informative_indices else 'lightblue' for i in range(n_features)]
bars = ax1.bar(range(n_features), importance_scores.numpy(), color=colors, edgecolor='black')
ax1.set_title('🔍 Feature Importance Scores', fontweight='bold')
ax1.set_xlabel('Feature Index')
ax1.set_ylabel('Importance Score')
ax1.axhline(tf.reduce_mean(importance_scores), color='green', linestyle='--', 
           label=f'Mean: {tf.reduce_mean(importance_scores).numpy():.3f}')
ax1.legend()

# Highlight top features
for idx in top_indices.numpy():
    bars[idx].set_edgecolor('gold')
    bars[idx].set_linewidth(3)

# Feature distribution comparison
informative_data = tf.gather(feature_data, informative_indices, axis=1)
noise_indices = [i for i in range(n_features) if i not in informative_indices]
noise_data = tf.gather(feature_data, noise_indices[:4], axis=1)  # Sample 4 noise features

ax2.hist(informative_data.numpy().flatten(), bins=30, alpha=0.7, label='🎯 Informative Features', 
         color='red', density=True)
ax2.hist(noise_data.numpy().flatten(), bins=30, alpha=0.7, label='📊 Noise Features', 
         color='lightblue', density=True)
ax2.set_title('🎲 Feature Value Distributions', fontweight='bold')
ax2.set_xlabel('Feature Value')
ax2.set_ylabel('Density')
ax2.legend()

plt.tight_layout()
plt.show()

print("💡 Feature Selection Insights:")
print("   • Reduction operations help identify patterns in high-dimensional data")
print("   • Mean and std dev together capture signal strength")
print("   • This is the foundation of feature engineering in ML!")
print()

## ✅ PRACTICAL VALIDATION & DEBUGGING
### 🔧 Master the common challenges and debug like a pro!

In [None]:
# 🔧 Reduction Operations Debugging Challenge
print("🔧 DEBUGGING CHALLENGE: Common Reduction Pitfalls")
print("=" * 50)

# Create test scenarios that often confuse students
test_tensor = tf.random.normal([3, 4, 5])  # 3D tensor

print(f"🎲 Test Tensor Shape: {test_tensor.shape}")
print("   Think: (batch_size=3, height=4, width=5)")
print()

# Test different reduction scenarios
scenarios = [
    ("No axis specified", lambda x: tf.reduce_sum(x)),
    ("Axis=0 (across batches)", lambda x: tf.reduce_sum(x, axis=0)),
    ("Axis=1 (across height)", lambda x: tf.reduce_sum(x, axis=1)),
    ("Axis=2 (across width)", lambda x: tf.reduce_sum(x, axis=2)),
    ("Axis=[0,1] (across batch and height)", lambda x: tf.reduce_sum(x, axis=[0,1])),
    ("Axis=[1,2] (across height and width)", lambda x: tf.reduce_sum(x, axis=[1,2])),
    ("Keep dims (axis=1, keepdims=True)", lambda x: tf.reduce_sum(x, axis=1, keepdims=True))
]

print("🧪 Reduction Scenarios Analysis:")
print("Description\t\t\tResult Shape\tDimensions Explanation")
print("-" * 80)

for desc, operation in scenarios:
    try:
        result = operation(test_tensor)
        explanation = ""
        if len(result.shape) == 0:
            explanation = "Scalar (all dimensions reduced)"
        elif len(result.shape) == 1:
            explanation = f"1D vector of length {result.shape[0]}"
        elif len(result.shape) == 2:
            explanation = f"2D matrix {result.shape[0]}×{result.shape[1]}"
        elif len(result.shape) == 3:
            explanation = f"3D tensor {result.shape[0]}×{result.shape[1]}×{result.shape[2]}"
        
        print(f"{desc:<30}\t{str(result.shape):<12}\t{explanation}")
    except Exception as e:
        print(f"{desc:<30}\tERROR\t\t{str(e)[:50]}...")

print()
print("🎯 Key Debugging Tips:")
print("   • axis=None → Reduces ALL dimensions to scalar")
print("   • axis=0 → Reduces first dimension")
print("   • axis=-1 → Reduces last dimension")
print("   • axis=[0,1] → Reduces multiple dimensions")
print("   • keepdims=True → Keeps dimensions as size 1")
print()

# Common mistake demonstration
print("⚠️ COMMON MISTAKE DEMONSTRATION:")
print("=" * 35)

# Simulating a common batch processing error
batch_predictions = tf.random.uniform([10, 5], 0, 1)  # 10 samples, 5 classes
batch_predictions = tf.nn.softmax(batch_predictions)

print(f"📊 Batch predictions shape: {batch_predictions.shape}")
print()

# Wrong way (reduces wrong axis)
wrong_avg = tf.reduce_mean(batch_predictions, axis=1)  # Average across classes
print(f"❌ WRONG: Mean across classes (axis=1): {wrong_avg.shape}")
print(f"   Result: {wrong_avg.numpy()[:5]}  # First 5 values")
print("   This gives average probability per sample (not useful!)")
print()

# Right way (reduces correct axis)
right_avg = tf.reduce_mean(batch_predictions, axis=0)  # Average across samples
print(f"✅ CORRECT: Mean across samples (axis=0): {right_avg.shape}")
print(f"   Result: {right_avg.numpy()}")
print("   This gives average probability per class (useful!)")
print()

print("💡 Remember: Always think about WHAT you're averaging!")
print("   • Across samples (axis=0): Gets typical behavior per feature")
print("   • Across features (axis=1): Gets summary per sample")
print()

In [None]:
# 🎉 Complete Neural Network Pipeline with Reductions
print("🎉 COMPLETE PIPELINE: Neural Network with All Reduction Types")
print("=" * 63)

# Simulate end-to-end training scenario
batch_size = 8
input_features = 10
hidden_size = 6
output_classes = 3

# Generate sample data
inputs = tf.random.normal([batch_size, input_features])
true_labels = tf.random.uniform([batch_size], 0, output_classes, dtype=tf.int32)
true_labels_onehot = tf.one_hot(true_labels, output_classes)

# Simple neural network
W1 = tf.Variable(tf.random.normal([input_features, hidden_size], stddev=0.1))
b1 = tf.Variable(tf.zeros([hidden_size]))
W2 = tf.Variable(tf.random.normal([hidden_size, output_classes], stddev=0.1))
b2 = tf.Variable(tf.zeros([output_classes]))

print(f"🏗️ Network Architecture: {input_features} → {hidden_size} → {output_classes}")
print(f"📊 Batch size: {batch_size}")
print()

# Forward pass with reduction operations
print("🚀 Forward Pass with Reduction Operations:")
print("=" * 42)

# Layer 1
hidden_pre = tf.matmul(inputs, W1) + b1
hidden_post = tf.nn.relu(hidden_pre)

# Batch statistics (like in batch normalization)
hidden_mean = tf.reduce_mean(hidden_post, axis=0)
hidden_std = tf.math.reduce_std(hidden_post, axis=0)
sparsity = tf.reduce_mean(tf.cast(hidden_post == 0, tf.float32))

print(f"🧠 Hidden Layer Analysis:")
print(f"   📊 Mean activations per neuron: {hidden_mean.numpy()}")
print(f"   📏 Std per neuron: {hidden_std.numpy()}")
print(f"   🔥 Sparsity (% zeros): {sparsity.numpy():.2%}")
print()

# Layer 2 + Output
logits = tf.matmul(hidden_post, W2) + b2
predictions = tf.nn.softmax(logits)

# Loss calculation using reductions
cross_entropy = -tf.reduce_sum(true_labels_onehot * tf.math.log(predictions + 1e-8), axis=1)
total_loss = tf.reduce_mean(cross_entropy)

# Accuracy calculation
predicted_classes = tf.argmax(predictions, axis=1)
correct_predictions = tf.cast(tf.equal(predicted_classes, tf.cast(true_labels, tf.int64)), tf.float32)
accuracy = tf.reduce_mean(correct_predictions)

print(f"📉 Loss & Metrics:")
print(f"   💔 Individual losses: {cross_entropy.numpy()}")
print(f"   📉 Total loss (mean): {total_loss.numpy():.4f}")
print(f"   🎯 Accuracy: {accuracy.numpy():.2%}")
print()

# Advanced analytics using reductions
confidence_scores = tf.reduce_max(predictions, axis=1)  # Highest probability per sample
entropy_scores = -tf.reduce_sum(predictions * tf.math.log(predictions + 1e-8), axis=1)  # Uncertainty

print(f"🔍 Prediction Analysis:")
print("Sample\tTrue\tPred\tConfidence\tEntropy\tStatus")
print("-" * 50)
for i in range(batch_size):
    confidence = confidence_scores[i]
    entropy = entropy_scores[i]
    correct = "✅" if correct_predictions[i] == 1 else "❌"
    
    status = "🔥" if confidence > 0.8 else "⚡" if confidence > 0.5 else "💫"
    
    print(f"{i+1}\t{true_labels[i].numpy()}\t{predicted_classes[i].numpy()}\t{confidence.numpy():.3f}\t\t{entropy.numpy():.3f}\t{correct}{status}")

print()
print(f"📊 Batch Statistics:")
print(f"   📈 Mean confidence: {tf.reduce_mean(confidence_scores).numpy():.3f}")
print(f"   🎲 Mean entropy: {tf.reduce_mean(entropy_scores).numpy():.3f}")
print(f"   🏆 Max confidence: {tf.reduce_max(confidence_scores).numpy():.3f}")
print(f"   🤔 Min confidence: {tf.reduce_min(confidence_scores).numpy():.3f}")
print()

print("🎉 REDUCTION OPERATIONS MASTERY ACHIEVED!")
print("You've seen how reductions power:")
print("   ➕ Loss functions (mean across samples)")
print("   📊 Batch normalization (mean/std per feature)")
print("   🏆 Max pooling (max across spatial dimensions)")
print("   🎯 Attention (weighted sum across sequence)")
print("   🔍 Feature selection (importance scores)")
print("   📈 Metrics and analytics (mean, max, min)")
print()

## 🔍 KEY TAKEAWAYS

### 📊 **Reduction Operations Mastery:**

1. **➕ Sum Family** - Building blocks of aggregation
   - `tf.reduce_sum()` - Adds everything up
   - Perfect for loss functions and feature importance
   - Axis parameter controls which dimensions to reduce

2. **📊 Mean Family** - Finding the typical
   - `tf.reduce_mean()` - Most important aggregation in ML
   - Essential for batch normalization and metrics
   - Provides stability and noise reduction

3. **🏆 Extremes Family** - Champions and selection
   - `tf.reduce_max()`, `tf.reduce_min()` - Find extremes
   - Power max pooling in CNNs
   - Enable feature selection and sparsity

4. **📈 Statistical Family** - Measuring uncertainty
   - `tf.math.reduce_std()`, `tf.math.reduce_variance()` - Quantify spread
   - Critical for normalization and uncertainty estimation
   - Help AI systems know when they don't know

### 🧠 **Neural Network Applications:**
- **Loss Functions**: Aggregate errors across samples and features
- **Batch Normalization**: Stabilize training with mean/std statistics
- **Attention Mechanisms**: Weighted aggregation of information
- **Feature Selection**: Identify most important patterns
- **Max Pooling**: Dimensionality reduction with information preservation
- **Metrics**: Performance evaluation and monitoring

### 💡 **Axis Understanding (Critical!):**
- **axis=None**: Reduce all dimensions → scalar
- **axis=0**: Reduce across samples → per-feature statistics
- **axis=1**: Reduce across features → per-sample statistics
- **axis=[0,1]**: Reduce multiple dimensions
- **keepdims=True**: Maintain dimensionality structure

### 🔧 **Debugging Wisdom:**
- Always verify shapes before and after reductions
- Think about what you're aggregating (samples vs features)
- Use reductions to convert tensors to meaningful summaries
- Combine multiple reduction types for comprehensive analysis

### 🤔 **Advanced Questions:**
- How do different reduction operations affect gradient flow?
- When should you use keepdims=True vs False?
- How can reductions help with model interpretability?

## ➡️ NEXT EXERCISE PREVIEW

### 🚀 T3-Exercise-5: Neural Network Forward Pass - Connecting It All

**Get ready to build:**
- 🏗️ **Complete Neural Networks** - From inputs to predictions
- 🔄 **Multi-layer architectures** - Stacking the building blocks
- 🎯 **Real classification tasks** - Solving actual problems
- 📊 **End-to-end pipelines** - Data → Processing → Decisions
- 🧠 **Architecture design** - Choosing layers, activations, and operations

🌟 **Coming up:** Combine tensors, operations, activations, and reductions into intelligent systems!

---

# 🎉 EXERCISE 4 COMPLETED!
## 📊 **You've mastered the aggregation engines of AI!**
### 🧮 **You understand how neural networks summarize and decide!**
#### 🚀 **Ready to build complete intelligent systems!**