# NumPy Essentials for Machine Learning (Beginner-friendly)

**Learning Objectives:**
- Learn how to create and work with NumPy arrays for ML data representation
- Master array operations, broadcasting, and reshaping for efficient ML computations
- Apply linear algebra and statistical operations used in ML algorithms
- Understand performance considerations for large-scale ML workflows

**Prerequisites:** Python basics, NumPy installed (`pip install numpy`)

**Estimated Time:** ~45 minutes

---

NumPy is the foundational library for numerical computing in Python and forms the backbone of the entire ML ecosystem. This notebook focuses on hands-on examples with practical ML context, aimed at beginners who want to understand how NumPy powers machine learning.

**Why NumPy for ML?** Every ML library (scikit-learn, TensorFlow, PyTorch) uses NumPy arrays as their core data structure. Understanding NumPy is essential for:
- Representing datasets as multi-dimensional arrays
- Implementing ML algorithms efficiently
- Understanding how neural networks process data in batches
- Debugging shape and data type issues in ML pipelines

**Learning Path:** This notebook prepares you for the Pandas notebook (data manipulation) and scikit-learn notebook (ML algorithms), where you'll see these NumPy concepts in action.

**🎯 Success Indicators:** By the end, you should be able to:
- Create and manipulate arrays for ML data representation
- Debug shape errors (the #1 beginner challenge!)
- Understand broadcasting for efficient computations
- Implement a basic neural network forward pass

**💡 Beginner Tips:**
- Don't worry about memorizing everything - focus on understanding concepts
- Run each cell and experiment with the code
- Pay special attention to shape outputs - they're crucial for ML
- When you see errors, read them carefully - NumPy errors are usually about shapes!

In [None]:
import numpy as np

# Set random seed for reproducibility
# PARAMETER EXPLANATION: seed=42
# • What it does: Makes random number generation predictable
# • Why we need it: So everyone gets the same 'random' results when running this notebook
# • Why 42? It's a reference to "The Hitchhiker's Guide to the Galaxy" - the answer to everything!
# • In ML: Essential for reproducible experiments and debugging
# • Alternative: You can use any integer (0, 123, 2024, etc.)
np.random.seed(42)

print(f"NumPy version: {np.__version__}")
print("Random seed set to 42 - now our 'random' numbers will be predictable!")

## 1. Array Creation and Basic Properties

Understanding how to create and inspect arrays is fundamental to ML workflows. In ML, data comes in many forms - images as pixel arrays, text as numerical vectors, tabular data as feature matrices. NumPy arrays provide a unified way to represent all these data types efficiently.

**ML Context:** Every dataset in ML is ultimately represented as NumPy arrays with specific shapes:
- **Images**: (batch_size, height, width, channels) 
- **Tabular data**: (samples, features)
- **Time series**: (samples, timesteps, features)
- **Neural network weights**: (input_size, output_size)

In [None]:
# Different ways to create arrays (common in ML)

# From lists (converting loaded data to NumPy format)
# This is how you might convert data from CSV files or databases
data_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]  # 3 samples, 3 features each
arr_from_list = np.array(data_list)
print("From list (typical dataset format):")
print(arr_from_list)
print(f"Shape: {arr_from_list.shape} (samples, features), Dtype: {arr_from_list.dtype}\n")

# Zeros (neural network weight initialization)
# In practice, you rarely initialize weights to zero, but it's useful for biases
weights = np.zeros((3, 4))  # 3 input neurons, 4 output neurons
print("Zeros (bias initialization):")
print(weights)
print(f"Shape: {weights.shape} (input_neurons, output_neurons)\n")

# Random arrays (realistic weight initialization and synthetic data)
# Normal distribution is common for weight initialization
random_data = np.random.randn(100, 5)  # 100 samples, 5 features
print("Random data (first 5 rows):")
print(random_data[:5])
print(f"Shape: {random_data.shape} (batch_size, num_features)\n")

# Identity matrix (useful for regularization and initialization)
# Often used in regularization terms and some initialization schemes
identity = np.eye(3)
print("Identity matrix (regularization):")
print(identity)
print("Note: We'll see this used in Pandas for data alignment and in scikit-learn for regularization")

In [None]:
# Array properties essential for ML debugging and optimization
sample_array = np.random.randn(32, 10, 8)  # Batch size 32, sequence length 10, features 8

print("Array Properties (typical ML batch for sequence data):")
print(f"Shape: {sample_array.shape} (batch_size, seq_len, features)")
print(f"Number of dimensions: {sample_array.ndim}")
print(f"Total elements: {sample_array.size}")
print(f"Data type: {sample_array.dtype}")
print(f"Memory usage: {sample_array.nbytes} bytes")
print(f"Memory usage: {sample_array.nbytes / 1024:.2f} KB")

print("\nPARAMETER DEEP DIVE: dtype (data type)")
# PARAMETER EXPLANATION: dtype
# • What it is: Specifies the data type of array elements
# • Common types: int32, int64, float32, float64, bool, complex128
# • Why it matters: Affects memory usage, computation speed, and precision
# • ML impact: float32 vs float64 can halve memory usage in large models
# • GPU compatibility: Most GPUs prefer float32 for faster computation
# • Precision trade-off: float32 has ~7 decimal digits, float64 has ~15

# Demonstrate different dtypes
int_array = np.array([1, 2, 3], dtype=np.int32)
float32_array = np.array([1.0, 2.0, 3.0], dtype=np.float32)
float64_array = np.array([1.0, 2.0, 3.0], dtype=np.float64)

print(f"int32 array: {int_array.dtype}, memory: {int_array.nbytes} bytes")
print(f"float32 array: {float32_array.dtype}, memory: {float32_array.nbytes} bytes")
print(f"float64 array: {float64_array.dtype}, memory: {float64_array.nbytes} bytes")
print(f"Memory difference: float64 uses {float64_array.nbytes / float32_array.nbytes}x more memory than float32")

print("\nWhy these properties matter in ML:")
print("• Shape: Must match model input expectations (common source of errors!)")
print("• Dtype: float32 vs float64 affects memory and computation speed")
print("• Memory: Large models require careful memory management")
print("• We'll use these properties extensively in Pandas for data validation")

## 2. Array Indexing and Slicing

Critical for data manipulation, batch processing, and feature selection. In ML, you constantly need to:
- Extract specific samples from batches
- Select subsets of features
- Filter data based on conditions
- Prepare train/validation/test splits

**ML Applications:** These indexing patterns appear everywhere in ML - from data preprocessing in Pandas to batch processing in neural networks.

In [None]:
# Create sample data representing a batch of images (common in computer vision)
# Shape: (batch_size, height, width, channels) - standard format for image data
batch_images = np.random.randint(0, 256, size=(8, 28, 28, 3))

print("Batch of images shape:", batch_images.shape)
print("Format: (batch_size=8, height=28, width=28, channels=3)")
print("\nCommon ML Indexing Patterns:")

# Get first image (single sample from batch)
first_image = batch_images[0]
print(f"First image shape: {first_image.shape} (height, width, channels)")

# Get first 4 images (creating smaller mini-batch)
mini_batch = batch_images[:4]
print(f"Mini-batch shape: {mini_batch.shape} (smaller batch for processing)")

# Get red channel from all images (feature extraction)
red_channel = batch_images[:, :, :, 0]  # Select channel 0 (red) from all images
print(f"Red channel shape: {red_channel.shape} (grayscale version)")

# Get center crop (data augmentation technique)
center_crop = batch_images[:, 7:21, 7:21, :]  # 14x14 center region
print(f"Center crop shape: {center_crop.shape} (cropped for data augmentation)")

print("\nThese patterns are essential for:")
print("• Batch processing in neural networks")
print("• Data augmentation and preprocessing")
print("• Feature extraction and selection")
print("• We'll see similar patterns in Pandas for tabular data")

In [None]:
# Boolean indexing (filtering data - crucial for ML data preprocessing)
scores = np.array([85, 92, 78, 96, 88, 73, 91, 82])
names = np.array(['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Grace', 'Henry'])

print("Student Performance Dataset:")
print("Original scores:", scores)
print("Names:", names)

# Filter high performers (score > 85) - like filtering outliers or top performers
high_performers = scores > 85
print(f"\nHigh performers mask: {high_performers}")
print(f"High performer scores: {scores[high_performers]}")
print(f"High performer names: {names[high_performers]}")

# Multiple conditions (combining filters - common in data cleaning)
good_range = (scores >= 80) & (scores <= 90)
print(f"\nScores in 80-90 range: {scores[good_range]}")

print("\nML Applications of Boolean Indexing:")
print("• Removing outliers from datasets")
print("• Filtering samples based on quality metrics")
print("• Creating train/test splits based on conditions")
print("• Selecting features that meet certain criteria")
print("• Pandas uses these same patterns for DataFrame filtering")

## 3. Array Operations and Broadcasting

Broadcasting is crucial for efficient ML computations and is used across all numerical computing libraries. It allows operations between arrays of different shapes without explicit loops, making code both faster and more readable.

**Why Broadcasting Matters in ML:**
- **Neural Networks**: Adding biases to all samples in a batch
- **Data Preprocessing**: Normalizing features across entire datasets
- **Model Operations**: Applying transformations efficiently
- **Memory Efficiency**: Avoiding unnecessary array copies

Understanding broadcasting is essential for debugging shape errors in ML pipelines!

In [None]:
# Element-wise operations (fundamental to neural networks)
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[2, 2, 2], [3, 3, 3]])

print("Array a:")
print(a)
print("\nArray b:")
print(b)

print("\nElement-wise operations:")
print("Addition (a + b):")
print(a + b)

print("\nMultiplication (a * b):")
print(a * b)

print("\nSquare (a**2):")
print(a**2)

# Activation functions
print("\nCommon activation functions:")
x = np.array([-2, -1, 0, 1, 2])
print(f"Input: {x}")
print(f"ReLU (max(0, x)): {np.maximum(0, x)}")
print(f"Sigmoid: {1 / (1 + np.exp(-x))}")
print(f"Tanh: {np.tanh(x)}")

In [None]:
# Broadcasting examples (very important for ML)
print("Broadcasting Examples:")

# Example 1: Adding bias to all samples
features = np.random.randn(100, 5)  # 100 samples, 5 features
bias = np.array([0.1, -0.2, 0.3, -0.1, 0.2])  # bias for each feature

print(f"Features shape: {features.shape}")
print(f"Bias shape: {bias.shape}")

# Broadcasting adds bias to each sample
features_with_bias = features + bias
print(f"Result shape: {features_with_bias.shape}")
print(f"First sample before: {features[0]}")
print(f"First sample after: {features_with_bias[0]}")

print("\n" + "="*50)

# Example 2: Normalizing features (mean centering)
data = np.random.randn(1000, 3) * 10 + 5  # Add some offset and scale
print(f"\nOriginal data shape: {data.shape}")
print(f"Original means: {np.mean(data, axis=0)}")
print(f"Original stds: {np.std(data, axis=0)}")

# Normalize (broadcasting)
mean = np.mean(data, axis=0)  # Shape: (3,)
std = np.std(data, axis=0)    # Shape: (3,)
normalized_data = (data - mean) / std  # Broadcasting!

print(f"\nNormalized means: {np.mean(normalized_data, axis=0)}")
print(f"Normalized stds: {np.std(normalized_data, axis=0)}")

In [None]:
# Broadcasting rules visualization
print("Broadcasting Rules Examples:")

# Rule: Arrays are aligned from the rightmost dimension
examples = [
    ((3, 4), (4,)),      # (3,4) + (4,) -> (3,4)
    ((2, 3, 4), (4,)),   # (2,3,4) + (4,) -> (2,3,4)
    ((2, 3, 4), (3, 4)), # (2,3,4) + (3,4) -> (2,3,4)
    ((2, 1, 4), (3, 4)), # (2,1,4) + (3,4) -> (2,3,4)
]

for shape1, shape2 in examples:
    a = np.ones(shape1)
    b = np.ones(shape2)
    result = a + b
    print(f"\n{shape1} + {shape2} -> {result.shape}")
    print("\nArray A:")
    print(a)
    print("\nArray B:")
    print(b)
    print("\nResult:")
    print(result)
    print("-"*50)

## 4. Linear Algebra Operations

Essential for understanding neural network computations, matrix multiplications, and transformations.

In [None]:
# Matrix multiplication (core of neural networks)
print("NEURAL NETWORK FORWARD PASS SIMULATION")
print("="*50)

# Simulate a realistic neural network layer
# This is exactly how frameworks like TensorFlow/PyTorch work under the hood!
batch_size, input_features, output_features = 32, 10, 5

# Create realistic training data
np.random.seed(42)  # For reproducible results
X = np.random.randn(batch_size, input_features)  # Input batch (32 samples, 10 features each)
W = np.random.randn(input_features, output_features) * 0.1  # Weights (small random values)
b = np.zeros(output_features)  # Bias (often initialized to zero)

print(f"Input batch X shape: {X.shape} (batch_size, input_features)")
print(f"Weight matrix W shape: {W.shape} (input_features, output_features)")
print(f"Bias vector b shape: {b.shape} (output_features,)")

print("\nStep-by-step forward pass:")
print("1. Matrix multiplication: X @ W")
linear_output = X @ W  # Modern syntax, same as np.dot(X, W)
print(f"   Result shape: {linear_output.shape}")

print("2. Add bias (broadcasting): (X @ W) + b")
Y = linear_output + b  # Broadcasting adds bias to each sample
print(f"   Final output shape: {Y.shape}")

print("\nSample data (first sample):")
print(f"Input features: {X[0][:5]}... (showing first 5 of 10)")
print(f"Raw output: {Y[0]}")

# Apply activation function (ReLU)
Y_activated = np.maximum(0, Y)
print(f"After ReLU activation: {Y_activated[0]}")

print("\nThis is the fundamental operation in every neural network layer!")
print("• Input shape must match weight matrix first dimension")
print("• Output shape is (batch_size, output_features)")
print("• Broadcasting handles bias addition automatically")
print("• Activation functions are applied element-wise")

print("\n" + "🎯" + " TRY IT YOURSELF - Neural Network Layer")
print("="*50)
print("Challenge: Create a neural network layer with different dimensions")
print("\nYour task:")
print("1. Create input data: 16 samples, 20 features")
print("2. Create weights: 20 inputs -> 8 outputs")
print("3. Create bias: 8 values")
print("4. Perform forward pass: X @ W + b")
print("5. Apply ReLU activation")
print("\nExpected output shape: (16, 8)")
print("\nHint: Use np.random.randn() for weights, np.zeros() for bias")
print("Bonus: Try different activation functions (sigmoid, tanh)")
print("\n# Uncomment and complete the code below:")
print("# X_exercise = np.random.randn(?, ?)")
print("# W_exercise = np.random.randn(?, ?) * 0.1")
print("# b_exercise = np.zeros(?)")
print("# output = ?")
print("# activated = ?")
print("# print(f'Output shape: {activated.shape}')")

In [None]:
# Different matrix operations
A = np.random.randn(3, 4)
B = np.random.randn(4, 2)

print("Matrix Operations:")
print(f"A shape: {A.shape}")
print(f"B shape: {B.shape}")

# Matrix multiplication
C = A @ B  # Same as np.dot(A, B)
print(f"A @ B shape: {C.shape}")

# Transpose (very common in ML)
A_T = A.T
print(f"A transpose shape: {A_T.shape}")

# Element-wise vs matrix multiplication
square_matrix = np.random.randn(3, 3)
print(f"\nSquare matrix shape: {square_matrix.shape}")
print(f"Element-wise square: {(square_matrix * square_matrix).shape}")
print(f"Matrix multiplication: {(square_matrix @ square_matrix).shape}")

In [None]:
# Advanced linear algebra (useful for understanding ML algorithms)
print("Advanced Linear Algebra:")

# Create a symmetric matrix (common in optimization)
A = np.random.randn(4, 4)
symmetric_A = A + A.T

print(f"Matrix A shape: {symmetric_A.shape}")

# Eigenvalues and eigenvectors (PCA, optimization)
eigenvalues, eigenvectors = np.linalg.eig(symmetric_A)
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors shape: {eigenvectors.shape}")

# Matrix norms (regularization)
print("\nMatrix norms:")
print(f"Frobenius norm: {np.linalg.norm(A, 'fro'):.4f}")
print(f"L2 norm: {np.linalg.norm(A, 2):.4f}")

# Determinant and inverse
det_A = np.linalg.det(symmetric_A)
print(f"\nDeterminant: {det_A:.4f}")

if abs(det_A) > 1e-10:  # Check if invertible
    inv_A = np.linalg.inv(symmetric_A)
    print(f"Inverse exists, shape: {inv_A.shape}")
    # Verify: A @ A^(-1) should be identity
    identity_check = symmetric_A @ inv_A
    print(f"A @ A^(-1) close to identity: {np.allclose(identity_check, np.eye(4))}")
else:
    print("Matrix is singular (not invertible)")

In [None]:
# COMPLETE MULTI-LAYER NEURAL NETWORK EXAMPLE
print("BUILDING A COMPLETE NEURAL NETWORK WITH NUMPY")
print("="*60)
print("This shows how NumPy powers all deep learning frameworks!")
print("="*60)

# Network architecture: 784 -> 128 -> 64 -> 10 (like MNIST digit classification)
np.random.seed(42)

# Input: flattened 28x28 images
batch_size = 32
X = np.random.randn(batch_size, 784)  # 32 images, 784 pixels each

# Layer 1: 784 -> 128
W1 = np.random.randn(784, 128) * 0.01  # Small weights for stable training
b1 = np.zeros(128)

# Layer 2: 128 -> 64  
W2 = np.random.randn(128, 64) * 0.01
b2 = np.zeros(64)

# Output layer: 64 -> 10
W3 = np.random.randn(64, 10) * 0.01
b3 = np.zeros(10)

print(f"Input shape: {X.shape}")
print(f"Layer 1 weights: {W1.shape}, bias: {b1.shape}")
print(f"Layer 2 weights: {W2.shape}, bias: {b2.shape}")
print(f"Output weights: {W3.shape}, bias: {b3.shape}")

print("\nForward pass through the network:")

# Layer 1: Linear + ReLU
z1 = X @ W1 + b1  # Linear transformation
a1 = np.maximum(0, z1)  # ReLU activation
print(f"After layer 1: {a1.shape} (applied ReLU)")

# Layer 2: Linear + ReLU
z2 = a1 @ W2 + b2
a2 = np.maximum(0, z2)
print(f"After layer 2: {a2.shape} (applied ReLU)")

# Output layer: Linear only (no activation yet)
z3 = a2 @ W3 + b3
print(f"Raw output (logits): {z3.shape}")

# Apply softmax for classification probabilities
# Numerically stable softmax
exp_logits = np.exp(z3 - np.max(z3, axis=1, keepdims=True))
probabilities = exp_logits / np.sum(exp_logits, axis=1, keepdims=True)
print(f"Final probabilities: {probabilities.shape}")

# Show results for first sample
print(f"\nFirst sample results:")
print(f"Raw logits: {z3[0]}")
print(f"Probabilities: {probabilities[0]}")
print(f"Predicted class: {np.argmax(probabilities[0])}")
print(f"Confidence: {np.max(probabilities[0]):.3f}")

print("\nKey insights:")
print("• Each layer transforms data: (batch_size, input_dim) -> (batch_size, output_dim)")
print("• Matrix multiplication handles all samples simultaneously (vectorization!)")
print("• Broadcasting adds biases to entire batches automatically")
print("• Activation functions are applied element-wise")
print("• This is exactly how TensorFlow, PyTorch, etc. work under the hood")
print("\nYou'll implement this from scratch in the scikit-learn notebook!")

print("\n" + "="*60)
print("🚨 COMMON BEGINNER ERRORS AND SOLUTIONS")
print("="*60)
print("\n1. SHAPE MISMATCH ERROR:")
print("   Error: 'ValueError: operands could not be broadcast together'")
print("   Solution: Check array shapes with .shape before operations")
print("   Example: Use keepdims=True or reshape arrays to match")

print("\n2. MATRIX MULTIPLICATION ERROR:")
print("   Error: 'ValueError: matmul: Input operand does not have enough dimensions'")
print("   Solution: Remember matrix multiplication rules: (m,n) @ (n,p) = (m,p)")
print("   Tip: Use @ operator for matrix multiplication, * for element-wise")

print("\n3. INDEXING ERROR:")
print("   Error: 'IndexError: index out of bounds'")
print("   Solution: Check array dimensions with .shape first")
print("   Tip: Remember Python uses 0-based indexing")

print("\n4. DTYPE CONFUSION:")
print("   Problem: Unexpected results from integer division")
print("   Solution: Be explicit about dtypes, use .astype() when needed")
print("   Tip: float32 vs float64 affects memory and GPU compatibility")

print("\n💡 DEBUGGING TIPS:")
print("• Always print array shapes when debugging: print(f'Shape: {arr.shape}')")
print("• Use small test arrays to understand operations before scaling up")
print("• Read error messages carefully - they usually tell you exactly what's wrong")
print("• When in doubt, check the NumPy documentation with help(np.function_name)")

## 5. Statistical Operations and Aggregations

Critical for data analysis, loss computation, and model evaluation.

In [None]:
# Statistical operations along different axes
# Simulate prediction scores for classification
# Shape: (batch_size, num_classes)
predictions = np.random.randn(100, 5)  # 100 samples, 5 classes

print(f"Predictions shape: {predictions.shape}")
print("\nStatistical Operations:")

# Overall statistics
print(f"Overall mean: {np.mean(predictions):.4f}")
print(f"Overall std: {np.std(predictions):.4f}")
print(f"Min value: {np.min(predictions):.4f}")
print(f"Max value: {np.max(predictions):.4f}")

# Statistics along axes - CRITICAL for ML!
print("\nPARAMETER DEEP DIVE: axis parameter")
print("Shape:", predictions.shape, "(100 samples, 5 classes)")
print("\n• axis=0: Operate along ROWS (collapse rows, keep columns)")
print("  Result: One value per CLASS (across all samples)")
mean_per_class = np.mean(predictions, axis=0)
print(f"  Mean per class: {mean_per_class}")
print(f"  Shape: {mean_per_class.shape} (5 classes)")

print("\n• axis=1: Operate along COLUMNS (collapse columns, keep rows)")
print("  Result: One value per SAMPLE (across all classes)")
mean_per_sample = np.mean(predictions, axis=1)
print(f"  Mean per sample shape: {mean_per_sample.shape} (100 samples)")
print(f"  First 5 sample means: {mean_per_sample[:5]}")

# PARAMETER DEEP DIVE: keepdims
print("\nPARAMETER DEEP DIVE: keepdims parameter")
# PARAMETER EXPLANATION: keepdims=True/False
# • What it does: Controls whether to keep the original number of dimensions after reduction
# • keepdims=True: Keeps the reduced axis as size 1 (e.g., (100,5) -> (100,1))
# • keepdims=False: Removes the reduced axis completely (e.g., (100,5) -> (100,))
# • Why it matters: keepdims=True preserves shape for broadcasting operations
# • ML importance: Critical for neural networks where shape consistency is essential
# • Common mistake: Forgetting keepdims=True leads to broadcasting errors
max_keepdims_true = np.max(predictions, axis=1, keepdims=True)
max_keepdims_false = np.max(predictions, axis=1, keepdims=False)

print(f"• keepdims=True:  {max_keepdims_true.shape} (keeps original dimensions)")
print(f"• keepdims=False: {max_keepdims_false.shape} (removes collapsed dimension)")
print("\nWhy keepdims=True matters:")
print("• Preserves shape for broadcasting operations")
print("• Essential for neural network computations")
print("• Prevents shape mismatch errors in ML pipelines")

# Show practical difference
print("\nPractical example - subtracting max for numerical stability:")
try:
    # This works because of keepdims=True (broadcasting compatible)
    stable_predictions = predictions - max_keepdims_true
    print(f"✓ With keepdims=True: Broadcasting works! Shape: {stable_predictions.shape}")
except:
    print("✗ Broadcasting failed")

try:
    # This might fail without keepdims (shape mismatch)
    unstable = predictions - max_keepdims_false.reshape(-1, 1)  # Need manual reshape
    print(f"✓ With keepdims=False: Need manual reshape for broadcasting")
except:
    print("✗ Would fail without manual reshape")

In [None]:
# Practical ML examples
print("Practical ML Statistical Operations:")

# 1. Softmax implementation
def softmax(x):
    """Numerically stable softmax"""
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

logits = np.random.randn(5, 3)  # 5 samples, 3 classes
probabilities = softmax(logits)

print(f"Logits shape: {logits.shape}")
print(f"Probabilities shape: {probabilities.shape}")
print(f"Probabilities sum per sample: {np.sum(probabilities, axis=1)}")
print(f"First sample probabilities: {probabilities[0]}")

# 2. Accuracy calculation
true_labels = np.array([0, 1, 2, 1, 0])
predicted_labels = np.argmax(probabilities, axis=1)

accuracy = np.mean(true_labels == predicted_labels)
print(f"\nTrue labels: {true_labels}")
print(f"Predicted labels: {predicted_labels}")
print(f"Accuracy: {accuracy:.2f}")

In [None]:
# Loss function implementations
print("Common Loss Functions:")

# Mean Squared Error (regression)
y_true = np.array([1.5, 2.3, 3.1, 4.2, 5.0])
y_pred = np.array([1.2, 2.1, 3.4, 4.0, 5.2])

mse = np.mean((y_true - y_pred)**2)
rmse = np.sqrt(mse)
mae = np.mean(np.abs(y_true - y_pred))

print(f"True values: {y_true}")
print(f"Predicted values: {y_pred}")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"MAE: {mae:.4f}")

# Cross-entropy loss (classification)
def cross_entropy_loss(y_true_labels, y_pred_probs):
    """Cross-entropy loss for classification"""
    # Convert labels to one-hot if needed
    n_classes = y_pred_probs.shape[1]
    y_true_onehot = np.eye(n_classes)[y_true_labels]

    # Clip predictions to avoid log(0)
    y_pred_clipped = np.clip(y_pred_probs, 1e-15, 1 - 1e-15)

    # Calculate cross-entropy per sample and average
    per_sample = np.sum(y_true_onehot * np.log(y_pred_clipped), axis=1)
    loss = np.mean(np.negative(per_sample))
    return loss

ce_loss = cross_entropy_loss(true_labels, probabilities)
print(f"\nCross-entropy loss: {ce_loss:.4f}")

## 6. Array Reshaping and Manipulation

Essential for preparing data for neural networks and handling different tensor shapes.

In [None]:
# Reshaping operations (very common in deep learning)
print("Array Reshaping:")

# Original data: flattened images
flattened_images = np.random.randint(0, 256, size=(100, 784))  # 100 images, 28x28 pixels
print(f"Flattened images shape: {flattened_images.shape}")

# Reshape to image format
images = flattened_images.reshape(100, 28, 28)
print(f"Reshaped to images: {images.shape}")

# Add channel dimension (for CNN)
images_with_channel = images.reshape(100, 28, 28, 1)
print(f"With channel dimension: {images_with_channel.shape}")

# Or using -1 for automatic calculation
auto_reshape = flattened_images.reshape(100, 28, 28, -1)
print(f"Auto reshape (-1): {auto_reshape.shape}")

# Flatten back
flattened_again = images_with_channel.reshape(100, -1)
print(f"Flattened again: {flattened_again.shape}")

In [None]:
# Axis manipulation
print("Axis Manipulation:")

# Sample data: batch of sequences
sequences = np.random.randn(32, 50, 128)  # batch_size, seq_len, features
print(f"Original shape: {sequences.shape}")

# Transpose (swap axes)
transposed = sequences.transpose(1, 0, 2)  # seq_len, batch_size, features
print(f"Transposed: {transposed.shape}")

# Add new axis
with_new_axis = sequences[:, :, :, np.newaxis]
print(f"With new axis: {with_new_axis.shape}")

# Squeeze (remove dimensions of size 1)
squeezed = np.squeeze(with_new_axis)
print(f"Squeezed: {squeezed.shape}")

# Expand dimensions
expanded = np.expand_dims(sequences, axis=0)
print(f"Expanded (axis=0): {expanded.shape}")

expanded_last = np.expand_dims(sequences, axis=-1)
print(f"Expanded (axis=-1): {expanded_last.shape}")

In [None]:
# Concatenation and stacking (combining data)
print("Concatenation and Stacking:")

# Create sample batches
batch1 = np.random.randn(16, 10)  # 16 samples, 10 features
batch2 = np.random.randn(16, 10)  # 16 samples, 10 features
batch3 = np.random.randn(16, 10)  # 16 samples, 10 features

print(f"Batch 1 shape: {batch1.shape}")
print(f"Batch 2 shape: {batch2.shape}")
print(f"Batch 3 shape: {batch3.shape}")

# Concatenate along batch dimension
combined_batches = np.concatenate([batch1, batch2, batch3], axis=0)
print(f"Combined batches: {combined_batches.shape}")

# Stack (creates new dimension)
stacked_batches = np.stack([batch1, batch2, batch3], axis=0)
print(f"Stacked batches: {stacked_batches.shape}")

# Horizontal stack (features)
features1 = np.random.randn(100, 5)
features2 = np.random.randn(100, 3)
combined_features = np.hstack([features1, features2])
print(f"\nFeatures 1: {features1.shape}")
print(f"Features 2: {features2.shape}")
print(f"Combined features: {combined_features.shape}")

## 7. Performance and Memory Considerations

Understanding NumPy performance is crucial for efficient ML workflows. In ML, you often work with:
- **Large datasets**: Millions of samples with hundreds of features
- **High-dimensional arrays**: Images, sequences, embeddings
- **Repeated operations**: Training loops, batch processing
- **Memory constraints**: GPU memory, RAM limitations

**Performance Rules for ML:**
1. **Always vectorize**: Avoid Python loops at all costs
2. **Use appropriate dtypes**: float32 vs float64 can halve memory usage
3. **Understand views vs copies**: Avoid unnecessary memory allocation
4. **Batch operations**: Process multiple samples simultaneously

In [None]:
import time

# ML Performance Example: Feature Scaling Comparison
print("ML PERFORMANCE EXAMPLE: Feature Scaling")
print("="*50)
print("Comparing Python loops vs NumPy vectorization for ML preprocessing")

# Simulate a realistic ML dataset
n_samples, n_features = 100000, 50  # 100k samples, 50 features
X = np.random.randn(n_samples, n_features) * 100 + 50  # Random dataset

print(f"Dataset shape: {X.shape}")
print(f"Memory usage: {X.nbytes / 1024 / 1024:.1f} MB")

# Method 1: Python loops (NEVER do this in ML!)
print("\nMethod 1: Python loops (testing on subset for speed)")
start_time = time.time()
X_scaled_loop = np.zeros((1000, n_features))  # Only test 1000 samples
for i in range(1000):
    for j in range(n_features):
        X_scaled_loop[i, j] = (X[i, j] - np.mean(X[:, j])) / np.std(X[:, j])
loop_time = time.time() - start_time

# Method 2: NumPy vectorization (the right way!)
print("Method 2: NumPy vectorization (entire dataset)")
start_time = time.time()
X_scaled_vectorized = (X - np.mean(X, axis=0)) / np.std(X, axis=0)
vectorized_time = time.time() - start_time

print(f"\nResults:")
print(f"Loop time (1k samples): {loop_time:.4f} seconds")
print(f"Vectorized time (100k samples): {vectorized_time:.4f} seconds")
print(f"Estimated speedup: {(loop_time * 100):.1f}x faster with vectorization!")

# Data type impact on memory
print("\nDATA TYPE IMPACT ON MEMORY:")
X_float64 = X.astype(np.float64)
X_float32 = X.astype(np.float32)
X_float16 = X.astype(np.float16)

print(f"float64 memory: {X_float64.nbytes / 1024 / 1024:.1f} MB")
print(f"float32 memory: {X_float32.nbytes / 1024 / 1024:.1f} MB (50% less!)")
print(f"float16 memory: {X_float16.nbytes / 1024 / 1024:.1f} MB (75% less!)")
print("\nFor most ML tasks, float32 is sufficient and saves memory!")

In [None]:
# Memory layout and views vs copies
print("Memory Layout and Views:")

# Original array
original = np.random.randn(1000, 1000)
print(f"Original array memory: {original.nbytes / 1024 / 1024:.2f} MB")

# View (shares memory)
view = original[::2, ::2]  # Every other element
print(f"View shares memory: {view.base is original}")
print(f"View shape: {view.shape}")

# Copy (new memory)
copy = original.copy()
print(f"Copy shares memory: {copy.base is original}")
print(f"Copy memory: {copy.nbytes / 1024 / 1024:.2f} MB")

# Demonstrate view behavior
original[0, 0] = 999
print("\nAfter modifying original[0,0] = 999:")
print(f"View[0,0] = {view[0, 0]} (should be 999 if it's a view)")
print(f"Copy[0,0] = {copy[0, 0]} (should be original value)")

## 8. Working with other ML libraries (brief, framework-neutral)

NumPy arrays are the common numerical representation used across many machine learning tools. The key things to check when preparing NumPy arrays for use with other tools are:

- dtype: many tools prefer float32 for inputs and integer labels for classification targets
- shape: confirm batch and feature dimensions (e.g., (batch_size, num_features))
- no NaNs or infinite values

Example checks and simple conversion:

In [None]:
print("Using NumPy arrays with other tools (framework-neutral):")

# Create sample data in NumPy
numpy_data = np.random.randn(32, 10)
numpy_labels = np.random.randint(0, 3, size=(32,))

print(f"NumPy data shape: {numpy_data.shape}")
print(f"NumPy data dtype before: {numpy_data.dtype}")

# Convert to a common dtype for ML (float32 inputs, int64 labels)
numpy_data = numpy_data.astype(np.float32)
numpy_labels = numpy_labels.astype(np.int64)

print(f"NumPy data dtype after: {numpy_data.dtype}")
print(f"NumPy labels dtype after: {numpy_labels.dtype}")

print("\nChecklist before handing arrays to another tool:")
print(" - Are shapes as expected? (batch, features)")
print(" - Are dtypes appropriate? (e.g. float32 for inputs)")
print(" - Are there NaNs or infinite values?")

## Summary and Key Takeaways

**What we've learned:**

1. **Array Creation & Properties**: Understanding shapes, dtypes, and memory usage
2. **Indexing & Slicing**: Essential for data manipulation and batch processing
3. **Broadcasting**: Enables efficient operations without explicit loops
4. **Linear Algebra**: Matrix operations that form the core of many ML models
5. **Statistical Operations**: Computing metrics, losses, and aggregations
6. **Reshaping**: Preparing data for different network architectures
7. **Performance**: Vectorization and memory considerations
8. **Tool Bridge**: How NumPy arrays are the common numerical format used by many ML tools

**Key Patterns for ML:**
- Use vectorized operations instead of loops
- Understand broadcasting for efficient computations
- Master axis-based operations for batch processing
- Know when operations create views vs copies
- Prepare data in NumPy before converting to other tools

**Next Steps:**
- Learn Pandas for structured data manipulation
- Understand how these concepts translate to other numerical libraries
- See how higher-level ML libraries mirror NumPy patterns

**Next Steps in Your ML Journey:**

Now that you understand NumPy fundamentals, you're ready to move forward:

1. **Pandas Notebook**: Learn how to work with structured data using DataFrames. You'll see how Pandas builds on NumPy arrays to handle real-world datasets.

2. **Scikit-learn Notebook**: Apply these NumPy concepts to build actual ML models. You'll see how array operations and linear algebra power ML algorithms.

**Key Connections:** Every Pandas DataFrame uses NumPy arrays, and scikit-learn expects NumPy arrays as input. The patterns you learned here appear everywhere in ML!

NumPy is the foundation that makes the entire Python ML ecosystem possible.
