# 📓 NumPy Array Operations: Broadcasting, Aggregation & Advanced Indexing  

In this notebook, we explore core NumPy array operations that are essential for data science and machine learning:
- **Broadcasting** → handling operations on different shapes
- **Aggregation** → summarizing data (sum, mean, std, etc.)
- **Advanced Indexing** → powerful ways to extract subsets
- **ML Context Example** → feature normalization


## 🏷️ Learning Goals  
By the end of this notebook, you will:  
- ✅ Understand broadcasting rules  
- ✅ Perform aggregation functions (sum, mean, std, etc.)  
- ✅ Use advanced indexing (boolean masking, fancy indexing)  
- ✅ Apply these operations in ML feature preprocessing  


## 🔹 1. Broadcasting  
Broadcasting allows NumPy to perform arithmetic operations on arrays of **different shapes** without creating unnecessary copies of data.  
This is heavily used in ML for tasks like normalization, adding bias terms, or applying weights.


In [2]:
import numpy as np

# Example 1: Scalar + Array
a = np.array([1,2,3])
b = 5
print("a + b:", a + b)  # Broadcasting scalar

# Example 2: 2D and 1D array
A = np.array([[1,2,3],
            [4, 5, 6]])

B = np.array([10, 20, 30])
print("A + B:\n", A + B) # Broadcasting row vector

# Example 3: Column broadcasting
C = np.array([[1],
              [2],
              [3]])
D = np.array([10, 20, 30])
print("C + D:\n", C + D)

a + b: [6 7 8]
A + B:
 [[11 22 33]
 [14 25 36]]
C + D:
 [[11 21 31]
 [12 22 32]
 [13 23 33]]


## 🔹 2. Aggregation Functions  
Aggregations help summarize data across rows, columns, or the entire array.  
Examples include sum, mean, min, max, and standard deviation.  
These are critical in ML for loss calculation, feature scaling, and pooling layers.


In [3]:
X = np.random.randint(1, 100, size=(5,4))
print("X:\n", X)

print("Sum of all elements:", X.sum())
print("Column wise sum: ", X.sum(axis=0))
print("Row-wise mean: ", X.mean(axis=1))
print("Standard deviation:", X.std())

X:
 [[43 90 90 50]
 [74 80  2 31]
 [ 1  5 77 37]
 [84 35  9 58]
 [45 97 86 48]]
Sum of all elements: 1042
Column wise sum:  [247 307 264 224]
Row-wise mean:  [68.25 46.75 30.   46.5  69.  ]
Standard deviation: 31.0530191768852


## 🔹 3. Advanced Indexing  
NumPy provides advanced ways of selecting subsets of data.  
This includes:  
- **Boolean masking** → filtering based on conditions  
- **Fancy indexing** → selecting by lists/arrays of indices  
- **Combined conditions** → filtering with multiple rules  


In [None]:
# Boolean masking

arr = np.arange(10)
mask = arr % 2 == 0
print("Original:", arr)
print("Even numbers:", arr[mask])

Original: [0 1 2 3 4 5 6 7 8 9]
Even numbers: [0 2 4 6 8]


In [None]:
# Fancy indexing

arr = np.array([10, 20, 30, 40, 50])
indices = [0, 2, 4]

print("Selected elements:", arr[indices])

Selected elements: [10 30 50]


In [7]:
# Combining Conditions

data = np.random.randn(10)
print("Data", data)
print("Values > 0", data[data > 0])
print("Values between -1 and 1:", data[(data > -1) & (data < 1)])

Data [ 1.97577635 -0.56067754 -1.83743454 -0.45436996 -1.28178779 -0.52369837
  0.88741952 -0.86428356 -0.66198782 -0.17450041]
Values > 0 [1.97577635 0.88741952]
Values between -1 and 1: [-0.56067754 -0.45436996 -0.52369837  0.88741952 -0.86428356 -0.66198782
 -0.17450041]


## 🎯 4. Practical Example — Feature Normalization  
Machine Learning algorithms often require features to be on a similar scale.  
Here we normalize features by subtracting the mean and dividing by the standard deviation (z-score normalization).

In [8]:
# Dataset with 5 samples % 3 features

features = np.array([[50, 200, 3000],
                     [60, 210, 3200],
                     [55, 190, 3100],
                     [65, 205, 3300],
                     [70, 215, 3400]])

print("Original features: \n", features)

# mean normalization

mean = features.mean(axis=0)
std = features.std(axis = 0)
normalized = (features - mean) / std

print("\nMean:\n", mean)
print("\nStandard deviation:\n", std)
print("\nNormalized features: \n", normalized)

Original features: 
 [[  50  200 3000]
 [  60  210 3200]
 [  55  190 3100]
 [  65  205 3300]
 [  70  215 3400]]

Mean:
 [  60.  204. 3200.]

Standard deviation:
 [  7.07106781   8.60232527 141.42135624]

Normalized features: 
 [[-1.41421356 -0.46499055 -1.41421356]
 [ 0.          0.69748583  0.        ]
 [-0.70710678 -1.62746694 -0.70710678]
 [ 0.70710678  0.11624764  0.70710678]
 [ 1.41421356  1.27872403  1.41421356]]


## ✅ Summary  

In this notebook, we covered:  
- 🔹 Broadcasting → operations across mismatched shapes  
- 🔹 Aggregations → statistics like sum, mean, std  
- 🔹 Advanced indexing → boolean masking & fancy indexing  
- 🔹 ML Example → feature normalization  