# Daily Blog #56 - Feature Scaling & Normalization
### June 25, 2025

### Why scaling matters?

Imagine you have a dataset with two features:

* Age in years (0–100)
* Annual income (\$0–1,000,000)

If you use these raw features in a distance-based algorithm (e.g. k-NN), **income will completely dominate** the distance calculation.
That means the model becomes blind to Age, because one feature is on a vastly larger scale.


### Scaling Techniques

Here are the most common methods:

#### **1. Min-Max Scaling (Normalization)**

Brings all features into the range \[0, 1]:

$$
x_{scaled} = \frac{x - \min(x)}{\max(x)-\min(x)}
$$

#### **2. Standardization (Z-score normalization)**

Centers data around mean 0 with unit variance:

$$
z = \frac{x - \mu}{\sigma}
$$

#### **3. RobustScaler (using median & IQR)**

Less affected by outliers — often used in messy real-world data.

### Example in Python

```python
import numpy as np
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Dummy data
X = np.array([[25, 50000], 
              [40, 80000], 
              [60, 120000]])

# Min-Max Scaling
minmax = MinMaxScaler()
X_minmax = minmax.fit_transform(X)

# Z-score normalization
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("Min-Max scaled:\n", X_minmax)
print("Standardized:\n", X_scaled)
```

### When to choose what?
- **Min-Max Scaling** — use when you want features on a fixed range \[0,1] or \[-1,1], especially for neural networks.
- **Standardization** — when data is approximately normally distributed or you want to preserve outliers as extreme z-scores.
- **RobustScaler** — when you have extreme outliers.
