# Feature Scaling: Normalization and Standardization

**Feature scaling** is a data preprocessing technique used to **resize the numerical values of features** to a similar range or to give them common statistical properties. This is done before training a machine learning model to ensure that all features contribute equally to the result.

### âœ… Why is Feature Scaling Important?

1.  **Distance-Based Algorithms**: Many algorithms like k-NN, k-Means, and SVMs rely on distance calculations (e.g., Euclidean distance). If one feature has a much larger range than others, it will dominate the distance calculation, making the model biased towards that feature.
2.  **Gradient Descent Convergence**: Algorithms that use gradient descent (like neural networks and logistic regression) converge much faster when features are on a similar scale. This is because the optimization landscape becomes more uniform, preventing skewed and slow gradient updates.
3.  **Regularization**: Techniques like L1 and L2 regularization apply penalties to model weights. If features have different scales, their corresponding weights will be updated unevenly, making the regularization less effective.

### Original Data

Let's start with a simple dataset containing 'Age' and 'Salary'. As you can see, the scale of 'Salary' is much larger than 'Age'.

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Create a sample DataFrame
data = pd.DataFrame({
    'Age': [30, 40, 50],
    'Salary': [20000, 80000, 50000]
})

print("Original Data:")
print(data)

Original Data:
   Age  Salary
0   30   20000
1   40   80000
2   50   50000


---

## 1. Min-Max Scaling (Normalization)

This technique rescales the data to a fixed range, usually **[0, 1]**. It's called "normalization" because it normalizes the data within a bounded interval.

The formula is:
$$ X_{\text{scaled}} = \frac{X - X_{\min}}{X_{\max} - X_{\min}} $$

In [2]:
# Initialize the Min-Max Scaler
mm_scaler = MinMaxScaler()

# Fit and transform the data
min_max_scaled_data = mm_scaler.fit_transform(data)

print("Data after Min-Max Scaling:")
print(pd.DataFrame(min_max_scaled_data, columns=data.columns))

Data after Min-Max Scaling:
   Age  Salary
0  0.0     0.0
1  0.5     1.0
2  1.0     0.5


**When to use it?** Good for algorithms that don't assume a specific data distribution (like k-NN or neural networks). It's especially useful when you need your feature values to be bounded within a specific range.

---

## 2. Standardization (Z-score Scaling)

This technique rescales the data so that it has a **mean ($\mu$) of 0 and a standard deviation ($\sigma$) of 1**. The resulting distribution is known as a standard normal distribution.

The formula is:
$$ X_{\text{scaled}} = \frac{X - \mu}{\sigma} $$

In [3]:
# Initialize the Standard Scaler
std_scaler = StandardScaler()

# Fit and transform the data
standardized_data = std_scaler.fit_transform(data)

print("Data after Standardization:")
print(pd.DataFrame(standardized_data, columns=data.columns))

Data after Standardization:
        Age    Salary
0 -1.224745 -1.224745
1  0.000000  1.224745
2  1.224745  0.000000


**When to use it?** Works very well for algorithms that assume a Gaussian (normal) distribution of the features, such as Linear Regression, Logistic Regression, and SVMs. It is also less sensitive to outliers compared to Min-Max scaling.

---

## Summary: Which One to Choose?

| Method | Formula | Resulting Range | Key Use Case | Sensitivity to Outliers |
| :--- | :--- | :--- | :--- | :--- |
| **Min-Max Scaling** | `(X - min) / (max - min)` | Typically [0, 1] | Good when you need bounded values, e.g., for image processing or some neural networks. | High |
| **Standardization** | `(X - mean) / std_dev` | Not bounded | Crucial for distance-based algorithms like SVMs (with RBF kernel) | Low |