# üìå What is Feature Scaling?

**Feature Scaling** is a data preprocessing technique used to bring all input features (variables) to a similar range or scale.

It is mainly used in **Machine Learning** when features have different units or magnitudes.

---

## üéØ Why Feature Scaling is Needed?

Imagine this dataset:

| Age (years) | Salary (‚Çπ) |
|-------------|------------|
| 25          | 250000     |
| 30          | 500000     |
| 35          | 750000     |

- **Age** range ‚Üí 25 to 35  
- **Salary** range ‚Üí 2,50,000 to 7,50,000  

Clearly, salary values are much larger than age.

If we apply algorithms like:

- KNN
- K-Means
- SVM
- Linear Regression (Gradient Descent)
- Neural Networks

The model may give **more importance to Salary** because its magnitude is larger.

üëâ Feature scaling ensures **equal importance** to all features.

---

# üî• Types of Feature Scaling

There are mainly **2 important methods**:

---

# 1Ô∏è‚É£ Min-Max Scaling (Normalization)

### üìå Formula:

X_scaled = (X - X_min) / (X_max - X_min)

It scales data between **0 and 1**.

---

### ‚úÖ Example

Suppose:

Age values = 25, 30, 35  

Here:
- Min = 25  
- Max = 35  

For Age = 30:

(30 - 25) / (35 - 25) = 5 / 10 = 0.5

Scaled values:

| Original Age | Scaled Age |
|--------------|------------|
| 25           | 0.0        |
| 30           | 0.5        |
| 35           | 1.0        |

---

### üìå When to Use?

- When data does not contain many outliers
- When we need bounded values (like neural networks)

---

# 2Ô∏è‚É£ Standardization (Z-Score Scaling)

Also called **Standard Scaling**.

### üìå Formula:

X_scaled = (X - Œº) / œÉ

Where:
- Œº = mean
- œÉ = standard deviation

It transforms data so that:
- Mean = 0
- Standard deviation = 1

---

### ‚úÖ Example

Suppose Age values:

25, 30, 35  

Mean = 30  
Std deviation ‚âà 4.08  

For Age = 35:

(35 - 30) / 4.08 ‚âà 1.22

So scaled value ‚âà 1.22

---

### üìå When to Use?

- When data has outliers
- When algorithm assumes normal distribution
- Best for:
  - Linear Regression
  - Logistic Regression
  - SVM
  - PCA

---

# ‚ö†Ô∏è When Feature Scaling is NOT Required

You **don‚Äôt need scaling** in:

- Decision Tree
- Random Forest
- XGBoost
- LightGBM

Because tree-based models split based on feature importance, not distance.

---

# üß† Why Scaling Helps (Mathematical Intuition)

Suppose we calculate Euclidean distance in KNN:

Distance = ‚àö((x‚ÇÅ - y‚ÇÅ)¬≤ + (x‚ÇÇ - y‚ÇÇ)¬≤)

If one feature has very large values, it will dominate the distance calculation.

Scaling prevents this dominance.

---

# üíª Python Example

```python
from sklearn.preprocessing import MinMaxScaler, StandardScaler
import pandas as pd

data = pd.DataFrame({
    'Age': [25, 30, 35],
    'Salary': [250000, 500000, 750000]
})

# Min-Max Scaling
minmax = MinMaxScaler()
scaled_minmax = minmax.fit_transform(data)

# Standard Scaling
standard = StandardScaler()
scaled_standard = standard.fit_transform(data)

print("Min-Max Scaled:\n", scaled_minmax)
print("\nStandard Scaled:\n", scaled_standard)
```

---

# üîé Comparison

| Method | Range | Affected by Outliers? | Best For |
|--------|-------|----------------------|----------|
| Min-Max | 0 to 1 | Yes | Neural Networks |
| Standardization | Mean=0, Std=1 | Less | Linear Models, SVM |

---

# üöÄ Final Understanding

Feature Scaling:

‚úî Makes training faster  
‚úî Prevents bias toward large features  
‚úî Improves convergence of gradient descent  
‚úî Essential for distance-based models  


In [15]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 

In [16]:
df =  pd.read_csv('Social_Network_Ads.csv')
df.head()

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


In [17]:
# df.iloc[row_selection, column_selection]


df = df.iloc[:,[2,3,4]]
df.head()

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0
