# What is Feature Scaling?
- Feature Scaling means adjusting the values of different features (columns) so that they are on a similar scale — typically in the range [0, 1] or [-1, 1].

- In simple terms: It ensures that all features (inputs) contribute fairly to the model, rather than one feature dominating others just because of its scale.

Example (Before Scaling)
| Feature | Description   | Typical Range |
| ------- | ------------- | ------------- |
| Age     | Person’s age  | 0 – 100       |
| Income  | Yearly income | 0 – 100,000   |


If you feed both features directly into a model:

- The model might give more importance to “Income” just because its numbers are larger (100,000 vs 100).
- Even if “Age” is equally important!
So — we scale features to balance them.

### Why We Use Feature Scaling?
| Reason                               | Explanation                                                                                                                                                                              |
| ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **1️⃣ Gradient Descent Convergence** | Models like Linear Regression, Logistic Regression, Neural Networks use Gradient Descent — if features have very different scales, the algorithm converges **slowly** or **gets stuck**. |
| **2️⃣ Fair Feature Contribution**    | Ensures all features affect the model equally (no bias toward large numbers).                                                                                                            |
| **3️⃣ Distance-Based Models**        | For KNN, SVM, K-Means — distances are sensitive to scale. Scaling prevents “large range” features from dominating.                                                                       |
| **4️⃣ Improves Performance**         | Scaling can speed up training and improve accuracy.                                                                                                                                      |


### How We Do Feature Scaling
There are two main techniques:
1. Normalization (Min Max Scaling)
2. Standardization (Z-score Scaling)

### 1. Normalization (Min-Max Scaling)
Scales values to a fixed range [0, 1].

$$
X_{\text{scaled}} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}
$$

In [2]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

data = np.array([[10], [20], [30], [40], [50]])
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)

print(scaled_data)

[[0.  ]
 [0.25]
 [0.5 ]
 [0.75]
 [1.  ]]


Best when:
- You need bounded data (e.g., neural networks with sigmoid activation).
- You want to preserve relative relationships.

### 2. Standardization (Z-score Scaling)
- Centers data around mean = 0 and standard deviation = 1.

$$
X_{\text{scaled}} = \frac{X - \mu}{\sigma}
$$

In [3]:
from sklearn.preprocessing import StandardScaler
import numpy as np

data = np.array([[10], [20], [30], [40], [50]])
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

print(scaled_data)


[[-1.41421356]
 [-0.70710678]
 [ 0.        ]
 [ 0.70710678]
 [ 1.41421356]]


Best when:
- You use models assuming normal distribution (e.g., Linear Regression, Logistic Regression, PCA, SVM).
- Features have outliers.

### Where We Use Feature Scaling
| Algorithm Type                     | Scaling Required? | Reason                              |
| ---------------------------------- | ----------------- | ----------------------------------- |
| **Linear Regression**              | ✅ Yes             | Gradient descent converges faster   |
| **Logistic Regression**            | ✅ Yes             | Improves convergence                |
| **SVM (Support Vector Machines)**  | ✅ Yes             | Distance-based algorithm            |
| **KNN (K-Nearest Neighbors)**      | ✅ Yes             | Uses Euclidean distance             |
| **K-Means Clustering**             | ✅ Yes             | Uses distance metric                |
| **Neural Networks**                | ✅ Yes             | Improves gradient flow              |
| **Decision Trees / Random Forest** | ❌ No              | Tree splits are not scale-sensitive |


### Example

In [4]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler

# Sample dataset
data = pd.DataFrame({
    'Age': [18, 25, 32, 47, 54],
    'Income': [20000, 35000, 60000, 120000, 200000]
})

print("Before Scaling:\n", data)

# Apply StandardScaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

scaled_df = pd.DataFrame(scaled_data, columns=['Age', 'Income'])
print("\nAfter Scaling:\n", scaled_df)


Before Scaling:
    Age  Income
0   18   20000
1   25   35000
2   32   60000
3   47  120000
4   54  200000

After Scaling:
         Age    Income
0 -1.280023 -1.015152
1 -0.759083 -0.787879
2 -0.238144 -0.409091
3  0.878155  0.500000
4  1.399095  1.712121


### Summary
| Concept   | Meaning                                              |
| --------- | ---------------------------------------------------- |
| **What**  | Rescaling features so they have similar ranges       |
| **Why**   | To ensure fair contribution and faster learning      |
| **How**   | Min-Max (Normalization) or Z-score (Standardization) |
| **Where** | Gradient-based and distance-based ML models          |
