<a href="https://colab.research.google.com/github/KhushnurAnjum26/Data-Analysis/blob/main/Feature_Scaling(Standardization).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **0. What is Feature Scaling?**

Feature Scaling is a technique to normalize or standardize the range of independent variables (features) of data.

In most machine learning algorithms, features with different scales can cause problems because many models rely on distance or assume data is centered.


Example:
Suppose you have the following data:

Height (cm)	Weight (kg)
170	70
180	90


Without scaling, "Weight" has a smaller numerical range than "Height", but that doesn’t mean it’s less important. Algorithms like KNN, SVM, and Gradient Descent-based methods can get biased due to this.


# **1. Why do we need Feature Scaling?**

Algorithms like KNN, K-Means, SVM, PCA, Logistic Regression use distance or magnitude.


Features with larger ranges dominate those with smaller ranges.



Gradient Descent may take longer to converge or get stuck.

Example:
In KNN, if height ranges from 150–200 and weight from 1–10, the model will care more about height than weight—unless you scale them.



# **2. Types of Feature Scaling**

There are mainly three common types:

## **Min-Max Scaling (Normalization): Scales values between 0 and 1.**

𝑥
scaled
=
𝑥
−
𝑥
min
𝑥
max
−
𝑥
min
x
scaled
​
 =
x
max
​
 −x
min
​

x−x
min
​

​

## **Standardization (Z-score Scaling):**

𝑥
scaled
=
𝑥
−
𝜇
𝜎
x
scaled
​
 =
σ
x−μ
​

Here,
𝜇
μ is the mean, and
𝜎
σ is the standard deviation.






📌 Formula for Standard Deviation (SD)
Standard deviation (
𝜎


σ) measures how much values deviate from the mean.

For a population:

𝜎
=
1
𝑁
∑
𝑖
=
1
𝑁
(
𝑥
𝑖
−
𝜇
)
2
σ=
N
1
​
  
i=1
∑
N
​
 (x
i
​
 −μ)
2

​

For a sample:

𝑠
=
1
𝑛
−
1
∑
𝑖
=
1
𝑛
(
𝑥
𝑖
−
𝑥
ˉ
)
2
s=
n−1
1
​
  
i=1
∑
n
​
 (x
i
​
 −
x
ˉ
 )
2

​

Where:

𝑥
𝑖
x
i
​
  = each data point

𝜇
μ or
𝑥
ˉ
x
ˉ
  = mean

𝑁
N or
𝑛
n = number of data points



Robust Scaling: Uses median and IQR (Interquartile Range), useful with outliers.


## **3. Standardization - Intuition**


Standardization transforms data to have zero mean and unit variance. It keeps the shape of the original distribution but shifts and scales it.

Useful when data is normally distributed.

Doesn’t squash values into a range like [0, 1], but makes data comparable.

Example:
Original: [50, 60, 70], Mean = 60, Std = 10
Standardized: [-1, 0, 1]





# **4. Example**




Let’s scale using Standardization:

Original Data:
Age	Salary
25	30,000
35	50,000
45	70,000

Step 1: Compute Mean and Std Dev

Mean Age = 35, Std = 10

Mean Salary = 50,000, Std = 20,000


Step 2: Apply Formula

Age (25)
=
25
−
35
10
=
−
1
Salary (30000)
=
30000
−
50000
20000
=
−
1
Age (25)=
10
25−35
​
 =−1
Salary (30000)=
20000
30000−50000
​
 =−1
Resulting scaled row: Age = -1, Salary = -1


## **5. Impact of Outliers**


Outliers can distort the range and mean:

Min-Max Scaling is sensitive to outliers.

Standardization is less sensitive but still affected.

RobustScaler (using median and IQR) is best for outliers.

Example:
Data: [10, 12, 14, 1000]

Min-Max would squash all small values to near 0 because of 1000.

Standardization would pull the mean too high.



## **6. When to Use Standardization?**


✅ Use Standardization when:

## **1. Your Data is Normally Distributed**


Explanation:

Standardization assumes the data follows a normal (Gaussian) distribution, meaning most values are around the mean and gradually decrease toward the extremes.

Why?
Standardization transforms the data so that:

Mean becomes 0

Standard deviation becomes 1

Data shape (distribution) remains the same.

This is especially helpful when algorithms expect or perform better with normally distributed input.

Example:

If your dataset has features like human height or test scores, which typically follow a bell curve, standardization helps scale them properly.



## **2. You Are Using Distance-Based Algorithms**


Examples of distance-based algorithms:

K-Nearest Neighbors (KNN)

Support Vector Machines (SVM)

K-Means Clustering

Explanation:

These algorithms calculate distances (like Euclidean distance).
If features are on different scales, the larger-scale feature dominates, which is unfair.

Standardization ensures all features contribute equally to the distance.





## **3. You Want to Retain the Effect of Outliers (to Some Extent)**


Explanation:

Min-Max scaling compresses all data into a [0, 1] range, which may suppress outliers.

Standardization does not suppress outliers, it spreads them in terms of standard deviations.


Why?


Sometimes, outliers carry important information (e.g., fraud detection, rare diseases).

So, if you want to preserve outliers but still scale the data, standardization is a better choice than normalization.



📌 Formula for Standardization
The formula to standardize a value
𝑥
x:

𝑧
=
𝑥
−
𝜇
𝜎
z=
σ
x−μ
​

Where:

𝑥
x = original value

𝜇
μ = mean of the feature

𝜎
σ = standard deviation of the feature

