## What is Standardization?
Standardization is one type of feature scaling where the features are rescaled so that they have:
Mean (μ) = 0
Standard deviation (σ) = 1

The formula for standardization is:  $z = \frac{x - \mu}{\sigma}$

Where:  
- x is the original value,
- μ is the mean of the feature,
- σ is the standard deviation of the feature,
- z is the standardized value.
## Why is Standardization Important?
1. Equal Treatment of Features: Features with larger scales can dominate the model if not scaled properly.
2. Faster Convergence: For gradient-based algorithms (like logistic regression, neural networks), it helps the model converge faster.
9. Improves Model Performance: Algorithms like SVMs and KNN rely on distance metrics, and unscaled features can skew distance calculations.
4.Regularization Works Better: Techniques like Lasso and Ridge regression are sensitive to the scale of features.

## When to Use Standardization?
When your data follows a Gaussian (normal) distribution.
When using algorithms sensitive to feature scale, such as:

1. Logistic Regression
2. SVM
3. K-Means
4. KNN
5. PCA (Principal Component Analysis)
6. Neural Networks

In [5]:
from sklearn.preprocessing import StandardScaler
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
# Example data
data = {
    'income': [40000, 50000, 60000, 80000,10000],
    'loan_amount': [10000, 15000, 20000, 25000,30000]
}

In [3]:
df = pd.DataFrame(data)
df

Unnamed: 0,income,loan_amount
0,40000,10000
1,50000,15000
2,60000,20000
3,80000,25000
4,10000,30000


In [8]:
# Create StandardScaler object
scaler = StandardScaler()

In [9]:
# Fit and transform the data
standardized_data = scaler.fit_transform(df)

In [10]:
# Create DataFrame with standardized values
standardized_df = pd.DataFrame(standardized_data, columns=df.columns)

In [11]:
standardized_df

Unnamed: 0,income,loan_amount
0,-0.345547,-1.414214
1,0.086387,-0.707107
2,0.518321,0.0
3,1.382189,0.707107
4,-1.64135,1.414214


### 🔹 What is **Min-Max Scaling**?

**Min-Max Scaling** (also called normalization) is a technique to rescale your features to a **fixed range**, usually **\[0, 1]**.

---

### ✅ **Formula:**

$$
x' = \frac{x - \text{min}(x)}{\text{max}(x) - \text{min}(x)}
$$

Where:

* $x$ = original value
* $x'$ = scaled value
* $\text{min}(x)$, $\text{max}(x)$ = min and max of the feature

### ⚖️ **When to Use Min-Max Scaling:**

* When you want values strictly between **0 and 1**
* Works well for algorithms that use distance or gradient-based optimization:

  * **KNN**
  * **K-Means**
  * **Neural Networks**

### ❗ Caution:

* **Sensitive to outliers** – extreme values can heavily affect the result.
* Better to use **Standardization** (Z-score) if data contains outliers.

In [14]:
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

In [15]:

# Sample data
data = {
    'income': [20000, 30000, 50000, 80000],
    'loan_amount': [10000, 12000, 25000, 30000]
}

In [16]:
df = pd.DataFrame(data)

In [17]:
# Create scaler object
scaler = MinMaxScaler()

In [18]:
# Fit and transform the data
normalized_data = scaler.fit_transform(df)

In [19]:
# Create a DataFrame with normalized data
normalized_df = pd.DataFrame(normalized_data, columns=df.columns)

In [21]:
normalized_df

Unnamed: 0,income,loan_amount
0,0.0,0.0
1,0.166667,0.1
2,0.5,0.75
3,1.0,1.0


### 🔹 What is **MaxAbsScaler**?

**MaxAbsScaler** scales each feature by **its maximum absolute value**, transforming the data into the range **\[-1, 1]**. Unlike MinMaxScaler, it **doesn’t shift or center** the data — only scales.

### ✅ **Formula:**

$$
x' = \frac{x}{\max(|x|)}
$$

Where:

* $x$ = original value
* $x'$ = scaled value
* $\max(|x|)$ = maximum absolute value of the feature

### ⚖️ **When to Use MaxAbsScaler:**

* Data contains **both positive and negative values**
* You want to **preserve sparsity** (works well with sparse data)
* Models sensitive to scale but not to centering (e.g., SVMs, linear models)

### ❗ Not suitable when:

* Data needs centering (mean = 0)
* There are extreme outliers (use **RobustScaler** instead)


In [24]:
from sklearn.preprocessing import MaxAbsScaler
import pandas as pd

# Sample data (can include negative values)
data = {
    'income': [20000, -30000, 50000, -80000],
    'loan_amount': [10000, -12000, 25000, -30000]
}

df = pd.DataFrame(data)

# Create scaler
scaler = MaxAbsScaler()

# Fit and transform the data
scaled_data = scaler.fit_transform(df)

# Create DataFrame with scaled values
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

scaled_df

Unnamed: 0,income,loan_amount
0,0.25,0.333333
1,-0.375,-0.4
2,0.625,0.833333
3,-1.0,-1.0


### 🔹 What is **RobustScaler**?

**RobustScaler** is a scaling technique that **reduces the influence of outliers** by using the **median** and **interquartile range (IQR)** instead of mean and standard deviation.

### ✅ **Formula:**

$$
x' = \frac{x - \text{median}(x)}{\text{IQR}(x)}
$$

Where:

* $x$ = original value
* $\text{median}(x)$ = 50th percentile
* $\text{IQR}(x) = Q3 - Q1$ = 75th percentile − 25th percentile

### ⚖️ **When to Use RobustScaler:**

* Your data contains **outliers**
* You want a scale that is **resistant to extreme values**
* Works well with algorithms sensitive to scale (e.g., linear models, SVM, KNN)

### 🧠 Comparison:

| Scaler         | Handles Outliers? | Range          | Centers Data   |
| -------------- | ----------------- | -------------- | -------------- |
| StandardScaler | ❌ No              | Varies         | Yes (mean=0)   |
| MinMaxScaler   | ❌ No              | \[0, 1]        | No             |
| MaxAbsScaler   | ❌ No              | \[-1, 1]       | No             |
| RobustScaler   | ✅ Yes             | Depends on IQR | Yes (median=0) |


In [26]:
from sklearn.preprocessing import RobustScaler
import pandas as pd

# Sample data with outliers
data = {
    'income': [20000, 30000, 50000, 800000],  # Outlier: 800000
    'loan_amount': [10000, 12000, 25000, 100000]  # Outlier: 100000
}

df = pd.DataFrame(data)

# Create RobustScaler
scaler = RobustScaler()

# Fit and transform the data
scaled_data = scaler.fit_transform(df)

# Create DataFrame with scaled values
scaled_dfr = pd.DataFrame(scaled_data, columns=df.columns)

scaled_dfr

Unnamed: 0,income,loan_amount
0,-0.095238,-0.263566
1,-0.047619,-0.20155
2,0.047619,0.20155
3,3.619048,2.527132
