## Robust Scaling

**Robust Scaling uses the median and interquartile range (IQR) instead of the mean and standard deviation making the transformation robust to outliers and skewed distributions. It is highly suitable when the dataset contains extreme values or noise.**

- Reduces influence of outliers by centering on median
- Scales based on IQR, which captures middle 50% spread

## Code Example: Performing Robust Scaling

- Uses median and interquartile range (IQR) for scaling instead of mean/std.
- Robust to outliers and skewed data distributions.
- Centers data around median and scales based on spread of central 50% values.
- scaled_df.head() shows robustly scaled data minimizing outlier effects.

In [3]:
import pandas as pd
import numpy as np

df = pd.read_csv('SampleFile.csv')

df = df.select_dtypes(include=np.number)
df.head()

Unnamed: 0,LotArea,MSSubClass
0,8450,60
1,9600,20
2,11250,60
3,9550,70
4,14260,60


In [4]:
from sklearn.preprocessing import RobustScaler

scaler = RobustScaler()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data,
                         columns=df.columns)
print(scaled_df.head())

    LotArea  MSSubClass
0 -0.254076         0.2
1  0.030015        -0.6
2  0.437624         0.2
3  0.017663         0.4
4  1.181201         0.2


## Advantages 

- Improves Model Performance: Enhances accuracy and predictive power by presenting features in comparable scales.
- Speeds Up Convergence: Helps gradient-based algorithms train faster and more reliably.
- Prevents Feature Bias: Avoids dominance of large-scale features, ensuring fair contribution from all features.
- Increases Numerical Stability: Reduces risks of overflow/underflow in computations.
- Facilitates Algorithm Compatibility: Makes data suitable for distance- and gradient-based models like SVM, KNN and neural networks.

## The End !!