Imagine this:

You and your friend are running a race.

You measure distance in meters.

Your friend measures time in seconds.

Now, suppose we want to compare “distance” and “time” in the same formula.

👉 Problem: meters and seconds are on totally different scales.
Meters might be 0–1000, while seconds might be 0–10.
If we just throw both into a formula, the big numbers (meters) will dominate. The small numbers (seconds) will hardly matter.

That’s unfair. Both should matter equally.

🔹 What is Scaling then?

Scaling means:
👉 "Let’s put all features on the same measuring stick so no one feature dominates unfairly."

Example: Convert meters and seconds into a scale from 0 to 1.

1000m → 1.0

0m → 0.0

10s → 1.0

0s → 0.0

Now both are comparable.


Standard Scaling:
Standard Scaling (a.k.a. Z-score normalization) is a method to rescale features by subtracting the mean and dividing by the standard deviation of that feature.

📌 Formula:---

​
x′=x−μ/σ

x = original value
μ = mean of the feature
σ = standard deviation of the feature

Typically, most values lie between -3 and +3 (because ~99.7% of data in a normal distribution falls in that range).

In [4]:
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd

In [5]:
#example data for scaling
data = {'Age': [25, 30, 35, 40, 45],
        'Salary': [50000, 60000, 70000, 80000, 90000]}
df = pd.DataFrame(data)
df

Unnamed: 0,Age,Salary
0,25,50000
1,30,60000
2,35,70000
3,40,80000
4,45,90000


In [None]:
# Standard Scaling
scaler = StandardScaler()

#fit the scaler on data and transform it
scaled_data = scaler.fit_transform(df)

#convert the scaled data back to a DataFrame for better readability
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

#Check the scaled data
scaled_df


#After Scaling, the features have a mean of 0 and a standard deviation of 1 and the range is adjusted accordingly from -3 to +3 as z-score.

Unnamed: 0,Age,Salary
0,-1.414214,-1.414214
1,-0.707107,-0.707107
2,0.0,0.0
3,0.707107,0.707107
4,1.414214,1.414214


Min-Max Scaler

Definition: Scales features to a fixed range, usually [0, 1].

Formula: (x - min) / (max - min)

Range: [0, 1] (or another range you set).

When to Use: Useful when you want all values to stay within a fixed boundary.

In [8]:
from sklearn.preprocessing import MinMaxScaler

In [9]:
#data fro the scaling
data2 = {'Age': [25, 30, 35, 40, 45],
        'Salary': [50000, 60000, 70000, 80000, 90000]}
df = pd.DataFrame(data2)
df


Unnamed: 0,Age,Salary
0,25,50000
1,30,60000
2,35,70000
3,40,80000
4,45,90000


In [12]:
#lets scale the data using Min-Max Scaler
scaler = MinMaxScaler()

#fit the scaler on data and transform it
scaled_data = scaler.fit_transform(df)

#convert the scaled data back to a DataFrame for better readability
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)
scaled_df

#After the scaling, the features are scaled to a fixed range of 0 to 1.


Unnamed: 0,Age,Salary
0,0.0,0.0
1,0.25,0.25
2,0.5,0.5
3,0.75,0.75
4,1.0,1.0


Max-Abs Scaler

Definition: Scales each feature by dividing by its maximum absolute value.

Formula: x / max(abs(x))

Range: [-1, 1]

When to Use: Good for sparse data (many zeros), because it does not shift/center the data.

In [13]:
from sklearn.preprocessing import MaxAbsScaler

In [14]:
#data for the scaling
data3 = {'Age': [25, 30, 35, 40, 45],
        'Salary': [50000, 60000, 70000, 80000, 90000]}
df = pd.DataFrame(data3)
df

Unnamed: 0,Age,Salary
0,25,50000
1,30,60000
2,35,70000
3,40,80000
4,45,90000


In [16]:
#lets scale the data using Max-Abs Scaler
scaler = MaxAbsScaler()

#fit the scaler on data and transform it
scaled_data = scaler.fit_transform(df)

#convert the scaled data back to a DataFrame for better readability
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)
scaled_df

#After the scaling, the features are scaled to a fixed range of -1 to 1 based on the maximum absolute value of each feature.

Unnamed: 0,Age,Salary
0,0.555556,0.555556
1,0.666667,0.666667
2,0.777778,0.777778
3,0.888889,0.888889
4,1.0,1.0


Robust Scaler

Definition: Scales features using statistics that are robust to outliers (median and interquartile range).

Formula: (x - median) / (IQR)

where IQR = Q3 - Q1 (interquartile range)

Range: No fixed range (depends on the data), but less affected by outliers compared to StandardScaler.

When to Use: Best when your data has outliers that would distort scaling with Standard or MinMax.

In [17]:
from sklearn.preprocessing import RobustScaler

In [18]:
#data for the scaling
data4 = {'Age': [25, 30, 35, 40, 45],
        'Salary': [50000, 60000, 70000, 80000, 90000]}
df = pd.DataFrame(data3)
df

Unnamed: 0,Age,Salary
0,25,50000
1,30,60000
2,35,70000
3,40,80000
4,45,90000


In [20]:
#lets scale the data using Robust Scaler
scaler = RobustScaler()

#fit the scaler on data and transform it
scaled_data = scaler.fit_transform(df)

#convert the scaled data back to a DataFrame for better readability
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

scaled_df

#After the scaling, the features are scaled using statistics that are robust to outliers, such as the median and the interquartile range (IQR). This makes it particularly useful for datasets with outliers.

Unnamed: 0,Age,Salary
0,-1.0,-1.0
1,-0.5,-0.5
2,0.0,0.0
3,0.5,0.5
4,1.0,1.0


Scaling is totally depend on the model requirement

If your model can handle the negative values u can use the standard scaler.

If your data can handle the negative values u can use the min-max scalar.

Note: There are many other scaling tenchiques and algo's are there as well we just discussed the most popular and the important one's