<h1 style="text-align:center">FEATURE SCALING</h1>

Feature scaling means bringing all input features to a similar range so that no feature dominates others just because of its scale.

### ❌ Without scaling:

Slow convergence

Wrong distance calculations

Model biased toward large-value features

### ✅ With scaling:

Faster training

Stable gradients

Fair contribution of all features

In [1]:
import numpy as np
import pandas as pd

In [2]:
data = {
'Age': [20, 22, 25, 30, 35],
'Salary': [20000, 25000, 30000, 50000, 70000]
}


df = pd.DataFrame(data)
df

Unnamed: 0,Age,Salary
0,20,20000
1,22,25000
2,25,30000
3,30,50000
4,35,70000


### fit() 
computes mean & std for each column

### transform()
applies the formula
Z = (X − μ) / σ

Centers data around 0

Makes spread equal to 1

In [6]:
from sklearn.preprocessing import StandardScaler 
sc=StandardScaler()
X_scaled = sc.fit_transform(df)

In [7]:
X_scaled.mean(axis=0)
X_scaled.std(axis=0)

array([1., 1.])

### minmax scaling 
X_scaled = (X − X_min) / (X_max − X_min)

Compresses all values into [0, 1]

Keeps relative distances

In [8]:
from sklearn.preprocessing import MinMaxScaler


mm = MinMaxScaler()
X_minmax = mm.fit_transform(df)


pd.DataFrame(X_minmax, columns=df.columns)

Unnamed: 0,Age,Salary
0,0.0,0.0
1,0.133333,0.1
2,0.333333,0.2
3,0.666667,0.6
4,1.0,1.0


In [13]:
X = df[['Age', 'Salary']] # features
y = [0, 1, 0, 1, 1] # dummy target (example)

In [14]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.2,random_state=42)
from sklearn.preprocessing import StandardScaler


sc = StandardScaler()
sc.fit(X_train) # learn only from training data
X_train_scaled = sc.transform(X_train)
X_test_scaled = sc.transform(X_test)

In [15]:
print(X_train_scaled)


[[ 1.34164079  1.43207802]
 [-0.4472136  -0.65094455]
 [-1.34164079 -1.1717002 ]
 [ 0.4472136   0.39056673]]


In [16]:
import pandas as pd

X_train_scaled_df = pd.DataFrame(
    X_train_scaled,
    columns=X_train.columns
)

X_train_scaled_df


Unnamed: 0,Age,Salary
0,1.341641,1.432078
1,-0.447214,-0.650945
2,-1.341641,-1.1717
3,0.447214,0.390567
