Normalization:

Definition:
Normalization means rescaling a vector so its length (norm) becomes 1. 
It preserves the direction but removes the magnitude. 
Often used in distance-based algorithms like KNN, SVM, cosine similarity, or text data.

Types of Norms:

1. L1 Normalization (Manhattan norm)
Formula: x' = x / (Σ |x|)
Property: The sum of absolute values = 1
When to use: Sparse data, text features (bag-of-words, TF-IDF).

2. L2 Normalization (Euclidean norm)
Formula: x' = x / √(Σ x²)
Property: The Euclidean length = 1
When to use: Regression, SVM, embeddings, when direction matters more than magnitude.

3. Max Norm (Infinity norm)
Formula: x' = x / max(|x|)
Property: The largest absolute value = 1
When to use: When only the largest value in a vector should be bounded.


In [3]:
from sklearn.preprocessing import Normalizer
import pandas as pd
import numpy as np

In [4]:
#data for the normalization
data5 = {'Feature1': [1, 2, 3, 4, 5],
        'Feature2': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data5)
df

Unnamed: 0,Feature1,Feature2
0,1,10
1,2,20
2,3,30
3,4,40
4,5,50


In [None]:
#lets apply the L2 normalization 
scaler = Normalizer(norm = 'l2') #l1, l2
normalized_data = scaler.fit_transform(df)
normalized_df = pd.DataFrame(normalized_data, columns=df.columns)
normalized_df

Unnamed: 0,Feature1,Feature2
0,0.099504,0.995037
1,0.099504,0.995037
2,0.099504,0.995037
3,0.099504,0.995037
4,0.099504,0.995037


In [None]:
#lets apply the L1 normalization
scaler = Normalizer(norm = 'l1') #l1, l2
normalized_data = scaler.fit_transform(df)
normalized_df = pd.DataFrame(normalized_data, columns=df.columns)
normalized_df

Unnamed: 0,Feature1,Feature2
0,0.090909,0.909091
1,0.090909,0.909091
2,0.090909,0.909091
3,0.090909,0.909091
4,0.090909,0.909091
