# Feature Transformation Techniques in Machine Learning
This notebook demonstrates encoding categorical variables and discretization/binning techniques with examples.

## 🔹 2. Encoding Categorical Variables
Converts categorical data into numerical format.

### Label Encoding
Assigns unique integers to each category.

In [1]:

import pandas as pd
from sklearn.preprocessing import LabelEncoder

data = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red']})
le = LabelEncoder()
data['Color_Label'] = le.fit_transform(data['Color'])
print(data)


   Color  Color_Label
0    Red            2
1   Blue            0
2  Green            1
3   Blue            0
4    Red            2


### One-Hot Encoding
Creates binary columns for each category.

In [2]:

data = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red']})
data_encoded = pd.get_dummies(data, columns=['Color'])
print(data_encoded)


   Color_Blue  Color_Green  Color_Red
0       False        False       True
1        True        False      False
2       False         True      False
3        True        False      False
4       False        False       True


### Ordinal Encoding
Assigns integers based on category order.

In [3]:

from sklearn.preprocessing import OrdinalEncoder

data = pd.DataFrame({'Size': ['Small', 'Medium', 'Large', 'Medium', 'Small']})
encoder = OrdinalEncoder(categories=[['Small', 'Medium', 'Large']])
data['Size_Ordinal'] = encoder.fit_transform(data[['Size']])
print(data)


     Size  Size_Ordinal
0   Small           0.0
1  Medium           1.0
2   Large           2.0
3  Medium           1.0
4   Small           0.0


In [11]:
import pandas as pd

data = pd.DataFrame({
    'City': ['Mumbai', 'Delhi', 'Bangalore', 'Chennai', 'Delhi', 'Mumbai', 'Bangalore', 'Chennai'],
    'Salary': [50000, 55000, 48000, 52000, 53000, 51000, 47000, 50000]
})
print(data)


        City  Salary
0     Mumbai   50000
1      Delhi   55000
2  Bangalore   48000
3    Chennai   52000
4      Delhi   53000
5     Mumbai   51000
6  Bangalore   47000
7    Chennai   50000


In [12]:
pip install category_encoders


Collecting category_encoders
  Downloading category_encoders-2.8.1-py3-none-any.whl.metadata (7.9 kB)
Downloading category_encoders-2.8.1-py3-none-any.whl (85 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.7/85.7 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: category_encoders
Successfully installed category_encoders-2.8.1


Binary Encoding

Binary

In [13]:
import category_encoders as ce

binary_encoder = ce.BinaryEncoder(cols=['City'])
binary_encoded = binary_encoder.fit_transform(data['City'])
data_binary = pd.concat([data, binary_encoded], axis=1)
print(data_binary)


        City  Salary  City_0  City_1  City_2
0     Mumbai   50000       0       0       1
1      Delhi   55000       0       1       0
2  Bangalore   48000       0       1       1
3    Chennai   52000       1       0       0
4      Delhi   53000       0       1       0
5     Mumbai   51000       0       0       1
6  Bangalore   47000       0       1       1
7    Chennai   50000       1       0       0


Hash Encoding

In [17]:
hash_encoder = ce.HashingEncoder(cols=['City'], n_components=3)  # 4 columns hash
hash_encoded = hash_encoder.fit_transform(data['City'])
data_hash = pd.concat([data, hash_encoded], axis=1)
print(data_hash)


        City  Salary  col_0  col_1  col_2
0     Mumbai   50000      0      0      1
1      Delhi   55000      0      1      0
2  Bangalore   48000      0      1      0
3    Chennai   52000      1      0      0
4      Delhi   53000      0      1      0
5     Mumbai   51000      0      0      1
6  Bangalore   47000      0      1      0
7    Chennai   50000      1      0      0


Target Encoding

In [15]:
target_encoder = ce.TargetEncoder(cols=['City'])
target_encoded = target_encoder.fit_transform(data['City'], data['Salary'])
data_target = pd.concat([data, target_encoded.rename(columns={'City': 'City_TargetEncoded'})], axis=1)
print(data_target)


        City  Salary  City_TargetEncoded
0     Mumbai   50000        50714.537234
1      Delhi   55000        51211.015961
2  Bangalore   48000        50288.984039
3    Chennai   52000        50785.462766
4      Delhi   53000        51211.015961
5     Mumbai   51000        50714.537234
6  Bangalore   47000        50288.984039
7    Chennai   50000        50785.462766
