# Encoding Types in Machine Learning
---
This notebook explains different types of encoding with **detailed examples and tables**.

## Covered Encoding Types
1. One-Hot Encoding
2. Label Encoding
3. Ordinal Encoding
4. Binary Encoding
5. Frequency / Count Encoding
6. Target / Mean Encoding
7. Hash Encoding
8. Learned Embeddings
9. Text Encodings (BoW, TF-IDF)


## 1) One-Hot Encoding
**Nominal categories → Independent binary columns**.

**Example Data**:

| id | color  |
|---:|:-------|
| 1  | Red    |
| 2  | Blue   |
| 3  | Green  |
| 4  | Red    |


In [1]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

df = pd.DataFrame({"id": [1, 2, 3, 4], "color": ["Red", "Blue", "Green", "Red"]})
encoder = OneHotEncoder(sparse_output=False)
encoded = encoder.fit_transform(df[["color"]])
pd.DataFrame(encoded, columns=encoder.get_feature_names_out(["color"]))

Unnamed: 0,color_Blue,color_Green,color_Red
0,0.0,0.0,1.0
1,1.0,0.0,0.0
2,0.0,1.0,0.0
3,0.0,0.0,1.0


## 2) Label Encoding
**Each category → integer value** (may imply order).

| id | fruit  |
|---:|:------|
| 1  | Apple  |
| 2  | Banana |
| 3  | Cherry |
| 4  | Apple  |

In [2]:
from sklearn.preprocessing import LabelEncoder

df2 = pd.DataFrame({"id": [1,2,3,4], "fruit": ["Apple","Banana","Cherry","Apple"]})
le = LabelEncoder()
df2["fruit_label"] = le.fit_transform(df2["fruit"])
df2

Unnamed: 0,id,fruit,fruit_label
0,1,Apple,0
1,2,Banana,1
2,3,Cherry,2
3,4,Apple,0


## 3) Ordinal Encoding
**Ordered categories → integer scale.**

| id | size   |
|---:|:-------|
| 1  | Small  |
| 2  | Medium |
| 3  | Large  |
| 4  | Small  |

In [3]:
from sklearn.preprocessing import OrdinalEncoder

df3 = pd.DataFrame({"id":[1,2,3,4], "size":["Small","Medium","Large","Small"]})
encoder = OrdinalEncoder(categories=[["Small","Medium","Large"]])
df3["size_ord"] = encoder.fit_transform(df3[["size"]])
df3

Unnamed: 0,id,size,size_ord
0,1,Small,0.0
1,2,Medium,1.0
2,3,Large,2.0
3,4,Small,0.0


## 4) Binary Encoding
**Category → integer → binary digits.**

In [None]:
!pip install category_encoders -q
import category_encoders as ce

df4 = pd.DataFrame({"id":[1,2,3,4], "city":["Delhi","Mumbai","Pune","Delhi"]})
encoder = ce.BinaryEncoder(cols=["city"])
df4_enc = encoder.fit_transform(df4)
df4_enc


[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Unnamed: 0,id,city_0,city_1
0,1,0,1
1,2,1,0
2,3,1,1
3,4,0,1


## 5) Frequency / Count Encoding
Replace categories with their frequency/count.

In [5]:
df5 = pd.DataFrame({"id":[1,2,3,4,5], "device":["mobile","desktop","mobile","tablet","mobile"]})
freq_map = df5["device"].value_counts().to_dict()
df5["device_count"] = df5["device"].map(freq_map)
df5

Unnamed: 0,id,device,device_count
0,1,mobile,3
1,2,desktop,1
2,3,mobile,3
3,4,tablet,1
4,5,mobile,3


## 6) Target / Mean Encoding
Replace category with mean of target variable.

In [6]:
df6 = pd.DataFrame({"id":[1,2,3,4,5], "plan":["basic","pro","basic","pro","basic"], "churn":[1,0,1,0,0]})
mean_map = df6.groupby("plan")["churn"].mean().to_dict()
df6["plan_mean"] = df6["plan"].map(mean_map)
df6

Unnamed: 0,id,plan,churn,plan_mean
0,1,basic,1,0.666667
1,2,pro,0,0.0
2,3,basic,1,0.666667
3,4,pro,0,0.0
4,5,basic,0,0.666667


## 7) Hash Encoding
Map categories into fixed-size buckets using hash function.

In [7]:
encoder = ce.HashingEncoder(cols=["plan"], n_components=3)
df6_hash = encoder.fit_transform(df6)
df6_hash.head()

Unnamed: 0,col_0,col_1,col_2,id,churn,plan_mean
0,1,0,0,1,1,0.666667
1,0,1,0,2,0,0.0
2,1,0,0,3,1,0.666667
3,0,1,0,4,0,0.0
4,1,0,0,5,0,0.666667


## 8) Learned Embeddings (demo idea)
Typically done in neural nets via an Embedding layer. Hard to demo in small snippet.

## 9) Text Encoding — Bag of Words
Convert text into numeric features.

In [8]:
from sklearn.feature_extraction.text import CountVectorizer

corpus = ["apple banana", "banana carrot"]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out())

Unnamed: 0,apple,banana,carrot
0,1,1,0
1,0,1,1
