### **Encoding in Data Preprocessing**

Encoding means **converting categorical data into numeric form** so that machine learning models can understand it.

---

## **1. Label Encoding**
Each category is replaced with a **unique number**.  
Best for **ordinal data** (where order matters).

**Before Encoding:**

| Size  |
|--------|
| Small  |
| Medium |
| Large  |

**After Label Encoding:**

| Size  | Encoded |
|--------|----------|
| Small  | 0        |
| Medium | 1        |
| Large  | 2        |

 **Use When:** The categories have a **natural order** (like size, rating).  
 **Avoid When:** There is **no order** (like colors), because model may think 2 > 1 > 0 has meaning.

---

## **2. One-Hot Encoding**
Creates **separate binary columns (0 or 1)** for each category.  
Best for **nominal data** (no order).

**Before Encoding:**

| Color |
|--------|
| Red    |
| Blue   |
| Green  |

**After One-Hot Encoding:**

| Color_Red | Color_Blue | Color_Green |
|------------|-------------|--------------|
| 1          | 0           | 0            |
| 0          | 1           | 0            |
| 0          | 0           | 1            |

- **Red → 1 0 0**  
- **Blue → 0 1 0**  
- **Green → 0 0 1**

    
 **Use When:** Categories have **no order**.  
 **Avoid When:** Too many unique categories (can create hundreds of columns).

---

## **3. Ordinal Encoding**
Assigns numbers based on **predefined order or rank**.  
Similar to Label Encoding, but you **decide the order**.

**Before Encoding:**

| Rating  |
|----------|
| Poor     |
| Average  |
| Good     |
| Excellent|

**After Ordinal Encoding:**

| Rating   | Encoded |
|-----------|----------|
| Poor      | 1        |
| Average   | 2        |
| Good      | 3        |
| Excellent | 4        |

 **Use When:** Categories have a **clear ranking**.  
 **Avoid When:** No natural order exists.

---

## **4. Target Encoding (Mean Encoding)**
Replaces a category with the **mean of target values** for that category.  
Used when there are **many unique categories**.

**Before Encoding:**

| City | Sales |
|------|--------|
| A    | 200    |
| B    | 350    |
| A    | 220    |
| B    | 370    |

**After Target Encoding (based on mean Sales):**

| City | Sales | Encoded |
|------|--------|----------|
| A    | 200    | 210      |
| B    | 350    | 360      |
| A    | 220    | 210      |
| B    | 370    | 360      |

 **Use When:** You have **high-cardinality features** (like many cities or users).  
 **Be careful:** Can cause **data leakage** — always use proper train-test split.

---

### **Summary Table**

| Encoding Type     | Best For       | Creates Extra Columns | Keeps Order | When to Avoid |
|--------------------|----------------|------------------------|--------------|----------------|
| Label Encoding     | Ordinal Data   | ❌                     | ✅           | Nominal data   |
| One-Hot Encoding   | Nominal Data   | ✅                     | ❌           | Too many categories |
| Ordinal Encoding   | Ordered Data   | ❌                     | ✅           | No natural order |
| Target Encoding    | High-cardinality Categorical | ❌ | ⚠️ (depends) | Small datasets or without validation |
