# Variable Encoding in Machine Learning 🚀

## Introduction 🧠
Variable encoding is the process of converting different types of data into numerical values so that they can be used in machine learning models. Since models require all variables to be numbers, encoding is essential for categorical data.

## Continuous Variables 📏
A **continuous variable** is something that can be measured and represented on a number line. Example: **Weight in kilograms**.

### Encoding Continuous Variables 🔢
Continuous variables are **already numerical**, so no encoding is necessary. Just leave them as they are! ✅

---

## Discrete / Categorical Variables 📊

![Screenshot (9845).png](attachment:f66f1b7f-4c9d-460d-888e-7d090515fe2e.png)

Categorical variables are divided into:
1. **Nominal Variables** – No inherent order (e.g., Cat, Dog, Zebra)
2. **Ordinal Variables** – Have a meaningful order (e.g., Small, Medium, Large)

### Nominal Categorical Variables 🐶🐱🦓
Example: An animal variable with categories **Cat, Dog, Zebra**.
Since these are **not numerical**, we need to encode them.

#### Nominal Categorical Variable Encoding 🎯
**1️⃣ One-Hot Encoding** 🔥
- Converts categories into binary columns.
- Example:
  
| Animal | Is Cat? | Is Dog? | Is Zebra? |
|--------|--------|--------|--------|
| Cat    | 1      | 0      | 0      |
| Dog    | 0      | 1      | 0      |
| Zebra  | 0      | 0      | 1      |
| Dog    | 0      | 1      | 0      |
  
Each row has only one **hot (1) value**, hence the name **One-Hot Encoding**.

**2️⃣ Dummy Encoding** 🎭
- Similar to one-hot encoding but removes one column to avoid redundancy.
- Example:
  
| Animal | Is Cat? | Is Dog? |
|--------|--------|--------|
| Cat    | 1      | 0      |
| Dog    | 0      | 1      |
| Zebra  | 0      | 0      |
| Dog    | 0      | 1      |
  
If an entry has **0 in all columns**, it means it belongs to the missing category (Zebra in this case).

---

## Ordinal Categorical Variables 📏➡️📐
Example: **Small, Medium, Large** (Coffee sizes ☕)

![Screenshot (9852).png](attachment:cdce0792-5b9e-4a8e-946b-71628a579dd7.png)

### Ordinal Categorical Variable Encoding 🎚️
One-hot encoding could work, 

![Screenshot (9853).png](attachment:925b0c1b-1a27-42a1-98c8-66796c428306.png)

but it loses the **order** of the values. Instead, we use **Ordinal Encoding**

![Screenshot (9854).png](attachment:675664ec-fa06-470c-92d6-8be0cbf81829.png)

| Coffee Size | Encoding |
|------------|---------|
| Small      | 0       |
| Medium     | 1       |
| Large      | 2       |

This encoding **preserves the order** in the data. ✅

---

## Binary Categorical Variables ⚫⚪
A **binary variable** has only **two categories**, such as **Cat vs. Dog**, **Fraud vs. Not Fraud**, or **Yes vs. No**.

### Binary Categorical Variable Encoding 🔁
- Assign **one class as 1 (positive)** and the other as **0 (negative)**.
- Example:
- 
![Screenshot (9859).png](attachment:4a01f725-aa1d-4462-8088-f3f6b5829b51.png)

| Animal | Encoding |
|--------|---------|
| Cat    | 0       |
| Dog    | 1       |
| Dog    | 1       |
| Cat    | 0       |

You could swap the 0 and 1 labels without any issue.

---

## Summary 📝
| Variable Type | Encoding Method |
|--------------|----------------|
| **Continuous** | No encoding needed ✅ |
| **Nominal** | One-Hot or Dummy Encoding 🔥🎭 |
| **Ordinal** | Number Mapping to preserve order 🎚️ |
| **Binary** | Assign 0 & 1 for two categories ⚫⚪ |

### Key Takeaways 📌
- **Continuous variables** need no encoding.
- **Nominal variables** use **One-Hot or Dummy encoding**.
- **Ordinal variables** use **ordered numerical mapping**.
- **Binary variables** are a special case using **0/1 encoding**.

By using the right encoding techniques, we can ensure that categorical variables are effectively represented for machine learning models. 🚀

