## **Text Representation**

### 🔹 **Subtopic: One-Hot Encoding (OHE)**


### **What is One-Hot Encoding (OHE)?**

One-Hot Encoding is a way to **convert text categories or words into numerical vectors** using only 0s and 1s.

* Each unique word or category becomes its own position (feature) in the vector.
* The vector contains a `1` where the word/category is present and `0` elsewhere.

---

### **Why Use One-Hot Encoding?**

* Machine learning models need **numbers, not text**.
* OHE is a **simple, straightforward method** to represent categorical data.
* Useful when the vocabulary or categories are **small and fixed**.

---

### **How Does One-Hot Encoding Work?**

If you have a list of categories or words like:

```python
["apple", "banana", "orange"]
```

OHE creates vectors like this:

| apple | banana | orange |
| ----- | ------ | ------ |
| 1     | 0      | 0      |
| 0     | 1      | 0      |
| 0     | 0      | 1      |

---

### **When to Use OHE?**

* For **categorical labels** (e.g., sentiment: positive, neutral, negative)
* For **small vocabularies** when representing words
* When you want a **simple, interpretable** numeric format

----

### **Limitations**

* If vocabulary is very large, OHE vectors become **very large and sparse** (mostly zeros).
* Doesn’t capture **meaning or similarity** between words (e.g., "cat" and "kitten" treated as completely different).

---

### **Summary**

| Aspect    | Details                                                        |
| --------- | -------------------------------------------------------------- |
| Purpose   | Convert categories/words to numeric vectors using 0s and 1s    |
| Pros      | Simple, easy to understand                                     |
| Cons      | Large sparse vectors for big vocabularies; no semantic meaning |
| Use cases | Small vocabularies, categorical labels                         |

