## 🔢 One-Hot Encoding

After applying text preprocessing techniques like **tokenization**, **stop word removal**, **stemming**, and **lemmatization**, the next step is to **convert text into numerical format** — because machine learning models can't work with raw text directly.

One of the simplest ways to do this is through **One-Hot Encoding**.

---

### 🔢 What is One-Hot Encoding?

One-hot encoding is a technique to represent **categorical data** — such as words — as **binary vectors**. It’s often the first step in transforming text into a machine-readable format.

Here’s how it works:
- Each unique word in your vocabulary is assigned a distinct index.
- A word is represented as a vector of all 0s, **except** for a 1 at the index of that word.

---

## 🔢 One-Hot Encoding Example 

### 📘 Sentences:
1. `"I love NLP"`
2. `"NLP is fun"`
3. `"I love fun"`

---

### 🧾 Step 1: Build Vocabulary

First, we tokenize all the sentences and extract the unique words (ignoring case):

**Vocabulary:**
["I", "love", "NLP", "is", "fun"]

### Step2: Each word is now represented as a 5-dimensional binary vector:

| Word   | One-Hot Vector     |
|--------|---------------------|
| I      | `[1, 0, 0, 0, 0]`   |
| love   | `[0, 1, 0, 0, 0]`   |
| NLP    | `[0, 0, 1, 0, 0]`   |
| is     | `[0, 0, 0, 1, 0]`   |
| fun    | `[0, 0, 0, 0, 1]`   |

---

### 🧩 Step 3: Represent Each Sentence as a Sequence of Vectors

Now we represent each sentence as a **sequence of one-hot vectors**, preserving both the words and their order:

#### 🔹 "I love NLP":
[ [1, 0, 0, 0, 0], # I [0, 1, 0, 0, 0], # love [0, 0, 1, 0, 0] # NLP ]


#### 🔹 "NLP is fun":
[ [0, 0, 1, 0, 0], # NLP [0, 0, 0, 1, 0], # is [0, 0, 0, 0, 1] # fun ]


#### 🔹 "I love fun":
[ [1, 0, 0, 0, 0], # I [0, 1, 0, 0, 0], # love [0, 0, 0, 0, 1] # fun ]


---

Each word is now machine-readable, and this representation can be used as input to basic models or as a stepping stone toward more advanced embeddings.