## 📌 What Are N-Grams?

**N-Grams** are continuous sequences of *n* words from a sentence. They help preserve **word order** and **local context**, which are crucial for understanding meaning in Natural Language Processing (NLP).

N-Grams are commonly categorized as:
- **Unigram (n = 1):** Single words  
- **Bigram (n = 2):** Pairs of consecutive words  
- **Trigram (n = 3):** Triplets of consecutive words

They are widely used in tasks such as **language modeling**, **text classification**, and **sentiment analysis**, especially where word sequence impacts meaning.

---

### 🔍 Why Not Just Use Bag of Words?

While **Bag of Words (BoW)** captures word frequency, it completely ignores the **order** of words. This leads to major issues in understanding meaning.

#### Consider:

- **Sentence 1**: `"Food is good"`  
- **Sentence 2**: `"Food is not good"`

#### Their BoW vectors might look like this:

| Word     | food | is | good | not |
|----------|------|----|------|-----|
| Sentence 1 |  1   | 1  |  1   |  0  |
| Sentence 2 |  1   | 1  |  1   |  1  |

Despite opposite sentiment, BoW treats these as nearly identical as vectors are almost identical. 

---

### 🧠 N-Grams in Action: "Food is good" vs "Food is not good"

Let’s see how N-Grams help capture **meaning through word combinations**, which Bag of Words misses.

---

### ✅ Input Sentences:

1. `"Food is good"`  
2. `"Food is not good"`
- Remove stop words (like `"is"`)
- Extract both **unigrams and bigrams** using `ngram_range=(1, 2)`

---

### ✂️ After Stop Word Removal:

- **Sentence 1:** `"Food good"`  
  → Unigrams: `["food", "good"]`  
  → Bigrams: `["food good"]`

- **Sentence 2:** `"Food not good"`  
  → Unigrams: `["food", "not", "good"]`  
  → Bigrams: `["food not", "not good"]`

---

### 🔠 Combined Feature Set:

| Feature      | food | good | not | food good | food not | not good |
|--------------|------|------|-----|------------|-----------|-----------|
| Sentence 1   |  1   |  1   |  0  |     1      |     0     |     0     |
| Sentence 2   |  1   |  1   |  1  |     0      |     1     |     1     |

---

### ✅ Insight:
- Both sentences share `"food"` and `"good"` as unigrams — **but bigrams add clarity**.
- Sentence 1 contains `"food good"` — positive sentiment  
- Sentence 2 includes `"not good"` — indicating **negation** and **negative sentiment**

✅ N-Grams provide context that helps models detect **tone and meaning** more accurately.

---

## 📊 `ngram_range` Examples 

This shows how `CountVectorizer` behaves with different n-gram configurations:

- **`ngram_range=(1, 1)`** → Unigrams  
  → `["food", "is", "not", "good"]`

- **`ngram_range=(1, 2)`** → Unigrams + Bigrams  
  → `["food", "is", "not", "good", "food is", "is not", "not good"]`

- **`ngram_range=(1, 3)`** → Unigrams + Bigrams + Trigrams  
  → Adds: `"food is not"`, `"is not good"`

- **`ngram_range=(2, 3)`** → Bigrams + Trigrams only  
  → `["food is", "is not", "not good", "food is not", "is not good"]`