### One Hot Encoding

One-Hot Encoding is a technique that represents each word as a unique binary vector where only one specific index is 1 and all others are 0, with the vector length equaling the total vocabulary size.

Example: In a vocabulary of {“cat”, “dog”, “fish”}, the word “cat” is $[1, 0, 0]$, “dog” is $[0, 1, 0]$, and “fish” is $[0, 0, 1]$.

**Disadvantages**
The main drawback of One-Hot Encoding is that it creates very large, "sparse" vectors that fail to capture any relationship or similarity between words (e.g., "king" and "queen" are treated as being just as different as "king" and "apple").

### Bag of Words

**Bag of Words (BoW)** is a text representation technique that converts a document into a numerical vector by counting the frequency of each word while completely ignoring its grammar and order.

Example: For the sentences "I love cat" and "I love dog", the BoW representation for the first sentence would be {"I": 1, "love": 1, "cat": 1, "dog": 0}.

Advantages	
- Simple to Implement: Very easy to understand and code for basic text tasks.	Loses 
- Efficient for Small Datasets: Works well for simple document classification (like Spam vs. Not Spam).	
- Fixed-Length Output: Transforms any length of text into a consistent vector size for machine learning.

Disadvantages
- Context: By discarding word order, it cannot distinguish between "Dog bites man" and "Man bites dog."
- High Dimensionality: In large datasets, the vectors become massive and mostly filled with zeros (sparse).
- Frequency Bias: Common words (stopwords) appear most often but usually carry the least meaning.
- Out of vocabulary issues

### N-Grams

**N-grams** are contiguous sequences of $n$ items (words or characters) from a given sample of text, used to capture local context by looking at groups of words instead of single terms. 
Example: In the sentence "The cat sat," Unigrams ($n=1$) are ["The", "cat", "sat"], Bigrams ($n=2$) are ["The cat", "cat sat"], and Trigrams ($n=3$) is ["The cat sat"].

#### Advantages 
- Captures Context: Unlike Bag of Words, it preserves local word order (e.g., "not good" is kept as a single unit).
- Handles Phrases: It can identify common phrases and idioms (e.g., "New York" or "thank you") as distinct entities.
- Predictiveness: Highly useful for "next-word prediction" (Autofill) because it knows which words typically follow others.

#### Disadvantages
- Dimensionality Explosion: As $n$ increases, the number of possible combinations grows exponentially, requiring massive memory.
- Data Sparsity: Many specific N-grams may only appear once in a dataset, making it hard for the model to generalize.Better Limited Long-Range 
- Dependency: It only "sees" $n$ words back; it cannot connect a word at the start of a paragraph to one at the end.

## Term Frequency - Inverse Document Frequency (TF-IDF)

**TF-IDF (Term Frequency–Inverse Document Frequency)** measures how important a word is to a document relative to a collection: common words get lower weight, rare but meaningful words get higher weight.

**Example:** In spam detection, “free” gets a high TF-IDF in spam emails but low in normal emails.

**Advantages:** Simple, fast, interpretable; reduces impact of common stopwords.
**Disadvantages:** Ignores word order and meaning; can’t capture context or synonyms.

### Word2Vec

Word2Vec is a neural-network model that learns vector representations of words so that words with similar meanings have similar vectors.
Example: Vectors(“king”) − Vectors(“man”) + Vectors(“woman”) ≈ Vectors(“queen”); “car” will be close to “vehicle”.

Advantages: Captures semantic meaning and relationships; dense vectors improve ML performance.
Disadvantages: Context-independent (same vector for “bank”); needs lots of data and training time.

**CBOW (Continuous Bag of Words)** predicts a target word from its surrounding context words.
Example: Given context “I love ___ food”, CBOW predicts “Indian”.

**Skip-gram** predicts surrounding context words from a given target word.
Example: Given word “Indian”, Skip-gram predicts context words like “love”, “food”.

**Key difference:**
CBOW = context ➝ word (faster, good for frequent words)
Skip-gram = word ➝ context (slower, better for rare words)

**Average Word2Vec** represents a sentence or document by averaging the Word2Vec vectors of all its words.

**Example:**
Sentence: “I love Indian food” → vector = average(“I”, “love”, “Indian”, “food”)

**Why use it:** Simple, fast way to get a fixed-length text embedding.
**Limitation:** Loses word order and nuance (treats all words equally).