# 🧠 Word2Vec – CBOW Model 

---

## 📌 What is Word2Vec?

**Word2Vec** is a popular technique in Natural Language Processing (NLP) for learning **vector representations of words**, known as **word embeddings**.

These embeddings capture semantic meaning—words that appear in similar contexts have similar vector representations. For example, the vectors for `"king"` and `"queen"` would be close in the embedding space.

---

## ⚙️ Word2Vec Architectures

Word2Vec has two main architectures:

1. **CBOW (Continuous Bag of Words)**  
   - Predicts the **target (center) word** from a given **context (surrounding words)**  
   - Efficient and works well with smaller datasets  
   - Emphasizes frequent words

2. **Skip-Gram**  
   - Predicts **context words** given a **target word**  
   - Performs better with rare words  
   - More expressive but slightly slower

---

## 📦 Word2Vec: Pretrained vs. Train From Scratch

Word2Vec can be used in two primary ways:

1. **Pretrained Word2Vec Models**  
   - Trained on massive corpora like Google News (100 billion words)  
   - Ready-to-use embeddings for general NLP tasks  
   - Example: Google’s Word2Vec (`GoogleNews-vectors-negative300.bin`)

2. **Training From Scratch**  
   - Ideal for domain-specific tasks or custom vocabulary  
   - Gives you full control over the corpus and training process  
   - Libraries like Gensim make this easy

---

## 🔍 How Does CBOW Work? – Step-by-Step Explanation

The **Continuous Bag of Words (CBOW)** model predicts the **target (center) word** using its **context (surrounding words)**.

---

### 🧠 Let’s Take a Simple Example:

**Sentence:**  
`The quick brown fox jumps over the lazy dog`

Let’s say we set the `window size = 2` — this means we’ll look at **2 words before and after** the center word.

Now, suppose our **target (center) word** is:  
➡️ `"brown"`

Then the **context words** are:  
➡️ `["the", "quick", "fox", "jumps"]`

---

### 🛠️ How CBOW Processes This:

1. **Input Layer**:  
   - Each context word is converted to a one-hot vector (or embedding)
   - So we have 4 input vectors: one for each context word

2. **Hidden Layer**:  
   - The vectors are averaged (or summed) together
   - This gives a single vector representing the combined context

3. **Output Layer**:  
   - The context vector is multiplied by a weight matrix and passed through a softmax function
   - This outputs a probability distribution over all words in the vocabulary

4. **Prediction**:  
   - The model tries to **maximize the probability of the actual center word** ("brown" in our case)
   - During training, weights are updated to improve predictions

---

### 🧪 This Process Repeats Across the Corpus

CBOW slides a window across the text and learns embeddings by predicting the center word each time. Over many iterations, the model **learns word meanings based on their context**.

---

✅ **Goal of CBOW**:  
Words appearing in **similar contexts** should have **similar embeddings** — this is how semantic relationships are learned!
