## 🤖 Word2Vec Intuition

**Word2Vec** is a word embedding algorithm developed by **Google in 2013** that represents words as **dense, continuous vectors** based on their context in large text corpora.

Rather than relying on word frequency (like BoW or TF-IDF), Word2Vec learns **semantic meaning** by training a shallow neural network to understand how words relate to one another based on their surrounding words.

---

### 🔍 Key Concepts

- Each word is mapped to a **vector of real numbers** (e.g., 100 or 300 dimensions).
- Words that occur in **similar contexts** are placed **closer together** in the vector space.
- Word2Vec captures **semantic similarity** (e.g., `"France"` and `"Italy"` are close) and **linguistic relationships** (e.g., `"king - man + woman ≈ queen"`).

---

## 🧩 Vocabulary and Feature Space in Word2Vec

In Word2Vec, every **unique word** in the corpus becomes part of the **vocabulary**. Each word is associated with a **trainable vector** of fixed length — typically 100 to 300 dimensions.

Unlike sparse representations (like One-Hot Encoding), Word2Vec learns **dense vectors** that capture **semantic meaning and relationships** between words.

---

### 📌 Vocabulary → Feature Matrix

Let’s say we have the following 5 words in our vocabulary:

**Vocabulary:** `["king", "queen", "man", "woman", "throne"]`  
We assume each word is mapped to a **4-dimensional vector** for simplicity as:

- **Power**  
- **Royalty**  
- **Gender (Masculine/Feminine)**  
- **Status**

---

### 📊 Word2Vec Embedding Matrix (Example)

| Word     | Power | Royalty | Gender | Status |
|----------|--------|---------|--------|--------|
| king     | 0.80   | 0.95    | 0.20   | 0.90   |
| queen    | 0.78   | 0.96    | 0.80   | 0.88   |
| man      | 0.75   | 0.10    | 0.15   | 0.60   |
| woman    | 0.72   | 0.12    | 0.85   | 0.62   |
| throne   | 0.85   | 0.99    | 0.50   | 0.95   |

---

### ✅ Interpretation

- `"king"` and `"queen"` are both **high in power and royalty**, but differ in **gender**.
- `"man"` and `"woman"` are not associated with royalty but are high in **gender polarity**.
- `"throne"` scores high in **royalty** and **status**, connecting it to both `"king"` and `"queen"`.

> Word2Vec learns these embeddings from **context**, not from predefined labels — but similar relationships often emerge naturally during training.



### 🔁 Vector Relationships (Analogies)

One of the most powerful aspects of Word2Vec is its ability to capture **word analogies** using simple vector arithmetic.

For example:
``king - man + woman ≈ queen``

How this works:

- `"king"` and `"man"` share similar masculine traits, so subtracting `"man"` removes the **male component**.
- Adding `"woman"` brings in the **female context**.
- The result vector is close to `"queen"` — which shares royalty and status with `"king"`, but is more feminine.

✅ This analogy shows how **semantic relationships** are preserved in the embedding space.

---

### 🔗 How Word Similarity is Measured in Word2Vec

Once we have word vectors from Word2Vec, we can measure how similar two words are using **Cosine Similarity**.

### 📐 What is Cosine Similarity?

Cosine similarity calculates the **angle between two vectors** in the embedding space — not their magnitude.  
It ranges from **-1 to 1**:
- `1` → perfectly similar (same direction)
- `0` → no similarity (orthogonal)
- `-1` → opposite direction (completely dissimilar)

---

### 🧮 Cosine Similarity Formula

For two word vectors **A** and **B**:

$$
\text{Cosine Similarity} = \frac{A \cdot B}{\|A\| \times \|B\|}
$$

Where:
- \( A . B \) is the **dot product** of the vectors
- \( \|A\| \) and \( \|B\| \) are the **magnitudes** (Euclidean norms) of each vector

---

### Example

Let’s say we have these simplified 2D word vectors:

- **king** = [0.8, 0.5]  
- **queen** = [0.75, 0.6]  
- **apple** = [-0.2, 0.9]  

We can visualize them in 2D space like this:

           ↑
           |                   * queen
           |                  ↗
           |                 /
       1.0 |                /
           |               /
           |              /
           |             /
           |            /
       0.5 |      * king
           |         ↗
           |        /
           |       /
           |      /
       0.0 |-----*--------------------→
           |   origin           * apple
           |                      ↖
           |                       \
           |                        \
       -0.5|                         \



- **king** and **queen** have vectors pointing in similar directions → **high cosine similarity**
- **apple** points in a different direction → **low similarity** with **king** or **queen**

### Summary

Cosine similarity focuses on the **direction** of vectors, not their length. So, two words can be far apart numerically but still have similar meanings if they point in similar directions in vector space.
