# Understanding Word2Vec – A Deep Dive

Welcome back! Today, we continue our journey in **Natural Language Processing (NLP)** with an in-depth discussion about **Word2Vec** — a powerful word embedding technique based on deep learning.

---

## What is Word2Vec?

- Word2Vec is a **deep learning–based word embedding** technique.
- It converts words into **dense vectors** while maintaining their **semantic meaning**.
- Words with similar meanings get vectors **close to each other**.
- Words that are opposites or unrelated have vectors that are **far apart**.
- Word2Vec helps detect **synonyms**, suggests additional words for partial sentences, and captures meaningful word relationships.

---

## Background and Importance

- Word2Vec was introduced by **Google in 2013**.
- It uses a **neural network** to learn word associations from large text corpora.
- Unlike sparse representations like One-Hot Encoding or TF-IDF, Word2Vec creates **dense vector representations**.
- These vectors encode rich linguistic features allowing machines to understand nuances between words.

---

## How Does Word2Vec Represent Words?

### Vocabulary and Feature Representation

- Vocabulary: The set of **unique words** in a text corpus.
- Each word in the vocabulary is mapped to a **feature vector**.
- The vector dimensions (e.g., 100, 300) represent **abstract latent features** learned during training.
- These features could relate to semantic concepts like **gender, royalty, age, or food categories** — though in large models, these features aren’t explicitly known but inferred.

### Example of Feature Representation (Intuitive)

| Word  | Gender | Royalty | Age  | Food  | ... | Dimension 300 |
|--------|--------|---------|------|-------|-----|----------------|
| Boy    | -1     | 0.01    | 0.03 | 0.00  | ... | Vector of 300 dims |
| Girl   | +1     | 0.02    | 0.01 | 0.00  | ... | Vector of 300 dims |
| King   | -0.92  | 0.95    | 0.75 | 0.00  | ... | Vector of 300 dims |
| Queen  | +0.93  | 0.96    | 0.68 | 0.00  | ... | Vector of 300 dims |
| Apple  | 0.01   | 0.00    | 0.20 | 0.91  | ... | Vector of 300 dims |

- Values close to zero indicate little or no relationship.
- Opposite words have opposing signs on certain features (e.g., Boy vs. Girl in gender).
- Similar words have vectors close in the feature space (e.g., King and Queen in royalty).

---

## Semantic Vector Arithmetic with Word2Vec

A famous example from Google’s Word2Vec research shows how vectors encode meaning:

king - man + woman ≈ queen


- Subtracting "man" vector from "king" removes the male attribute.
- Adding "woman" vector adds the female attribute.
- Resulting vector points close to "queen".

Similarly:

king - boy + queen ≈ girl


This demonstrates the **power of vector arithmetic** in semantic space.

---

## Dimensionality and Vector Space

- Common Word2Vec vectors have **100 to 300 dimensions**.
- These dimensions capture complex linguistic and semantic relationships.
- Example with simplified 2D vectors for visualization:

| Word  | Dim 1 | Dim 2 |
|--------|-------|-------|
| King   | 0.95  | 0.96  |
| Queen  | 0.96  | 0.97  |
| Man    | 0.95  | 0.98  |
| Human  | 0.94  | 0.96  |

Using vector arithmetic, King - Man + Queen results near Human in this 2D space.

---

## Measuring Similarity: Cosine Similarity

- To find how similar two word vectors are, **cosine similarity** is used.
- Cosine similarity measures the **angle between two vectors**:

\[
\text{cosine similarity} = \cos(\theta) = \frac{A \cdot B}{||A|| \times ||B||}
\]

- Similar vectors have angles close to 0°, cosine similarity close to 1.
- Dissimilar vectors have angles close to 90°, cosine similarity close to 0.
- **Distance between vectors = 1 - cosine similarity**.

### Example:

- Angle between vectors = 45°
- Cosine 45° ≈ 0.707
- Distance = 1 - 0.707 = 0.293 → fairly similar words.

If vectors are orthogonal (90°):

- Cosine 90° = 0
- Distance = 1 → completely different words.

---

## Real-World Implications

- Word2Vec vectors enable **recommendation systems** and **semantic search**.
- For example, in a movie recommender system:
  - "Avengers" vector is close to "Iron Man" vector because of shared features like genre (action, comic).
- This analogy helps generalize Word2Vec beyond just words to any domain where features and similarity matter.

---

## What’s Next?

- Upcoming videos will explain:
  - How Word2Vec is **trained from scratch** using neural networks.
  - The **architectures behind Word2Vec**: CBOW and Skip-Gram.
  - How **loss functions and optimizers** are used.
  - Practical implementations with popular datasets and pre-trained models.

---

## Summary

- Word2Vec is a **game-changing NLP technique** converting words into dense vectors encoding rich semantic relationships.
- Each word is represented as a **feature vector** in a high-dimensional space.
- Vector operations reveal meaningful relationships between words.
- Cosine similarity measures closeness of word meanings.
- Understanding Word2Vec lays the foundation for many modern NLP applications.

---

# Keep exploring! 🚀
