# 🧠 Word Embeddings in NLP – Introduction & Overview

In this notebook, we dive into **Word Embeddings**, a core concept in Natural Language Processing (NLP) that enables machines to understand the semantic meaning of words by representing them as dense vectors. This lesson marks a transition from traditional word vectorization methods to more advanced, deep learning–based techniques like **Word2Vec**.

---

## ❓ Why Do We Need Word Embeddings?

Traditional word vectorization methods such as **One-Hot Encoding**, **Bag of Words**, and **TF-IDF** are simple and effective but come with significant limitations:

| Method           | Vector Type       | Limitations                                                  |
|------------------|-------------------|---------------------------------------------------------------|
| One-Hot Encoding | Sparse, Binary    | No semantic meaning, very high dimensionality                |
| Bag of Words     | Sparse Frequency  | Ignores word order, no context awareness, large vocab size   |
| TF-IDF           | Weighted Sparse   | No semantic relationships, sparsity remains                  |

### 🔍 Key Issues with These Methods
- Words are treated independently without understanding their meaning
- Vectors are sparse (mostly zeros), leading to inefficient storage and processing
- No relationship captured between similar or related words (e.g., “happy” and “excited”)

### ✅ What Word Embeddings Solve
- Convert words into **dense, real-valued vectors**
- Preserve **semantic similarity** (e.g., similar words have similar vector representations)
- Make downstream machine learning models **more accurate and efficient**
- Eliminate sparsity issues common with traditional vectorization

---

## 📚 What You Will Learn

- What **word embeddings** are and why they’re needed in NLP
- How word meanings are preserved in vector space
- Overview of two types of embedding techniques:
  1. **Frequency-based techniques**
     - One-Hot Encoding
     - Bag of Words
     - TF-IDF
  2. **Deep learning–based embeddings**
     - **Word2Vec**: a more powerful and context-aware approach
- The two architectures within Word2Vec:
  - **CBOW (Continuous Bag of Words)**: Predicts a word from its context
  - **Skip-Gram**: Predicts context words from a single word
- Brief on **pre-trained embeddings** (like Google’s 1.5GB Word2Vec model)

---

## 🧩 Conceptual Analogy: Semantic Proximity Without a Graph

Think of the following relationships between words:

- **"happy"** and **"excited"** → have similar meanings → should have **similar vectors**
- **"happy"** and **"angry"** → opposite meanings → should have **dissimilar vectors**

In a well-trained word embedding model, these semantic relationships are encoded in such a way that **the vector distance reflects meaning** — closer vectors for similar words, farther vectors for unrelated or opposite words.

---

## 🔍 Word2Vec: A Glimpse

| Architecture | Predicts              | Works Best For             | Training Style             |
|--------------|-----------------------|-----------------------------|-----------------------------|
| **CBOW**     | Target word from context | Frequent words, faster training | Averages surrounding words |
| **Skip-Gram**| Context words from target | Rare words, deeper learning | One word to many contexts  |

Word2Vec embeddings are **learned representations** trained on large corpora. They retain meaningful relationships like:
- **king - man + woman ≈ queen**
- **Paris - France + Italy ≈ Rome**

---

## ✅ Conclusion

Word Embeddings revolutionized NLP by transforming words into **dense, context-aware vectors** that preserve both **syntactic** and **semantic** relationships. This overcomes the key drawbacks of traditional vectorization methods like Bag of Words or TF-IDF.

In the next lessons, we will:
- Explore how **Word2Vec** works in detail
- Understand how it solves sparsity and context issues
- Load and use **pre-trained Word2Vec models** to apply these concepts

> 💡 *Every vectorization technique you've learned so far falls under the umbrella of word embeddings. What sets Word2Vec apart is its ability to embed **meaning** into vectors, not just presence or frequency.*

---
