# 📚 Table of Contents

- [🧾 Introduction to Word Embeddings](#introduction-to-word-embeddings)
  - [❓ What are word embeddings, and why are they used?](#what-are-word-embeddings-and-why-are-they-used)
  - [🧠 How do word embeddings capture semantic meaning in text?](#how-do-word-embeddings-capture-semantic-meaning-in-text)
- [🔤 Word2Vec](#word2vec)
  - [📚 Overview of Word2Vec: CBOW vs. Skip-Gram](#overview-of-word2vec-cbow-vs-skip-gram)
  - [🛠️ Training Word2Vec on text data using Gensim](#training-word2vec-on-text-data-using-gensim)
  - [🧭 Example: Visualizing word embeddings with t-SNE](#example-visualizing-word-embeddings-with-t-sne)
- [🌐 GloVe (Global Vectors for Word Representation)](#glove-global-vectors-for-word-representation)
  - [🔍 Difference between Word2Vec and GloVe](#difference-between-word2vec-and-glove)
  - [🧮 GloVe’s matrix factorization approach](#gloves-matrix-factorization-approach)
  - [📦 Using pre-trained GloVe embeddings in NLP tasks](#using-pre-trained-glove-embeddings-in-nlp-tasks)
- [⚡ FastText](#fasttext)
  - [🔡 FastText’s approach to representing words as subword units](#fasttexts-approach-to-representing-words-as-subword-units)
  - [🧳 Handling out-of-vocabulary words with FastText](#handling-out-of-vocabulary-words-with-fasttext)
  - [🧪 Example: Training and using FastText for word vector representation](#example-training-and-using-fasttext-for-word-vector-representation)

---

## ✅ **1. Word2Vec Implementation (Hybrid with Code Tooltip)**

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart LR
    subgraph Word2Vec_Models["Word2Vec Implementation"]
        direction TB
        CBOW[[CBOW Architecture<br/>Predict center word from context]]:::blue
        SkipGram[[Skip-Gram<br/>Predict context from center]]:::orange
        CBOW --> Visualize1[t-SNE Projection]
        SkipGram --> Visualize2[t-SNE Projection]
    end

    RawText[Raw Text Corpus] --> Preprocess[Tokenization & Cleaning]
    Preprocess --> CBOW
    Preprocess --> SkipGram

    classDef blue fill:#e6f3ff,stroke:#0066cc
    classDef orange fill:#ffe6cc,stroke:#ff6600
```

<details>
<summary>🧪 Python Code (CBOW / Skip-Gram)</summary>

```python
from gensim.models import Word2Vec
model_cbow = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=0)  # CBOW
model_sg   = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=1)  # Skip-Gram
```

</details>

---

## ✅ **2. Comparison Matrix (with BERT, Emoji & Context Column)**

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '12px'}}}%%
flowchart TD
    subgraph Comparison["Embedding Model Comparison"]
        direction LR
        header1[Model] --> header2[OOV] --> header3[Subwords] --> header4[Context-Aware] --> header5[Speed]
        row1[Word2Vec] --> cell1[❌] --> cell2[❌] --> cell3[❌] --> cell4[🏎️ Fast]
        row2[GloVe] --> cell5[❌] --> cell6[❌] --> cell7[❌] --> cell8[🏎️ Fast]
        row3[FastText] --> cell9[✅] --> cell10[✅] --> cell11[❌] --> cell12[🚗 Medium]
        row4[BERT] --> cell13[✅] --> cell14[✅] --> cell15[✅] --> cell16[🐢 Slow]
    end
    classDef header fill:#e6f3ff,stroke:#0066cc
    class header1,header2,header3,header4,header5 header
```

---

## ✅ **3. FastText OOV Handling (with Linguistic Examples)**

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart TB
    A[FastText<br/>OOV Recovery in Morphological Languages]:::green
    A --> B["🇹🇷 Turkish: 'evlerinizden' → ev + ler + iniz + den"]
    A --> C["🇫🇮 Finnish: 'taloissani' → talo + issa + ni"]
    A --> D["🇰🇷 Korean: '가방에' → 가방 + 방에"]
    A --> E["Subwords averaged → Word vector"]

    classDef green fill:#e6ffe6,stroke:#009900
    class B,C,D,E green
```

<details>
<summary>🧪 Python Code (FastText)</summary>

```python
from gensim.models import FastText
model = FastText(sentences, vector_size=100, window=5, min_count=1)
```

</details>

---

## ✅ **4. GloVe Mechanics (Cleaned + Objective Function)**

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart TB
    A[Corpus] --> B[Build Co-occurrence Matrix]
    B --> C[Factorize Matrix]
    C --> D[Word Vectors]
    D --> E[Pretrained Vectors] --> F[Fine-tune for Downstream Task]
    C -.->|Loss Function| M[["J = Σ f(X_ij)(wᵢᵀw̃ⱼ + bᵢ + b̃ⱼ − log X_ij)²"]]:::math

    classDef math fill:#f0e6ff,stroke:#6600cc
    class M math
```

<details>
<summary>🧪 Load Pretrained GloVe (Gensim)</summary>

```python
from gensim.scripts.glove2word2vec import glove2word2vec
glove2word2vec("glove.txt", "glove.word2vec.txt")
from gensim.models import KeyedVectors
model = KeyedVectors.load_word2vec_format("glove.word2vec.txt")
```

</details>

---

## ✅ **5. BERT Contextual Embedding (New Bonus Diagram)**

```mermaid
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart LR
    A[Input Sentence] --> B[WordPiece Tokenization]
    B --> C[Transformer Layers]
    C --> D[Contextual Embeddings]
    D --> E[Fine-tune: NER / QA / Sentiment]

    classDef purple fill:#f0e6ff,stroke:#6600cc
    class B,C,D,E purple
```

<details>
<summary>🧪 Python Code (Transformers – BERT Embeddings)</summary>

```python
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

inputs = tokenizer("Paris is the capital of France", return_tensors="pt")
outputs = model(**inputs)
```

</details>

---


# <a id="introduction-to-word-embeddings"></a>🧾 Introduction to Word Embeddings

## <a id="what-are-word-embeddings-and-why-are-they-used"></a>❓ What are word embeddings, and why are they used?

## <a id="how-do-word-embeddings-capture-semantic-meaning-in-text"></a>🧠 How do word embeddings capture semantic meaning in text?

---

# <a id="word2vec"></a>🔤 Word2Vec

## <a id="overview-of-word2vec-cbow-vs-skip-gram"></a>📚 Overview of Word2Vec: CBOW vs. Skip-Gram

## <a id="training-word2vec-on-text-data-using-gensim"></a>🛠️ Training Word2Vec on text data using Gensim

## <a id="example-visualizing-word-embeddings-with-t-sne"></a>🧭 Example: Visualizing word embeddings with t-SNE

---

# <a id="glove-global-vectors-for-word-representation"></a>🌐 GloVe (Global Vectors for Word Representation)

## <a id="difference-between-word2vec-and-glove"></a>🔍 Difference between Word2Vec and GloVe

## <a id="gloves-matrix-factorization-approach"></a>🧮 GloVe’s matrix factorization approach

## <a id="using-pre-trained-glove-embeddings-in-nlp-tasks"></a>📦 Using pre-trained GloVe embeddings in NLP tasks

---

# <a id="fasttext"></a>⚡ FastText

## <a id="fasttexts-approach-to-representing-words-as-subword-units"></a>🔡 FastText’s approach to representing words as subword units

## <a id="handling-out-of-vocabulary-words-with-fasttext"></a>🧳 Handling out-of-vocabulary words with FastText

## <a id="example-training-and-using-fasttext-for-word-vector-representation"></a>🧪 Example: Training and using FastText for word vector representation

---
