## 🔹 1. Word Embedding 
Definition:
A word embedding is a dense vector representation of words in a continuous vector space, capturing semantic and syntactic similarities. Unlike one-hot encoding, word embeddings allow us to compute similarity and relationships between words.

Popular techniques:

Word2Vec (CBOW, Skip-Gram)

GloVe (Global Vectors)

FastText

BERT embeddings (contextual)



### 🔹 2. Word2Vec
Developed by Google in 2013, Word2Vec converts words into vector form using shallow neural networks. The objective is to predict context from a word or a word from context.

#### It has two architectures:
1. CBOW (Continuous Bag of Words)
2. Skip-Gram

### 📘 CBOW (Continuous Bag of Words) - Theoretical Overview
🔹 Definition
CBOW is a neural network-based architecture introduced as part of the Word2Vec framework. Its primary objective is to predict a target (central) word based on its context words (surrounding words within a predefined window). It treats the context as a bag of words—ignoring word order, hence the name.

### ✅ Advantages of CBOW
| Advantage                          | Explanation                                                           |
| ---------------------------------- | --------------------------------------------------------------------- |
| **Efficiency**                     | Faster training due to fewer parameters and simpler architecture      |
| **Good for Small Datasets**        | Performs well when data is limited or less noisy                      |
| **Deterministic Context**          | Predicting a single target from known context simplifies optimization |
| **Less Computationally Intensive** | Compared to Skip-Gram, CBOW uses less compute power                   |


### ⚠️ Disadvantages of CBOW
| Disadvantage                     | Explanation                                                                                  |
| -------------------------------- | -------------------------------------------------------------------------------------------- |
| **Ignores Word Order**           | The model uses a bag of words approach—semantic meaning from word order is lost              |
| **Not Ideal for Rare Words**     | CBOW averages context, which can dilute signals from infrequent or informative context words |
| **Context Window Sensitivity**   | Performance heavily depends on the chosen window size                                        |
| **Cannot Capture Polysemy Well** | Assigns a single vector to each word, ignoring multiple meanings (e.g., “bank”)              |


### 📘 Skip-Gram – Theoretical Overview

🔹 Definition
Skip-Gram is one of the two primary architectures in the Word2Vec framework (alongside CBOW). Unlike CBOW, which predicts a target word based on its surrounding context, Skip-Gram predicts surrounding context words given a central target word.

### ✅ Advantages of Skip-Gram
| Advantage                           | Description                                                                                            |
| ----------------------------------- | ------------------------------------------------------------------------------------------------------ |
| **Effective for Rare Words**        | Performs well for infrequent words due to independent context predictions                              |
| **Better Semantic Representation**  | Captures syntactic and semantic relationships more accurately than CBOW                                |
| **Flexible Context Modeling**       | Predicts multiple context words, enhancing richness of training signals                                |
| **Scalable with Negative Sampling** | Efficiently trains on large corpora using optimizations like negative sampling or hierarchical softmax |


### ⚠️ Disadvantages of Skip-Gram
| Disadvantage                  | Description                                                                 |
| ----------------------------- | --------------------------------------------------------------------------- |
| **Computationally Intensive** | Predicts multiple context words → more forward and backward passes per word |
| **Longer Training Time**      | Especially with large vocabularies and wide context windows                 |
| **Requires Large Corpus**     | Performs best with large-scale textual data (e.g., Wikipedia, Google News)  |
| **Ignores Word Order**        | Like CBOW, it assumes a bag of context, ignoring sequence position          |

### 🔁 CBOW vs Skip-Gram Summary
| Feature               | CBOW           | Skip-Gram     |
| --------------------- | -------------- | ------------- |
| Input                 | Context words  | Target word   |
| Output                | Target word    | Context words |
| Speed                 | Faster         | Slower        |
| Accuracy (rare words) | Lower          | Higher        |
| Best For              | Small datasets | Large corpora |
