## 🔢 Getting started with Embedding Layer

Before feeding text into machine learning models, we must first **convert words into numerical representations**. One of the simplest approaches is **One-Hot Encoding**.

---

### 🟩 One-Hot Encoding

In One-Hot Encoding, each word is assigned a unique index and represented by a binary vector of length equal to the vocabulary size. Only one position is `1`, and all others are `0`.

Example:  
If `"movie"` is index `345` in a vocabulary of size `10,000`, its one-hot vector looks like:  
`[0, 0, ..., 1 (at 345), ..., 0]`

---

### ⚠️ The Sparsity Problem

One-hot vectors come with key limitations:

1. **High Dimensionality**: A 10,000-word vocabulary leads to 10,000-dimensional vectors.
2. **Sparsity**: Most values are zero, which is inefficient for memory and computation.
3. **No Semantics**: All vectors are orthogonal, so `"great"` and `"terrible"` are equally dissimilar as `"great"` and `"banana"` — there's no notion of meaning or similarity.

---

### 🔍 Feature Representation with Embeddings

To overcome this, we use **Word Embeddings** — compact, dense vectors that capture **semantic similarity**. These vectors represent **features** of words such as sentiment, gender, or domain usage. Unlike one-hot vectors, embeddings learned during training bring semantically related words closer together in vector space.

Example:
- `"great"` → `[0.12, -0.45, 0.67, ..., 0.01]`
- `"good"` → `[0.14, -0.40, 0.65, ..., 0.03]`
- `"bad"` → `[-0.55, 0.72, -0.23, ..., -0.01]`

These embeddings form the **feature representation** of words and are used as input to models like RNNs and LSTMs. The model learns to associate certain **dimensions of the embedding** with task-relevant characteristics (e.g., positivity, royalty, action, etc.).

---

### 🧠 Word2Vec: Learning Embeddings Based on Context

**Word2Vec** is a widely used algorithm that learns word embeddings from large corpora using two main architectures:

- **CBOW (Continuous Bag of Words)**: Predicts a word from surrounding context words.
- **Skip-Gram**: Predicts surrounding words from a target word.

Word2Vec embeddings capture semantic relationships and even analogies:  
`vector("king") - vector("man") + vector("woman") ≈ vector("queen")`

---

### 🧩 Embedding Layer in Deep Learning

Deep learning frameworks (e.g., TensorFlow, Keras) provide an **Embedding layer** to learn word representations as part of the training process. It functions like a **trainable lookup table** that maps each word index to a dense vector.

```python
from tensorflow.keras.layers import Embedding

embedding_layer = Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_len)
