# Word Embeddings in NLP

---

## What Are Word Embeddings?

**Word embeddings** are dense vector representations of words in a continuous space where similar words are close together.

Instead of one-hot encoding like this:

king = [0, 0, 0, 1, 0, 0, ...]


Word embeddings give:

king = [0.26, -1.4, 0.78, ..., 0.11]


These numbers capture **semantic meaning** (like gender, royalty, etc.).

---

## Why Not One-Hot or TF-IDF?

| Feature        | One-Hot / TF-IDF              | Word Embeddings              |
|----------------|-------------------------------|------------------------------|
| Sparse         | ✅ High-dimensional            | ❌ Dense (Compact)           |
| Context-aware  | ❌ Same vector in all context  | ✅ (in contextual embeddings)|
| Semantic Info  | ❌ No relation between words   | ✅ Similar words → similar vectors |
| Memory usage   | High                          | Low                          |

---

## Intuition Behind Word Embeddings

Words that appear in similar contexts tend to have similar meanings.

If "king" and "queen" both appear near "crown", "royal", and "throne", their vectors become similar.

---

## How Word Embeddings Work

Word embeddings are **learned** by teaching a model to predict:

- The word given the context (CBOW)
- The context given the word (Skip-Gram)

The weights in the neural net become the word vectors.

---

## Types of Word Embeddings

| Type         | Examples                  | Context-Aware? | Trainable?       |
|--------------|---------------------------|----------------|------------------|
| Static       | Word2Vec, GloVe, FastText | ❌             | ✅ (unsupervised)|
| Contextual   | BERT, GPT, ELMo           | ✅             | ✅ (self-supervised)|




## What is Word2Vec?

**Word2Vec** is a popular algorithm to create **word embeddings** — numerical vector representations of words — that capture their meaning and relationships.

> Developed by Google (Tomas Mikolov, 2013), Word2Vec uses a simple neural network to learn these vectors from a large corpus of text.

---

## Why Use Word2Vec?

- Traditional methods like One-Hot or TF-IDF don't capture **semantic meaning**.
- Word2Vec vectors place similar words **closer together** in vector space.
- Example:  


![image.png](attachment:image.png)

# NLP Essentials: Part of Speech (POS) Tagging & Named Entity Recognition (NER)

---

## What is Part of Speech (POS) Tagging?

**POS Tagging** is the process of labeling each word in a sentence with its corresponding part of speech — such as noun, verb, adjective, etc.

### Why POS Tagging?

POS tagging is useful for:
- Understanding **sentence structure**
- Improving **syntactic parsing**
- Enhancing **NER**, **question answering**, and **machine translation**
- Assisting in **grammar correction**

---

## Common POS Tags

| Tag | Meaning           | Example            |
|-----|-------------------|--------------------|
| NN  | Noun              | dog, book          |
| VB  | Verb              | run, is            |
| JJ  | Adjective         | beautiful, blue    |
| RB  | Adverb            | quickly, very      |
| PRP | Pronoun           | he, she, it        |
| DT  | Determiner        | a, the, this       |
| IN  | Preposition       | in, on, at         |
| CC  | Conjunction       | and, or, but       |

🔍 **Note:** POS tags may differ slightly across libraries (NLTK, spaCy, etc.)




## What is Named Entity Recognition (NER)?

**Named Entity Recognition (NER)** is a subtask of Natural Language Processing (NLP) that locates and classifies **named entities** in text into predefined categories such as:

- 👤 **Person** – names of people
- 🏢 **Organization** – companies, agencies, institutions
- 🌍 **Location** – countries, cities, landmarks
- 📅 **Date / Time** – specific dates and times
- 💲 **Money** – monetary values
- 🔢 **Quantities / Percentages / Ordinals** – numerical expressions

---

## Why is NER Important?

NER helps:
- Extract **structured information** from unstructured text
- Power **chatbots**, **search engines**, and **QA systems**
- Build **knowledge graphs**
- Enable **semantic search** and **information retrieval**
- Assist in **document classification** and **summarization**

---

## Examples of NER

  Sentence:  
`"Apple was founded by Steve Jobs in California in 1976."`

| Token        | Named Entity |
|--------------|--------------|
| Apple        | ORG          |
| Steve Jobs   | PERSON       |
| California   | GPE          |
| 1976         | DATE         |

---

## Common Entity Types (from spaCy / CoNLL)

| Label     | Description                        | Example               |
|-----------|------------------------------------|------------------------|
| PERSON    | People, fictional characters       | "Elon Musk"           |
| ORG       | Companies, institutions            | "Microsoft"           |
| GPE       | Geopolitical entities              | "France", "Kathmandu" |
| LOC       | Non-GPE locations                  | "Amazon River"        |
| DATE      | Absolute or relative dates         | "Jan 1, 2020"          |
| TIME      | Times                               | "2:30 PM"              |
| MONEY     | Monetary values                    | "$100", "Rs. 1000"     |
| QUANTITY  | Measurements                       | "70 kg", "10 km"       |
| PRODUCT   | Objects, products                  | "iPhone", "Tesla"      |
| EVENT     | Named events                       | "World War II"         |
| LANGUAGE  | Named languages                    | "English", "Hindi"     |

---

## How Does NER Work?

NER can be performed using:
1. **Rule-Based Systems**:
   - Regex patterns, dictionaries
   - High precision, low recall

2. **Statistical Models**:
   - Hidden Markov Models (HMM)
   - Conditional Random Fields (CRF)

3. **Deep Learning Approaches**:
   - BiLSTM + CRF
   - Transformers (BERT, RoBERTa, etc.)


# Recurrent Neural Networks (RNN) 

---

## What is an RNN?

A **Recurrent Neural Network (RNN)** is a type of neural network designed for **sequential data** — where the output depends not only on current input but also on previous inputs. It is widely used in **Natural Language Processing (NLP)**, **Time Series Prediction**, and **Speech Recognition**.

---

## Key Idea

RNNs **"remember"** past information using **recurrent connections**. They process input sequences **one element at a time**, maintaining a hidden state that captures context from previous steps.



# Why RNNs Were Needed: What ANN Couldn't Do

---

## What is ANN?

An **Artificial Neural Network (ANN)** is a feedforward neural network where data flows in one direction — from input to output. It is great for fixed-size input/output tasks like image classification, tabular data, etc.

---

## Limitations of ANN

| Problem | Why ANN Fails |
|--------|----------------|
| **Sequential Data** | ANN cannot handle sequences like text, audio, or time series because it treats all inputs independently. |
| **Contextual Understanding** | ANN lacks memory — it cannot "remember" past inputs to influence current output. |
| **Variable Length Input** | ANN only works with fixed-size inputs and outputs. |
| **Temporal Relationships** | ANN can't learn patterns across time (e.g., "what happened before" has no influence on output). |

---

## How RNN Solved These Problems

| RNN Advantage | Explanation |
|---------------|-------------|
| **Handles Sequences** | RNNs process input **step-by-step**, maintaining a hidden state that carries forward information. |
| **Maintains Memory** | RNNs use **hidden states** to "remember" previous inputs, enabling context-aware predictions. |
| **Works with Variable-Length Input** | RNNs can process sequences of varying lengths (e.g., sentences of different lengths). |
| **Captures Time Dependency** | RNNs are ideal for time series forecasting, where future depends on the past. |
| **Natural Language Understanding** | RNNs outperform ANN in NLP tasks like translation, text generation, and speech recognition. |

---

## ANN vs RNN Summary

| Feature | ANN | RNN |
|--------|-----|-----|
| Sequential Input | ❌ | ✅ |
| Memory | ❌ | ✅ |
| Variable Length Support | ❌ | ✅ |
| Context Awareness | ❌ | ✅ |
| Time-Series/NLP Tasks | ❌ Poor | ✅ Excellent |

---

## Conclusion

> **ANNs** are powerful for **non-sequential**, independent data.
>
> But when the **order, context, and memory** matter — as in text, speech, or time-series — **RNNs** are the right tool for the job.

---


### RNN Architecture
![image.png](attachment:image.png)

### Types of RNN
![image.png](attachment:image.png)

## ❌ Major Problems with RNNs

---

### 1. 🔻 Vanishing Gradient Problem

- When training RNNs using backpropagation through time (BPTT), gradients become **very small** as they are propagated backwards.
- This leads to **weights not updating properly**, and the model **"forgets" long-term dependencies**.

#### 🔬 Why it happens:
- The chain rule multiplies many derivatives < 1.
- Causes the gradient to exponentially shrink.

#### 🔎 Effect:
- Poor performance on long sequences.
- Fails to capture distant context.

---

### 2. 💥 Exploding Gradient Problem

- Opposite of vanishing gradients.
- Gradients become **too large** during backpropagation.
- This causes **unstable training** and large weight updates.

#### 🔬 Why it happens:
- Chain of derivatives > 1 grows exponentially.

#### 🔎 Effect:
- Model output becomes NaN or diverges during training.

---

### 3. 🧳 Short-Term Memory

- RNNs are biased toward **recent inputs**.
- They struggle to remember or learn from **long-term dependencies** (i.e., what happened 50+ steps ago).

---

### 4. 🐢 Sequential Computation is Slow

- RNNs process input **one timestep at a time**.
- This makes them **slow to train and hard to parallelize**.

---

## ✅ Solutions to RNN Problems

---

### 1. 💡 Use LSTM (Long Short-Term Memory)