# 📌 **NLP & Generative AI - Topics & Subtopics**  

## 🔹 **1. Introduction to NLP & GenAI**  
- What is NLP? Applications & Use Cases  
- Traditional NLP vs. Deep Learning-based NLP  
- Overview of GenAI in NLP (Chatbots, Summarization, Content Generation)  

---

## 🔹 **2. Text Preprocessing Techniques**  
### **👉 Basic Preprocessing**
- Tokenization (Word, Sentence)  
- Stopword Removal  
- Stemming & Lemmatization  
- Text Normalization (Lowercasing, Punctuation Removal)  

### **👉 Feature Extraction**
- **TF-IDF (Term Frequency - Inverse Document Frequency)**  
- **Bag of Words (BoW)**  
- **Word Embeddings (Word2Vec, GloVe, FastText)**  

---

## 🔹 **3. Deep Learning for NLP**  
### **👉 Recurrent Neural Networks (RNNs) & LSTMs**
- Why RNNs for NLP?  
- Long Short-Term Memory (LSTM) & Gated Recurrent Units (GRUs)  

### **👉 Attention Mechanism & Transformers**
- Problems with RNNs (Long-Term Dependencies)  
- Self-Attention & Multi-Head Attention  

---



## 🔹 **4. Transformer Models (BERT, GPT, T5, etc.)**  
### **👉 BERT (Bidirectional Encoder Representations from Transformers)**
- Understanding BERT’s architecture  
- Pretraining & Fine-tuning BERT for NLP Tasks (Classification, QA, Summarization)  

### **👉 GPT (Generative Pretrained Transformer)**
- GPT-2, GPT-3, GPT-4 Overview  
- How GPT generates text  
- Fine-Tuning GPT for Specific Applications  

### **👉 Other Transformer Models**
- T5 (Text-to-Text Transfer Transformer)  
- DistilBERT (Efficient BERT)  
- LLaMA, Falcon, and Open Source Models  

---



## 🔹 **5. Advanced NLP & GenAI Applications**  
### **👉 NLP in Action**
- Named Entity Recognition (NER)  
- Sentiment Analysis  
- Question Answering  
- Summarization  

### **👉 GenAI for Text Generation**
- Chatbots & Conversational AI  
- AI-powered Content Creation  
- Code Generation (Codex, StarCoder)  

---

## 🔹 **6. Fine-Tuning & Deployment**  
- Fine-Tuning Transformers with Hugging Face  
- Model Deployment (Streamlit, FastAPI, LangChain)  
- Optimizing LLMs for Efficiency  

---


# 📌 **Introduction to NLP & Generative AI**  

## 🔹 **What is NLP?**  
**Natural Language Processing (NLP)** is a field of artificial intelligence (AI) that enables machines to understand, interpret, and generate human language.  

### **💡 Key Tasks in NLP**  
✔ Text Processing (Tokenization, Lemmatization)  
✔ Text Classification (Spam Detection, Sentiment Analysis)  
✔ Machine Translation (Google Translate)  
✔ Speech Recognition (Siri, Google Assistant)  
✔ Named Entity Recognition (NER)  

### **🔍 Example: Text Processing in Python**


In [11]:
import nltk
nltk.data.path.append('/Users/deepanshu/nltk_data')  

In [15]:

import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

#nltk.download('punkt')
#nltk.download('stopwords')

text = "Natural Language Processing enables machines to understand human language!"

# Tokenization
tokens = word_tokenize(text)

# Removing Stopwords
filtered_tokens = [word for word in tokens if word.lower() not in stopwords.words('english')]

print("Original Tokens:", tokens)
print("Filtered Tokens:", filtered_tokens)


Original Tokens: ['Natural', 'Language', 'Processing', 'enables', 'machines', 'to', 'understand', 'human', 'language', '!']
Filtered Tokens: ['Natural', 'Language', 'Processing', 'enables', 'machines', 'understand', 'human', 'language', '!']



---

## 🔹 **Applications & Use Cases of NLP**  
### **📌 Text Classification**  
- Sentiment Analysis (Positive/Negative Reviews)  
- Spam Detection (Email Filters)  

**Example: Sentiment Analysis using NLTK**  


In [17]:
from textblob import TextBlob

text = "I love using ChatGPT! It's amazing."
sentiment = TextBlob(text).sentiment.polarity  # -1 (Negative) to +1 (Positive)

print("Sentiment Score:", sentiment)


Sentiment Score: 0.6125




### **📌 Machine Translation**
- Google Translate  
- DeepL Translator  

**Example: Translate English to French using Google Translator API**  


In [19]:

from deep_translator import GoogleTranslator

translator = GoogleTranslator(source="en", target="fr")
translated_text = translator.translate("Hello, how are you?")
print("Translated Text:", translated_text)


Translated Text: Bonjour comment allez-vous?




### **📌 Named Entity Recognition (NER)**
- Identifying proper names (Person, Location, Organization)  
- Used in search engines, chatbots, and legal document analysis  

**Example: Named Entity Recognition using SpaCy**  


In [24]:
import spacy
from spacy.cli import download

# Manually download and load the model
try:
    nlp = spacy.load("en_core_web_sm")
except OSError:
    print("Downloading model...")
    download("en_core_web_sm")
    nlp = spacy.load("en_core_web_sm")


Downloading model...
Defaulting to user installation because normal site-packages is not writeable
Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.8.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [25]:

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion.")

for ent in doc.ents:
    print(f"{ent.text} -> {ent.label_}")


Apple -> ORG
U.K. -> GPE
$1 billion -> MONEY




---

## 🔹 **Traditional NLP vs. Deep Learning-based NLP**  

| **Aspect**          | **Traditional NLP**               | **Deep Learning-based NLP** |
|---------------------|---------------------------------|-----------------------------|
| **Approach**       | Rule-based, Statistical Models | Neural Networks (LSTMs, Transformers) |
| **Feature Engineering** | Manual (TF-IDF, BoW) | Automatic (Embeddings) |
| **Performance**     | Decent for structured text | Superior for complex tasks |
| **Examples**       | NLTK, SpaCy | BERT, GPT, T5 |

### **💡 Example: Traditional vs. Deep Learning for Text Classification**  
**1️⃣ Traditional (TF-IDF + Naive Bayes)**


In [26]:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB

corpus = ["This is a great movie", "This movie is bad", "I love this film"]
labels = [1, 0, 1]  # 1: Positive, 0: Negative

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)

model = MultinomialNB()
model.fit(X, labels)

test_text = ["I hate this movie"]
X_test = vectorizer.transform(test_text)
prediction = model.predict(X_test)

print("Prediction:", prediction)  # 0 (Negative)


Prediction: [1]




**2️⃣ Deep Learning (BERT Fine-Tuning - Overview, No Code Here)**
- Uses **pre-trained word embeddings**  
- Fine-tunes on a **large dataset** for better accuracy  
- Requires **more computational power**  



---
## 🔹 **Overview of GenAI in NLP**  
**Generative AI (GenAI)** refers to models that can **generate human-like text** based on learned patterns.  

### **🔍 Key GenAI Applications in NLP**  
✔ **Chatbots & Virtual Assistants** (ChatGPT, Google Bard)  
✔ **Text Summarization** (TLDR, AI-powered news apps)  
✔ **Content Generation** (Blog Writing, Code Generation)  

### **💡 Example: Generating Text with GPT-3.5**


In [None]:
import openai

openai.api_key = "your-api-key"

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Write a short poem about AI"}]
)

print(response["choices"][0]["message"]["content"])



---


# 🚀 **Text Preprocessing Techniques in NLP**  

Text preprocessing is the first step in Natural Language Processing (NLP), transforming raw text into a structured format that can be effectively used by machine learning models.  

---

## 🔹 **1. Basic Preprocessing Techniques**  

### **👉 1.1 Tokenization (Word, Sentence)**
Tokenization is the process of splitting a text into individual words (word tokenization) or sentences (sentence tokenization).  

📌 **Why is it important?**  
✔ Helps in analyzing individual words/sentences.  
✔ Essential for NLP tasks like text classification and sentiment analysis.  

🔹 **Example: Word & Sentence Tokenization in Python**


In [28]:

import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

#nltk.download('punkt')

text = "Natural Language Processing enables machines to understand human language. It is widely used in AI applications."

# Sentence Tokenization
sentences = sent_tokenize(text)
print("Sentence Tokenization:", sentences)

# Word Tokenization
words = word_tokenize(text)
print("Word Tokenization:", words)


Sentence Tokenization: ['Natural Language Processing enables machines to understand human language.', 'It is widely used in AI applications.']
Word Tokenization: ['Natural', 'Language', 'Processing', 'enables', 'machines', 'to', 'understand', 'human', 'language', '.', 'It', 'is', 'widely', 'used', 'in', 'AI', 'applications', '.']


[nltk_data] Error loading punkt: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1000)>




---

### **👉 1.2 Stopword Removal**
Stopwords are common words (like *is, the, a, an*) that do not add significant meaning to the text.  

📌 **Why remove stopwords?**  
✔ Reduces the dimensionality of the dataset.  
✔ Improves model efficiency by focusing on meaningful words.  

🔹 **Example: Removing Stopwords in Python**


In [30]:

from nltk.corpus import stopwords

#nltk.download('stopwords')

words = ["this", "is", "a", "sample", "text"]
filtered_words = [word for word in words if word.lower() not in stopwords.words('english')]

print("Filtered Words:", filtered_words)


Filtered Words: ['sample', 'text']




---

### **👉 1.3 Stemming & Lemmatization**
Both techniques reduce words to their root form.  

| **Technique** | **Example** |
|--------------|------------|
| **Stemming** | "running" → "run", "studies" → "studi" |
| **Lemmatization** | "running" → "run", "studies" → "study" |

📌 **Difference:**  
✔ **Stemming** is faster but less accurate (removes suffixes).  
✔ **Lemmatization** considers the meaning of words using a dictionary.  

🔹 **Example: Stemming in Python**


In [31]:

from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
words = ["running", "flies", "studies", "better"]

stemmed_words = [stemmer.stem(word) for word in words]
print("Stemmed Words:", stemmed_words)


Stemmed Words: ['run', 'fli', 'studi', 'better']



🔹 **Example: Lemmatization in Python**


In [33]:

from nltk.stem import WordNetLemmatizer

#nltk.download('wordnet')

lemmatizer = WordNetLemmatizer()
words = ["running", "flies", "studies", "better"]

lemmatized_words = [lemmatizer.lemmatize(word, pos="v") for word in words]
print("Lemmatized Words:", lemmatized_words)


Lemmatized Words: ['run', 'fly', 'study', 'better']




---

### **👉 1.4 Text Normalization**
📌 **What is it?**  
✔ Converts text into a consistent format.  
✔ Removes unnecessary variations in text.  

📌 **Common Techniques:**  
- **Lowercasing** → Converts text to lowercase.  
- **Removing Punctuation** → Removes symbols like “.”, “?”, “!”.  

🔹 **Example: Text Normalization in Python**


In [34]:

import re

text = "Hello WORLD! NLP is AMAZING!!!"
normalized_text = re.sub(r'[^\w\s]', '', text.lower())  # Remove punctuation and lowercase
print("Normalized Text:", normalized_text)


Normalized Text: hello world nlp is amazing




---

## 🔹 **2. Feature Extraction Techniques**
After preprocessing, text needs to be converted into numerical features for machine learning models.

---

### **👉 2.1 TF-IDF (Term Frequency - Inverse Document Frequency)**
📌 **What is TF-IDF?**  
✔ **TF (Term Frequency)** → Measures how often a word appears in a document.  
✔ **IDF (Inverse Document Frequency)** → Gives higher importance to rare words.  

📌 **Why use TF-IDF?**  
✔ Captures word importance in a document.  
✔ Better than simple word counts (Bag of Words).  

🔹 **Example: TF-IDF in Python**


In [35]:

from sklearn.feature_extraction.text import TfidfVectorizer

documents = ["NLP is amazing", "I love NLP", "Machine learning is powerful"]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)

print("TF-IDF Feature Names:", vectorizer.get_feature_names_out())
print("TF-IDF Vectors:\n", X.toarray())


TF-IDF Feature Names: ['amazing' 'is' 'learning' 'love' 'machine' 'nlp' 'powerful']
TF-IDF Vectors:
 [[0.68091856 0.51785612 0.         0.         0.         0.51785612
  0.        ]
 [0.         0.         0.         0.79596054 0.         0.60534851
  0.        ]
 [0.         0.40204024 0.52863461 0.         0.52863461 0.
  0.52863461]]




---

### **👉 2.2 Bag of Words (BoW)**
📌 **What is BoW?**  
✔ Converts text into a vector of word occurrences.  
✔ Represents text numerically but ignores word meaning.  

🔹 **Example: BoW in Python**


In [37]:

from sklearn.feature_extraction.text import CountVectorizer

documents = ["I love NLP", "NLP is fun", "AI is the future"]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(documents)

print("BoW Feature Names:", vectorizer.get_feature_names_out())
print("BoW Vectors:\n", X.toarray())


BoW Feature Names: ['ai' 'fun' 'future' 'is' 'love' 'nlp' 'the']
BoW Vectors:
 [[0 0 0 0 1 1 0]
 [0 1 0 1 0 1 0]
 [1 0 1 1 0 0 1]]




---

### **👉 2.3 Word Embeddings (Word2Vec, GloVe, FastText)**
📌 **What are Word Embeddings?**  
✔ Represents words as dense vectors in a high-dimensional space.  
✔ Captures word meaning & relationships.  

#### **1️⃣ Word2Vec (Google)**
- Uses **CBOW (Continuous Bag of Words)** or **Skip-gram** to learn word representations.  
- Example: **"King - Man + Woman = Queen"**  

🔹 **Example: Train Word2Vec using `gensim`**


In [1]:

from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize

sentences = [
    word_tokenize("I love natural language processing"),
    word_tokenize("Machine learning is powerful"),
    word_tokenize("Deep learning enables AI breakthroughs")
]

model = Word2Vec(sentences, vector_size=50, min_count=1, workers=4)

print("Word Vector for 'learning':", model.wv['learning'])


Word Vector for 'learning': [-1.0724545e-03  4.7286271e-04  1.0206699e-02  1.8018546e-02
 -1.8605899e-02 -1.4233618e-02  1.2917745e-02  1.7945977e-02
 -1.0030856e-02 -7.5267432e-03  1.4761009e-02 -3.0669428e-03
 -9.0732267e-03  1.3108104e-02 -9.7203208e-03 -3.6320353e-03
  5.7531595e-03  1.9837476e-03 -1.6570430e-02 -1.8897636e-02
  1.4623532e-02  1.0140524e-02  1.3515387e-02  1.5257311e-03
  1.2701781e-02 -6.8107317e-03 -1.8928028e-03  1.1537147e-02
 -1.5043275e-02 -7.8722071e-03 -1.5023164e-02 -1.8600845e-03
  1.9076237e-02 -1.4638334e-02 -4.6675373e-03 -3.8754821e-03
  1.6154874e-02 -1.1861792e-02  9.0324880e-05 -9.5074680e-03
 -1.9207101e-02  1.0014586e-02 -1.7519170e-02 -8.7836506e-03
 -7.0199967e-05 -5.9236289e-04 -1.5322480e-02  1.9229487e-02
  9.9641159e-03  1.8466286e-02]




---

#### **2️⃣ GloVe (Global Vectors - Stanford)**
- Learns word embeddings based on word co-occurrence.  
- Pre-trained embeddings available (50D, 100D, 200D).  

🔹 **Example: Load Pre-trained GloVe Embeddings**


In [4]:
import gensim.downloader as api
import os

gensim_data_path = os.path.expanduser("~/gensim-data")
os.makedirs(gensim_data_path, exist_ok=True)
api.BASE_DIR = gensim_data_path


In [5]:

import gensim.downloader as api

glove_model = api.load("glove-wiki-gigaword-50")  # Load 50D GloVe embeddings
print("Vector for 'apple':", glove_model['apple'])


ValueError: unable to read local cache '/Users/deepanshu/gensim-data/information.json' during fallback, connect to the Internet and retry



---

#### **3️⃣ FastText (Facebook)**
- Like Word2Vec but captures subword information.  
- Better for handling rare words or misspellings.  

🔹 **Example: Load Pre-trained FastText Embeddings**


In [None]:

fasttext_model = api.load("fasttext-wiki-news-subwords-300")
print("Vector for 'machine':", fasttext_model['machine'])




---

# ✅ **Summary**
| **Technique** | **Purpose** | **Example Output** |
|--------------|------------|---------------------|
| **Tokenization** | Splits text into words/sentences | ["NLP", "is", "fun"] |
| **Stopword Removal** | Removes common words | ["NLP", "fun"] |
| **Stemming** | Converts words to root form | ["run", "studi"] |
| **Lemmatization** | Converts words to base form | ["run", "study"] |
| **TF-IDF** | Assigns importance to words | `[0.5, 0.8, 0.3]` |
| **BoW** | Word frequency count | `[1, 2, 0]` |
| **Word Embeddings** | Context-aware word representation | `[0.25, -0.08, 0.67]` |

---


# 🚀 **Deep Learning for NLP**  

Deep Learning has revolutionized NLP by allowing models to capture complex patterns in text data. Before transformers, **Recurrent Neural Networks (RNNs)** and their variants like **Long Short-Term Memory (LSTM)** and **Gated Recurrent Units (GRU)** were the backbone of deep NLP models. However, these architectures struggled with long-range dependencies, which led to the development of the **Attention Mechanism** and ultimately **Transformers (BERT, GPT, etc.)**.

---

## **🔹 1. Recurrent Neural Networks (RNNs) & LSTMs**

### **1.1 Why RNNs for NLP?**
- Traditional feedforward neural networks **do not handle sequential data well** because they process inputs independently.  
- **Recurrent Neural Networks (RNNs)** were designed to handle sequential data, making them ideal for NLP tasks like text classification, machine translation, and speech recognition.  

### **1.2 Understanding RNNs**
An **RNN** processes a sequence of words step-by-step while **maintaining a hidden state** that captures past information.

#### **Mathematical Formulation:**
At each time step $ t $:
$$
h_t = \tanh(W_h h_{t-1} + W_x x_t + b)
$$
where:  
- $ h_t $ = hidden state at time step $ t $  
- $ x_t $ = input at time step $ t $ (word embedding)  
- $ W_h, W_x, b $ = learnable weights  

However, RNNs struggle with **long-term dependencies** because of the **vanishing gradient problem**, meaning they forget past information as sequences get longer.

---



### **1.3 Long Short-Term Memory (LSTM)**
LSTMs solve the **vanishing gradient problem** by introducing a **cell state** that selectively remembers and forgets information.  

#### **LSTM Architecture**
An LSTM consists of three key gates:  
1. **Forget Gate** $ f_t $ – Decides what to forget  
   $$
   f_t = \sigma(W_f [h_{t-1}, x_t] + b_f)
   $$
2. **Input Gate** $ i_t $ – Decides what new information to store  
   $$
   i_t = \sigma(W_i [h_{t-1}, x_t] + b_i)
   $$
3. **Output Gate** $ o_t $ – Decides what to output  
   $$
   o_t = \sigma(W_o [h_{t-1}, x_t] + b_o)
   $$
4. **Cell State Update**  
   $$
   C_t = f_t * C_{t-1} + i_t * \tilde{C_t}
   $$
   where $ \tilde{C_t} $ is the candidate cell state.

#### **🔹 Python Implementation of LSTM for NLP**


In [None]:

import torch
import torch.nn as nn

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(LSTMModel, self).__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.lstm(x)
        out = self.fc(out[:, -1, :])  # Get the last time step output
        return out

# Example: LSTM for text classification
lstm_model = LSTMModel(input_size=50, hidden_size=128, output_size=2)  # Binary classification
print(lstm_model)



---

### **1.4 Gated Recurrent Units (GRUs)**
- GRUs are a simplified version of LSTMs with fewer parameters.  
- They **combine the forget and input gates** into a single **update gate**.  
- GRUs perform similarly to LSTMs but are computationally faster.  

#### **GRU Update Equations**
$$
z_t = \sigma(W_z [h_{t-1}, x_t])
$$
$$
r_t = \sigma(W_r [h_{t-1}, x_t])
$$
$$
\tilde{h_t} = \tanh(W_h [r_t * h_{t-1}, x_t])
$$
$$
h_t = (1 - z_t) * h_{t-1} + z_t * \tilde{h_t}
$$

---

## **🔹 2. Attention Mechanism & Transformers**
Although LSTMs and GRUs improved memory retention, they still **process words sequentially**, which limits parallelization. **Attention Mechanisms** solve this issue.

### **2.1 Problems with RNNs**
- **Long-Term Dependencies**: Even LSTMs struggle with extremely long sequences.  
- **Sequential Processing**: Cannot process words in parallel, slowing training.  

### **2.2 Introduction to Attention**
The **attention mechanism** allows models to focus on the most relevant words in a sentence rather than treating all words equally.

#### **🔹 Self-Attention Mechanism**
Self-Attention computes relationships between words in a sentence.  

Given an input sequence of words, we define:
1. **Query (Q)** – What are we looking for?  
2. **Key (K)** – What do we have?  
3. **Value (V)** – What information do we use?  

$$
\text{Attention(Q, K, V)} = \text{softmax} \left( \frac{QK^T}{\sqrt{d_k}} \right) V
$$
where $ d_k $ is the scaling factor.

### **2.3 Multi-Head Attention**
Instead of computing a single attention score, we compute multiple **attention heads** in parallel, capturing different types of relationships.

$$
\text{MultiHead(Q, K, V)} = \text{Concat}(\text{head}_1, ..., \text{head}_h) W^O
$$

### **2.4 Transformer Architecture**
Transformers (BERT, GPT) replace RNNs with attention layers.

#### **🔹 Transformer Encoder (BERT)**
- **Multi-Head Attention** → Focuses on relevant words  
- **Feedforward Network** → Learns representations  
- **Layer Normalization** → Stabilizes learning  

#### **🔹 Transformer Decoder (GPT)**
- Uses **causal attention** (cannot see future words)  
- Generates text token by token  

---

### **🔹 Python Implementation of Self-Attention**


In [None]:

import torch
import torch.nn as nn

class SelfAttention(nn.Module):
    def __init__(self, embed_size, heads):
        super(SelfAttention, self).__init__()
        self.embed_size = embed_size
        self.heads = heads
        self.head_dim = embed_size // heads

        self.values = nn.Linear(embed_size, embed_size, bias=False)
        self.keys = nn.Linear(embed_size, embed_size, bias=False)
        self.queries = nn.Linear(embed_size, embed_size, bias=False)
        self.fc_out = nn.Linear(embed_size, embed_size)

    def forward(self, x):
        N, seq_length, embed_size = x.shape
        values = self.values(x)
        keys = self.keys(x)
        queries = self.queries(x)

        attention = torch.softmax(torch.matmul(queries, keys.transpose(-2, -1)) / self.head_dim**0.5, dim=-1)
        out = torch.matmul(attention, values)
        out = self.fc_out(out)

        return out

# Example usage
embed_size = 128  # Size of word embeddings
heads = 8  # Multi-head attention
self_attention = SelfAttention(embed_size, heads)
print(self_attention)




---

## **✅ Summary of What We Covered**
✔ **RNNs, LSTMs, GRUs** → Sequential models for NLP  
✔ **Attention Mechanism** → Allows models to focus on relevant words  
✔ **Self-Attention & Multi-Head Attention** → Core of transformers  
✔ **Transformers (BERT, GPT) Overview** → Modern NLP models  

---


# 🚀 **Deep Dive into Transformer Models**  

Transformers have revolutionized NLP by enabling powerful models like **BERT, GPT, T5, and others**. Unlike traditional sequence models (RNNs, LSTMs), transformers leverage **self-attention** to process input **in parallel**, leading to **faster and more accurate text understanding and generation**.

---

# **🔹 1. Introduction to Transformers**  

### **1.1 Why Transformers?**
Traditional models (RNNs, LSTMs) process text sequentially, leading to:
- **Long-term dependency issues** (forgetting earlier words in long texts).  
- **Slow training** due to sequential computations.  

**Transformers solve these problems** by using:  
✅ **Self-Attention** (allows the model to focus on important words)  
✅ **Parallel Processing** (faster than RNNs)  
✅ **Positional Encoding** (captures word order)  

### **1.2 Transformer Architecture**
A **transformer** consists of an **encoder** and a **decoder**:  
- **Encoder (Used in BERT)** → Processes input text into representations.  
- **Decoder (Used in GPT)** → Generates text token by token.  

Each block in a transformer contains:  
✔ **Multi-Head Self-Attention** – Helps the model focus on important words.  
✔ **Feedforward Network** – Adds non-linearity for better learning.  
✔ **Layer Normalization & Residual Connections** – Stabilizes training.  

---



# **🔹 2. BERT (Bidirectional Encoder Representations from Transformers)**  

BERT is **an encoder-only transformer model** that **understands text bidirectionally**. It was introduced by Google in 2018 and became the foundation of many NLP applications.

### **2.1 Understanding BERT’s Architecture**
- **BERT uses only the encoder** part of the transformer.  
- It **reads text in both directions (left to right & right to left)**.  
- Uses **self-attention** to capture relationships between words.  

**Example**:  
➡️ In the sentence:  
  _"The **bank** of the river was beautiful."_  
➡️ A **bidirectional model** understands that "bank" means "riverbank" and not "financial institution."

### **2.2 Pretraining BERT**
BERT is **pretrained** on a large corpus using two tasks:  
✔ **Masked Language Modeling (MLM)** – Randomly masks words and predicts them.  
✔ **Next Sentence Prediction (NSP)** – Predicts if sentence B follows sentence A.

#### **🔹 BERT Pretraining Code in Python**


In [None]:

from transformers import BertTokenizer, BertForMaskedLM
import torch

# Load pre-trained BERT tokenizer & model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForMaskedLM.from_pretrained("bert-base-uncased")

# Example sentence with masked word
sentence = "The capital of France is [MASK]."
tokens = tokenizer(sentence, return_tensors="pt")

# Predict the masked word
with torch.no_grad():
    outputs = model(**tokens)
    predictions = outputs.logits

# Decode the predicted word
predicted_token = torch.argmax(predictions[0, 5]).item()
predicted_word = tokenizer.decode([predicted_token])
print(f"Predicted word: {predicted_word}")  # Expected output: Paris




### **2.3 Fine-Tuning BERT for NLP Tasks**
BERT can be fine-tuned for:
✅ **Text Classification (Sentiment Analysis, Spam Detection)**  
✅ **Question Answering (QA Models like SQuAD)**  
✅ **Named Entity Recognition (NER)**  
✅ **Summarization, Translation**  

#### **🔹 Fine-Tuning BERT for Sentiment Analysis**


In [None]:

from transformers import BertTokenizer, BertForSequenceClassification
from torch.utils.data import DataLoader, Dataset

# Load pre-trained BERT model for classification
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

# Example sentences
texts = ["I love this movie!", "This product is terrible."]

# Tokenize and convert to tensor
tokens = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

# Predict sentiment
with torch.no_grad():
    outputs = model(**tokens)
    predictions = torch.argmax(outputs.logits, dim=1)

print(predictions)  # 1 for positive, 0 for negative



---

# **🔹 3. GPT (Generative Pretrained Transformer)**  

### **3.1 Overview of GPT-2, GPT-3, and GPT-4**
- **GPT is a decoder-only transformer model** designed for text generation.  
- It **reads text left to right** (unidirectional).  
- **Larger models (GPT-3, GPT-4)** improve text coherence and reasoning.

| Model | Parameters | Key Feature |
|--------|-----------|-------------|
| **GPT-2** | 1.5B | Coherent text generation |
| **GPT-3** | 175B | Few-shot learning |
| **GPT-4** | >1T? | Multimodal, reasoning |

### **3.2 How GPT Generates Text**
GPT generates text **token by token**, predicting the **next most probable word**.  
Example:  
✅ Input: "Once upon a time"  
✅ GPT predicts: "there was a king who ruled a vast kingdom."

#### **🔹 GPT-2 Text Generation in Python**


In [None]:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load GPT-2 model & tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

# Input text
input_text = "Once upon a time"
input_tokens = tokenizer.encode(input_text, return_tensors="pt")

# Generate text
output = model.generate(input_tokens, max_length=50, temperature=0.7)
print(tokenizer.decode(output[0], skip_special_tokens=True))



### **3.3 Fine-Tuning GPT for Specific Tasks**
GPT can be fine-tuned for:
✅ **Chatbots (Customer Support, AI Assistants)**  
✅ **Creative Writing (Story Generation, Code Generation)**  
✅ **Financial Reports, Medical Summarization**  

---

# **🔹 4. Other Transformer Models**

### **4.1 T5 (Text-to-Text Transfer Transformer)**
- Unlike BERT/GPT, **T5 treats every NLP task as text generation**.
- **Example Tasks:**
  - Summarization: **"Summarize: The book is about..."**  
  - Translation: **"Translate to French: Hello!"**  

#### **🔹 T5 Summarization Example**


In [None]:

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

text = "Summarize: This article explains transformers in deep learning."
tokens = tokenizer.encode(text, return_tensors="pt")
summary = model.generate(tokens)
print(tokenizer.decode(summary[0], skip_special_tokens=True))




---

### **4.2 DistilBERT (Efficient BERT)**
- A **lighter version of BERT** that retains 97% accuracy but is **60% smaller**.
- Used in mobile and real-time applications.

---

### **4.3 Open-Source Transformer Models**
✔ **LLaMA (Meta AI’s model)** – Open-source GPT alternative.  
✔ **Falcon (Hugging Face Model)** – Efficient and optimized for inference.  
✔ **Mistral, Bloom** – Used for research and enterprise applications.  

---

# **✅ Summary of What We Covered**
✔ **Transformer Basics** → Self-Attention, Multi-Head Attention  
✔ **BERT** → Bidirectional, Used for Understanding Text (NER, QA, Classification)  
✔ **GPT** → Unidirectional, Used for Text Generation (Chatbots, Summarization)  
✔ **T5** → Text-to-Text Model for Translation, Summarization  
✔ **DistilBERT, LLaMA, Falcon** → Optimized transformer models  

---


# 🚀 **Advanced NLP & GenAI Applications**  

After covering the foundations of NLP and transformers, let's now **explore real-world applications** like **NER, Sentiment Analysis, Question Answering, Summarization, Chatbots, AI Content Generation, and Deployment**.  

---

# **🔹 1. NLP in Action (Real-World Applications)**  

## **👉 1.1. Named Entity Recognition (NER)**  
**NER extracts entities like names, dates, locations, and organizations from text.**  
It is widely used in:
- **Finance** (Extracting company names from news)
- **Healthcare** (Identifying medical terms in documents)
- **Legal Industry** (Extracting case details)

### **📝 Example: Extracting Named Entities Using SpaCy**


In [6]:

import spacy

# Load pre-trained NLP model
nlp = spacy.load("en_core_web_sm")

text = "Elon Musk, the CEO of Tesla, visited Berlin on March 5, 2023."

# Process the text
doc = nlp(text)

# Print Named Entities
for ent in doc.ents:
    print(f"{ent.text} - {ent.label_}")

# Expected Output:
# Elon Musk - PERSON
# Tesla - ORG
# Berlin - GPE
# March 5, 2023 - DATE


Elon Musk - PERSON
Tesla - ORG
Berlin - GPE
March 5, 2023 - DATE




---

## **👉 1.2 Sentiment Analysis**  
Sentiment Analysis determines whether a text is **positive, negative, or neutral**.  
It is widely used in:
- **Social Media Monitoring** (Twitter sentiment)
- **Customer Reviews Analysis**
- **Stock Market Predictions** (Based on news sentiment)

### **📝 Sentiment Analysis Using Transformers**


In [7]:

from transformers import pipeline

# Load sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis")

text = "I love the new Tesla model, it's amazing!"
result = sentiment_pipeline(text)
print(result)

# Expected Output: [{'label': 'POSITIVE', 'score': 0.99}]


ModuleNotFoundError: No module named 'transformers'



---

## **👉 1.3 Question Answering (QA)**  
QA models extract answers from a given passage.  
Use Cases:
- **Customer Support Chatbots**
- **Document Search**
- **AI Assistants like ChatGPT, Siri, Alexa**

### **📝 Example: Question Answering with BERT**


In [None]:

from transformers import pipeline

# Load the QA model
qa_pipeline = pipeline("question-answering")

context = "Elon Musk is the CEO of Tesla and SpaceX."
question = "Who is the CEO of Tesla?"

# Get the answer
answer = qa_pipeline(question=question, context=context)
print(answer["answer"])

# Expected Output: "Elon Musk"




---

## **👉 1.4 Text Summarization**  
Summarization reduces long text into concise summaries.  
Use Cases:
- **News Summarization**
- **Legal & Financial Document Summaries**
- **YouTube Video Transcripts Summarization**

### **📝 Example: Summarization with T5**


In [None]:

from transformers import pipeline

# Load summarization model
summarizer = pipeline("summarization")

text = "Transformers have revolutionized NLP by enabling state-of-the-art performance on various tasks. They use self-attention mechanisms to process text efficiently."

# Summarize the text
summary = summarizer(text, max_length=50, min_length=10, do_sample=False)
print(summary[0]["summary_text"])

# Expected Output: "Transformers revolutionized NLP using self-attention."




---

# **🔹 2. GenAI for Text Generation**  

## **👉 2.1 Chatbots & Conversational AI**  
AI-powered chatbots use **LLMs (Large Language Models)** like **GPT-4** to interact with users.  
Use Cases:
- **Customer Support Bots**
- **Healthcare Assistants**
- **Personal AI Assistants**

### **📝 Example: Chatbot using OpenAI API**


In [None]:

import openai

# Set your API key (Replace 'your-api-key' with actual OpenAI API key)
openai.api_key = "your-api-key"

def chatbot(prompt):
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    return response["choices"][0]["message"]["content"]

# Example Chatbot Conversation
print(chatbot("How does a transformer model work?"))




---

## **👉 2.2 AI-Powered Content Generation**  
Generative AI can write **blog posts, marketing content, and social media captions**.  
Use Cases:
- **Automated Content Writing**
- **Email Drafting**
- **Creative Writing (Stories, Poetry)**

### **📝 Example: AI-generated Blog Post Using GPT**


In [None]:

prompt = "Write a short blog post about the future of AI in finance."

print(chatbot(prompt))



---

## **👉 2.3 Code Generation (Codex, StarCoder, Code Llama)**  
AI can generate **code snippets, fix bugs, and suggest optimizations**.  
Use Cases:
- **Automated Code Completion** (GitHub Copilot)
- **AI-assisted Debugging**
- **Code Translation (Python → Java, etc.)**

### **📝 Example: Code Generation Using OpenAI Codex**


In [None]:

prompt = "Write a Python function to check if a number is prime."

print(chatbot(prompt))




---

# **🔹 3. Fine-Tuning & Deployment**  

## **👉 3.1 Fine-Tuning Transformers with Hugging Face**  
Fine-tuning allows us to train transformer models on **custom datasets** for specific use cases.

### **📝 Fine-Tuning BERT for Custom Text Classification**


In [None]:

from transformers import BertForSequenceClassification, Trainer, TrainingArguments

# Load pre-trained BERT model
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8
)

# Define trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,  # Custom dataset
    eval_dataset=eval_dataset     # Validation dataset
)

# Train the model
trainer.train()




---

## **👉 3.2 Deploying AI Models (Streamlit, FastAPI, LangChain)**  

After fine-tuning a model, we need to deploy it for real-world use.  
**Deployment Options:**
✅ **Streamlit** – Build an interactive UI for NLP applications.  
✅ **FastAPI** – Deploy models as REST APIs.  
✅ **LangChain** – Create AI-powered applications using LLMs.

### **📝 Example: Deploying a Chatbot Using Streamlit**


In [None]:

import streamlit as st
import openai

# Set API key
openai.api_key = "your-api-key"

st.title("AI Chatbot")

user_input = st.text_input("Ask a question:")

if st.button("Send"):
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": user_input}]
    )
    st.write(response["choices"][0]["message"]["content"])




---

## **👉 3.3 Optimizing LLMs for Efficiency**  
LLMs can be **memory-intensive** and **expensive to run**.  
✅ **Quantization** (Reduces model size)  
✅ **Distillation** (Uses smaller models like DistilBERT)  
✅ **Model Caching** (Speeds up inference)

---

# **✅ Summary of What We Covered**
✔ **NER, Sentiment Analysis, QA, Summarization**  
✔ **Chatbots, Content Creation, Code Generation**  
✔ **Fine-Tuning Transformers for Custom Tasks**  
✔ **Model Deployment with Streamlit, FastAPI, LangChain**  
✔ **Optimizing Large Language Models (LLMs)**  

---

---

Here's a structured **flowchart of NLP & GenAI evolution**, starting from traditional text processing to modern Large Language Models (LLMs).  

---

## **🌍 Evolution of NLP & GenAI: From Basics to Cutting-Edge AI**  

### **1️⃣ Traditional NLP (Rule-Based & Classical ML Approaches)**
📌 **Early NLP focused on rule-based methods and statistical techniques.**  

- **Text Preprocessing** 🛠️  
  - Tokenization, Stopword Removal, Lemmatization  
  - TF-IDF, Bag of Words (BoW), N-grams  

- **Machine Learning for NLP** 🤖  
  - Logistic Regression, Naive Bayes for Text Classification  
  - Hidden Markov Models (HMM) & Conditional Random Fields (CRF) for NER  
  - Latent Dirichlet Allocation (LDA) for Topic Modeling  

📢 **Limitations**  
❌ Feature engineering required  
❌ Poor handling of complex grammar & context  

---



### **2️⃣ Deep Learning for NLP (2010s - Rise of Neural Networks)**
📌 **DL-based NLP improved text understanding using neural networks.**  

- **Word Embeddings 🌍** (Understanding word meanings)  
  - Word2Vec, GloVe, FastText  

- **Recurrent Neural Networks (RNNs) & Variants 🔁**  
  - Vanilla RNNs (Context-dependent text processing)  
  - Long Short-Term Memory (LSTM) & Gated Recurrent Units (GRU) (Long-term memory)  

- **CNNs for Text Processing 📜**  
  - Text classification, Sentence embeddings  

📢 **Limitations**  
❌ RNNs struggle with long-range dependencies  
❌ Sequential processing is slow  

---



### **3️⃣ Attention Mechanism & Transformers (Breakthrough in NLP)**
📌 **Transformers introduced parallel processing and self-attention.**  

- **Self-Attention Mechanism** (Key innovation 🏆)  
  - Helps model **relationships between all words** in a sentence  
  - Improves context understanding  

- **Transformer Model (Vaswani et al., 2017) ⚡**  
  - Fully parallelizable → Faster training  
  - Scales well for long sequences  

📢 **Transformers replaced RNNs & LSTMs in NLP** 🎉  

---



### **4️⃣ Transformer-Based Models (2020s - Rise of LLMs)**
📌 **State-of-the-art NLP powered by massive LLMs.**  

- **BERT (Bidirectional Encoder Representations from Transformers) 🔄**  
  - Context-aware embeddings (understands both left & right context)  
  - Used for classification, sentiment analysis, question answering  

- **GPT (Generative Pretrained Transformer) 🔥**  
  - **GPT-2, GPT-3, GPT-4** → Autoregressive text generation  
  - Chatbots, Storytelling, Code Generation  

- **T5 (Text-to-Text Transfer Transformer) 📖**  
  - Converts all NLP tasks into text generation problems  

- **DistilBERT, ALBERT (Optimized Transformers) 🚀**  
  - Smaller, faster versions of BERT for deployment  

📢 **Transformers dominate modern NLP applications**  



---

### **5️⃣ Generative AI & LLM Applications (Today & Beyond)**
📌 **LLMs revolutionized AI applications beyond NLP.**  

- **Conversational AI & Chatbots 🤖**  
  - OpenAI's ChatGPT, Google's Gemini, Meta’s LLaMA  
  - AI-powered personal assistants, enterprise chatbots  

- **AI-Powered Content Creation 📝**  
  - Blog writing, Social media content, AI-assisted coding (Codex, StarCoder)  

- **Multimodal Models (Text, Image, Video, Audio) 🎥**  
  - GPT-4o, Gemini 1.5, LLaMA 3 → Vision-Language models  

📢 **Next frontier**: **AGI (Artificial General Intelligence)**? 🚀  

---

## **🗺️ Complete Flowchart: Evolution of NLP & GenAI**


```plaintext
1️⃣ Traditional NLP (Rule-Based & ML)  
   ├── Text Preprocessing (Tokenization, Stopwords, TF-IDF)  
   ├── Machine Learning for NLP (Naive Bayes, SVM, CRF)  
   └── Topic Modeling (LDA)  

2️⃣ Deep Learning for NLP  
   ├── Word Embeddings (Word2Vec, GloVe, FastText)  
   ├── RNNs & LSTMs (Context Learning)  
   ├── CNNs for NLP (Text Classification)  
   └── Seq2Seq Models (Early Translation & Chatbots)  

3️⃣ Attention & Transformers  
   ├── Self-Attention Mechanism  
   ├── Transformer Architecture (Vaswani et al.)  
   └── Faster & More Scalable NLP  

4️⃣ Transformer-Based Models  
   ├── **BERT** (Contextual Representation)  
   ├── **GPT** (Autoregressive Text Generation)  
   ├── **T5** (Text-to-Text Learning)  
   ├── **DistilBERT & ALBERT** (Efficient Transformers)  
   └── **Multilingual NLP Models**  

5️⃣ Generative AI (LLMs)  
   ├── Conversational AI (ChatGPT, Bard, LLaMA)  
   ├── AI Content Generation (Blogs, Code, Marketing)  
   ├── Multimodal AI (Text + Image + Video)  
   └── Future Trends (AGI & Beyond)  
```



---

## **🌟 Summary**
✅ **Traditional NLP** → Rule-based & Statistical ML  
✅ **Deep Learning for NLP** → RNNs, LSTMs, CNNs  
✅ **Transformers** → Self-Attention, BERT, GPT  
✅ **Generative AI** → Chatbots, LLMs, Multimodal AI  

🚀 **Future Focus**: AGI, Multimodal Models, Optimized LLMs for Edge Devices  

---