# Transformers and Large Language Models (LLMs)

Transformers are deep learning models designed for sequence processing tasks like:

Machine Translation (e.g., Google Translate)

Text Generation (e.g., ChatGPT)

Question Answering (e.g., BERT)

Summarization and Sentiment Analysis

They replace RNNs and LSTMs by using self-attention mechanisms to process entire sequences in parallel instead of step-by-step.

## Key Concepts in Transformers
1. Self-Attention Mechanism

Allows the model to focus on important words in a sentence, regardless of their position.
Example: "The cat sat on the mat." → Focus on "cat" when predicting "sat."

2. Positional Encoding

Adds position information (since transformers process data in parallel and lack inherent order).

3. Multi-Head Attention

Captures different relationships between words simultaneously.

4. Encoder-Decoder Architecture

Encoder: Processes input data (e.g., a sentence in English).
Decoder: Generates output data (e.g., a sentence in French).


In [None]:
%pip install transformers
%pip install datasets

## Popular LLMs (Large Language Models)

GPT (Generative Pre-trained Transformer): Text generation and chatbots.

BERT (Bidirectional Encoder Representations): Question answering, classification.

T5 (Text-to-Text Transfer Transformer): Translation, summarization.

LLama (Meta's LLM): Advanced large-scale text generation.

Claude (Anthropic): Focused on AI safety and conversation.

In [None]:
import torch
from transformers import BertTokenizer, BertForSequenceClassification
from torch.utils.data import DataLoader
from datasets import load_dataset


tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

dataset = load_dataset("imdb")  # IMDB movie reviews
train_texts = dataset['train']['text']
train_labels = dataset['train']['label']

# Tokenize input
inputs = tokenizer(train_texts, padding=True, truncation=True, return_tensors="pt")
labels = torch.tensor(train_labels)

dataset = torch.utils.data.TensorDataset(inputs['input_ids'], inputs['attention_mask'], labels)
train_loader = DataLoader(dataset, batch_size=16, shuffle=True)

optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)
criterion = torch.nn.CrossEntropyLoss()

model.train()
for epoch in range(3):
    for batch in train_loader:
        input_ids, attention_mask, labels = batch
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
    print(f"Epoch {epoch+1}, Loss: {loss.item()}")



#### Evaluate the model

In [None]:
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for batch in train_loader:
        input_ids, attention_mask, labels = batch
        outputs = model(input_ids, attention_mask=attention_mask)
        predictions = torch.argmax(outputs.logits, dim=1)
        correct += (predictions == labels).sum().item()
        total += labels.size(0)

print(f"Accuracy: {100 * correct / total:.2f}%")
