# Introduction to Transformers

Transformers are a deep learning architecture introduced in the paper *"Attention is All You Need"* (2017). They are the foundation of modern NLP models like **BERT, GPT, and T5**.

Key concepts:
- No recurrence (unlike RNNs/LSTMs).
- Relies entirely on **self-attention**.
- Highly parallelizable → faster training.
- Handles long-range dependencies better.

In this notebook, we’ll cover:
1. What is attention?
2. Transformer architecture.
3. Using Hugging Face `transformers` library for text classification.

## 1. What is Attention?

Attention allows a model to focus on **relevant parts of the input sequence** when making predictions.

Formula:
\[ Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V \]

- **Q (Query)** → what we are looking for.
- **K (Key)** → identifiers for values.
- **V (Value)** → actual information.

Self-attention helps the model understand relationships between words, regardless of their distance.

## 2. Transformer Architecture

A Transformer consists of:
- **Encoder**: Processes the input.
- **Decoder**: Generates the output.

Each block contains:
- Multi-Head Self Attention
- Feedforward Neural Network
- Residual Connections + Layer Normalization

![Transformer Architecture](https://jalammar.github.io/images/t/transformer_architecture.jpg)

## 3. Using Hugging Face Transformers

We’ll load a pretrained model (`distilbert-base-uncased`) for **sentiment analysis**.

In [1]:
!pip install transformers torch --quiet

In [2]:
from transformers import pipeline

# Load sentiment analysis pipeline
classifier = pipeline("sentiment-analysis")

# Test
result = classifier("Transformers are revolutionizing NLP!")[0]
print(result)

### Output Example:
```python
{'label': 'POSITIVE', 'score': 0.9998}
```
This means the model predicts the text is **positive sentiment**.

## Summary
- Transformers use **self-attention** instead of recurrence.
- They power state-of-the-art NLP models (BERT, GPT, T5).
- Hugging Face `transformers` makes it easy to use pretrained models.

Next steps:
- Explore **fine-tuning Transformers** for your own dataset.
- Learn about **BERT embeddings** and **GPT text generation**.