# 🔄 Encoders vs Decoders in the Transformer Architecture

In the original Transformer model (from *“Attention is All You Need”*), the architecture is split into two major parts:

- **Encoder**: Processes and understands the input sequence.
- **Decoder**: Generates the output sequence, one token at a time.

Understanding their roles is crucial before diving into models like BERT, GPT, and T5.

---

## 🧠 What is an Encoder?

The **Encoder** takes the input tokens and converts them into contextualized embeddings. These embeddings carry rich information about each token and its relationship with others.

Each encoder layer consists of:
- Multi-head self-attention
- Feed-forward neural network
- Add & LayerNorm (residual connection + normalization)

✅ **Used in**: BERT, RoBERTa, DistilBERT (Encoder-only models)

---

## ✍️ What is a Decoder?

The **Decoder** takes the encoder’s output and generates the final output tokens (like translated text or the next word in a sentence).

Each decoder layer includes:
- **Masked** multi-head self-attention (only sees past tokens)
- Cross-attention (attends to encoder output)
- Feed-forward network
- Add & LayerNorm

✅ **Used in**: GPT, GPT-2, GPT-3 (Decoder-only models), T5 (Encoder-Decoder model)

---

## 🔍 Key Differences

| Feature | Encoder | Decoder |
|--------|---------|---------|
| Input | Full input sequence | Previously generated tokens |
| Attention | Self-attention | Masked self-attention + cross-attention |
| Use Case | Understanding | Generation |
| Sees Full Input? | Yes | No (only past tokens) |
| Common Models | BERT, RoBERTa | GPT, LLaMA |


---

## 🔧 Real-World Examples

- **BERT** (Encoder-only): Great for tasks like classification, sentence similarity, Q&A.
- **GPT** (Decoder-only): Great for tasks like text generation, story writing, coding.
- **T5 / BART** (Encoder-Decoder): Ideal for translation, summarization, question generation.

---

## 🧭 Summary

- Encoders = Understand input
- Decoders = Generate output
- Encoder-decoder combo = Flexible and powerful for seq-to-seq tasks

---


