# 📖 Breakthrough Papers in Machine Translation

---

## 🟤 Statistical Era (Before Neural MT)

1. **Brown et al. (1993)** – *The Mathematics of Statistical Machine Translation*  
   - IBM Models 1–5 → introduced **probabilistic word alignments**.  
   - Foundation of **phrase-based SMT**.  

2. **Koehn et al. (2003)** – *Statistical Phrase-Based Translation*  
   - Introduced **phrase-based SMT** (Moses toolkit).  
   - Major improvement over word-based IBM models.  

---

## 🔵 Early Neural MT

3. **Kalchbrenner & Blunsom (2013)** – *Recurrent Continuous Translation Models*  
   - First **full neural MT system**.  
   - CNN encoder + RNN decoder, **continuous sentence representations**.  

4. **Sutskever, Vinyals & Le (2014)** – *Sequence to Sequence Learning with Neural Networks*  
   - Introduced **encoder–decoder LSTMs**.  
   - Showed **end-to-end NMT** works better than SMT.  

5. **Bahdanau, Cho & Bengio (2015)** – *Neural Machine Translation by Jointly Learning to Align and Translate*  
   - Introduced the **attention mechanism**.  
   - Solved **fixed bottleneck problem**, robust to long sequences.  

---

## 🟢 Scaling NMT

6. **Luong et al. (2015)** – *Effective Approaches to Attention-based Neural MT*  
   - Refined attention (**global vs local**).  
   - Widely adopted **practical improvements**.  

7. **Jean et al. (2015)** – *On Using Very Large Target Vocabulary for Neural MT*  
   - Introduced **importance sampling** → scalable vocabularies.  

8. **Wu et al. (2016)** – *Google’s Neural Machine Translation System (GNMT)*  
   - Large-scale NMT deployment at **Google Translate**.  
   - 8-layer LSTMs with **residuals + coverage model**.  

---

## 🟣 Transformer Revolution

9. **Vaswani et al. (2017)** – *Attention Is All You Need*  
   - Introduced the **Transformer**.  
   - Fully attention-based, **no recurrence**.  
   - Became the **new backbone** for MT and beyond.  

10. **Ott et al. (2018)** – *Scaling Neural Machine Translation*  
    - Showed **large-batch training + Transformers** outperform GNMT.  
    - Introduced **fairseq** toolkit.  

---

## 🟡 Multilingual & Pretrained MT

11. **Johnson et al. (2017, Google)** – *Zero-Shot Translation with a Multilingual NMT System*  
    - Single model for **many languages**.  
    - Introduced **zero-shot translation**.  

12. **Edunov et al. (2018)** – *Understanding Back-Translation at Scale*  
    - Showed **back-translation** is critical for NMT performance.  

13. **Lample & Conneau (2019)** – *Cross-lingual Language Model Pretraining (XLM)*  
    - Combined **pretraining + NMT** → big BLEU improvements.  

14. **Liu et al. (2020)** – *Multilingual Denoising Pretraining for Neural MT (mBART)*  
    - **Sequence-to-sequence pretraining** with denoising.  
    - Foundation for many **multilingual MT systems**.  

---

## 🔴 Latest Breakthroughs (Large-Scale & Generative)

15. **NLLB Team, Meta AI (2022)** – *No Language Left Behind*  
    - Trained a single multilingual MT system for **200+ languages**.  

16. **OpenAI GPT & ChatGPT (2022–2023)**  
    - Not MT-specific, but showed **LLMs can outperform supervised MT** on many benchmarks via **emergent translation ability**.  

17. **SeamlessM4T (Meta, 2023)** – *Massively Multilingual & Multimodal Translation*  
    - Unified model for **speech + text translation** across **100+ languages**.  

---

## ✅ In Short
- **1990s–2000s:** Statistical models → word-based → phrase-based SMT.  
- **2013–2015:** Neural encoder–decoder + attention.  
- **2017–2020:** Transformers + scaling + multilingual pretraining.  
- **2022–2023:** LLMs + massively multilingual, multimodal models.  