# 📜 Self-Supervised Learning (SSL) Breakthroughs

---

## 🔹 NLP Embeddings (Prediction from Context)

**Word2Vec – Mikolov et al. (2013, Google)**  
*"Efficient Estimation of Word Representations in Vector Space."*  
- Learned **dense word embeddings** by predicting context words.  
- Models: **Skip-gram** (predict context from target) and **CBOW** (predict target from context).  

**FastText – Bojanowski et al. (2016, Facebook AI)**  
*"Enriching Word Vectors with Subword Information."*  
- Improved embeddings by incorporating **subword character n-grams**.  
- Handles **rare/compound words** better than Word2Vec.  

---

## 🔹 Masked Modeling (Language & Vision)

**BERT – Devlin et al. (2018, Google AI)**  
*"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding."*  
- **Masked Language Modeling (MLM):** randomly masks tokens, predicts them using bidirectional context.  
- Revolutionized **NLP pretraining + fine-tuning** paradigm.  

**MAE – He et al. (2021, Meta AI)**  
*"Masked Autoencoders Are Scalable Vision Learners."*  
- Extends **masked modeling** to vision.  
- Randomly masks image patches and trains the model to reconstruct them.  
- Efficient large-scale pretraining for vision tasks.  

---

## 🔹 Contrastive Learning

**SimCLR – Chen et al. (2020, Google Brain)**  
*"A Simple Framework for Contrastive Learning of Visual Representations."* (ICML 2020)  
- Maximizes **agreement between augmented views** of the same image.  
- Uses contrastive loss (**InfoNCE**) to learn powerful image embeddings.  

**MoCo – He et al. (2020, Facebook AI)**  
*"Momentum Contrast for Unsupervised Visual Representation Learning."* (CVPR 2020)  
- Introduces a **momentum encoder** + **memory bank**.  
- Scales contrastive learning to large datasets and long training schedules.  

---

## ✅ Summary Families

- **Embeddings:** Word2Vec (2013), FastText (2016).  
- **Masked modeling:** BERT (2018), MAE (2021).  
- **Contrastive learning:** SimCLR (2020), MoCo (2020).  

---

## 👉 Why It Matters

- **NLP:** Self-supervised learning (BERT, GPT-style pretraining) dominates modern language models.  
- **Computer Vision:** Masked autoencoders and contrastive methods now rival or surpass supervised CNNs.  
- **Speech/Audio:** SSL extended to acoustic data (e.g., **Wav2Vec 2.0, 2020**), enabling low-resource ASR.  

🔑 **Takeaway:** SSL shifted AI from **label-hungry supervised training** to **scalable pretraining on unlabeled data**, powering today’s **foundation models**.  


# 📜 Self-Supervised Learning (SSL) Breakthroughs

---

## 📚 Key Milestones

| **Era** | **Model / Concept** | **Year** | **Authors / Org** | **Key Contributions** |
|---------|----------------------|----------|-------------------|-----------------------|
| **NLP Embeddings (Prediction from Context)** | **Word2Vec** | 2013 | Mikolov et al., Google | Skip-gram & CBOW objectives; learned embeddings from raw text. |
| | **FastText** | 2016 | Bojanowski et al., Facebook AI | Word representations as bags of n-grams; improved rare/compound word handling. |
| **Masked Language / Representation Modeling** | **BERT** | 2018 | Devlin et al., Google AI | Masked tokens → deep bidirectional context learning. |
| | **XLNet** | 2019 | Yang et al. | Permutation-based autoregression; alternative to masking. |
| | **MAE (Masked Autoencoders)** | 2021 | He et al., Meta AI | Extended masked modeling to vision; scalable pretraining for images. |
| **Contrastive Learning (Positive vs Negative Pairs)** | **SimCLR** | 2020 | Chen et al., Google Brain | Contrastive loss with augmented image views; new SSL vision benchmarks. |
| | **MoCo (Momentum Contrast)** | 2020 | He et al., Facebook AI | Queue-based memory bank for scalable contrastive learning. |
| | **BYOL (Bootstrap Your Own Latent)** | 2020 | Grill et al., DeepMind | Contrastive-free SSL; stable emergent representations without negatives. |
| **Unified & Multimodal SSL** | **Wav2Vec 2.0** | 2020 | Baevski et al., Facebook AI | SSL breakthrough in speech/audio → low-resource ASR. |
| | **CLIP** | 2021 | Radford et al., OpenAI | Trained on 400M image–text pairs; aligned multimodal embeddings. |
| | **DINO** | 2021 | Caron et al., Facebook AI | Teacher–student SSL for Vision Transformers; no labels required. |

---

## ✅ Summary Families
- **Embeddings:** Word2Vec (2013), FastText (2016).  
- **Masked modeling:** BERT (2018), XLNet (2019), MAE (2021).  
- **Contrastive learning:** SimCLR (2020), MoCo (2020), BYOL (2020).  
- **Multimodal SSL:** Wav2Vec 2.0 (2020), CLIP (2021), DINO (2021).  
