#  Comprehensive List of **Default Hugging Face Models** Across All Specializations

Below is a fully structured, clean, and mathematically formatted Markdown version of the complete taxonomy of **default / canonical Hugging Face models** across all tasks and domains.

---

# **1. Foundation Language Models (LLMs) — Decoder-Only (GPT-Like)**

These are Hugging Face’s default **causal language models** for text generation.

- GPT-2 (baseline default)
- GPT-Neo / GPT-NeoX  
- GPT-J  
- OPT  
- BLOOM  
- Falcon  
- MPT  
- LLaMA / LLaMA-2 / LLaMA-3  
- Qwen / Qwen-2  
- Mixtral / Mistral  
- Phi-2 / Phi-3  
- StarCoder / StarCoder2  
- CodeGen  
- DeepSeek-Coder  
- Yi / Yi-Coder  
- Gemma  
- XGen  

**Default pipeline tasks:**  
`text-generation`, `text2text-generation`, `fill-mask`, `causal-lm`.

---

# **2. Encoder-Only Models (BERT-Like)**

These models are defaults for embeddings and token-level tasks.

- BERT (base, large)  
- RoBERTa  
- DistilBERT  
- ALBERT  
- XLNet  
- Electra  
- CamemBERT  
- mBERT (BERT-base-multilingual-cased)  
- XLM-R  

**Default pipeline tasks:**  
`feature-extraction`, `text-classification`, `token-classification`.

---

# **3. Encoder–Decoder Models (Seq2Seq, T5-Family)**

Used for summarization, translation, and general text-to-text tasks.

- T5  
- FLAN-T5  
- mT5  
- BART  
- MarianMT (default translation family)  
- Pegasus (default summarization)  
- LongT5  
- LED (LongformerEncoderDecoder)

**Default pipeline tasks:**  
`summarization`, `translation`, `text2text-generation`.

---

# **4. Default Sentence Embedding Models**

- `sentence-transformers/all-MiniLM-L6-v2`  
- `sentence-transformers/mpnet-base-v2`  
- `sentence-transformers/all-mpnet-base-v2`  
- gte-base / gte-large  
- e5-small / e5-large / e5-mistral  
- Instructor-XL / Instructor-Large  

**Default pipeline task:**  
`feature-extraction` (RAG embeddings).

---

# **5. Computer Vision — Default HF Models**

## **Image Classification**
- ResNet (18, 34, 50)  
- ViT (Vision Transformer — HF’s default transformer for CV)  
- ConvNeXt  
- EfficientNet  
- MobileNet V2 / V3  
- Swin Transformer  
- BEiT  
- DeiT  

## **Object Detection**
- DETR  
- DINO / DINOv2  
- YOLOS  
- RT-DETR  
- GroundingDINO  

## **Segmentation**
- SegFormer  
- MaskFormer  
- UPerNet  
- SAM / MobileSAM / FastSAM  

## **Image-to-Image**
- ControlNet  
- Stable Diffusion Img2Img  

## **Vision-Language**
- CLIP  
- BLIP / BLIP-2  
- OWL-ViT  

**Default pipeline tasks:**  
`image-classification`, `object-detection`, `image-segmentation`, `zero-shot-image-classification`.

---

# **6. Audio & Speech Models (Hugging Face Defaults)**

## **ASR (Speech-to-Text)**
- Whisper (tiny → large-v3)  
- Wav2Vec2  
- HuBERT  
- MMS  

## **Speaker Diarization**
- `pyannote/speaker-diarization-3.1` (default)

## **Text-to-Speech**
- SpeechT5  
- FastSpeech2  
- VITS / YourTTS  

## **Audio Classification**
- AST  
- Wav2Vec2-CTC  

---

# **7. Multimodal & Vision-Language Models**

Default HF multimodal models:

- CLIP  
- BLIP / BLIP-2  
- Flamingo / OpenFlamingo  
- LLaVA / LLaVA-1.5 / LLaVA-NeXT  
- MiniGPT-4  
- Qwen-VL / Qwen2-VL  
- Kosmos-2  
- Pix2Struct  
- Donut (OCR)  
- LayoutLM / LayoutLMv3  
- Mistral-VL, Yi-VL  

**Default pipeline tasks:**  
`image-to-text`, `document-question-answering`, `visual-question-answering`.

---

# **8. Diffusion Models (Text-to-Image Standards)**

## **Stable Diffusion Family**
- SD 1.4 / 1.5  
- SD 2.0 / 2.1  
- SDXL  
- Stable Diffusion Turbo  
- SDXL-Lightning  
- DreamBooth fine-tuned variants  

## **Latent Diffusion**
- LDM text2img  
- LDM img2img  

## **Other Generative Families**
- Consistency Models  
- ControlNet  
- PixArt-α / PixArt-Sigma  
- DeepFloyd IF  
- Kandinsky 2.2  

---

# **9. Reinforcement Learning Models**

- Decision Transformer  
- Trajectory Transformer  
- GTrXL  

Plus Hugging Face TRL methods:
- PPO  
- DPO  
- Reinforce  
- Reward Models  

---

# **10. Time Series Forecasting Models**

- N-BEATS  
- TFT  
- TCN  
- Amazon Chronos  
- TimesFM  
- PatchTST  
- Informer / Autoformer  
- ETSformer  

---

# **11. Hugging Face Tokenizers (Standard)**

- BPE  
- WordPiece  
- SentencePiece  
- Unigram LM  
- GPT-2 BPE  

---

# **12. Default HF Pipeline Models (Canonical Table)**

| Task | Default Model |
|------|---------------|
| text-generation | gpt2 |
| text-classification | distilbert-base-uncased-finetuned-sst-2-english |
| zero-shot-classification | facebook/bart-large-mnli |
| fill-mask | bert-base-uncased |
| translation | Helsinki-NLP/opus-mt-* |
| summarization | facebook/bart-large-cnn |
| question-answering | distilbert-base-cased-distilled-squad |
| ner | dbmdz/bert-large-cased-finetuned-conll03-english |
| image-classification | google/vit-base-patch16-224 |
| object-detection | facebook/detr-resnet-50 |
| speech-to-text | openai/whisper-small |
| audio-classification | superb/hubert-base-superb-ks |
| image-to-text | nlpconnect/vit-gpt2-image-captioning |
| document-qa | impira/layoutlm-document-qa |
| image-segmentation | facebook/mask2former-swin-base-coco |

---

# **13. Final Condensed “Canonical” HF Models by Field**

## **NLP**
- BERT  
- RoBERTa  
- DistilBERT  
- ALBERT  
- XLNet  
- Electra  
- GPT-2  
- T5  
- BART  
- MarianMT  
- Pegasus  
- XLM-R  
- mT5  
- FLAN-T5  

## **LLMs**
- LLaMA  
- Falcon  
- Mistral  
- Qwen  
- Gemma  
- Mixtral  
- StarCoder  
- Phi  

## **Vision**
- ViT  
- ResNet  
- ConvNeXt  
- Swin  
- BEiT  
- DETR  
- SegFormer  
- SAM  
- CLIP  

## **Audio**
- Whisper  
- Wav2Vec2  
- HuBERT  
- MMS  

## **Multimodal**
- CLIP  
- BLIP / BLIP-2  
- Flamingo  
- LLaVA  
- Pix2Struct  
- Donut  
- LayoutLM  

## **Diffusion**
- Stable Diffusion (all versions)  
- SDXL  
- DeepFloyd IF  
- Kandinsky  
- PixArt  

---



