## Example of AutoModelClass with Different heads

| Task                        | Class Name                                          | Description                                          |
| --------------------------- | --------------------------------------------------- | ---------------------------------------------------- |
| Base Model               | `AutoModel`                                         | Just embeddings (no head), for feature extraction    |
| Masked LM                | `AutoModelForMaskedLM`                              | For BERT-style `[MASK]` prediction                   |
| Sequence Classification | `AutoModelForSequenceClassification`                | For sentence-level tasks like sentiment, spam        |
| Token Classification     | `AutoModelForTokenClassification`                   | For NER, POS tagging (token-level labels)            |
| Multiple Choice          | `AutoModelForMultipleChoice`                        | For MCQ tasks (e.g., SWAG dataset)                   |
| Next Sentence Prediction | `AutoModelForNextSentencePrediction`                | For NSP (mainly BERT pretraining)                    |
| Causal LM (Text Gen)     | `AutoModelForCausalLM`                              | For GPT-style models (left-to-right text generation) |
| Seq2Seq LM               | `AutoModelForSeq2SeqLM`                             | For translation, summarization (T5, BART)            |
| Question Answering       | `AutoModelForQuestionAnswering`                     | For extractive QA (SQuAD-style)                      |
| Vision Tasks            | `AutoModelForImageClassification`, etc.             | For image classification (e.g., ViT)                 |
| Audio Tasks              | `AutoModelForAudioClassification`, etc.             | For Wav2Vec, Whisper, etc.                           |
| Conditional Generation   | `AutoModelForConditionalGeneration`                 | For T5/BART-style seq2seq tasks                      |
| Zero-shot Tasks          | `AutoModelForZeroShotClassification` (via pipeline) | For inference without fine-tuning                    |
| Contrastive Learning     | `AutoModelForContrastiveLearning`                   | For embeddings comparison tasks                      |


## Inbuilt Pipeline method

| Parameter           | Type                        | Description / Use Case                                                               |
| ------------------- | --------------------------- | ------------------------------------------------------------------------------------ |
| `task`              | `str`                       | Task name like `"text-classification"`, `"text-generation"`, `"summarization"`, etc. |
| `model`             | `str` or `PreTrainedModel`  | Model name (e.g., `"bert-base-uncased"`) or loaded model object                      |
| `tokenizer`         | `str` or Tokenizer Object   | Optional – auto-loaded from model if not given                                       |
| `config`            | `str` or `PretrainedConfig` | Optional – custom configuration (e.g., num labels, dropout, etc.)                    |
| `framework`         | `str` (`"pt"` or `"tf"`)    | Force PyTorch or TensorFlow (auto-detected if not set)                               |
| `device`            | `int` or `str`              | `0` = GPU, `-1` = CPU, `"cuda"` or `"cpu"`                                           |
| `revision`          | `str`                       | Git branch/tag/commit from Hugging Face Hub (e.g., `"main"`, `"v1.0"`)               |
| `use_fast`          | `bool`                      | Whether to use fast tokenizer (defaults to `True`)                                   |
| `token`             | `str` or `bool`             | Hugging Face token (needed for private models or rate limits)                        |
| `device_map`        | `Any`                       | For multi-GPU / model parallelism                                                    |
| `torch_dtype`       | `torch.dtype`               | Force precision: e.g., `torch.float16` for faster inference                          |
| `trust_remote_code` | `bool`                      | Allow loading custom `model.py` logic from HF repo (for advanced/custom models)      |
| `model_kwargs`      | `dict`                      | Extra arguments to pass to model (e.g., `temperature`, `top_p`, `max_length`, etc.)  |
| `feature_extractor` | `str` or Object             | Used for older vision/audio models (now replaced by `image_processor`)               |
| `image_processor`   | `BaseImageProcessor`        | For image tasks (used in ViT, SAM, etc.)                                             |
| `processor`         | `ProcessorMixin`            | For multi-modal models (e.g., Whisper, CLIP, LayoutLMv3)                             |
| `pipeline_class`    | `Any`                       | Custom pipeline class if you’re extending or modifying the base pipeline logic       |
| `**kwargs`          | `Any`                       | Additional pipeline-specific keyword arguments (like `max_length`, `top_k`, etc.)    |


## BLEU vs ROUGE: Evaluation Metrics Comparison

| Feature                 | **BLEU (Bilingual Evaluation Understudy)**          | **ROUGE (Recall-Oriented Understudy for Gisting Evaluation)** |
| ----------------------- | --------------------------------------------------- | ------------------------------------------------------------- |
| **Primary Focus**       | Precision (How much predicted is correct)           | Recall (How much reference is captured)                       |
| **N-gram type**         | 1-gram to 4-gram precision                          | 1-gram to 4-gram recall + LCS (ROUGE-L)                       |
| **Formula Style**       | BLEU = BP × exp(avg log precision)                  | ROUGE = overlap / reference length                            |
| **Brevity Penalty**     | Yes — penalizes too-short output                    | No — doesn’t punish short outputs                             |
| **Best For**            | Machine Translation, factual generation             | Summarization, content preservation                           |
| **Fails When**          | Synonyms used, multiple valid outputs               | Extra irrelevant content is present                           |
| **Overlap type**        | Exact n-gram match (no flexibility)                 | Longest sequence, flexible phrasing                           |
| **Multiple References** | Supported                                           | Supported                                                     |
| **Interpretation**      | Higher BLEU = more precise and fluent output        | Higher ROUGE = more informative and complete output           |
| **Used In**             | Machine Translation, chatbot QA, factual generation | Summarization, headline generation                            |
| **Metric Values**       | 0 to 1 (or 0 to 100%)                               | 0 to 1 (or 0 to 100%)                                         |


## Decision Tree: Which AutoModel Should I Pick?

```
START
 |
 |–– Do you only need embeddings / hidden states?
 |        |
 |        |–– YES → AutoModel
 |        |
 |        |–– NO
 |
 |–– Are you generating text?
 |        |
 |        |–– YES
 |        |     |
 |        |     |–– Left-to-right generation (GPT-style)?
 |        |     |        → AutoModelForCausalLM
 |        |     |
 |        |     |–– Encoder–decoder generation (translate, summarize)?
 |        |              → AutoModelForSeq2SeqLM
 |        |
 |        |–– NO
 |
 |–– Are you predicting labels?
 |        |
 |        |–– Sentence-level label?
 |        |        → AutoModelForSequenceClassification
 |        |
 |        |–– Token-level label?
 |        |        → AutoModelForTokenClassification
 |        |
 |        |–– Start–end span in text?
 |        |        → AutoModelForQuestionAnswering
 |        |
 |        |–– Multiple options per example?
 |        |        → AutoModelForMultipleChoice
 |
 |–– Are you filling in [MASK] tokens?
 |        |
 |        |–– YES → AutoModelForMaskedLM
 |
 |–– Is this a vision task?
 |        |
 |        |–– Image classification?
 |        |        → AutoModelForImageClassification
 |        |
 |        |–– Object detection?
 |        |        → AutoModelForObjectDetection
 |        |
 |        |–– Image → text?
 |        |        → AutoModelForVision2Seq
 |
 |–– Is this an audio task?
 |        |
 |        |–– Speech → text?
 |        |        → AutoModelForSpeechSeq2Seq
 |        |
 |        |–– Audio classification?
 |        |        → AutoModelForAudioClassification
 |
 |–– Is it multimodal (text + image/audio)?
 |        |
 |        |–– Generation involved?
 |        |        → AutoModelForVision2Seq / AutoModelForSeq2SeqLM
 |
 |–– Are you doing similarity / retrieval?
 |        |
 |        |–– Bi-encoder / embeddings?
 |        |        → AutoModel
 |        |
 |        |–– Cross-encoder reranking?
 |        |        → AutoModelForSequenceClassification
 |
 END
```
