# **AutoModelForCausalLM**

### **Dig Deeper using Auto Class**
This gives more transperancy and visibility in the whole pipeline.

**Reference - [Auto Class Documentation](https://huggingface.co/docs/transformers/autoclass_tutorial)**

1. **AutoTokenizer -** Nearly every NLP task begins with a tokenizer. A tokenizer converts your input into a format that can be processed by the model. Load a tokenizer with `AutoTokenizer.from_pretrained()`.
2. **AutoModel -** The AutoModelFor classes let you load a pretrained model for a given task. For example, load a model for sequence classification with `AutoModelForSequenceClassification.from_pretrained()`.  **[Click here](https://huggingface.co/docs/transformers/model_doc/auto)** for a complete list of available tasks under AutoModel Class.
3. **AutoImageProcessor -** For vision tasks, an image processor processes the image into the correct input format. Use `AutoImageProcessor.from_pretrained()`.
4. **AutoFeatureExtractor -** For audio tasks, a feature extractor processes the audio signal the correct input format. Load a feature extractor with `AutoFeatureExtractor.from_pretrained()`.
5. **AutoProcessor -** Multimodal tasks require a processor that combines two types of preprocessing tools. For example, the LayoutLMV2 model requires an image processor to handle images and a tokenizer to handle text; a processor combines both of them. Load a processor with `AutoProcessor.from_pretrained()`.

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct")



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [2]:
tokenizer

LlamaTokenizerFast(name_or_path='microsoft/Phi-3-mini-4k-instruct', vocab_size=32000, model_max_length=4096, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '<|endoftext|>', 'unk_token': '<unk>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("</s>", rstrip=True, lstrip=False, single_word=False, normalized=False, special=False),
	32000: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	32001: AddedToken("<|assistant|>", rstrip=True, lstrip=False, single_word=False, normalized=False, special=True),
	32002: AddedToken("<|placeholder1|>", rstrip=True, lstrip=False, single_word=False, normalized=False, special=

In [3]:
model

Phi3ForCausalLM(
  (model): Phi3Model(
    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x Phi3DecoderLayer(
        (self_attn): Phi3Attention(
          (o_proj): Linear(in_features=3072, out_features=3072, bias=False)
          (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)
          (rotary_emb): Phi3RotaryEmbedding()
        )
        (mlp): Phi3MLP(
          (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)
          (down_proj): Linear(in_features=8192, out_features=3072, bias=False)
          (activation_fn): SiLU()
        )
        (input_layernorm): Phi3RMSNorm()
        (resid_attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)
        (post_attention_layernorm): Phi3RMSNorm()
      )
    )
    (norm): Phi3RMSNorm()
  )
  (lm_head): Linear(in_features=3072, out_features=3206

In [11]:
# # library provides 8-bit and 4-bit quantization
# !pip install bitsandbytes

In [12]:
from transformers import AutoModelForCausalLM
import os

f = open("keys/.hf_read_token.txt")
os.environ['HF_TOKEN'] = f.read()

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", 
                                             quantization_config={'load_in_8bit': True})

RuntimeError: No GPU found. A GPU is needed for quantization.