# **AutoModelForCausalLM**

### **Dig Deeper using Auto Class**
This gives more transperancy and visibility in the whole pipeline.

**Reference - [Auto Class Documentation](https://huggingface.co/docs/transformers/autoclass_tutorial)**

1. **AutoTokenizer -** Nearly every NLP task begins with a tokenizer. A tokenizer converts your input into a format that can be processed by the model. Load a tokenizer with `AutoTokenizer.from_pretrained()`.
2. **AutoModel -** The AutoModelFor classes let you load a pretrained model for a given task. For example, load a model for sequence classification with `AutoModelForSequenceClassification.from_pretrained()`.  **[Click here](https://huggingface.co/docs/transformers/model_doc/auto)** for a complete list of available tasks under AutoModel Class.
3. **AutoImageProcessor -** For vision tasks, an image processor processes the image into the correct input format. Use `AutoImageProcessor.from_pretrained()`.
4. **AutoFeatureExtractor -** For audio tasks, a feature extractor processes the audio signal the correct input format. Load a feature extractor with `AutoFeatureExtractor.from_pretrained()`.
5. **AutoProcessor -** Multimodal tasks require a processor that combines two types of preprocessing tools. For example, the LayoutLMV2 model requires an image processor to handle images and a tokenizer to handle text; a processor combines both of them. Load a processor with `AutoProcessor.from_pretrained()`.

In [11]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Error while downloading from https://cdn-lfs-us-1.huggingface.co/repos/4a/a7/4aa7b074fda985a41c347453e29e295a88918e98691e0f71da924481ca5e50d8/3f311787aa136e858556caa8543015161edcad85ba81b6a36072443d7fa73c87?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model-00002-of-00002.safetensors%3B+filename%3D%22model-00002-of-00002.safetensors%22%3B&Expires=1724340235&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNDM0MDIzNX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzRhL2E3LzRhYTdiMDc0ZmRhOTg1YTQxYzM0NzQ1M2UyOWUyOTVhODg5MThlOTg2OTFlMGY3MWRhOTI0NDgxY2E1ZTUwZDgvM2YzMTE3ODdhYTEzNmU4NTg1NTZjYWE4NTQzMDE1MTYxZWRjYWQ4NWJhODFiNmEzNjA3MjQ0M2Q3ZmE3M2M4Nz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoifV19&Signature=k0X0Ofl6xOKSnySqhUfk-ZwnaCIhq7XKmUpM4UqoQfl8ehsMBlpM0umS9bFGBt0JxnVaimpTgWwAIU5QHSCe81taewOwT2-7IERoAobh0Ck6FnrU3-dpazlayAOvjkyWa3ow0dQn5AE%7EGStz64rbTJ%7E7vKtRa8Y35iH090RhfPMz24ABE8nCaJyTu4%7Eoj8SPxsZo6G%7EZ%

model-00002-of-00002.safetensors:  12%|#1        | 315M/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

In [12]:
tokenizer

LlamaTokenizerFast(name_or_path='microsoft/Phi-3-mini-4k-instruct', vocab_size=32000, model_max_length=4096, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '<|endoftext|>', 'unk_token': '<unk>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("</s>", rstrip=True, lstrip=False, single_word=False, normalized=False, special=False),
	32000: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	32001: AddedToken("<|assistant|>", rstrip=True, lstrip=False, single_word=False, normalized=False, special=True),
	32002: AddedToken("<|placeholder1|>", rstrip=True, lstrip=False, single_word=False, normalized=False, special=

In [13]:
model

Phi3ForCausalLM(
  (model): Phi3Model(
    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x Phi3DecoderLayer(
        (self_attn): Phi3Attention(
          (o_proj): Linear(in_features=3072, out_features=3072, bias=False)
          (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)
          (rotary_emb): Phi3RotaryEmbedding()
        )
        (mlp): Phi3MLP(
          (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)
          (down_proj): Linear(in_features=8192, out_features=3072, bias=False)
          (activation_fn): SiLU()
        )
        (input_layernorm): Phi3RMSNorm()
        (resid_attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)
        (post_attention_layernorm): Phi3RMSNorm()
      )
    )
    (norm): Phi3RMSNorm()
  )
  (lm_head): Linear(in_features=3072, out_features=3206

In [5]:
tweet = "A very bad experience at the airport."

tokens = tokenizer.tokenize(tweet)

print("Tokens:\n", tokens)

Tokens:
 ['A', 'Ġvery', 'Ġbad', 'Ġexperience', 'Ġat', 'Ġthe', 'Ġairport', '.']


In [6]:
tweet = "A very bad experience at the airport."

token_ids = tokenizer(tweet)

print("Token Ids:\n", token_ids)

Token Ids:
 {'input_ids': [32, 845, 2089, 1998, 379, 262, 9003, 13], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1]}


In [7]:
## Decode method allows us to check how the final output of the 
## tokenizer translates back to text
print("Decoded Text Output:\n", tokenizer.decode(token_ids["input_ids"]))

Decoded Text Output:
 A very bad experience at the airport.


In [10]:
# A tokenizer can also accept a list of inputs, and pad and truncate the text to return a batch with uniform length:
tweets = ["A very bad experience at the airport.", 
          "Amazing job done by the government with their initiatives.", 
          "This is just a random string."]

token_ids = tokenizer(tweets, padding=True, truncation=True, return_tensors="pt")

print(token_ids)

ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.

In [47]:
# Now pass your preprocessed batch of inputs directly to the model. 
# You just have to unpack the dictionary by adding **
outputs = model(**token_ids)

print(outputs)

SequenceClassifierOutput(loss=None, logits=tensor([[ 2.5498, -0.3649, -2.5233],
        [-2.2652, -1.1891,  3.2718],
        [ 0.5962,  1.1877, -1.9644]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)


In [49]:
from torch import nn

predictions = nn.functional.softmax(outputs.logits, dim=-1)

print(predictions)

tensor([[0.9430, 0.0511, 0.0059],
        [0.0039, 0.0114, 0.9847],
        [0.3468, 0.6265, 0.0268]], grad_fn=<SoftmaxBackward0>)


In [52]:
# Get the index of the maximum value (the predicted class)
predicted_class_idx = predictions.argmax(dim=-1)

# Define the mapping from indices to class labels
class_labels = ['negative', 'neutral', 'positive']

# Convert the predicted indices to the corresponding labels
predicted_labels = [class_labels[idx] for idx in predicted_class_idx]

# Print the predicted labels
for i in range(len(predicted_labels)):
    print(f"Tweet: {tweets[i]}")
    print(f"Prediction: {predicted_labels[i]}")
    print()

Tweet: A very bad experience at the airport.
Prediction: negative

Tweet: Amazing job done by the government with their initiatives.
Prediction: positive

Tweet: This is just a random string.
Prediction: neutral

