<a href="https://colab.research.google.com/github/dhruvin6122/HuggingFace_Hub/blob/main/pretrained_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [36]:
# 1.AutoModel (Embeddings only)

from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

text = "I love Hugging Face!"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

print(outputs.last_hidden_state.shape)


torch.Size([1, 7, 768])


Hidden vectors only, no classification labels

Use case → feature extraction, embeddings for clustering, similarity search
**bold text**

In [37]:
# 2.AutoModelForSequenceClassification (Classification only)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

text = "I love Hugging Face!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)

logits = outputs.logits
probs = F.softmax(logits, dim=-1)
predicted_label = model.config.id2label[torch.argmax(probs).item()]
print(predicted_label)


POSITIVE


Easy → direct predicted label

Industry use → Sentiment, Spam, Toxicity

In [38]:
# 3.AutoModelForQuestionAnswering (Question Answering only)

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")

question = "Where is Hugging Face located?"
context = "Hugging Face is based in New York City."
inputs = tokenizer(question, context, return_tensors="pt")
outputs = model(**inputs)

start_idx = torch.argmax(outputs.start_logits)
end_idx = torch.argmax(outputs.end_logits)
answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][start_idx:end_idx+1]))
print(answer)



tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

New York City


Output → Extracted answer span

Use case → Chatbots, FAQ systems

In [42]:
# 4.AutoModelForQuestionAnswering (summrization and translation only)

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")

text = """
Hugging Face is a company that has become a leader in the field of natural language processing.
They provide open-source libraries and tools that help developers implement state-of-the-art machine learning models.

"""

inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(summary)


Hugging Face is a company that has become a leader in the field of natural language processing


Output → Generated text

Use case → Summarization, Translation, Paraphrasing