[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hFqIFluX5AgomZYdb5PVn4kMXJiZ0t3d?usp=sharing)

# Pretrained models in HuggingFace - Overview Notebook

This notebook is a self-contained way to start using transformers.

**Learning goals:** The goal of this tutorial is to learn How To

1. Use pre-trained pipelines
2. Get embeddings
3. Build a multimodal models

**Steps to Do:** How to best use this notebook

1. Make a copy of this notebook, so you can keep your changes



In [None]:
%pip install --quiet transformers datasets sentence-transformers

## Pre-Trained Models with Pipelines -> ✨ Easy Mode ✨

The [pipeline()](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.pipeline) supports many 20+ common tasks out-of-the-box:

**Image**:
* Image classification: classify an image.
* Image segmentation: classify every pixel in an image.
* Object detection: detect objects within an image.

### Image

In [None]:
from IPython.display import Image
Image('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg')

In [None]:
from transformers import pipeline

vision_classifier = pipeline(task="image-classification")
imagepic="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
result = vision_classifier(
    images=imagepic
)
print("\n".join([f"Class {d['label']} with score {round(d['score'], 4)}" for d in result]))

#### Load specific endpoint

In [None]:
vision_classifier = pipeline(task="image-classification", model='microsoft/resnet-50')
imagepic="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
result = vision_classifier(
    images=imagepic
)
print("\n".join([f"Class {d['label']} with score {round(d['score'], 4)}" for d in result]))

### Load as extractor + model

We have to do exta steps to perform inference

In [None]:
from PIL import Image
import requests
from io import BytesIO

# get image in PIL format
response = requests.get(imagepic)
img = Image.open(BytesIO(response.content))

In [None]:
from transformers import AutoImageProcessor, ResNetForImageClassification
import torch

processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50")
model = ResNetForImageClassification.from_pretrained("microsoft/resnet-50")

inputs = processor(img, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# Apply softmax to the logits to get probabilities
probabilities = torch.softmax(logits, dim=-1)

# Get the top 5 predicted labels and their corresponding probabilities
top5_probabilities, top5_indices = torch.topk(probabilities, 5, dim=-1)

# Get the class labels corresponding to the top 5 indices
class_labels = [model.config.id2label[index.item()] for index in top5_indices[0]]

# Print the top 5 predicted classes and their probabilities
for i, (label, probability) in enumerate(zip(class_labels, top5_probabilities[0])):
    print(f"Top {i + 1}: Class '{label}' with Probability: {probability.item()}")

### Image embeddings using [Distilled data-efficient Image Transformer (DeiT)](https://huggingface.co/facebook/deit-base-distilled-patch16-224)

In [None]:
from PIL import Image
import requests
im = Image.open(requests.get(imagepic, stream=True).raw)

In [None]:
from transformers import AutoFeatureExtractor
feature_extractor = AutoFeatureExtractor.from_pretrained('facebook/deit-base-distilled-patch16-224')
embeddings = feature_extractor(images=im, return_tensors="pt")
embeddings