# Vision Transformer (ViT) - Image Classification

In this code-along, we will use a pretrained Vision Transformer model from Hugging Face to classify an image.  
We’ll follow these steps:

1. Install the required libraries  
2. Load a sample image  
3. Load the pretrained ViT model  
4. Run the prediction and print the result



In [None]:
#Install Required Libraries

!pip install transformers torchvision



In [None]:
# Import the required Python libraries
from transformers import ViTImageProcessor, ViTForImageClassification
from PIL import Image
import requests
import torch

In [None]:
# Load a Sample Image

# Use a test image
url = "https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg"
image = Image.open(requests.get(url, stream=True).raw)
display(image)

In [None]:
# Load a Pretrained ViT Model

# Load model and image processor
model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')
processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')


In [None]:
# Preprocess the Image

# Preprocess image
inputs = processor(images=image, return_tensors="pt")


In [None]:
# Make Prediction

# Extract the pixel values from the dictionary
pixel_values = inputs["pixel_values"]

# Run the model using the pixel values
with torch.no_grad():
    outputs = model(pixel_values=pixel_values)

# Get the predicted label
logits = outputs.logits                             # These are the raw scores for each class
predicted_class_idx = logits.argmax(-1).item()      # Find the index with the highest score
label = model.config.id2label[predicted_class_idx]  # Convert index to human-readable label

print(f"Predicted label: {label}")