# Zero-Shot Classification

Traditionally, computer vision models are trained to predict a fixed set of categories. For image classification, for instance, many standard models are trained on the ImageNet dataset, which contains 1,000 categories. All images must be assigned to one of these 1,000 categories, and the model is trained to predict the correct category for each image.

For object detection, many popular models like YOLOv5, YOLOv8, and YOLO-NAS are trained on the MS COCO dataset, which contains 80 categories. In other words, the model is trained to detect objects in any of these categories, and ignore all other objects.

Thanks to the recent advances in multimodal models, it is now possible to perform zero-shot learning, which allows us to predict categories that were not seen during training. This can be especially useful when:

- We want to roughly pre-label images with a new set of categories
- Obtaining labeled data for all categories is impractical or impossible.
- The categories are changing over time, and we want to predict new categories without retraining the model.

In this recipe, we will show how you can quickly add zero-shot predictions to your dataset. Check [here](https://docs.voxel51.com/tutorials/zero_shot_classification.html#Evaluating-Zero-Shot-Image-Classification-Predictions-with-FiftyOne) for a more in depth tutorial on zero-shot image classification.

## Setup

If you haven't already, install FiftyOne:

In [None]:
!pip install fiftyone

We will also need the required packages:

In [None]:
!pip install -U torch torchvision fiftyone transformers timm open_clip_torch

Now let’s import the relevant modules and load the dataset!

For this walkthrough, we will use the [Caltech-256 dataset](https://docs.voxel51.com/user_guide/dataset_zoo/datasets.html#caltech-256), which contains 30,607 images across 257 categories. We will use 1000 randomly selected images from the dataset for demonstration purposes. The zero-shot models were not explicitly trained on the Caltech-256 dataset, so we will use this as a test of the models’ zero-shot capabilities. Of course, you can use any dataset you like!

In [None]:
import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "caltech256",
    max_samples=1000,
    shuffle=True
)

dataset.name = "Zero-Shot Classification"

session = fo.launch_app(dataset)

![zero-shot](../assets/zero-shot.png)

First, we need to start by grabbing the classes that we will want our zero-shot model to use. In our case, let us grab the ground_truth labels of Caltech-256 with the code below:

In [14]:
classes = dataset.distinct("ground_truth.label")

## Zero-Shot Image Classification with OpenAI CLIP

We can start off with the natively supported Open AI CLIP model, we can load and apply it to our dataset as follows:

In [15]:
clip = foz.load_zoo_model(
    "clip-vit-base32-torch",
    classes=classes,
)

dataset.apply_model(clip, label_field="clip")

 100% |███████████████| 1000/1000 [4.5s elapsed, 0s remaining, 305.2 samples/s]      


We can take a look at our new results right away!

In [None]:
session.show()

![OpenAi CLIP](../assets/openai-clip.png)

Want to try a different model? We have tons to choose from including any from [Hugging Face](https://docs.voxel51.com/tutorials/zero_shot_classification.html#Zero-Shot-Image-Classification-with-Hugging-Face-Transformers) as well as [OpenClip](https://docs.voxel51.com/tutorials/zero_shot_classification.html#Zero-Shot-Image-Classification-with-OpenCLIP)! Here we load AltCLIP from HuggingFace!

In [17]:
model = foz.load_zoo_model(
        "zero-shot-classification-transformer-torch",
        name_or_path="kakaobrain/align-base",
        classes=classes,
    )

dataset.apply_model(model, label_field="AltCLIP")




   0% ||--------------|    0/1000 [3.4ms elapsed, ? remaining, ? samples/s] 



 100% |███████████████| 1000/1000 [13.5m elapsed, 0s remaining, 1.2 samples/s]      


In [17]:
session.show()