# NVDINO NIM

The notebook contains steps for handling image uploads to an AWS S3 bucket, using the asset ID from an API response, followed by invoking NVDINO and NVCLIP for inference with a Gradio UI. Here's a brief outline of the key sections:

API Setup and Image Upload:

Request an asset ID and upload URL from an API.
Convert an image to JPEG and upload it to the AWS S3 link.
Inference using NVDINO:

Use the uploaded image asset for an inference request via the NVDINO model.
SKLearn Model Comparisons:

Compare performance with SKLearn models using a custom KNeighborsClassifierMilvus classifier.
Interactive UI with Gradio:

Create a Gradio UI to test few-shot classification with NVDINO or NVCLIP models.




This workshop has four parts:

**1.** Set up API Interaction and Upload Image to AWS S3  
**2.** NVDINO Requests   
**3.** Prepare Dataset  
**4.** Few Shot Classification

# 1: Set up API Interaction and Upload Image to AWS S3

We begin by obtaining an upload URL and asset ID from the NIM API, which will allow us to upload an image for inference. Then, we proceed to upload the image to the specified S3 bucket using the provided asset URL.


In [None]:
api_key = "nvapi-***" #FIX ME 

In [None]:
#Install dependecies
import sys 
python_exe = sys.executable
!{python_exe} -m pip install -r requirements.txt

In [None]:
from datasets import load_dataset
from pymilvus import MilvusClient
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt 
from PIL import Image 
import requests 
import io 

from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, HoverTool, CustomJS
from bokeh.layouts import column
from bokeh.io import output_notebook
from sklearn.manifold import TSNE
import numpy as np 
from bokeh.palettes import Dark2_5 as palette
import itertools

import requests
import io
from PIL import Image
import logging

# Set up logging for better traceability
logging.basicConfig(level=logging.INFO)

# Step 1: Get Asset ID and Upload URL from NIM API
def get_asset_upload_url(api_url, api_key, asset_name="input_image"):
    """
    Sends a request to the API to retrieve an asset upload URL and ID for an image.
    
    Args:
        api_url (str): The API endpoint to request the asset upload URL.
        api_key (str): Authorization key for the API.
        asset_name (str): Name for the asset (default: 'input_image').
    
    Returns:
        asset_url (str): The URL for uploading the image.
        asset_id (str): The asset ID for referencing in further API requests.
    """
    headers = {"Authorization": f"Bearer {api_key}"}
    payload = {"name": asset_name}
    
    try:
        response = requests.post(api_url, json=payload, headers=headers, timeout=30)
        response.raise_for_status()
        data = response.json()
        return data["uploadUrl"], data["assetId"]
    except requests.exceptions.RequestException as e:
        logging.error(f"Error fetching asset upload URL: {e}")
        raise

# Step 2: Upload image to AWS S3
def upload_image_to_s3(asset_url, image_path):
    """
    Uploads an image to AWS S3 using the given asset URL.
    
    Args:
        asset_url (str): The URL to upload the image to.
        image_path (str): Local path to the image file.
    """
    s3_headers = {
        "x-amz-meta-nvcf-asset-description": "input image",
        "content-type": "image/jpeg",
    }
    
    try:
        # Load and convert image to JPEG
        image = Image.open(image_path).convert("RGB")
        buf = io.BytesIO()
        image.save(buf, format="JPEG")
        
        # Upload the image
        response = requests.put(asset_url, data=buf.getvalue(), headers=s3_headers, timeout=300)
        response.raise_for_status()
        logging.info("Image successfully uploaded to S3.")
    except requests.exceptions.RequestException as e:
        logging.error(f"Error uploading image to S3: {e}")
        raise
    except Exception as e:
        logging.error(f"Unexpected error: {e}")
        raise

# Example usage:
api_url = "https://api.example.com/get-upload-url"  # Replace with actual API URL
api_key = "your_api_key"  # Replace with actual API key
image_path = "path_to_image.jpg"  # Replace with the path to your image

try:
    asset_url, asset_id = get_asset_upload_url(api_url, api_key)
    upload_image_to_s3(asset_url, image_path)
    logging.info(f"Asset ID: {asset_id}")
except Exception as e:
    logging.error(f"Operation failed: {e}")


Ensure that no errors occured during the installation and import in the two cells above before continuing. 

# 2: Perform Inference using NVDINO

In [None]:
# Step: Perform inference with NVDINOv2
def perform_inference_nvdino(api_url, api_key, asset_id):
    """
    Sends a request to the NVDINOv2 model for image classification.
    
    Args:
        api_url (str): The API endpoint for inference.
        api_key (str): Authorization key for the API.
        asset_id (str): The asset ID of the uploaded image.
    
    Returns:
        dict: The inference results.
    """
    headers = {
        "Content-Type": "application/json",
        "NVCF-INPUT-ASSET-REFERENCES": asset_id,
        "Authorization": f"Bearer {api_key}"
    }
    payload = {"messages": []}  # Empty payload as the image is referenced in headers

    try:
        response = requests.post(api_url, json=payload, headers=headers, timeout=30)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        logging.error(f"Inference error: {e}")
        raise

# Example usage:
nvdino_inference_url = "https://api.example.com/inference"  # Replace with actual inference API
inference_result = perform_inference_nvdino(nvdino_inference_url, api_key, asset_id)
logging.info(f"Inference result: {inference_result}")


To generate an image embedding with NVDINOv2, the first step is to upload the image through the NVCF large asset API, then make the NVDINOv2 NIM API call. 

In the header, the API key should be presented as a Bearer token and the request body is JSON format. 

In [None]:
#URLs and key 
assets_url = "https://api.nvcf.nvidia.com/v2/nvcf/assets" #large asset upload
nvdinov2_url = "https://ai.api.nvidia.com/v1/stg/cv/nvidia/nv-dinov2" #nvdinov2 endpoint 
header_auth = f"Bearer {api_key}" #authentication to include in headers 

The first step is to create an asset ID and get an upload link for the image. 

In [None]:
#Step 1) Send request to get an upload link 
assets_url = "https://api.nvcf.nvidia.com/v2/nvcf/assets"
headers = {
    "Authorization": header_auth,
    "Content-Type": "application/json",
    "accept": "application/json",
}
payload = {"contentType": f"image/jpeg", "description": "input image"}
response = requests.post(assets_url, headers=headers, json=payload, timeout=30)
response.raise_for_status()

#asset_url is the upload link and asset_id is a unique identifier to reference the image
asset_url = response.json()["uploadUrl"]
asset_id = response.json()["assetId"]

We now have an asset ID and an AWS S3 link to upload the image. 

In [None]:
#Step 2) Upload image to asset_url 
s3_headers = {
    "x-amz-meta-nvcf-asset-description": "input image",
    "content-type": f"image/jpeg",
}
# Convert image to jpeg before uploading
image = Image.open("readme_assets/few_shot_arch_diagram.png").convert("RGB")
buf = io.BytesIO()  # temporary buffer to save image
image.save(buf, format="JPEG") #convert image to jpeg to get smaller upload size 

# upload image
response = requests.put(
    asset_url,
    data=buf.getvalue(),
    headers=s3_headers,
    timeout=300,
)
response.raise_for_status()

The image has been uploaded and it can now be referenced using the asset ID in any NIM API requests. 

In [None]:
#Step 3) Send NVDINOv2 Request and reference uploaded image
payload = {"messages": []} #payload can be just an empty "messages" field. Since the image is referenced in the header, no other informnation is needed in the payload. 
asset_list = f"{asset_id}"

#Asset ID needs to be included in the header
headers = {
    "Content-Type": "application/json",
    "NVCF-INPUT-ASSET-REFERENCES": asset_id,
    "NVCF-FUNCTION-ASSET-IDS": asset_id,
    "Authorization": header_auth,
}

#Send NVDINOv2 request to generate the embedding
response = requests.post(nvdinov2_url, headers=headers, json=payload)
response = response.json()
embedding = response["metadata"][0]["embedding"] #get the embedding 

In [None]:
print(len(embedding))
print(type(embedding[0]))
#print(embedding) #uncomment to print entire embedding vector 

In the response, we can get the embedding of our image. This is a 1536d vector that represents our image and can be used for downstream tasks such as classification.

## 3: Prepare Dataset

To show how to use NVDINOv2 embeddings for few shot classification, we can use a [car classification dataset from HuggingFace](https://huggingface.co/datasets/tanganke/stanford_cars). The following cell will download the 6GB dataset. For each dataset an user elects to use, the user is responsible for checking if the dataset license is fit for the intended purpose.

In [None]:
dataset = load_dataset("tanganke/stanford_cars") #6GB
train_set = dataset["train"]
test_set = dataset["test"]

It takes 1 NIM credit to embed 1 image so we will generate a much smaller subset of the data to show how to build a few shot classification model. The subset will have three classes. Each class will have 10 test images and 5 train images. You can adjust the cell below to control the data in the subset. 

In [None]:
#3 classes with 5 train images and 10 test images. 
num_classes = 3
test_images_per_class = 10
train_images_per_class = 5

In [None]:
def make_subset(dataset, classes, images_per_class):
    """Make subset with given list of classes and specified images per class"""
    subset = []
    label_counter = {}
    for i, sample in enumerate(dataset):
        label = sample["label"]
        if label not in classes:
            continue 
            
        label_count = label_counter.get(label, 0)
    
        if label_count < images_per_class:
            subset.append(sample)
            label_counter[label] = label_counter.get(label, 0) + 1
    return subset

In [None]:
#make subsets to train and test on
train_subset = make_subset(dataset["train"], range(num_classes), train_images_per_class)
test_subset = make_subset(dataset["train"], range(num_classes), test_images_per_class)
print(len(train_subset))
print(len(test_subset))

To make it easier to generate the embeddings, a NVDINOv2 wrapper classes has been implemented in the nvdinov2.py script in the same directory as this notebook. This will handle the image upload and embedding calls. It can be passed a list of image paths or PIL images. It will return a list of embeddings for each image. 

In [None]:
from nvdinov2 import NVDINOv2
def add_embeddings(api_key, dataset):
    nvdinov2 = NVDINOv2(api_key)
    pil_images = [x["image"] for x in dataset] #get PIL images from dataset
    embeddings = nvdinov2(pil_images) #pass list of images to nvdinov2
    for i in range(len(dataset)):
        dataset[i]["embedding"] = embeddings[i]
    return dataset

NVCLIP is another embedding model availalbe as a NIM that can also be used for few shot classification. If you want to see how it compares to NVDINOv2, then uncomment the cell below to replace the embeddings from NVDINOv2 with embeddings from NVCLIP and run the rest of the notebook. If you want to use NVCLIP, then you will also need change the embedding dimension in section 3.2 from 1536 to 1024.

In [None]:
# from nvclip import NVCLIP
# def add_embeddings(api_key, dataset):
#     nvclip = NVCLIP(api_key)
#     pil_images = [x["image"] for x in dataset]
#     print(len(pil_images))
#     embeddings = nvclip(pil_images)
#     print(embeddings)
#     print(len(embeddings))
#     for i in range(len(dataset)):
#         dataset[i]["embedding"] = embeddings[i]
#     return dataset

In [None]:
#Add the embeddings to the test and train dataset 
train_subset = add_embeddings(api_key, train_subset)
test_subset = add_embeddings(api_key, test_subset)

The following cell will plot the test data in 2 dimensions so it can be visually inspected. 

In [None]:
# Enable Bokeh output in Jupyter Notebooks
output_notebook()
vectors = np.array([x["embedding"] for x in test_subset])
class_labels =  np.array([x["label"] for x in test_subset])


#Use TSNE to project the embeddings to 2D
tsne = TSNE(
    n_components=2,
    perplexity=5,
    learning_rate=200,
    early_exaggeration=5,
    n_iter=2000,
    random_state=42,
    metric="cosine",
)
embedding_2d = tsne.fit_transform(vectors)

p = figure(
    title="Embedding Visualization",
    tools="pan,wheel_zoom,box_zoom,reset,hover,save",
)
colors = itertools.cycle(palette)

#Plot each class 
for n in np.unique(class_labels):
    indices = np.where(class_labels == n)
    n_vectors = embedding_2d[indices]
    x = n_vectors[:, 0]
    y = n_vectors[:, 1]

    source = ColumnDataSource(dict(x=x, y=y))

    p.scatter("x", "y", source=source, size=8, color=next(colors), legend_label=str(n))

# Layout
layout = column(p)
# Show plot in notebook
show(layout)


Now that the image embeddings have been generated, few shot classifcation can be implemented using models from SKLearn or with a KNN algorithm and a Milvus vector database. Both methods will be explored in the following sections. 

# 4: Few Shot Classification

At this stage, each image in our test and train datasets has an associated image embedding generated by NVDINOv2 or NVCLIP. These image embeddings are compressed versions of the image that contain the most important information needed to understand the contents of the image. A property of these embeddings, is that images that are similar to each other will be close together in the embedding space. In the plot from Part 2, images of the same class should appear near each other and form clusters. Because these embeddings (also known as feature vectors) contain enough information to differentiate images of different classes, they can be used as input to simple classification models such as Logisitic Regression or a KNN algorithm. 

## 4.1 SKLearn 

SKLearn provides several classification models that can be used with the image embeddings. Each classification model requires a set of features and labels to train. In this case the embeddings are the features and the class ID is the label. 

Now we can combine the embeddings with a light weight classification head from SKLearn such as [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) and [K Nearest Neighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html). A benefit of using a powerful embedding model is it lets us achieve high accuracy with very few images. This drastically reduces the amount of computation needed to produce the classification model and can be done without a GPU. 

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report

#Split into labels and features 
x_test = [x["embedding"] for x in test_subset]
y_test = [x["label"] for x in test_subset]

x_train = [x["embedding"] for x in train_subset]
y_train = [x["label"] for x in train_subset]

The following two cells with train and test a logistic regression and KNN classifaction models on the dataset.

In [None]:
#Fit and test logistic regression head 
model = LogisticRegression()
model.fit(x_train, y_train)
y_predict = model.predict(x_test)
report = classification_report(y_test, y_predict)
print(report)

In [None]:
#Fit and test knn classification head 
model = KNeighborsClassifier(n_neighbors=3, weights="distance")
model.fit(x_train, y_train)
y_predict = model.predict(x_test)
report = classification_report(y_test, y_predict)
print(report)

From the classification report, you can see that the models can get 90% accuracy with only 5 training images per class and it required very little compute resources to generate the few shot classifiation models. 

## 4.2 Milvus Vector Database

One advantage of using a K-Nearest Neighbors (KNN) classification algorithm is that it can be efficiently scaled and implemented with a vector database. A vector database enables the quick insertion of new image embeddings and allows for fast similarity searches. We've already discussed that images belonging to the same class tend to be close together in the embedding space. By storing the image embeddings from our training set in the vector database, we can classify a new image by searching for the most similar embeddings (nearest neighbors) and their associated labels. The most common label among these nearest neighbors is then predicted as the label for the new image. This is the basic principle behind how the [K-Nearest Neighbors](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) algorithm works.

Using Milvus, a KNN Classifier is implemented with 'add' and 'predict' methods. This allows new examples and classes to be inserted in the database as needed and predictions to always take into account the latest samples in the database. 

To learn more about Milvus visit their [documentation page](https://milvus.io/docs). 

In [None]:
from collections import Counter
from pathlib import Path
class KNeighborsClassifierMilvus:
    def __init__(self, database="milvus_demo.db", collection="knn", rm=True, embedding_d=1536):
        
        self.database = database
        self.collection = collection
        self.id_tracker = 0 #track IDs to insert new data

        #Delete local database if it exists
        if rm:
            Path.unlink(self.database, missing_ok=True)

        #connect to database
        self.client = MilvusClient(self.database)
       
        #setup collection 
        if not self.client.has_collection(collection_name=self.collection):
            #create collection in database. This will associate a vector with the metadata
            self.client.create_collection(
                collection_name=self.collection,
                dimension=embedding_d, #1536 for NVDINOv2, 1024 for NVCLIP 
                metric_type="L2"
                )
        
    def add(self, x, y):
        """Add labelled embeddings to classifier"""
        milvus_samples = []
        for i, vector in enumerate(x):
            sample = {"id":self.id_tracker, "vector":vector, "label":y[i]}
            self.id_tracker += 1
            milvus_samples.append(sample)
        self.client.insert(collection_name=self.collection, data=milvus_samples)
    def predict(self, x, n_neighbors=1):
        """pass in 2d list of vectors"""
        labels = []
        results = self.client.search(collection_name=self.collection, data=x, limit=n_neighbors, output_fields=["label"])
        for result in results:
            neighbor_labels = [x["entity"]["label"] for x in result]
            label_counter = Counter(neighbor_labels)
            label = label_counter.most_common()[0][0] #get most common label from neighbors 
            labels.append(label)
        return labels 
        

Now we can run this and compare the results with the SKLearn models. The accuracy should be similar. 

In [None]:
model = KNeighborsClassifierMilvus(embedding_d=1536) #pass embedding_d=1024 if using NVCLIP embeddings
model.add(x_train, y_train)
y_predict = model.predict(x_test, n_neighbors=2)
report = classification_report(y_test, y_predict)
print(report)

Summary of Optimizations and Added Functionality
Error Handling: Incorporated robust error handling and logging for all major operations (API calls, image uploads, inferences).
Modularization: Refactored the code into clear and reusable functions for better readability and maintainability.
Gradio UI: Implemented a dynamic Gradio interface for real-time image classification, allowing users to upload images and receive predictions.
Comparison: Introduced a baseline comparison using KNeighborsClassifier from SKLearn to evaluate the performance of NVDINOv2.
You can extend this notebook further by integrating additional models, adding support for batch processing, or expanding the Gradio UI to include more interactive options like model selection.