SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.  
SPDX-License-Identifier: Apache-2.0

# NV-CLIP NIM Multimodal Search Workshop

NVIDIA Inference Microservices (NIMs) are a collection of easy to use API driven microservices to interact with AI models.

This workshop will focus on the NV-CLIP NIM which is a commerically viable embedding model for both images and text. Having both images and text in the same embedding space makes it easy to determine how similar text is to any given image. 

This capability of being able to associate images with text, has led to the development of a new class of open vocabulary models that allow you to detect or classify anything based on text descriptions and powerful search applications through natural language. This notebook will explore how the embeddings produced by NV-CLIP can be used to create a semantic search application over a traffic camera dataset. 

To learn more about NIMs visit <a href=https://build.nvidia.com/explore/discover> ai.nvidia.com </a>

![semantic search architecture diagram](readme_assets/semantic_search_diagram.png)

This workshop has four parts:

**Part 0**: Setup Environment  
**Part 1**: NV-CLIP Requests  
**Part 2**: Embeddings & Similarity   
**Part 3**: Traffic Vehicle Search  

# Part 0: Prepare the Workspace

## Part 0.1: Setup Environment

First, we set up the environment. This includes installing the required libraries.

Note that this notebook will download a public traffic camera dataset that is 155MB and will be used throughout the notebook.


In [3]:
import subprocess
import platform
import os

# Check/create virtual environment based on notebook directory name

# Get the directory name of the current notebook
notebook_dir = os.path.basename(os.path.abspath('.'))
venv_name = notebook_dir + '_venv'

print(f"Notebook directory name: {notebook_dir}")


# Check if a virtual environment is active
active_venv = os.environ.get('VIRTUAL_ENV')
if active_venv:
    active_venv_name = os.path.basename(active_venv)
    print(f"Currently active virtual environment: {active_venv_name}")
    
    # If the active venv doesn't match the directory name
    if active_venv_name != venv_name:
        print(f"Warning: Active environment doesn't match notebook directory name.")


# Check if a virtual environment with the directory name exists
venv_path = os.path.join(os.path.abspath('.'), venv_name)
print(f"Virtual environment path: {venv_path}")
bin_dir = 'Scripts' if platform.system() == 'Windows' else 'bin'
activate_script = os.path.join(venv_path, bin_dir, 'activate')

if os.path.exists(activate_script):
    print(f"Virtual environment '{venv_name}' exists.")
    
    if not active_venv or active_venv_name != venv_name:
        print(f"To activate this environment, restart the kernel and run:")
        if platform.system() == 'Windows':
            print(f"    {os.path.join(venv_path, 'Scripts', 'activate')}")
        else:
            print(f"    source {os.path.join(venv_path, 'bin', 'activate')}")
    
else:
    print(f"Creating virtual environment '{venv_name}'...")
    try:
        subprocess.run(['python', '-m', 'venv', venv_name], check=True)
        print(f"Virtual environment '{venv_name}' created successfully.")
        if platform.system() == 'Windows':
            print(f"    {os.path.join(venv_path, 'Scripts', 'activate')}")
        else:
            print(f"    source {os.path.join(venv_path, 'bin', 'activate')}")
    except subprocess.CalledProcessError as e:
        print(f"Error creating virtual environment: {e}")

Notebook directory name: nvclip_multimodal_search
Virtual environment path: /home/luke/Documents/GitHub/Camera-Based-Tracking/metropolis-providencecv/nim_workflows/nvclip_multimodal_search/nvclip_multimodal_search_venv
Virtual environment 'nvclip_multimodal_search_venv' exists.
To activate this environment, restart the kernel and run:
    source /home/luke/Documents/GitHub/Camera-Based-Tracking/metropolis-providencecv/nim_workflows/nvclip_multimodal_search/nvclip_multimodal_search_venv/bin/activate


## Part 0.2: Activate the environment and install dependencies

***Run the script from the previous output to activate the environment.***

```bash
source /path/to/your/env/
```

This cell will change to the relevant Python version.

Once that is done, it will install the required dependencies.

In [4]:
# Deactivate the current environment if any
if active_venv and active_venv_name != venv_name:
    print(f"Deactivating the current environment '{active_venv_name}'...")
    subprocess.run("bash -c 'deactivate'", shell=True)

# Activate the virtual environment
print(f"Activating virtual environment '{venv_name}'...")
subprocess.run(f"bash -c 'source {activate_script}'", shell=True)
venv_python = os.path.join(venv_path, bin_dir, "python")

try:
    subprocess.run([venv_python, '-m', 'pip', 'install', '-r', 'requirements.txt'], check=True)
    print("Requirements installed successfully.")
except subprocess.CalledProcessError as e:
    print(f"Error installing required packages: {e}")

Activating virtual environment 'nvclip_multimodal_search_venv'...
Requirements installed successfully.



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


## Part 0.3: Restart the kernel and update Python version

At the top of the notebook, you will see a button that says "Restart". Press it to restart the kernel.

Once that is done, change the Python version in the top right to the one matching the name of the environment you just created. For example, if you created an environment called `nim`, select the Python version that says `nim` in the name.

In [1]:
#Ensure all imports work 
import requests 
import json
import os
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics.pairwise import cosine_similarity
from PIL import Image 
from random import randint 
from pathlib import Path 
from tqdm import tqdm 
from random import sample
from dotenv import load_dotenv, find_dotenv

## Part 0.4: Retrieve API Key

The API key is required to access the NIMs, which should be stored in a .env file in either this directory or a master directory above this.
The .env file should contain the following line:

```
NIM_API_KEY=nvapi-<the_rest_of_your_api_key>
```

You can get your API key from the NVIDIA NIMs portal: https://build.nvidia.com/ 

In [6]:
# Find the .env file in the current directory. If issues are encountered with the function of this API, please check there are no clashing .env files in the same directory as this notebook.
find_dotenv()
# Load environment variables from .env file
load_dotenv()

api_key = os.getenv("NIM_API_KEY")
print(f'API key ending in {api_key[-6:]} loaded successfully.')

API key ending in 3cRTN1 loaded successfully.


If there are any errors at this point, ensure the dependecies installed properly can be imported before continuing. 

# Part 1: NV-CLIP Requests

This section shows how to call the NV-CLIP NIM API with a POST request to get an embedding for text input. 

Sending a request to the NV-CLIP NIM API requires a header that includes your API key for authorization and a payload with the content to be embedded. 

In the header, the API key should be presented as a Bearer token and the request body is JSON format. 

In [None]:
base_url = "https://integrate.api.nvidia.com/v1/embeddings"
headers = {"Authorization": f"Bearer {api_key}", "Accept": "application/json"}

### Part 1.1: Text Embeddings

We can then form the payload to generate an embedding for text input. 

In [None]:
payload = {
    "input": "An Apple",
    "model": "nvidia/nvclip"
} 

The payload required for NV-CLIP is an "input" field with a value that is either a string or a list of strings and a "model" field that specifies using the "nvidia/nvclip" embedding model.  

In [None]:
response = requests.post(base_url, headers=headers, json=payload)
response = response.json()
print(json.dumps(response, indent=2))

The Python requests library can then be used to send a POST request to the NV-CLIP API url. The response will be in JSON format and can be parsed to get the embedding vector for our input phrase "An Apple".

In [None]:
embedding_vector = response["data"][0]["embedding"]
print(len(embedding_vector))
print(type(embedding_vector[0]))
print(embedding_vector)

The vector that gets returned is a list of 1024 floating point values that represent our text input in the embedding space.

### Part 1.2 Image Embeddings

A unique property of the CLIP family of models, is that they can also be used to embed images. To send an image to the NVCLIP NIM in the POST request, it needs to be converted to a base 64 string. 

NV-CLIP processes images at 336x336 resolution so we can first downsize our input image before converting it to a base 64 string. Resizing the image is not necessary but we can do it to reduce our payload size.

The following code cells will generate the embedding of this image of an apple. 

![image](test_image.jpeg)

In [None]:
import io
import base64 
def process_image(image):
    """ Resize image, encode as jpeg to shrink size then convert to b64 for upload """
    if isinstance(image, str):
        image = Image.open(image).convert("RGB")
    elif isinstance(image, Image.Image):
        image = image.convert("RGB")
        
    image = image.resize((336,336)) #Resize or center crop and padding to be square are common approaches 
    buf = io.BytesIO() #temporary buffer to save processed image 
    image.save(buf, format="JPEG") #save as jpeg to reduce size
    image = buf.getvalue()
    image_b64 = base64.b64encode(image).decode() #convert to b64 string
    assert len(image_b64) < 180_000, "Image too large to upload." #ensure image is small enough
    return image_b64

In [None]:
image_file = "test_image.jpeg"
image_string = f"data:image/jpeg;base64,{process_image(image_file)}"

We can now add the image string the same way we added our text input in the payload of the request. When the NVCLIP NIM receives a string with this format, it will automatically load it as an image. 

```data:image/jpeg;base64,{b64_string}```

In [None]:
payload = {"input": [image_string], "model":"nvidia/nvclip"}

In [None]:
response = requests.post(base_url, headers=headers, json=payload)
response = response.json()
print(json.dumps(response, indent=2))

In the request reponse, we get the vector representation (embedding) of the input image. The next section will show how we can compare these vectors to determine similarity.

<p align="center">
  <img src="readme_assets/nvclip_diagram.png" />
</p>

## Part 2: Embeddings & Similarity

Part 1 showed how to use the NVCLIP API to convert image and text into an embedding. This embedding is a 1024 dimensional vector of floating point numbers. One of the main benefits of converting image and text into a vector with an embedding model, is it enables an easy way to determine how similar images and text are to each other by calculating the cosine similarity between the vectors.  NVCLIP was trained on millions of image text pairs, allowing it to learn the association betweeen text and images. This knowledge is then captured in the embeddings that NVCLIP generates allowing us to use it as a tool to determine similarity between text and images. 

The ability to directly compare images and text in this manner enables powerful capabilities such as zero shot classification, semantic search and is critical for Visual Language Models (VLMs) such as LLaVA. The rest of this notebook will focus on similarity and how to build a semantic search application with the NVCLIP embeddings.

In [None]:
payload = {
    "input": ["apple", "banana", image_string],
    "model": "nvidia/nvclip"
}
response = requests.post(base_url, headers=headers, json=payload)
response = response.json()
print(json.dumps(response, indent=2))

The NVCLIP API allows more than one input item at a time. In our payload, we can put a list of items as input. This list can be a mix of both text and images. Lets generate embeddings for the text "apple", "banana" and an image of an apple. 

In [None]:
apple_vec = response["data"][0]["embedding"]
banana_vec = response["data"][1]["embedding"]
apple_image_vec = response["data"][2]["embedding"]

These vectors that represent each of our inputs can now be compared to each other to get a measure of how similar they are too each other. This can be done by calculating the cosine similarity between two of the vectors. 

When using cosine similarity, you will see values between [-1, 1]. A value close to 1 means the vectors are very similar to each other. We can plot the similarity scores for our three vectors.

In [None]:
embedding_vectors = np.array([apple_vec, banana_vec, apple_image_vec]) #convert vectors to np arrays 
labels = ["'apple'", "'banana'", "apple_image"]
cosine_sim_matrix = cosine_similarity(embedding_vectors)
plt.figure(figsize=(8, 6))
sns.heatmap(cosine_sim_matrix[:, 2:3], annot=True, cmap='coolwarm', xticklabels=[labels[2]], yticklabels=labels)
plt.title('Cosine Similarity Heatmap')
plt.show()

From the heatmap, we can see that the similarity score between the image of an apple and the text "apple" is higher(closer to 1) than with the text "banana". Using NVCLIP and the similarity score between the embeddings, we were essentially able to classify the image as an apple. By adding more text labels and comparing against image embeddings, NVCLIP can be used to build a zero shot classification model that allows us to classify any images on arbitrary classes without any training!

## Part 3 Traffic Vehicle Search

Lets apply NV-CLIP to a practical application of vehicle search over a large number of traffic cameras. 

It is often a challenge that there is more footage collected than possible to manually review and find what you are looking for. We can use NVCLIP to help us search through a large dataset of objects to find what we want. In this example, we will search over images of vehicles that have been cropped out of traffic cameras. 

## Part 3.1 Prepare Dataset

The rest of the notebook will use the [STREETS dataset hosted by the University of Illinois](https://databank.illinois.edu/datasets/IDB-3671567) which is a collection of traffic camera images.

Run the following cells to download, view and prepare the traffic camera dataset for semantic search. This will download 148 MB of data.

For each dataset an user elects to use, the user is responsible for checking if the dataset license is fit for the intended purpose.

**Note**: If you are on Windows or the following cell fails to download and extract the dataset, then follow these steps to manually prepare the data:

1) Download only the "vehicleannotations.zip" file from [this page](https://databank.illinois.edu/datasets/IDB-3671567)
2) Place the "vehicleannotations.zip" in the same directory as this notebook
3) Make a new folder in the same directory as this notebook called "traffic_data"
4) In your file explorer, extract the "vehicleannotations.zip"
5) Open the extracted directory and copy the "vehicleannotations" folder and place it into the "traffic_data" folder
6) Ensure that in the same directory as your notebook you now have a folder

``` 
nvclip_semantic_search/  
├── nvclip_workshop.ipynb  
└── traffic_data/  
    └── vehicleannotations/  
        ├── images  
        └── annotations
```     
7) Once you verify the directory structure is correct, skip the following cell and continue with the notebook

In [None]:
#download dataset if not downloaded. SKIP if on Windown and follow the above instructions.
!wget -N https://databank.illinois.edu/datafiles/ht2io/download #155MB download
!unzip -q -o download -d traffic_data

In [None]:
dataset_folder = Path("traffic_data/vehicleannotations")
image_folder = dataset_folder/"images"
annotation_folder = dataset_folder/"annotations"
crop_image_folder = dataset_folder/"cropped_vehicles"

In [None]:
import matplotlib.pyplot as plt 
import math 
def plot_images(image_folder, num_images):
    image_files = os.listdir(image_folder)
    sample_image_paths = sample(image_files, num_images)
    image_paths = [image_folder/x for x in sample_image_paths]
    grid_size = (math.ceil(num_images/3), 3)
    fig, axes = plt.subplots(grid_size[0], grid_size[1], figsize=(15,15))
    fig.subplots_adjust()

    for i, ax in enumerate(axes.flat):
        ax.axis("off")
        if i >= num_images:
            break 
        img = Image.open(image_paths[i])
        ax.imshow(img)
        
    plt.show()

In [None]:
plot_images(image_folder, 9)

This dataset includes annotations for each image with bounding boxes of each car present in the image. Using these annotations we can crop out each car from the image. In this use case we will take advantage of this detection data being provided, however this could be combined with a detection model to first find the objects of interest and generate the bounding boxes. 

The following cells will crop out the vehicles and display them. After cropping out the vehicles there should be around 1600 images. Each NVCLIP request can accept up to 64 items so it will take 25 requests (25 credits) to embedd all the images. 

In [None]:
#crop out cars
annotation_file = annotation_folder/"vehicle-annotations.json"
with open(annotation_file, "r") as file:
    annotations = json.load(file)

In [None]:
def save_crop(image_path, bbox, output_path):
    image = Image.open(image_path)
    image = image.crop(bbox)
    image.save(output_path)

In [None]:
os.makedirs(crop_image_folder, exist_ok=True)

#all_bboxes = []
for key, value in tqdm(annotations.items()):
    """For each annotated vehicle, crop and save it"""
    file_name = value["filename"]

    for x, region in enumerate(value["regions"].values()): #each region is a vehicle 
        #convert polygon annotations to a bounding box 
        x_points = region["shape_attributes"]["all_points_x"] 
        y_points = region["shape_attributes"]["all_points_y"]
        bbox = [min(x_points), min(y_points), max(x_points), max(y_points)]
        area = (bbox[2]-bbox[0]) * (bbox[3] - bbox[1])
        if area < 10000: #skip crops that are too small 
            continue 
        #all_bboxes.append(bbox)
        file_path = image_folder/file_name
        save_crop(file_path, bbox, f"traffic_data/vehicleannotations/cropped_vehicles/{str(x).zfill(3)}_{file_name}") #save cropped out car

print(f"Cropped Vehicle Images: {len(os.listdir(crop_image_folder))}")

In [None]:
plot_images(crop_image_folder, 9)

Now that we have images that contain individual vehicles, we can use NVCLIP to generate embeddings for each one to start building our semantic search application. 

Instead of directly calling NV-CLIP through POST requests, we can wrap it in an easy to use Python class with multi-threading support to speed up the responses. To view the full code for this view the nvclip.py file in the same directory as this notebook. 

With this NV-CLIP class, we can instantiate a new object and then pass it a list of file paths to where the cropped vehicles have been stored. The class will handle the image processing and requests behind the scenes. 

In [None]:
from nvclip import NVCLIP 
nvclip = NVCLIP(api_key)
cropped_image_files = [str(crop_image_folder/x) for x in os.listdir(crop_image_folder)] #list of all image paths to give to nvclip
text_prompts = ["A School Bus"] #list of text prompts to embed and display with the images. 

Now we can call NVCLIP on the image paths, parse the reponse and store it in a dictionary format to make it easier to work with. 

In [None]:
#embed images
resp = nvclip(cropped_image_files, resize=False)
image_embedding_data = []
for i, data in enumerate(resp["data"]):
    data = {"id":i, "vector":data["embedding"], "file_name":cropped_image_files[i]}
    image_embedding_data.append(data)
    
image_embeddings = np.array([x["vector"] for x in image_embedding_data])
image_file_names = [x["file_name"] for x in image_embedding_data]

In [None]:
#embed text
resp = nvclip(text_prompts)
text_embedding_data = []
for i, data in enumerate(resp["data"]):
    data = {"id":i, "vector":data["embedding"], "text_prompt":text_prompts[i]}
    text_embedding_data.append(data)

text_embeddings = np.array([x["vector"] for x in text_embedding_data])

## 3.2 Plot Embeddings

Now that the image embeddings have been generated for all of the cropped vehicles, the embeddings can be projected onto a 2D plot to understand more about the embedding space and explore the clusters. 

The clusters in the plot should represent images and text that are similar to each other. For example cars of the same type, color and shape should be close to each other once plotted. 

We will also add a red dot representing the embedding for the phrase "A School Bus". This will allow us to identify the school bus cluster as it should appear near the images of school buses. 

Run the cell bellow to generate the plot. It may take a few minutes to appear after running the cell. 

Once it appears, hover your cursor around the plot to see how NVCLIP clusters the images. 

In [None]:
from sklearn.manifold import TSNE
import numpy as np
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, HoverTool, CustomJS
from bokeh.layouts import column
from bokeh.io import output_notebook
from pathlib import Path 
import base64

#helper to show images in the plot 
def encode_image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Enable Bokeh output in Jupyter Notebooks
output_notebook()

hosted_image_paths = [f"data:image/jpg;base64,{encode_image_to_base64(x)}" for x in image_file_names]
combined_embeddings = np.vstack([text_embeddings, image_embeddings])
contents = text_prompts.copy()
contents.extend(hosted_image_paths)
content_names = text_prompts.copy()
content_names.extend([str(Path(x).name) for x in image_file_names])

content_types = ["text"]
content_types.extend(["image"] * len(image_file_names))

# Apply t-SNE
tsne = TSNE(n_components=2, perplexity=30, learning_rate=200, early_exaggeration=30, n_iter=2000, random_state=42, metric="cosine")
embedding_2d = tsne.fit_transform(combined_embeddings)

split_index = len(text_prompts)
# Prepare text data for Bokeh
text_source = ColumnDataSource(data=dict(
    x=embedding_2d[0:split_index, 0],
    y=embedding_2d[0:split_index, 1],
    content=contents[0:split_index],
    content_name=content_names[0:split_index],
    content_type = content_types[0:split_index]
))

# Prepare image data for Bokeh
image_source = ColumnDataSource(data=dict(
    x=embedding_2d[split_index:, 0],
    y=embedding_2d[split_index:, 1],
    content=contents[split_index:],
    content_name=content_names[split_index:],
    content_type = content_types[split_index:]
))

# Create plot
p = figure(title="NVCLIP Embedding Visualization", tools="pan,wheel_zoom,box_zoom,reset,hover,save")

image_renderer= p.scatter('x', 'y', size=8, source=image_source, color='blue', legend_label="Image Embedding")

# Highlight the text embedding with a different color
text_renderer = p.scatter('x', 'y', size=10, source=text_source, color='red', legend_label='Text Embedding')

#tooltip to display text
text_hover = HoverTool(renderers=[text_renderer], tooltips="""

    <h2>@content</h2>
    
""")
p.add_tools(text_hover)

# tooltip to display images 
image_hover = HoverTool(renderers=[image_renderer], tooltips="""
    <div>
        <div>
            <span style="font-size: 15px;">@content_name</span>
        </div>
        <div>
            <img src="@content" height="100" alt="@file_name" style="float: left; margin: 0px 15px 15px 0px;" />
        </div>
    </div>
""")
p.add_tools(image_hover)

# Layout
layout = column(p)

# Show plot in notebook
show(layout)


To build a semantic search application, we need something to help automate the process of storing and searching our vectors. We can use a vector database to do this. The next section will show how to use the Milvus Vector Database. 

## 3.3 Vector Database 

Milvus is a Vector Database that can be used for quick experimentation without any setup required other than installing their Python library. The following cell will create a database and setup a collection that we will use to store our NVCLIP embeddings. We then use the insert function to add all the embeddings that were just displayed on the plot in the previous section. By putting these embeddings into Milvus, it will make it easier to search the vectors. 

To learn more about how to use Milvus, visit their [quickstart guide](https://milvus.io/docs/quickstart.md). 

In [None]:
from pymilvus import MilvusClient

#create database 
client = MilvusClient("milvus_demo.db")

if not client.has_collection(collection_name="demo_collection"):
    #create collection in database. This will associate a vector with the metadata
    client.create_collection(
        collection_name="demo_collection",
        dimension=1024 #NVCLIP output dimension
        )
    res = client.insert(collection_name="demo_collection", data=image_embedding_data)
    print(res)
client.close()

The database now has our embeddings loaded and we can provide a new vector to search for the most similar embeddings in the database. The following cell will take the string assigned to the `text_search` variable and then use Milvus to search for the 5 most common images in our vector database. 

In [None]:
#Embed text search query 
text_search = "A School Bus"
resp = nvclip([text_search]) #generate embedding 
text_embedding = resp["data"][0]["embedding"]

#Search vector DB for closest images 
client = MilvusClient("milvus_demo.db")
results = client.search(collection_name="demo_collection", data=[text_embedding_data[0]["vector"]], limit=5, output_fields=["file_name"]) #search for 5 most similar vectors 
#print results 
for x in results:
    print(x)
client.close()

We can wrap this logic in a while loop and plot the most common images to make a simple semantic search app. Run the following cell to search the database of images through text prompts. A text box will appear at the bottom of the cell where you can type your search terms. The closest matching images will then be displayed. This will run in a loop forever so you must stop the cell manually. 

In [None]:
while True:
    client = MilvusClient("milvus_demo.db")
    text_prompt = input("Enter Search Term:")
    resp = nvclip([text_prompt])
    query_vector = resp["data"][0]["embedding"]
    results = client.search(collection_name="demo_collection", data=[query_vector], limit=9, output_fields=["file_name"])
    client.close()
    image_paths = [x["entity"]["file_name"] for x in results[0]]
    
    num_images = 9
    grid_size = (3,3)
    
    fig, axes = plt.subplots(grid_size[0], grid_size[1], figsize=(15,15))
    fig.subplots_adjust()
    
    for i, ax in enumerate(axes.flat):
        img = Image.open(image_paths[i])
        ax.imshow(img)
        ax.axis("off")
    plt.show()
    plt.close(fig)

## 3.4 Interactive Gradio UI for Semantic Search

Now we can put all this together to build a cohesive Gradio UI that allows us to easily search any provided folder of images. The code for this application can be found in the same folder as this notebook under "main.py". 

When the script is launched it will automatically generate embeddings for the passed in folder of images and store them in Milvus. The Gradio UI then allows you to actively search the database through text and image prompts.

Run the cell below to launch the Gradio UI. It will start the application with the cropped vehicle images generated earlier in this notebook. After running the cell below, you can access the gradio UI at http://localhost:7860

In [None]:
!{python_exe} main.py traffic_data/vehicleannotations/cropped_vehicles {api_key}