 # CLIP Model Adapter Tutorial



 This notebook provides a step-by-step tutorial for using the CLIP model adapter to fine-tune and embed.

 There are two main steps in finetuning:

 1. Preparing the dataset with descriptions

 2. Running the model adapter for training and embedding



 Let's get started!

 ## Import Required Libraries

In [1]:
import time
import random
import string
import dtlpy as dl
import pandas as pd

from pathlib import Path
from concurrent.futures import ThreadPoolExecutor


 ## Set Up Dataloop Environment



 First, we need to set up our Dataloop environment and get our project. You'll need to replace project and dataset names with your own values.

> **_NOTE:_**  This tutorial assumes you are working in a new project which does NOT have the CLIP model previously installed. If it's an existing project and you already have CLIP installed, you will need to get the appropriate app and base CLIP model entity for the rest of the code to work correctly.

In [2]:
if dl.token_expired():
    dl.login()
dl.setenv("prod") #TODO DELETE ME
PROJECT_NAME = "test clip FT8" #"<your project name here>"
project = dl.projects.create(project_name=PROJECT_NAME)

 ## Prepare Dataset with Descriptions


This stage has two steps: first create the image dataset with uploaded descriptions, then convert into prompts and responses in prompt items. 

For this tutorial we will install the Mars Surface Images Datasets from the Dataloop Marketplace. 

In [3]:
dpk = dl.dpks.get(dpk_name="mars-surface-images")
app = project.apps.install(dpk=dpk)
print(f"Mars Surface Datasets installed: {app.name}")

Command Progress: 100%|██████████| 100/100 [00:02<00:00, 45.37it/s]
Mars Surface Datasets installed: Mars Surface Images


Lets get the captioned dataset and split the data for training.  You may need to wait a few minutes after installing the app until the dataset has completed loading.

In [4]:
dataset = project.datasets.get(dataset_name="Mars Surface Images with Captions")

SUBSET_PERCENTAGES = {'train': 80, 'validation': 10, 'test': 10}
dataset.split_ml_subsets(
        items_query=dl.Filters(field='type', values='file'),
        percentages=SUBSET_PERCENTAGES
    )

Command Progress: 100%|██████████| 100/100 [00:06<00:00, 15.94it/s]


True

Alternatively, if you'd like to upload your own dataset you can use the function code below.

In [5]:
def create_new_dataset(dataset_name, pairs_df, subset_percentages={'train': 60, 'validation': 20, 'test': 20}):
    """
    Creates a new dataset from a CSV file containing image paths and descriptions

    Args:
        dataset_name (str): Name of the dataset to create
        pairs_df (pd.DataFrame): DataFrame containing 'filepath' and 'img_description' columns
        subset_percentages (dict): Dictionary containing the percentages for each subset
        default is 60% train, 20% validation, 20% test
        can be changed to any other percentages as long as the sum is 100
    """

    try:
        dataset = project.datasets.create(dataset_name=dataset_name)
    except dl.exceptions.BadRequest:
        # Generate 5 random alphanumeric characters
        suffix = ''.join(random.choices(string.ascii_letters + string.digits, k=5))
        dataset = project.datasets.create(dataset_name=f"{dataset_name}_{suffix}")

    def upload_item(row):
        file_path = row["filepath"]
        annots_path = file_path.replace("items", "json")
        
        # Upload item with annotations
        item = dataset.items.upload(
            local_path=file_path,
            local_annotations_path=annots_path,
            item_metadata=dl.ExportMetadata.FROM_JSON,
            overwrite=True,
        )

        # Set description and update
        item.set_description(text=row["description"])
        item.update()

    # Use ThreadPoolExecutor to upload items in parallel with progress bar
    with ThreadPoolExecutor() as executor:
        from tqdm import tqdm
        list(tqdm(
            executor.map(upload_item, [row for _, row in pairs_df.iterrows()]),
            total=len(pairs_df),
            desc="Uploading items",
            unit="item",
            bar_format='{l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, {rate_fmt}]'
        ))

    # Since model training requires labels, we create a dummy label for the recipe
    dataset.add_labels(label_list=['free-text'])

    return dataset

This next section is copied directly from the [CLIP model adapter repo](https://github.com/dataloop-ai-apps/clip-model-adapter/blob/main/utils/prepare_dataset.py), under `utils/prepare_dataset.py`. The `ClipPrepare` class contains functions to help us convert an existing dataset into a prompt item dataset, making it possible to train CLIP.

In [6]:
class ClipPrepare:
    @staticmethod
    def convert_dataset(dataset, keep_subsets=None):
        dataset_to = ClipPrepare.convert_to_prompt_dataset(dataset_from=dataset, keep_subsets=keep_subsets)
        return dataset_to

    @staticmethod
    def convert_to_prompt_dataset(dataset_from: dl.Dataset, keep_subsets):
        items = dataset_from.items.list()
        try:
            dataset_to = dataset_from.project.datasets.get(dataset_name=f"{dataset_from.name} prompt items")
            if dataset_to.items_count > 0:
                suffix = ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(5))
                dataset_to = dataset_from.project.datasets.create(
                    dataset_name=f"{dataset_from.name} prompt items-{suffix}")
        except Exception as e:
            print("Prompt item dataset does not exist or already contains items. Creating new prompt item dataset.")
            suffix = ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(5))
            dataset_to = dataset_from.project.datasets.create(dataset_name=f"{dataset_from.name} prompt items-{suffix}")

        # use thread multiprocessing to get items and convert them to prompt items
        all_items = items.all()
        with ThreadPoolExecutor() as executor:
            _ = executor.map(lambda item: ClipPrepare._convert_item(item_id=item.id, dataset=dataset_to, existing_subsets=keep_subsets), all_items)

        new_recipe = dataset_from.get_recipe_ids()[0]
        dataset_to.switch_recipe(new_recipe)
        return dataset_to

    # add captions for the item either from description or from directory name
    @staticmethod
    def _convert_item(item_id, dataset: dl.Dataset, existing_subsets=True):
        item = dl.items.get(item_id=item_id)
        if item.description is not None:
            caption = item.description
        else:
            print(f"Item {item.id} has no description. Trying directory name.")
            item_dir = item.dir.split('/')[-1]
            if item_dir != '':
                print(f"Using directory name: {item_dir}")
                caption = "this is a photo of a " + item_dir
            else:
                print(f"Item {item.id} has no directory name. Using empty string.")
                caption = ''
        new_name = Path(item.name).stem + '.json'

        prompt_item = dl.PromptItem(name=new_name)
        prompt_item.add(message={"content": [{"mimetype": dl.PromptType.IMAGE,  # role default is user
                                              "value": item.stream}]})
        new_metadata = item.metadata
        if existing_subsets is True:
            new_metadata["system"] = new_metadata.get("system", {})
            new_metadata["system"]["subsets"] = item.metadata.get("system", {}).get(
                "subsets", {}
            )

        new_item = dataset.items.upload(
            prompt_item,
            remote_name=new_name,
            remote_path=item.dir,
            overwrite=True,
            item_metadata=new_metadata,
        )
        prompt_item._item = new_item
        prompt_item.add(message={"role": "assistant",
                                 "content": [{"mimetype": dl.PromptType.TEXT,
                                              "value": caption}]})

        return new_item

Now all the functions are ready, you can create your dataset, upload all the images and descriptions, and convert it into a prompt item dataset. We upload the image first because the Dataloop prompt item entities cannot store images on their own. Images must be uploaded as their own items before creating the corresponding prompt item with the image as the prompt and the description as the text annotation prompt response.

In [7]:
prompt_dataset = ClipPrepare.convert_dataset(dataset=dataset, keep_subsets=True)

Prompt item dataset does not exist or already contains items. Creating new prompt item dataset.
Iterate Entity: 0it [00:00, ?it/s]


Now you should have two datasets: one with the original images and descriptions, and one with the prompt items.


 ## Run Model Adapter



 Now that we have our dataset prepared, we can use the CLIP model adapter for training and embedding.

 First lets install the CLIP dpk.

In [8]:
dpk = dl.dpks.get(dpk_name='clip-model-pretrained')
app = project.apps.install(dpk=dpk)
print(f"CLIP App installed: {app.name}")

Command Progress: 100%|██████████| 100/100 [00:02<00:00, 45.53it/s]
CLIP App installed: OpenAI CLIP


Now we need to clone the base CLIP model entity to prepare a new one for finetuning.

In [9]:
base_model = project.models.get(model_name="openai-clip")

# Configure model metadata and subsets
base_model.metadata["system"] = {}
base_model.metadata["system"]["subsets"] = {}

train_filters = dl.Filters(field="metadata.system.tags.train", values=True)
val_filters = dl.Filters(field="metadata.system.tags.validation", values=True)

base_model.metadata["system"]["subsets"]["train"] = train_filters.prepare()
base_model.metadata["system"]["subsets"]["validation"] = val_filters.prepare()

# Set model configuration (optional)
base_model.configuration = {
    "model_name": "ViT-B/32",
    "embeddings_size": 512,
    "num_epochs": 50,
    "batch_size": 200,
    "learning_rate": 5e-5,
    "early_stopping": True,
    "early_stopping_epochs": 5,
}
base_model.output_type = "text"


Now we can clone the pretrained CLIP model entity, set our dataset on the new model, and train. 

> **NOTE**: The training process might take some time depending on your dataset size and model configuration.

In [None]:
finetuned_model_name = base_model.name + "-finetuned"
finetuned_model = base_model.clone(model_name=finetuned_model_name, dataset=prompt_dataset)
execution = finetuned_model.train()

In [None]:
# Wait for training to complete
print("Waiting for training to complete...")

while execution.in_progress():
    print("Training in progress... checking again in 5 minutes")
    time.sleep(300)  # Sleep for 5 minutes
    execution = dl.executions.get(execution_id=execution.id)

if execution.get_latest_status()['status'] == "success":
    print("Training completed!")

Waiting for training to complete...
Training in progress... checking again in 5 minutes


Once the model has completed training, you can deploy your model and embed your images for better semantic search.

In [24]:
finetuned_model.deploy()

Service(created_at='2025-06-10T14:26:38.877Z', creator='yaya.t@dataloop.ai', version='1.0.0', package_id='68359a1d8101e8ec39d63dfd', package_revision='1.0.2', bot='bot.441a1408-133b-4338-abe7-585e3de058b0@bot.dataloop.ai', init_input={'model_entity': '68483ee8ca62bef2c6b2caf6'}, module_name='clip-module', name='predict-68483ee8ca62bef2c6b2ca-2r11', url='https://gate.dataloop.ai/api/v1/services/6848409e3c8bd8deabde2e04', id='6848409e3c8bd8deabde2e04', active=True, queue_length_limit=None, run_execution_as_process=False, execution_timeout=3600, drain_time=600, on_reset='failed', _type=None, project_id='5f1545c9-57b4-4821-82d6-adbb644d3d3b', org_id='778686bd-7941-4903-b2d5-ba638d8fabc8', is_global=False, max_attempts=3, metadata={'ml': {'modelId': '68483ee8ca62bef2c6b2caf6', 'modelOperation': 'deploy'}}, updated_by=None, app={'id': '68483eb7e544ef904680d4f3', 'dpkName': 'clip-model-pretrained', 'dpkVersion': '1.0.2', 'dpkId': '67a4bb4d4a998a3ee9dfcdd7'}, integrations=[])

It may take a minute for the model to successfully deploy. Once the model is deployed and the service is up, you can embed your dataset. You'll need to get the updated model entity from the platform before embedding. 

In [26]:
finetuned_model = project.models.get(model_id=finetuned_model.id)
finetuned_model.embed_datasets(dataset_ids=[dataset.id])

Command Progress: 100%|██████████| 100/100 [00:02<00:00, 44.99it/s]


Command(id='684840c04e7979cf89bee10a', status='success', created_at='2025-06-10T14:27:12.754Z', updated_at='2025-06-10T14:27:12.979Z', type='EmbedDatasetsCommandSettings', progress=100, spec={'datasetIds': ['68483ead292a9f3bf0d844c3'], 'config': {'serviceId': '68484081f79dac7c437b9763'}, 'modelId': '68483ee8ca62bef2c6b2caf6', 'attachTrigger': False}, error=None)