<a href="https://colab.research.google.com/github/aniket-alt/Clustering_Assignment/blob/main/Task(h)Image_Clustering_with_ImageBind_Embeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task (h): Image Clustering with ImageBind Embeddings

In this task, we move beyond simple color and texture analysis. We use ImageBind as a sophisticated feature extractor for visual data. Traditional image clustering often struggles with variations in lighting, angles, or background. However, by using ImageBind’s Vision Transformer (ViT) backbone, we convert images into high-level semantic embeddings. This allows our clustering algorithm to group images based on the objects and concepts they contain (e.g., 'a vehicle' vs. 'an animal') rather than just similar pixel distributions.

In [14]:
# --- 1. SYSTEM-LEVEL INJECTION (Fixes the ModuleNotFoundError) ---
!pip install torchcodec --index-url https://download.pytorch.org/whl/cu126
!pip install ftfy

# --- 1. SYSTEM-LEVEL INJECTION ---
import sys
import torchvision.transforms.functional as F
from types import ModuleType

# Fix the torchvision bug before anything else starts
mock_module = ModuleType("torchvision.transforms.functional_tensor")
mock_module.__dict__.update(F.__dict__)
sys.modules["torchvision.transforms.functional_tensor"] = mock_module

# --- 2. ENVIRONMENT SETUP ---
import os
%cd /content
!rm -rf ImageBind
!git clone https://github.com/facebookresearch/ImageBind.git
%cd ImageBind
!pip install pytorchvideo timm fvcore -q
!pip install . --no-deps -q

# --- 3. IMPORTS ---
import torch
import numpy as np
from sklearn.cluster import KMeans
from imagebind import data
from imagebind.models import imagebind_model
from imagebind.models.imagebind_model import ModalityType
from PIL import Image

# --- 4. LOAD MODEL ---
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = imagebind_model.imagebind_huge(pretrained=True)
model.eval()
model.to(device)

# --- 5. DATA PREPARATION ---
os.makedirs("task_h_images", exist_ok=True)
urls = {
    "Dog": "https://raw.githubusercontent.com/facebookresearch/ImageBind/main/.assets/dog_image.jpg",
    "Bird": "https://raw.githubusercontent.com/facebookresearch/ImageBind/main/.assets/bird_image.jpg",
    "Car": "https://raw.githubusercontent.com/facebookresearch/ImageBind/main/.assets/car_image.jpg"
}

valid_paths = []
processed_names = []

for name, url in urls.items():
    path = f"task_h_images/{name}.jpg"
    !wget -O {path} {url} -q
    try:
        with Image.open(path) as img:
            img.verify()
            valid_paths.append(path)
            processed_names.append(name)
    except:
        print(f"Skipping {name} due to download error.")

# --- 6. EXECUTION ---
with torch.no_grad():
    inputs = {ModalityType.VISION: data.load_and_transform_vision_data(valid_paths, device)}
    embeddings = model(inputs)
    vis_emb = embeddings[ModalityType.VISION].cpu().numpy()

# Cluster into 2 groups
kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)
labels = kmeans.fit_predict(vis_emb)

# --- 7. OUTPUT ---
print("\n" + "="*40)
print("IMAGE CLUSTERING SUCCESSFUL")
print("="*40)
for i, name in enumerate(processed_names):
    print(f"{name.ljust(20)} -> Cluster {labels[i]}")

Looking in indexes: https://download.pytorch.org/whl/cu126
/content
Cloning into 'ImageBind'...
remote: Enumerating objects: 187, done.[K
remote: Counting objects: 100% (120/120), done.[K
remote: Compressing objects: 100% (67/67), done.[K
remote: Total 187 (delta 84), reused 54 (delta 53), pack-reused 67 (from 3)[K
Receiving objects: 100% (187/187), 2.65 MiB | 5.81 MiB/s, done.
Resolving deltas: 100% (92/92), done.
/content/ImageBind
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for imagebind (setup.py) ... [?25l[?25hdone
Downloading imagebind weights to .checkpoints/imagebind_huge.pth ...


100%|██████████| 4.47G/4.47G [08:14<00:00, 9.71MB/s]



IMAGE CLUSTERING SUCCESSFUL
Dog                  -> Cluster 0
Bird                 -> Cluster 0
Car                  -> Cluster 1
