<a href="https://colab.research.google.com/github/Snehil-Shah/MultiModal-Vector-Semantic-Search-Engine/blob/main/images.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Image to Semantic Embeddings

**Aim**: Encode around 50k jpg/jpeg images into vector embeddings using a vision tranformer model and upsert them into a vector database for clustering and querying

In [None]:
!pip install jupyter pandas qdrant_client pyarrow datasets

# Load Dataset
This is the Open Images Dataset by CVDFoundation which hosts over 9 mil images. We will be working with a smaller subset.

The dataset currently is a tsv file, with the first column representing a URL to a hosted jpg/jpeg image.

In [98]:
import pandas as pd
data = pd.read_csv('open-images-dataset-validation.tsv', sep='\t', header=None).reset_index()
print(data.shape, data.head(), sep="\n")

(41620, 4)
   index                                                  0        1  \
0      0  https://c2.staticflickr.com/6/5606/15611395595...  2038323   
1      1  https://c6.staticflickr.com/3/2808/10351094034...  1762125   
2      2  https://c2.staticflickr.com/9/8089/8416776003_...  9059623   
3      3  https://farm3.staticflickr.com/568/21452126474...  2306438   
4      4  https://farm4.staticflickr.com/1244/677743874_...  6571968   

                          2  
0  I4V4qq54NBEFDwBqPYCkDA==  
1  38x6O2LAS75H1vUGVzIilg==  
2  4ksF8TuGWGcKul6Z/6pq8g==  
3  R+6Cs525mCUT6RovHPWREg==  
4  JnkYas7iDJu+pb81tfqVow==  


## Download the images
We need the image data locally to feed it to the model

In [99]:
import urllib
import os

def download_file(url):
    basename = os.path.basename(url)
    target_path = f"./images/{basename}"
    if not os.path.exists(target_path):
        try:
            urllib.request.urlretrieve(url, target_path)
        except urllib.error.HTTPError:
            return None
    return target_path

# The Model
We will be using a pre-trained model. Contrastive Language-Image Pre-training (CLIP) model developed by OpenAI is a multi-modal Vision Transformer model that can extract the visual features from the image into vector embeddings

We will be storing these vector embeddings in a vector space database, where images will be clustered based on their semantic information ready for querying

In [None]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("clip-ViT-B-32")

# The Vector Database

Qdrant is an open-source vector database, where we can store vector embeddings and query nearest neighbours of a given embedding to create a recommendation/semantic search engine

We start by initializing the Qdrant client and connecting to the cluster hosted on Qdrant Cloud

In [None]:
from qdrant_client import QdrantClient
from qdrant_client.http import models as rest
from google.colab import userdata

qdrant_client = QdrantClient(
    url = userdata.get('QDRANT_CLUSTER_URL'),
    api_key = userdata.get('QDRANT_CLUSTER_API_KEY'),
)
qdrant_client.recreate_collection(
   collection_name="images",
   vectors_config = rest.VectorParams(size=512, distance = rest.Distance.COSINE),
)

Function to upsert an embedding to the collection

In [76]:
def upsert_to_db(id, vector, payload):
  qdrant_client.upsert(
   collection_name="images",
   points=[
      rest.PointStruct(
            id=id,
            vector=vector.tolist(),
            payload=payload
      )
   ]
)

In [None]:
for i, link in data.iloc[:, :2].iterrows():
  img = download_file(link[0])
  if(img):
    embedding = model.encode(str(img))
    upsert_to_db(i,embedding, {"link":link[0]})
    print(f"upserted {i}")