only works for a certain nvidia gpu
conda install faiss-gpu required

Let’s try to build our own (small scale) Google Image Search! For that, we need to:
Convert the images into embeddings (=vectors) with Clip
Index the image vectors with Faiss
Build the image search using the data from the previous steps

In [3]:
from sentence_transformers import SentenceTransformer
from PIL import Image
img_model=SentenceTransformer('clip-ViT-B-32')
images=[Image.open('/Users/oliverzimmermann/Desktop/VisKomm3_A5/Berthold.Hannes.VK3.WiSe23.A5/Berthold.Hannes.VK3.WiSe23.A5_8.jpg')]
embeddings=img_model.encode(images)


Encoding a single image takes ~20 ms with a single Nvidia V100 GPU and 1 million images takes ~90 minutes. With a large number of images, it’s good to encode the images in larger batches to minimize the overhead of sending the data to the GPU.
To efficiently look up the most similar images for a given text query, we need to index them. -> Faiss. Faiss is a library from Facebook for efficient similarity search and clustering of dense vectors. It offers many different functionalities, such as:
Basic vector similarity search without any clustering or compression
Partitioned index with Voronoi cells to do an approximate search (to speed up the search)
Vector compression using product quantization (to reduce the memory footprint)
Building the index
I chose the IndexIVFFlat index type, which creates a partitioned index to allow faster lookup. The vectors are grouped into clusters (Voronoi cells) and the search checks the vectors from the best cluster(s). Which allows faster searches but might not always return the most accurate results. You can balance between speed and accuracy by choosing the number of clusters but also how many clusters to visit when searching.

In [4]:
import math
import faiss
from faiss import index_factory
COUNT = embeddings.shape[0]
DIMENSIONS = embeddings.shape[1]
storage='Flat'
cells=min(round(math.sqrt(COUNT)),int(COUNT/39))
params=f"IVF{cells},{storage}"
index=index_factory(DIMENSIONS,params)
res=faiss.StandardGpuResources()
index=faiss.index_cpu_to_gpu(res,0,index)

AttributeError: module 'faiss' has no attribute 'StandardGpuResources'

The index_factory function allows building those composite indexes easily since we need an index to find the best cluster and then another index for the vectors in the cluster.
The training just finds the most optimal cluster centroids so you don’t necessarily need to train it with all the indexes. I’m also adding the vectors to the index with IDs so it will be easier to look up the actual image files. The ID is just a unique random number which will be also used as the filename of the image on GCS.


In [None]:
ids=[12345]
filenames=[f"gs://<bucket>/images/{id}.jpg" for id in ids]
index.train(embeddings)
index.add_with_ids(embeddings, ids)


The image search
Finding for the most similar images for the given text is just a vector similarity search:
Convert the text into a query vector
Find the most similar vectors from the index for the query vector
Lookup the image files from GCS using the (image) vector ID

In [None]:
text_model=SentenceTransformer('sentence-transformers/clip-ViT-B-32-multilingual-v1')
query='men playing soccer'
embedding=text_model.encode([query])
def normalize_L2(embedding):
    k=0
    for i in embedding:
        j=i*i
        k+=j
    l=math.sqrt(k)
    for i in range(len(embedding)):
        embedding[i]=embedding[i]/l
    return embedding
    
probabilities,ids=index.search(embedding,COUNT)
ids=ids[0]
probabilities=probabilities[0]
