# Similarity search with Milvus vector index

In this tutorial, we demonstrate the integration of Milvus vector store to EvaDB by conducting an image-level similarity search on a collection of Reddit images. We employ the classic `SIFT` feature, which is  to identify images with a strikingly similar appearance (image-level pipeline).

Within EvaDB, multiple vector stores are supported, including but not limited to `FAISS` and `QDRANT`. With the  integration of `MILVUS` as a vector store, we enrich our selection of vector stores for building indexes. This enhancement allows us to tailor our choice of vector stores to specific application requirements, taking full advantage of the diverse functionalities they offer.

Note: This tutorial is a modified version of the first part of https://github.com/georgia-tech-db/evadb/blob/staging/tutorials/11-similarity-search-for-motif-mining.ipynb, which is created for conducting image-level similarity search using `FAISS`. We have adapted the content to demonstrate the same search functionality using `MILVUS`. While the core principles of similarity search remain the same, this tutorial focuses on implementing them with `MILVUS`, providing you with an alternative approach to achieving similar results. 

### Connect to EvaDB


In [1]:
import evadb
cursor = evadb.connect().cursor()
import warnings
warnings.filterwarnings("ignore")

### Download reddit dataset

In [2]:
!wget -nc https://www.dropbox.com/scl/fo/fcj6ojmii0gw92zg3jb2s/h\?dl\=1\&rlkey\=j3kj1ox4yn5fhonw06v0pn7r9 -O reddit-images.zip
!unzip -o reddit-images.zip -d reddit-images

File ‘reddit-images.zip’ already there; not retrieving.
Archive:  reddit-images.zip
mapname:  conversion of  failed
 extracting: reddit-images/g348_d7jgzgf.jpg  
 extracting: reddit-images/g348_d7jphyc.jpg  
 extracting: reddit-images/g348_d7ju7dq.jpg  
 extracting: reddit-images/g348_d7jhhs3.jpg  
 extracting: reddit-images/g1074_d4n1lmn.jpg  
 extracting: reddit-images/g1074_d4mxztt.jpg  
 extracting: reddit-images/g1074_d4n60oy.jpg  
 extracting: reddit-images/g1074_d4n6fgs.jpg  
 extracting: reddit-images/g1190_cln9xzr.jpg  
 extracting: reddit-images/g1190_cln97xm.jpg  
 extracting: reddit-images/g1190_clna260.jpg  
 extracting: reddit-images/g1190_clna2x2.jpg  
 extracting: reddit-images/g1190_clna91w.jpg  
 extracting: reddit-images/g1190_clnad42.jpg  
 extracting: reddit-images/g1190_clnajd7.jpg  
 extracting: reddit-images/g1190_clnapoy.jpg  
 extracting: reddit-images/g1190_clnarjl.jpg  
 extracting: reddit-images/g1190_clnavnu.jpg  
 extracting: reddit-images/g1190_clnbalu.j

### Load all images into evadb

In [3]:
cursor.query("DROP TABLE IF EXISTS reddit_dataset;").df()
cursor.query("LOAD IMAGE 'reddit-images/*.jpg' INTO reddit_dataset;").df()

Unnamed: 0,0
0,Number of loaded IMAGE: 34


### Register a SIFT FeatureExtractor 
It uses `kornia` library to extract sift features for each image

In [4]:
cursor.query("DROP FUNCTION IF EXISTS SiftFeatureExtractor;").df()
cursor.query("""
    CREATE FUNCTION SiftFeatureExtractor
    IMPL '../evadb/functions/sift_feature_extractor.py'
""").df()

Unnamed: 0,0
0,Function SiftFeatureExtractor added to the dat...


In [5]:
# Keep track of which image gets the most votes
from collections import Counter
vote = Counter()

## Image-level similarity search pipeline. 
This pipeline creates one vector per image. Next, we should breakdown steps how we build the index and search similar vectors using the index.

In [6]:
#1. Create index for the entire image
cursor.query("""DROP INDEX IF EXISTS reddit_sift_image_index""").df()
cursor.query("""
    CREATE INDEX reddit_sift_image_index 
    ON reddit_dataset (SiftFeatureExtractor(data)) 
    USING MILVUS
""").df()



Unnamed: 0,0
0,Index reddit_sift_image_index successfully add...


In [7]:
#2. Search similar vectors
response = cursor.query("""
    SELECT name FROM reddit_dataset ORDER BY
    Similarity(
      SiftFeatureExtractor(Open('reddit-images/g1074_d4mxztt.jpg')),
      SiftFeatureExtractor(data)
    )
    LIMIT 5
""").df()

In [8]:
#3. Update votes
for i in range(len(response)):
    vote[response["name"][i]] += 1
print(vote)

Counter({'reddit-images/g1074_d4mxztt.jpg': 1, 'reddit-images/g348_d7ju7dq.jpg': 1, 'reddit-images/g1209_ct6bf1n.jpg': 1, 'reddit-images/g1190_cln9xzr.jpg': 1, 'reddit-images/g1190_clna2x2.jpg': 1})
