# Demo - from embedding to visulization

Only needs to prepare some image data to observe the ANNS process of this dataset.

- Embedding
    - towhee
- Index Building
    - faiss - ivf_flat
    - hnswlib - hnsw
- Visualizaton for ANNS
    - [federpy](https://github.com/zilliztech/feder)
    - [more cases](https://colab.research.google.com/drive/12L_oJPR-yFDlORpPondsqGNTPVsSsUwi#scrollTo=N3qqBAYxYcbt)
    
![image](fig/pipeline.png)

## Embedding - Towhee
- dataset: https://github.com/towhee-io/data/raw/main/image/reverse_image_search.zip
- operator: resnet50

In [None]:
from towhee import pipeline
import numpy as np
from pathlib import Path

embedding_pipeline = pipeline('towhee/image-embedding-resnet50')
dataset_path = "./imageNet_subset/train"

images = []
vectors = []


for img_class_path in Path(dataset_path).glob('*'):
    for img_path in Path(img_class_path).glob('*'):
        vec = embedding_pipeline(str(img_path))
        norm_vec = vec / np.linalg.norm(vec)
        vectors.append(norm_vec.tolist())
        images.append(str(img_path))

vectors_float32 = np.array(vectors, dtype="float32")

In [6]:
# save the vectors and paths
import pandas as pd

d = pd.DataFrame(vectors)
d['image_path'] = d.apply(lambda x: images[int(x.name)] if int(x.name) < len(images) else False)
d.to_csv('imageNet_subset_names_vectors.csv')

## Index Building - Faiss

In [35]:
import faiss

dim = vectors_float32.shape[1]
nlist = 128
faiss_index = faiss.index_factory(dim, 'IVF%s,Flat' % nlist)
faiss_index.train(vectors_float32)
faiss_index.add(vectors_float32)

# save the index
faiss_index_file_name = 'faiss_image_net.index'
faiss.write_index(faiss_index, faiss_index_file_name)



## Index Building - Hnswlib

In [34]:
import hnswlib

dim = vectors_float32.shape[1]
max_elements = vectors_float32.shape[0]
hnsw_index = hnswlib.Index(space='l2', dim=dim)
hnsw_index.init_index(max_elements=max_elements, ef_construction=30, M=6)
hnsw_index.add_items(vectors_float32)

# save the index
hnsw_index_file_name = 'hnswlib_image_net.index'
hnsw_index.save_index(hnsw_index_file_name)

## Vis for ivf_flat - FederPy

The jupyter-notebook in github can't display html, you can open https://alwayslove2013.github.io/feder_case/feder_ivf_flat_image_net.html

In [1]:
from federpy.federpy import FederPy

viewParams = {
    "width": 950,
    "height": 600,
    "mediaType": "img",
    "mediaUrls": images,
    "fineSearchWithProjection": 1,
    "projectMethod": "umap"
}
federPy = FederPy(faiss_index_file_name, 'faiss', **viewParams)
# federPy.overview()
federPy.setSearchParams({"k": 5, "nprobe": 6})
# federPy.searchRandTestVec()
federPy.searchById(833)

NameError: name 'images' is not defined

In [39]:
feder_ivf_flat_file_name = 'feder_ivf_flat_image_net.html'
useIPythonDisplay = False
with open(feder_ivf_flat_file_name, 'w') as f:
    f.write(federPy.searchById(833, useIPythonDisplay))

## Vis for hnsw - FederPy

The jupyter-notebook in github can't display html, you can open https://alwayslove2013.github.io/feder_case/feder_hnsw_image_net.html

In [41]:
from federpy.federpy import FederPy

viewParams = {
    "width": 950,
    "height": 600,
    "mediaType": "img",
    "mediaUrls": images,
}
federPy = FederPy(hnsw_index_file_name, 'hnswlib', **viewParams)
# federPy.overview()
federPy.setSearchParams({"k": 4, "ef_search": 6})
# federPy.searchRandTestVec()
federPy.searchById(833)

In [38]:
feder_hnsw_file_name = 'feder_hnsw_image_net.html'
useIPythonDisplay = False
with open(feder_hnsw_file_name, 'w') as f:
    f.write(federPy.searchById(833, useIPythonDisplay))