# 相似性搜索示例

[How to Use FAISS to Build Your First Similarity Search](https://medium.com/loopio-tech/how-to-use-faiss-to-build-your-first-similarity-search-bf0f708aa772)

## 依赖库

In [2]:
!pip install faiss-cpu
!pip install sentence-transformers
!pip install pandas

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
[0mLooking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
[0mLooking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
[0m

## 创建文本

In [1]:
import pandas as pd
data = [['Where are your headquarters located?', 'location'],
['Throw my cellphone in the water', 'random'],
['Network Access Control?', 'networking'],
['Address', 'location']]
df = pd.DataFrame(data, columns = ['text', 'category'])

df

Unnamed: 0,text,category
0,Where are your headquarters located?,location
1,Throw my cellphone in the water,random
2,Network Access Control?,networking
3,Address,location


## 创建向量

In [2]:
from sentence_transformers import SentenceTransformer
text = df['text']
encoder = SentenceTransformer("/models/bge-large-zh-v1.5")
vectors = encoder.encode(text)

print('end')

end


## 从向量创建 Faiss 的索引

In [3]:
import faiss

vector_dimension = vectors.shape[1]
index = faiss.IndexFlatL2(vector_dimension)
faiss.normalize_L2(vectors)
index.add(vectors)

## 创建搜索向量

In [4]:
import numpy as np

search_text = 'where is your office?'
search_vector = encoder.encode(search_text)
_vector = np.array([search_vector])
faiss.normalize_L2(_vector)

## 搜索所有最近相邻

In [5]:
k = index.ntotal
distances, ann = index.search(_vector, k=k)

## 结果排序

In [6]:
results = pd.DataFrame({'distances': distances[0], 'ann': ann[0]})

results

Unnamed: 0,distances,ann
0,0.949018,0
1,0.966977,3
2,1.356527,2
3,1.492573,1


## 获取搜索文本的类别

In [7]:
merge=pd.merge(results,df,left_on='ann',right_index=True)

merge

Unnamed: 0,distances,ann,text,category
0,0.949018,0,Where are your headquarters located?,location
1,0.966977,3,Address,location
2,1.356527,2,Network Access Control?,networking
3,1.492573,1,Throw my cellphone in the water,random


## 时序图

![](SequenceDiagram.png)