In [1]:
!pip install faiss-cpu



In [2]:
# 需要 CUDA 7.5+ 版本支持的 GPU
#!pip install faiss-gpu 

# Faiss 向量数据库

In [9]:
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader

In [4]:
# 实例化文档加载器
loader = TextLoader("real_estate_sales_data.txt")
# 加载文档
documents = loader.load()

In [5]:
# 实例化文本分割器
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
# 分割文本
docs = text_splitter.split_documents(documents)

In [7]:
docs

[Document(page_content='1.  \n[客户问题] 这个小区交通便利吗？\n[销售回答] 当然了，这个小区距离地铁站只有几分钟的步行距离，而且附近有多条公交线路，非常方便。\n\n2.  \n[客户问题] 我担心楼下太吵。\n[销售回答] 这个小区特别注重居住体验，我们有良好的隔音设计，并且小区内部规划了绿化区域，可以有效降低噪音。\n\n3.  \n[客户问题] 我看房价还在涨，这个投资回报怎么样？\n[销售回答] 这个区域未来有大量的商业和基础设施建设，所以从长期来看，投资回报非常有保证。\n\n4.  \n[客户问题] 有没有学校？\n[销售回答] 附近有多所优质的学校，非常适合有孩子的家庭。\n\n5.  \n[客户问题] 物业管理怎么样？\n[销售回答] 我们的物业管理得到了业主一致好评，服务非常到位。\n\n6.  \n[客户问题] 我想要南向的房子。\n[销售回答] 很好，我们确实有一些朝南的单位，它们的采光特别好。\n\n7.  \n[客户问题] 这个小区安全吗？\n[销售回答] 当然，我们24小时安保巡逻，还有先进的监控系统。\n\n8.  \n[客户问题] 预计什么时候交房？\n[销售回答] 根据目前的进度，我们预计将在明年底交房。\n\n9.  \n[客户问题] 我不想要一楼的房子。\n[销售回答] 我理解您的顾虑，我们还有多个楼层的房源可以选择。\n\n10.  \n[客户问题] 有优惠吗？\n[销售回答] 当然，如果您现在下订，我们可以给您一些优惠。\n\n11.  \n[客户问题] 你们是否提供按揭服务？\n[销售回答] 是的，我们与多家银行合作，可以帮助您快速办理按揭。\n\n12.  \n[客户问题] 税费怎么算？\n[销售回答] 我们可以提供详细的税费咨询服务，确保您清楚所有费用。\n\n13.  \n[客户问题] 附近有医院吗？\n[销售回答] 是的，附近有多家大型医院，医疗资源非常丰富。\n\n14.  \n[客户问题] 我担心小区会很拥挤。\n[销售回答] 这个小区总体规划非常合理，保证了每个单元之间有足够的空间。\n\n15.  \n[客户问题] 这个小区有游泳池和健身房吗？\n[销售回答] 当然，我们提供全方位的生活设施，包括游泳池和健身房。\n\n16.  \n[客户问题] 我需要两个停车位，怎么办

In [10]:
# OpenAI Embedding 模型
embeddings = OpenAIEmbeddings()

In [11]:
# FAISS 向量数据库，使用 docs 的向量作为初始化存储
db = FAISS.from_documents(docs, embeddings)

In [12]:
# 构造提问 Query
query = "What did the president say about Ketanji Brown Jackson"

## 相似度搜索

In [13]:
# 在 Faiss 中进行相似度搜索，找出与 query 最相似结果
docs = db.similarity_search(query)

In [14]:
# 输出 Faiss 中最相似结果
print(docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


## 持久化存储 Faiss DB

In [12]:
db.save_local("real_estates_sales_data")

## 加载 Faiss DB

In [16]:
new_db = FAISS.load_local("faiss_index", embeddings)

In [17]:
docs = new_db.similarity_search(query)

In [18]:
print(docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
