In [1]:
!pip install faiss-cpu

[0m

In [2]:
# 需要 CUDA 7.5+ 版本支持的 GPU
#!pip install faiss-gpu 

# Faiss 向量数据库

In [3]:
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader

In [7]:
# 实例化文档加载器
loader = TextLoader("../jupyter/tests/state_of_the_union.txt")
# 加载文档
documents = loader.load()

In [8]:
# 实例化文本分割器
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
# 分割文本
docs = text_splitter.split_documents(documents)

In [9]:
docs

[Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.', metadata={'source'

In [10]:
# OpenAI Embedding 模型
embeddings = OpenAIEmbeddings()

In [11]:
# FAISS 向量数据库，使用 docs 的向量作为初始化存储
db = FAISS.from_documents(docs, embeddings)

In [12]:
# 构造提问 Query
query = "What did the president say about Ketanji Brown Jackson"

## 相似度搜索

In [13]:
# 在 Faiss 中进行相似度搜索，找出与 query 最相似结果
docs = db.similarity_search(query)

In [14]:
# 输出 Faiss 中最相似结果
print(docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


## 持久化存储 Faiss DB

In [15]:
db.save_local("faiss_index")

## 加载 Faiss DB

In [16]:
new_db = FAISS.load_local("faiss_index", embeddings)

In [17]:
docs = new_db.similarity_search(query)

In [18]:
print(docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
