# Vector stores
存储和搜索非结构化数据最常用的方法之一是嵌入数据并存储结果嵌入向量，然后在查询时嵌入非结构化查询并检索与嵌入查询“最相似”的嵌入向量。矢量存储负责存储嵌入的数据并为您执行矢量搜索。
<img src="https://python.langchain.com/assets/images/vector_stores-125d1675d58cfb46ce9054c9019fea72.jpg" alt="">

In [2]:
from dotenv import load_dotenv, find_dotenv
from langchain.globals import set_debug
import os
load_dotenv(find_dotenv())
set_debug(False)

本演练展示了与矢量存储相关的基本功能。使用矢量存储的一个关键部分是创建要放入其中的矢量，这通常是通过嵌入来创建的。因此，建议您在深入研究之前熟悉文本嵌入模型接口。

有许多很棒的矢量存储选项，这里有一些是免费的、开源的，并且完全在您的本地机器上运行。查看许多优秀托管产品的所有集成。

In [7]:
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_chroma import Chroma

# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('data/state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
db = Chroma.from_documents(documents, OpenAIEmbeddings())

## Similarity search


In [8]:
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


## Similarity search by vector
也可以使用`similarity_search_by_vector`来搜索与给定嵌入向量相似的文档，该方法接受嵌入向量作为参数而不是字符串。



In [9]:
embedding_vector = OpenAIEmbeddings().embed_query(query)
docs = db.similarity_search_by_vector(embedding_vector)
print(docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


### Asynchronous operations
向量存储通常作为一个独立的服务运行，需要一些IO操作，因此它们可能被异步调用。这样可以提高性能，因为您不会浪费时间等待来自外部服务的响应。如果您使用异步框架(如FastAPI)，这一点可能也很重要。

LangChain支持对vector存储进行异步操作。所有的方法都可以使用对应的异步方法调用，使用前缀a表示async。

Qdrant是一个矢量存储，它支持所有异步操作，因此将在本演练中使用它。

In [14]:
from langchain_community.vectorstores import Qdrant
embeddings = OpenAIEmbeddings()
db = await Qdrant.afrom_documents(documents, embeddings)
# db = await Qdrant.afrom_documents(documents, embeddings, "http://localhost:6333")

UnexpectedResponse: Unexpected Response: 502 (Bad Gateway)
Raw response content:
b''

**Similarity search**


In [15]:
query = "What did the president say about Ketanji Brown Jackson"
docs = await db.asimilarity_search(query)
print(docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


Similarity search by vector

In [None]:
embedding_vector = embeddings.embed_query(query)
docs = await db.asimilarity_search_by_vector(embedding_vector)

### Maximum marginal relevance search (MMR)
最大边际相关性优化了查询的相似性和所选文档之间的多样性。async API也支持它。

In [None]:
query = "What did the president say about Ketanji Brown Jackson"
found_docs = await qdrant.amax_marginal_relevance_search(query, k=2, fetch_k=10)
for i, doc in enumerate(found_docs):
    print(f"{i + 1}.", doc.page_content, "\n")