<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/LanceDBIndexDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在 Colab 中打开"/></a>


# LanceDB向量存储
在这个笔记本中，我们将展示如何使用[LanceDB](https://www.lancedb.com)在LlamaIndex中执行向量搜索。


如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。


In [None]:
%pip install llama-index-vector-stores-lancedb

In [None]:
!pip install llama-index

In [None]:
import loggingimport sys# 取消注释以查看调试日志# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))from llama_index.core import SimpleDirectoryReader, Document, StorageContextfrom llama_index.core import VectorStoreIndexfrom llama_index.vector_stores.lancedb import LanceDBVectorStoreimport textwrap

### 设置OpenAI
第一步是配置OpenAI密钥。它将用于为加载到索引中的文档创建嵌入。


In [None]:
import openai

openai.api_key = ""

下载数据


In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

### 加载文档
使用SimpleDirectoryReader加载存储在`data/paul_graham/`中的文档。


In [None]:
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
print("Document ID:", documents[0].doc_id, "Document Hash:", documents[0].hash)

Document ID: 855fe1d1-1c1a-4fbe-82ba-6bea663a5920 Document Hash: 4c702b4df575421e1d1af4b1fd50511b226e0c9863dbfffeccb8b689b8448f35


### 创建索引
在这里，我们使用之前加载的文档创建一个由LanceDB支持的索引。LanceDBVectorStore接受一些参数。
- uri（str，必需）：LanceDB将存储其文件的位置。
- table_name（str，可选）：嵌入将被存储的表名。默认为"vectors"。
- nprobes（int，可选）：使用的探测次数。较高的数字使搜索更准确，但也更慢。默认为20。
- refine_factor：（int，可选）：通过读取额外的元素并在内存中重新排列结果来优化结果。默认为None。

- 更多详细信息可以在[LanceDB文档](https://lancedb.github.io/lancedb/ann_indexes)中找到。


In [None]:
vector_store = LanceDBVectorStore(uri="/tmp/lancedb")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

### 查询索引
现在我们可以使用我们的索引来提问。


In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("How much did Viaweb charge per month?")

In [None]:
print(textwrap.fill(str(response), 100))

Viaweb charged $100 per month for a small store and $300 per month for a big one.


In [None]:
response = query_engine.query("What did the author do growing up?")

In [None]:
print(textwrap.fill(str(response), 100))

The author worked on writing and programming outside of school before college. They wrote short
stories and tried writing programs on the IBM 1401 computer. They also mentioned getting a
microcomputer, a TRS-80, and started programming on it.


### 追加数据
您也可以将数据添加到现有的索引中。


In [None]:
del index

index = VectorStoreIndex.from_documents(
    [Document(text="The sky is purple in Portland, Maine")],
    uri="/tmp/new_dataset",
)

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("Where is the sky purple?")
print(textwrap.fill(str(response), 100))

The sky is purple in Portland, Maine.


In [None]:
index = VectorStoreIndex.from_documents(documents, uri="/tmp/new_dataset")

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("What companies did the author start?")
print(textwrap.fill(str(response), 100))

The author started two companies: Viaweb and Y Combinator.
