<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/existing_data/weaviate_existing_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在 Colab 中打开"/></a>


# 指南：在现有的Weaviate向量存储中使用向量存储索引


如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。


In [None]:
%pip install llama-index-vector-stores-weaviate
%pip install llama-index-embeddings-openai

In [None]:
!pip install llama-index

In [None]:
import weaviate

In [None]:
client = weaviate.Client("https://test-cluster-bbn8vqsn.weaviate.network")

## 准备样本“现有” Weaviate 向量存储库


### 定义模式
我们为“Book”类创建了一个模式，包括4个属性：标题（str）、作者（str）、内容（str）和年份（int）。


In [None]:
try:
    client.schema.delete_class("Book")
except:
    pass

In [None]:
schema = {
    "classes": [
        {
            "class": "Book",
            "properties": [
                {"name": "title", "dataType": ["text"]},
                {"name": "author", "dataType": ["text"]},
                {"name": "content", "dataType": ["text"]},
                {"name": "year", "dataType": ["int"]},
            ],
        },
    ]
}

if not client.schema.contains(schema):
    client.schema.create(schema)

### 定义样本数据
我们创建了4本样本书籍


In [None]:
books = [
    {
        "title": "To Kill a Mockingbird",
        "author": "Harper Lee",
        "content": (
            "To Kill a Mockingbird is a novel by Harper Lee published in"
            " 1960..."
        ),
        "year": 1960,
    },
    {
        "title": "1984",
        "author": "George Orwell",
        "content": (
            "1984 is a dystopian novel by George Orwell published in 1949..."
        ),
        "year": 1949,
    },
    {
        "title": "The Great Gatsby",
        "author": "F. Scott Fitzgerald",
        "content": (
            "The Great Gatsby is a novel by F. Scott Fitzgerald published in"
            " 1925..."
        ),
        "year": 1925,
    },
    {
        "title": "Pride and Prejudice",
        "author": "Jane Austen",
        "content": (
            "Pride and Prejudice is a novel by Jane Austen published in"
            " 1813..."
        ),
        "year": 1813,
    },
]

### 添加数据
我们将样本书籍添加到我们的Weaviate "Book"类中（带有内容字段的嵌入）。


In [None]:
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding()

In [None]:
with client.batch as batch:
    for book in books:
        vector = embed_model.get_text_embedding(book["content"])
        batch.add_data_object(
            data_object=book, class_name="Book", vector=vector
        )

## 查询现有的 Weaviate 向量存储库


In [None]:
from llama_index.vector_stores.weaviate import WeaviateVectorStore
from llama_index.core import VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_source_node

您必须正确指定与所需的Weaviate类匹配的“index_name”，并选择一个类属性作为“text”字段。


In [None]:
vector_store = WeaviateVectorStore(
    weaviate_client=client, index_name="Book", text_key="content"
)

In [None]:
retriever = VectorStoreIndex.from_vector_store(vector_store).as_retriever(
    similarity_top_k=1
)

In [None]:
nodes = retriever.retrieve("What is that book about a bird again?")

让我们检查一下检索到的节点。我们可以看到书籍数据被加载为LlamaIndex `Node`对象，其中“content”字段是主要文本。


In [None]:
pprint_source_node(nodes[0])

Document ID: cf927ce7-0672-4696-8aae-7e77b33b9659
Similarity: None
Text: author: Harper Lee title: To Kill a Mockingbird year: 1960  To
Kill a Mockingbird is a novel by Harper Lee published in 1960......


剩余的字段应该被加载为元数据（存储在 `metadata` 中）


In [None]:
nodes[0].node.metadata

{'author': 'Harper Lee', 'title': 'To Kill a Mockingbird', 'year': 1960}