<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/docs/examples/managed/vectaraDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在 Colab 中打开"/></a>


# 使用Zilliz云管道管理索引

[Zilliz云管道](https://docs.zilliz.com/docs/pipelines)是一个可扩展的用于检索的API服务。您可以将Zilliz云管道用作`llama-index`中的托管索引。该服务可以将文档转换为向量嵌入，并将它们存储在Zilliz云中，以实现有效的语义搜索。

## 设置

1. 安装llama-index的依赖项


In [None]:
%pip install llama-index-indices-managed-zilliz

In [None]:
%pip install llama-index

2. 配置您的[Zilliz Cloud](https://cloud.zilliz.com/signup?utm_source=twitter&utm_medium=social%20&utm_campaign=2023-12-22_social_pipeline-llamaindex_twitter)账户凭据。


In [None]:
from getpass import getpass

ZILLIZ_PROJECT_ID = getpass("Enter your Zilliz Project ID:")
ZILLIZ_CLUSTER_ID = getpass("Enter your Zilliz Cluster ID:")
ZILLIZ_TOKEN = getpass("Enter your Zilliz API Key:")

> [查找您的OpenAI API密钥](https://beta.openai.com/account/api-keys)
>
> [查找您的Zilliz Cloud凭据](https://docs.zilliz.com/docs/on-zilliz-cloud-console)


## 索引文档

> 对于每个文档添加元数据是可选的。元数据可用于在检索过程中过滤文档数据。

### 从签名URL

Zilliz Cloud Pipelines接受来自AWS S3和Google Cloud Storage的文件。您可以从对象存储生成预签名URL，并使用`from_document_url()`来摄入文件。它可以自动索引文档，并将文档块存储为Zilliz Cloud上的向量。


In [None]:
from llama_index.indices.managed.zilliz import ZillizCloudPipelineIndex# 创建流水线：如果您已经准备好有效的流水线，则可以跳过此步骤pipeline_ids = ZillizCloudPipelineIndex.create_pipelines(    project_id=ZILLIZ_PROJECT_ID,    cluster_id=ZILLIZ_CLUSTER_ID,    api_key=ZILLIZ_TOKEN,    data_type="doc",    collection_name="zcp_llamalection_doc",  # 更改此值将自定义集合名称    metadata_schema={"user_id": "VarChar"},)print(pipeline_ids)

{'INGESTION': 'pipe-d639f220f27320e2e381de', 'SEARCH': 'pipe-47bd43fe8fd54502874a08', 'DELETION': 'pipe-bd434c99e064282f1a28e8'}


In [None]:
zcp_doc_index = ZillizCloudPipelineIndex.from_document_url(    # 存储在AWS S3或Google Cloud Storage上的文件的公共或预签名URL    url="https://publicdataset.zillizcloud.com/milvus_doc.md",    pipeline_ids=pipeline_ids,    api_key=ZILLIZ_TOKEN,    metadata={        "user_id": "user-001"    },  # 可选，可用于过滤)# # 按文档名称删除文档# zcp_doc_index.delete_by_expression(expression="doc_name == 'milvus_doc_22.md'")

### 从文档节点

Zilliz云管道还支持文本作为数据输入。以下示例准备了一个带有示例文档节点的数据。


In [None]:
from llama_index.core import Documentfrom llama_index.indices.managed.zilliz import ZillizCloudPipelineIndex# 准备文档documents = [Document(text="被搜索的数字是十。")]# 创建流水线：如果您已经准备好有效的流水线，请跳过此步骤pipeline_ids = ZillizCloudPipelineIndex.create_pipelines(    project_id=ZILLIZ_PROJECT_ID,    cluster_id=ZILLIZ_CLUSTER_ID,    api_key=ZILLIZ_TOKEN,    data_type="text",    collection_name="zcp_llamalection_text",  # 更改此值将自定义集合名称)print(pipeline_ids)

{'INGESTION': 'pipe-2bbab10f273a57eb987024', 'SEARCH': 'pipe-e1914a072ec5e6f83e446a', 'DELETION': 'pipe-72bbabf273a51af0b0c447'}


In [None]:
zcp_text_index = ZillizCloudPipelineIndex.from_documents(    # 存储在AWS S3或Google Cloud Storage上的文件的公共或预签名URL    documents=documents,    pipeline_ids=pipeline_ids,    api_key=ZILLIZ_TOKEN,)

## 作为查询引擎工作

要使用 `ZillizCloudPipelineIndex` 进行语义搜索，您可以通过指定一些参数来使用 `as_query_engine()`：
- **search_top_k**：要检索多少个文本节点/块。可选，默认为 `DEFAULT_SIMILARITY_TOP_K`（2）。
- **filters**：元数据过滤器。可选，默认为 None。
- **output_metadata**：要与检索到的文本节点一起返回的元数据字段。可选，默认为 []。


In [None]:
import os

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key:")

In [None]:
query_engine = zcp_doc_index.as_query_engine(search_top_k=3)

然后查询引擎已经准备好与Milvus 2.3文档一起进行语义搜索或检索增强生成：

- **检索**（由Zilliz Cloud Pipelines提供支持的语义搜索）：


In [None]:
question = "Can users delete entities by filtering non-primary fields?"
retrieved_nodes = query_engine.retrieve(question)
print(retrieved_nodes)

[NodeWithScore(node=TextNode(id_='449755997496672548', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='# Delete Entities\nThis topic describes how to delete entities in Milvus.  \nMilvus supports deleting entities by primary key or complex boolean expressions. Deleting entities by primary key is much faster and lighter than deleting them by complex boolean expressions. This is because Milvus executes queries first when deleting data by complex boolean expressions.  \nDeleted entities can still be retrieved immediately after the deletion if the consistency level is set lower than Strong.\nEntities deleted beyond the pre-specified span of time for Time Travel cannot be retrieved again.\nFrequent deletion operations will impact the system performance.  \nBefore deleting entities by comlpex boolean expressions, make sure the collection has been loaded.\nDeleting entities by complex boolean expressions is not an atomic ope

- **查询**（由Zilliz云管道提供支持的RAG作为检索器和OpenAI的LLM）：


In [None]:
response = query_engine.query(question)
print(response.response)

Users can delete entities by filtering non-primary fields using complex boolean expressions in Milvus.


## 多租户

通过将租户特定的值（例如用户ID）作为元数据，托管索引能够通过应用元数据过滤器实现多租户。

通过指定元数据值，每个文档在摄入时都会被标记上租户特定的字段。


In [None]:
zcp_doc_index._insert_doc_url(
    url="https://publicdataset.zillizcloud.com/milvus_doc_22.md",
    metadata={"user_id": "user_002"},
)

{'token_usage': 984, 'doc_name': 'milvus_doc_22.md', 'num_chunks': 3}

然后，托管索引能够通过过滤特定于租户的字段来为每个租户构建查询引擎。


In [None]:
from llama_index.core.vector_stores import ExactMatchFilter，MetadataFiltersquery_engine_for_user_002 = zcp_doc_index.as_query_engine(    search_top_k=3,    filters=MetadataFilters(        filters=[ExactMatchFilter(key="user_id", value="user_002")]    ),    output_metadata=["user_id"],  # 可选，显示输出中的user_id)

> 更改`filters`以构建具有不同条件的查询引擎。


In [None]:
问题 = "我能通过过滤非主键字段来删除实体吗？"# search_results = query_engine_for_user_002.retrieve(question)response = query_engine_for_user_002.query(question)print(response.response)

Milvus only supports deleting entities by primary key filtered with boolean expressions. Other operators can be used only in query or scalar filtering in vector search.
