<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/multi_tenancy/multi_tenancy_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在 Colab 中打开"/></a>


# 使用LlamaIndex构建多租户RAG系统

在这个笔记本中，您将学习如何使用LlamaIndex构建多租户RAG系统。

1. 设置
2. 下载数据
3. 加载数据
4. 创建索引
5. 创建摄取管道
6. 更新元数据并插入文档
7. 为每个用户定义查询引擎
8. 查询


## 设置

您应该确保已安装`llama-index`和`pypdf`。


In [None]:
!pip install llama-index pypdf

## 设置OpenAI的API


In [None]:
import os

os.environ["OPENAI_API_KEY"] = "YOUR OPENAI API KEY"

In [None]:
from llama_index.core import VectorStoreIndex
from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
from llama_index.core import SimpleDirectoryReader
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter

from IPython.display import HTML

## 下载数据

我们将使用 `An LLM Compiler for Parallel Function Calling` 和 `Dense X Retrieval: What Retrieval Granularity Should We Use?` 两篇论文进行演示。


In [None]:
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2312.04511.pdf" -O "llm_compiler.pdf"
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2312.06648.pdf" -O "dense_x_retrieval.pdf"

--2024-01-15 14:29:26--  https://arxiv.org/pdf/2312.04511.pdf
Resolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.67.42, 151.101.3.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 755837 (738K) [application/pdf]
Saving to: ‘llm_compiler.pdf’


llm_compiler.pdf      0%[                    ]       0  --.-KB/s               

2024-01-15 14:29:26 (163 MB/s) - ‘llm_compiler.pdf’ saved [755837/755837]

--2024-01-15 14:29:26--  https://arxiv.org/pdf/2312.06648.pdf
Resolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.67.42, 151.101.3.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1103758 (1.1M) [application/pdf]
Saving to: ‘dense_x_retrieval.pdf’


2024-01-15 14:29:26 (208 MB/s) - ‘dense_x_retrieval.pdf’ saved [1103758/1103758]



## 加载数据


In [None]:
reader = SimpleDirectoryReader(input_files=["dense_x_retrieval.pdf"])
documents_jerry = reader.load_data()

reader = SimpleDirectoryReader(input_files=["llm_compiler.pdf"])
documents_ravi = reader.load_data()

## 创建一个空索引


In [None]:
index = VectorStoreIndex.from_documents(documents=[])

## 创建数据摄取管道


In [None]:
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=512, chunk_overlap=20),
    ]
)

## 更新元数据并插入文档


In [None]:
# 对杰瑞的文档进行遍历
for document in documents_jerry:
    document.metadata["user"] = "Jerry"

# 运行管道并获取节点
nodes = pipeline.run(documents=documents_jerry)
# 将节点插入索引
index.insert_nodes(nodes)

In [None]:
# 遍历ravi的文档
for document in documents_ravi:
    document.metadata["user"] = "Ravi"

# 运行管道，获取节点
nodes = pipeline.run(documents=documents_ravi)
# 将节点插入索引
index.insert_nodes(nodes)

## 定义查询引擎

为用户定义必要的过滤器，为其定义查询引擎。


In [None]:
# 为Jerry
jerry_query_engine = index.as_query_engine(
    filters=MetadataFilters(
        filters=[
            ExactMatchFilter(
                key="user",
                value="Jerry",
            )
        ]
    ),
    similarity_top_k=3,
)

# 为Ravi
ravi_query_engine = index.as_query_engine(
    filters=MetadataFilters(
        filters=[
            ExactMatchFilter(
                key="user",
                value="Ravi",
            )
        ]
    ),
    similarity_top_k=3,
)

## 查询


In [None]:
# Jerry有一篇关于密集检索的论文，应该能够回答以下问题。
response = jerry_query_engine.query(
    "论文中提到了哪些命题？"
)
# 打印响应
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

In [None]:
# Ravi有LLMCompiler论文
response = ravi_query_engine.query("LLMCompiler涉及哪些步骤？")

# 打印响应
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

In [None]:
# 这不应该被回答，因为Jerry没有关于LLMCompiler的信息
response = jerry_query_engine.query("LLMCompiler涉及哪些步骤？")

# 打印响应
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))