<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/tools/eval_query_engine_tool.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在 Colab 中打开"/></a>


# 评估查询引擎工具

在本节中，我们将向您展示如何在代理程序中使用 `EvalQueryEngineTool`。您可能希望使用 `EvalQueryEngineTool` 的一些原因包括：
1. 对工具使用特定类型的评估，而不仅仅是代理程序的推理
2. 对工具响应进行评估时使用不同的LLM（语言模型）而不是代理程序的LLM

`EvalQueryEngineTool` 是建立在 `QueryEngineTool` 之上的。除了包装现有的[查询引擎](https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/root.html)之外，它还必须提供一个现有的[评估器](https://docs.llamaindex.ai/en/stable/examples/evaluation/answer_and_context_relevancy.html)来评估该查询引擎的响应。


## 安装依赖

要运行此Python文件，您需要安装以下依赖项：
- numpy
- pandas
- matplotlib

您可以使用以下命令来安装这些依赖项：
```bash
pip install numpy pandas matplotlib
```


In [None]:
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-openai
%pip install llama-index-agents-openai

In [None]:
import os

os.environ["OPENAI_API_KEY"] = "sk-..."

## 初始化和设置LLM和本地嵌入模型


In [None]:
from llama_index.core.settings import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.openai import OpenAI

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)
Settings.llm = OpenAI()

## 下载和索引数据
这是我们为了演示而做的事情。在生产环境中，数据存储和索引应该已经存在，而不是临时创建。


### 创建存储上下文


In [None]:
from llama_index.core import (
    StorageContext,
    load_index_from_storage,
)

try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/lyft",
    )
    lyft_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/uber"
    )
    uber_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False

下载数据


In [None]:
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'

 加载数据


In [None]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

if not index_loaded:
    lyft_docs = SimpleDirectoryReader(
        input_files=["./data/10k/lyft_2021.pdf"]
    ).load_data()
    uber_docs = SimpleDirectoryReader(
        input_files=["./data/10k/uber_2021.pdf"]
    ).load_data()

    # 构建索引
    lyft_index = VectorStoreIndex.from_documents(lyft_docs)
    uber_index = VectorStoreIndex.from_documents(uber_docs)

    # 持久化索引
    lyft_index.storage_context.persist(persist_dir="./storage/lyft")
    uber_index.storage_context.persist(persist_dir="./storage/uber")

## 创建查询引擎


In [None]:
lyft_engine = lyft_index.as_query_engine(similarity_top_k=5)
uber_engine = uber_index.as_query_engine(similarity_top_k=5)

## 创建评估器


In [None]:
from llama_index.core.evaluation import RelevancyEvaluator

evaluator = RelevancyEvaluator()

## 创建查询引擎工具


In [None]:
from llama_index.core.tools import ToolMetadata
from llama_index.core.tools.eval_query_engine import EvalQueryEngineTool

query_engine_tools = [
    EvalQueryEngineTool(
        evaluator=evaluator,
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="lyft",
            description=(
                "Provides information about Lyft's financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    EvalQueryEngineTool(
        evaluator=evaluator,
        query_engine=uber_engine,
        metadata=ToolMetadata(
            name="uber",
            description=(
                "Provides information about Uber's financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]

```python
import gym
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
```


In [None]:
from llama_index.agent.openai import OpenAIAgent

agent = OpenAIAgent.from_tools(query_engine_tools, verbose=True)

## 查询引擎失败评估

为了演示目的，我们将告诉代理先选择错误的工具，以便我们可以观察当评估失败时 `EvalQueryEngineTool` 的影响。为了实现这一点，在调用代理时，我们将 `tool_choice` 设置为 `lyft`。

这是我们预期会发生的事情：
1. 代理将首先使用包含错误财务数据的 `lyft` 工具，因为我们已经指示它这样做
2. `EvalQueryEngineTool` 将使用其评估器评估查询引擎的响应
3. 查询引擎的输出将因为包含 Lyft 的财务数据而无法通过评估
4. 该工具将形成一个响应，通知代理无法使用该工具，并给出原因
5. 代理将回退到第二个工具，即 `uber`
6. 第二个工具的查询引擎输出将通过评估，因为它包含 Uber 的财务数据
7. 代理将以答复的形式做出回应


In [None]:
response = await agent.achat(
    "What was Uber's revenue growth in 2021?", tool_choice="lyft"
)
print(str(response))

Added user message to memory: What was Uber's revenue growth in 2021?
=== Calling Function ===
Calling function: lyft with args: {"input":"What was Uber's revenue growth in 2021?"}
Got output: Could not use tool lyft because it failed evaluation.
Reason: NO

=== Calling Function ===
Calling function: uber with args: {"input":"What was Uber's revenue growth in 2021?"}
Got output: Uber's revenue grew by 57% in 2021.

Uber's revenue grew by 57% in 2021.


## 查询引擎通过评估

在这里，我们正在询问有关Lyft财务状况的问题。我们应该期望发生以下情况：
1. 代理将首先使用`lyftk`工具，仅基于其描述，因为我们在这里**没有**设置`tool_choice`
2. `EvalQueryEngineTool`将使用其评估器评估查询引擎的响应
3. 查询引擎的输出将通过评估，因为它包含Lyft的财务状况


In [None]:
response = await agent.achat("What was Lyft's revenue growth in 2021?")
print(str(response))

Added user message to memory: What was Lyft's revenue growth in 2021?
=== Calling Function ===
Calling function: lyft with args: {"input": "What was Lyft's revenue growth in 2021?"}
Got output: Lyft's revenue growth in 2021 was $3,208,323, which increased compared to the revenue in 2020 and 2019.

=== Calling Function ===
Calling function: uber with args: {"input": "What was Lyft's revenue growth in 2021?"}
Got output: Could not use tool uber because it failed evaluation.
Reason: NO

Lyft's revenue grew by $3,208,323 in 2021, which increased compared to the revenue in 2020 and 2019.
