<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/evaluation/UpTrain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在 Colab 中打开"/></a>


# 如何使用UpTrain与LlamaIndex


**概述**：在这个例子中，我们将看到如何使用LlamaIndex来使用UpTrain。UpTrain ([github](https://github.com/uptrain-ai/uptrain) || [website](https://github.com/uptrain-ai/uptrain/) || [docs](https://docs.uptrain.ai/)) 是一个开源平台，用于评估和改进GenAI应用程序。它为20多个预配置检查项（涵盖语言、代码、嵌入使用案例）提供评分，对失败案例进行根本原因分析，并提供如何解决这些问题的见解。有关UpTrain评估的更多详细信息，请参阅[此处](https://github.com/uptrain-ai/uptrain?tab=readme-ov-file#pre-built-evaluations-we-offer-)。


**问题**：主要存在两个问题：
1. 大多数大型语言模型训练的数据并不代表它们在使用时所面对的数据。这导致训练和测试分布不匹配，可能导致性能不佳。
2. 大型语言模型生成的结果并不总是可靠的。响应可能与提示不相关，不符合期望的语气或上下文，或可能具有冒犯性等问题。


**解决方案**：上述两个问题可以通过两种不同的工具来解决，我们将向您展示如何将它们结合使用：
1. LlamaIndex通过允许您使用在自己的数据上进行微调的检索器来执行检索增强生成（RAG），从而解决了第一个问题。这使您能够使用自己的数据来微调检索器，然后使用该检索器执行RAG。
2. UpTrain通过允许您对生成的响应进行评估来解决第二个问题。这有助于确保响应与提示相关，符合所需的语气或上下文，并且不具有冒犯性等。


## 安装 UpTrain 和 LlamaIndex


In [None]:
%pip install -qU uptrain llama-index

Note: you may need to restart the kernel to use updated packages.


## 导入所需的库


In [None]:
import httpx
import os
import openai
import pandas as pd

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from uptrain import Evals, EvalLlamaIndex, Settings as UpTrainSettings



## 为查询引擎创建数据集文件夹

您可以使用任何您拥有的文档来完成这一步。在本教程中，我们将使用从维基百科中提取的有关纽约市的数据。我们将只添加一个文档到文件夹中，但您可以根据需要添加更多。


In [None]:
url = "https://uptrain-assets.s3.ap-south-1.amazonaws.com/data/nyc_text.txt"
if not os.path.exists("nyc_wikipedia"):
    os.makedirs("nyc_wikipedia")
dataset_path = os.path.join("./nyc_wikipedia", "nyc_text.txt")

if not os.path.exists(dataset_path):
    r = httpx.get(url)
    with open(dataset_path, "wb") as f:
        f.write(r.content)

## 创建查询列表

在我们生成响应之前，我们需要创建一个查询列表。由于查询引擎是在纽约市训练的，我们将创建一个与纽约市相关的查询列表。


In [None]:
data = [
    {"question": "What is the population of New York City?"},
    {"question": "What is the area of New York City?"},
    {"question": "What is the largest borough in New York City?"},
    {"question": "What is the average temperature in New York City?"},
    {"question": "What is the main airport in New York City?"},
    {"question": "What is the famous landmark in New York City?"},
    {"question": "What is the official language of New York City?"},
    {"question": "What is the currency used in New York City?"},
    {"question": "What is the time zone of New York City?"},
    {"question": "What is the famous sports team in New York City?"},
]

**本笔记本使用OpenAI API来为提示生成文本，同时创建向量存储索引。因此，请将openai.api_key设置为您的OpenAI API密钥。**


In [None]:
openai.api_key = "sk-************************"  # 你的OpenAI API密钥

## 使用LlamaIndex创建查询引擎

让我们使用LlamaIndex创建一个向量存储索引，然后将其作为查询引擎，从文档中检索相关部分。


In [None]:
Settings.chunk_size = 512

documents = SimpleDirectoryReader("./nyc_wikipedia/").load_data()

vector_index = VectorStoreIndex.from_documents(
    documents,
)

query_engine = vector_index.as_query_engine()

## 设置

UpTrain为您提供：
1. 具有高级下钻和过滤选项的仪表板
1. 失败案例中的见解和常见主题
1. 对生产数据的可观察性和实时监控
1. 通过与您的CI/CD流水线的无缝集成进行回归测试

您可以选择以下两种替代方案来评估使用UpTrain：


# 方案1：使用UpTrain的开源软件（OSS）进行评估

您可以使用开源评估服务来评估您的模型。在这种情况下，您需要提供一个OpenAI API密钥。您可以在[这里](https://platform.openai.com/account/api-keys)获取您自己的API密钥。

为了在UpTrain仪表板中查看您的评估结果，您需要通过在终端中运行以下命令来设置它：

```bash
git clone https://github.com/uptrain-ai/uptrain
cd uptrain
bash run_uptrain.sh
```

这将在您的本地机器上启动UpTrain仪表板。您可以在`http://localhost:3000/dashboard`上访问它。

**注意：** `project_name`将是在UpTrain仪表板中显示执行的评估的项目名称。


In [None]:
settings = UpTrainSettings(
    openai_api_key=openai.api_key,
)

## 创建EvalLlamaIndex对象

现在我们已经创建了查询引擎，我们可以使用它来创建一个EvalLlamaIndex对象。这个对象将用于为查询生成响应。


In [None]:
llamaindex_object = EvalLlamaIndex(
    settings=settings, query_engine=query_engine
)

## 运行评估

现在我们已经有了查询列表，我们可以使用 EvalLlamaIndex 对象为查询生成响应，然后对这些响应进行评估。您可以在 [这里](https://docs.uptrain.ai/key-components/evals) 找到 UpTrain 提供的评估的详尽列表。我们选择了两个在本教程中最相关的评估：

1. **上下文相关性**：该评估检查检索到的上下文是否与查询相关。这很重要，因为检索到的上下文用于生成响应。如果检索到的上下文与查询不相关，那么响应也将与查询不相关。

2. **响应简洁性**：该评估检查响应是否简洁。这很重要，因为响应应该简洁，不应包含任何不必要的信息。


In [None]:
results = llamaindex_object.evaluate(
    project_name="uptrain-llama-index",
    evaluation_name="nyc_wikipedia",  # 添加项目和评估名称可以让您在UpTrain仪表板中跟踪结果
    data=data,
    checks=[Evals.CONTEXT_RELEVANCE, Evals.RESPONSE_CONCISENESS],
)

100%|██████████| 10/10 [00:02<00:00,  3.94it/s]
100%|██████████| 10/10 [00:03<00:00,  3.12it/s]


In [None]:
pd.DataFrame(results)

Unnamed: 0,question,response,context,score_context_relevance,explanation_context_relevance,score_response_conciseness,explanation_response_conciseness
0,What is the population of New York City?,"The population of New York City is 8,804,190 a...","=== Population density ===\n\nIn 2020, the cit...",,,,
1,What is the area of New York City?,New York City has a total area of 468.484 squa...,Some of the natural relief in topography has b...,,,,
2,What is the largest borough in New York City?,Queens is the largest borough in New York City.,"==== Brooklyn ====\nBrooklyn (Kings County), o...",,,,
3,What is the average temperature in New York City?,The average temperature in New York City is 33...,"Similarly, readings of 0 °F (−18 °C) are also ...",,,,
4,What is the main airport in New York City?,John F. Kennedy International Airport,"along the Northeast Corridor, and long-distanc...",,,,
5,What is the famous landmark in New York City?,The famous landmark in New York City is the St...,The settlement was named New Amsterdam (Dutch:...,,,,
6,What is the official language of New York City?,As many as 800 languages are spoken in New Yor...,=== Accent and dialect ===\n\nThe New York are...,,,,
7,What is the currency used in New York City?,The currency used in New York City is the US D...,=== Real estate ===\n\nReal estate is a major ...,,,,
8,What is the time zone of New York City?,Eastern Standard Time (EST),"According to the New York City Comptroller, wo...",,,,
9,What is the famous sports team in New York City?,The famous sports team in New York City is the...,"==== Soccer ====\nIn soccer, New York City is ...",,,,


# 方案2：使用UpTrain的托管服务和仪表板进行评估

或者，您可以使用UpTrain的托管服务来评估您的模型。您可以在[这里](https://uptrain.ai/)创建一个免费的UpTrain账户并获得免费试用积分。如果您想要更多的试用积分，可以在[这里预约与UpTrain维护人员的通话](https://calendly.com/uptrain-sourabh/30min)。

使用托管服务的好处包括：
1. 无需在本地机器上设置UpTrain仪表板。
1. 可访问许多LLMs而无需它们的API密钥。

完成评估后，您可以在UpTrain仪表板上查看评估结果，网址为`https://dashboard.uptrain.ai/dashboard`

**注意：** `project_name`将是在UpTrain仪表板上显示执行评估的项目名称。


In [None]:
UPTRAIN_API_KEY = "up-**********************"  # 你的UpTrain API密钥

# 在这种情况下，我们使用`uptrain_access_token`参数而不是在设置中使用'openai_api_key'
settings = UpTrainSettings(
    uptrain_access_token=UPTRAIN_API_KEY,
)

## 创建EvalLlamaIndex对象

现在我们已经创建了查询引擎，我们可以使用它来创建一个EvalLlamaIndex对象。这个对象将用于为查询生成响应。


In [None]:
llamaindex_object = EvalLlamaIndex(
    settings=settings, query_engine=query_engine
)

## 运行评估

现在我们已经有了查询列表，我们可以使用 EvalLlamaIndex 对象为查询生成响应，然后对响应进行评估。您可以在 [这里](https://docs.uptrain.ai/key-components/evals) 找到 UpTrain 提供的评估的详尽列表。我们选择了两个在本教程中最相关的评估：

1. **上下文相关性**：该评估检查检索到的上下文是否与查询相关。这很重要，因为检索到的上下文用于生成响应。如果检索到的上下文与查询不相关，那么响应也将与查询不相关。

2. **响应简洁性**：该评估检查响应是否简洁。这很重要，因为响应应该简洁，不应包含任何不必要的信息。


In [None]:
results = llamaindex_object.evaluate(
    project_name="uptrain-llama-index",
    evaluation_name="nyc_wikipedia",  # 添加项目和评估名称可以让您在UpTrain仪表板中跟踪结果
    data=data,
    checks=[Evals.CONTEXT_RELEVANCE, Evals.RESPONSE_CONCISENESS],
)

[32m2024-01-23 18:36:57.815[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m507[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain server[0m


In [None]:
pd.DataFrame(results)

Unnamed: 0,question,response,context,score_context_relevance,explanation_context_relevance,score_response_conciseness,explanation_response_conciseness
0,What is the population of New York City?,"The population of New York City is 8,804,190 a...","New York, often called New York City or NYC, i...",1.0,The question asks for the population of New Yo...,1.0,The question asks for the population of New Yo...
1,What is the area of New York City?,The area of New York City is 468.484 square mi...,"New York, often called New York City or NYC, i...",1.0,Step 1: The question asks for the area of New ...,1.0,The question asks for the area of New York Cit...
2,What is the largest borough in New York City?,Queens is the largest borough in New York City.,"==== Brooklyn ====\nBrooklyn (Kings County), o...",0.5,Step 1: The question is asking for the largest...,1.0,The question asks for the largest borough in N...
3,What is the average temperature in New York City?,The average temperature in New York City is 57...,"Similarly, readings of 0 °F (−18 °C) are also ...",0.5,The question asks for the average temperature ...,1.0,The question asks for the average temperature ...
4,What is the main airport in New York City?,The main airport in New York City is John F. K...,"along the Northeast Corridor, and long-distanc...",1.0,"The question is ""What is the main airport in N...",1.0,The question asks for the main airport in New ...
5,What is the famous landmark in New York City?,The famous landmark in New York City is the Em...,A record 66.6 million tourists visited New Yor...,1.0,The question asks for the famous landmark in N...,1.0,The question asks for the famous landmark in N...
6,What is the official language of New York City?,The official language of New York City is not ...,=== Accent and dialect ===\n\nThe New York are...,0.0,The question is asking for the official langua...,0.0,The question asks for the official language of...
7,What is the currency used in New York City?,The currency used in New York City is the Unit...,=== Real estate ===\n\nReal estate is a major ...,0.0,"The question is ""What is the currency used in ...",1.0,The question asks specifically for the currenc...
8,What is the time zone of New York City?,Eastern Standard Time (EST),"According to the New York City Comptroller, wo...",0.0,"The question is ""What is the time zone of New ...",1.0,The question asks for the time zone of New Yor...
9,What is the famous sports team in New York City?,The famous sports team in New York City is the...,==== Baseball ====\nNew York has been describe...,1.0,The question asks for the famous sports team i...,1.0,The question asks for the famous sports team i...


### 仪表板：
得分与具有该得分的案例数量的直方图

![nyc_dashboard.png](https://uptrain-assets.s3.ap-south-1.amazonaws.com/images/llamaindex/nyc_dashboard.png)


### 洞察：

您可以筛选失败案例并生成它们之间的共同主题。这有助于识别核心问题并帮助解决它。
