<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/docs/examples/evaluation/RetryQuery.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在 Colab 中打开"/></a>


# 自我纠正的查询引擎 - 评估与重试


在这个笔记本中，我们展示了几个高级的、自我纠正的查询引擎。它们利用最新的LLM能够评估自己的输出，然后自我纠正以提供更好的响应。


如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。


In [None]:
!pip install llama-index

In [None]:
# 取消注释以添加您的OpenAI API密钥# import os# os.environ['OPENAI_API_KEY'] = "插入OpenAI密钥"

In [None]:
# 取消注释以进行调试级别的日志记录# 导入日志# 导入 sys# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## 设置


首先我们要导入文档。


In [None]:
from llama_index.core import VectorStoreIndexfrom llama_index.core import SimpleDirectoryReader# 在Jupyter Notebook中运行异步函数所需from nest_asyncionest_asyncio.apply()

下载数据


In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

# 加载数据


In [None]:
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(documents)
query = "What did the author do growing up?"

让我们看一下默认查询引擎的响应是什么样的。


In [None]:
base_query_engine = index.as_query_engine()
response = base_query_engine.query(query)
print(response)

The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They later got a microcomputer and started programming on it, writing simple games and a word processor. They also mentioned their interest in philosophy and AI.


## 重试查询引擎


重试查询引擎使用评估器来改善基本查询引擎的响应。

它执行以下操作：
1. 首先查询基本查询引擎，然后
2. 使用评估器来决定响应是否通过。
3. 如果响应通过，则返回响应，
4. 否则，使用评估结果（查询、响应和反馈）对原始查询进行转换，生成一个新的查询，
5. 最多重试 max_retries 次。


In [None]:
from llama_index.core.query_engine import RetryQueryEngine
from llama_index.core.evaluation import RelevancyEvaluator

query_response_evaluator = RelevancyEvaluator()
retry_query_engine = RetryQueryEngine(
    base_query_engine, query_response_evaluator
)
retry_response = retry_query_engine.query(query)
print(retry_response)

The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They later got a microcomputer, a TRS-80, and started programming more extensively, including writing simple games and a word processor.


## 重试源查询引擎


Source Retry修改查询源节点，通过根据llm节点评估来过滤查询的现有源节点。


In [None]:
from llama_index.core.query_engine import RetrySourceQueryEngine

retry_source_query_engine = RetrySourceQueryEngine(
    base_query_engine, query_response_evaluator
)
retry_source_response = retry_source_query_engine.query(query)
print(retry_source_response)

The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They later got a microcomputer and started programming on it, writing simple games and a word processor. They also mentioned their interest in philosophy and AI.


## 重试指南查询引擎


该模块尝试使用指南来指导评估者的行为。您可以自定义您自己的指南。


In [None]:
from llama_index.core.evaluation import GuidelineEvaluatorfrom llama_index.core.evaluation.guideline import DEFAULT_GUIDELINESfrom llama_index.core import Responsefrom llama_index.core.indices.query.query_transform.feedback_transform import FeedbackQueryTransformationfrom llama_index.core.query_engine import RetryGuidelineQueryEngine# Guideline evalguideline_eval = GuidelineEvaluator(    guidelines=DEFAULT_GUIDELINES    + "\n响应不应过长。\n"    "响应应尽可能进行总结。\n")  # 仅作为示例

让我们看看底层发生了什么。


In [None]:
typed_response = (
    response if isinstance(response, Response) else response.get_response()
)
eval = guideline_eval.evaluate_response(query, typed_response)
print(f"Guideline eval evaluation result: {eval.feedback}")

feedback_query_transform = FeedbackQueryTransformation(resynthesize_query=True)
transformed_query = feedback_query_transform.run(query, {"evaluation": eval})
print(f"Transformed query: {transformed_query.query_str}")

Guideline eval evaluation result: The response partially answers the query but lacks specific statistics or numbers. It provides some details about the author's activities growing up, such as writing short stories and programming on different computers, but it could be more concise and focused. Additionally, the response does not mention any statistics or numbers to support the author's experiences.
Transformed query: Here is a previous bad answer.
The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They later got a microcomputer and started programming on it, writing simple games and a word processor. They also mentioned their interest in philosophy and AI.
Here is some feedback from the evaluator about the response given.
The response partially answers the query but lacks specific statistics or numbers. It provides some details about the author's a

现在让我们运行完整的查询引擎。


In [None]:
retry_guideline_query_engine = RetryGuidelineQueryEngine(
    base_query_engine, guideline_eval, resynthesize_query=True
)
retry_guideline_response = retry_guideline_query_engine.query(query)
print(retry_guideline_response)

During their childhood and adolescence, the author worked on writing short stories and programming. They mentioned that their short stories were not very good, lacking plot but focusing on characters with strong feelings. In terms of programming, they tried writing programs on the IBM 1401 computer in 9th grade using an early version of Fortran. However, they mentioned being puzzled by the 1401 and not being able to do much with it due to the limited input options. They also mentioned getting a microcomputer, a TRS-80, and starting to write simple games, a program to predict rocket heights, and a word processor.
