<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/docs/examples/evaluation/HotpotQADistractor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在 Colab 中打开"/></a>


# HotpotQADistractor演示

本笔记本将介绍如何使用HotpotQA数据集评估查询引擎。在这个任务中，LLM必须在给定预先配置的上下文的情况下回答问题。答案通常必须简洁，并且通过计算重叠（用F1度量）和完全匹配来衡量准确性。


如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。


In [None]:
%pip install llama-index-llms-openai

In [None]:
!pip install llama-index

In [None]:
from llama_index.core.evaluation.benchmarks import HotpotQAEvaluator
from llama_index.core import VectorStoreIndex
from llama_index.core import Document
from llama_index.llms.openai import OpenAI
from llama_index.core.embeddings import resolve_embed_model

llm = OpenAI(model="gpt-3.5-turbo")
embed_model = resolve_embed_model(
    "local:sentence-transformers/all-MiniLM-L6-v2"
)

index = VectorStoreIndex.from_documents(
    [Document.example()], embed_model=embed_model, show_progress=True
)

Parsing documents into nodes: 100%|██████████| 1/1 [00:00<00:00, 129.13it/s]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00, 36.62it/s]


首先我们尝试使用一个非常简单的引擎。在这个特定的基准测试中，检索器和因此索引实际上被忽略，因为每个查询检索到的文档都在数据集中提供。这在HotpotQA中被称为“干扰项”设置。


In [None]:
engine = index.as_query_engine(llm=llm)

HotpotQAEvaluator().run(engine, queries=5, show_result=True)

Dataset: hotpot_dev_distractor downloaded at: /Users/loganmarkewich/Library/Caches/llama_index/datasets/HotpotQA
Evaluating on dataset: hotpot_dev_distractor
-------------------------------------
Loading 5 queries out of 7405 (fraction: 0.00068)
Question:  Were Scott Derrickson and Ed Wood of the same nationality?
Response: No.
Correct answer:  yes
EM: 0 F1: 0
-------------------------------------
Question:  What government position was held by the woman who portrayed Corliss Archer in the film Kiss and Tell?
Response: Unknown
Correct answer:  Chief of Protocol
EM: 0 F1: 0
-------------------------------------
Question:  What science fantasy young adult series, told in first person, has a set of companion books narrating the stories of enslaved worlds and alien species?
Response: Animorphs
Correct answer:  Animorphs
EM: 1 F1: 1.0
-------------------------------------
Question:  Are the Laleli Mosque and Esma Sultan Mansion located in the same neighborhood?
Response: Yes.
Correct answer

现在我们尝试使用句子转换器重新排序器，它从检索器提出的10个节点中选择3个。


In [None]:
from llama_index.core.postprocessor import SentenceTransformerRerank

rerank = SentenceTransformerRerank(top_n=3)

engine = index.as_query_engine(
    llm=llm,
    node_postprocessors=[rerank],
)

HotpotQAEvaluator().run(engine, queries=5, show_result=True)

Dataset: hotpot_dev_distractor downloaded at: /Users/loganmarkewich/Library/Caches/llama_index/datasets/HotpotQA
Evaluating on dataset: hotpot_dev_distractor
-------------------------------------
Loading 5 queries out of 7405 (fraction: 0.00068)
Question:  Were Scott Derrickson and Ed Wood of the same nationality?
Response: No.
Correct answer:  yes
EM: 0 F1: 0
-------------------------------------
Question:  What government position was held by the woman who portrayed Corliss Archer in the film Kiss and Tell?
Response: No government position.
Correct answer:  Chief of Protocol
EM: 0 F1: 0
-------------------------------------
Question:  What science fantasy young adult series, told in first person, has a set of companion books narrating the stories of enslaved worlds and alien species?
Response: Animorphs
Correct answer:  Animorphs
EM: 1 F1: 1.0
-------------------------------------
Question:  Are the Laleli Mosque and Esma Sultan Mansion located in the same neighborhood?
Response: No.

F1和精确匹配分数似乎略有改善。

请注意，基准测试旨在优化生成简短的事实性答案，而不包括解释，尽管已知CoT提示有时可以提高输出质量。

所使用的分数也并非完美的正确性衡量标准，但可以快速识别查询引擎变化如何改变输出的方法。
