<img src="docs/images/DSPy8.png" alt="DSPy7 图片" height="150"/>

## **DSPy**: 使用基础模型进行编程

[<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/stanfordnlp/dspy/blob/main/intro.ipynb)

这份笔记介绍了**DSPy**框架，用于**基础模型编程**，即语言模型（LMs）和检索模型（RMs）。

**DSPy**强调编程而非提示。它统一了用于**提示**和**微调**LMs的技术，以及通过**推理**和**工具/检索增强**来改进它们，所有这些都通过一组最小化的Python操作来表达和学习。

**DSPy**提供了用于指导LMs的**可组合和声明性模块**，采用熟悉的Python语法。此外，**DSPy**引入了一个**自动编译器，教导LMs**如何执行程序中的声明步骤。**DSPy编译器**将在内部_跟踪_您的程序，然后为大型LMs制作高质量提示（或为小型LMs训练自动微调），以教导它们执行您的任务的步骤。

### 0] 设置

正如我们将在下面看到的，**DSPy** 可以经常地教授强大的模型，比如 `GPT-3.5` 和本地模型，比如 `T5-base` 或 `Llama2-13b`，使其在复杂任务上更加可靠。**DSPy** 将相同的程序编译成不同的 few-shot 提示和/或为每个 LM 进行微调。

让我们开始设置吧。如果尚未安装 **DSPy**，下面的代码片段也将安装 **DSPy**。

In [1]:
%load_ext autoreload
%autoreload 2

import sys
import os

try: # 当在谷歌Colab上时，让我们克隆笔记本以便下载缓存。
    import google.colab
    repo_path = 'dspy'
    !git -C $repo_path pull origin || git clone https://github.com/stanfordnlp/dspy $repo_path
except:
    repo_path = '.'

if repo_path not in sys.path:
    sys.path.append(repo_path)

# 为这个笔记本设置缓存
os.environ["DSP_NOTEBOOK_CACHEDIR"] = os.path.join(repo_path, 'cache')

import pkg_resources # 如果未安装包，则安装该包
if not "dspy-ai" in {pkg.key for pkg in pkg_resources.working_set}:
    !pip install -U pip
    !pip install dspy-ai
    !pip install openai~=0.28.1
    # !pip install -e $repo_path

import dspy

### 1] 入门指南

我们将首先设置语言模型（LM）和检索模型（RM）。**DSPy**支持多种API和本地模型。在这个笔记本中，我们将使用GPT-3.5（`gpt-3.5-turbo`）和检索器`ColBERTv2`。

为了简化操作，我们已经设置了一个ColBERTv2服务器，托管了一个维基百科2017年“摘要”搜索索引（即包含来自[2017转储](https://hotpotqa.github.io/wiki-readme.html)的每篇文章的第一段），因此您无需担心设置！而且是免费的。

**注意：** _如果您想在不更改示例的情况下运行此笔记本，则不需要API密钥。所有示例已经在内部缓存，因此您可以检查它们！_

In [2]:
# 创建一个OpenAI模型实例，模型为'gpt-3.5-turbo'
turbo = dspy.OpenAI(model='gpt-3.5-turbo')

# 创建一个ColBERTv2实例，指定url为'http://20.102.90.50:2017/wiki17_abstracts'
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

# 配置dspy的设置，指定语言模型为turbo，检索模型为colbertv2_wiki17_abstracts
dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)

在上面的最后一行中，我们将**DSPy**配置为默认使用 turbo LM 和 ColBERTv2 检索器（在维基百科2017摘要上）。如果需要的话，这将很容易被本地程序的部分覆盖。

##### 工作流程说明

您可以为各种任务构建自己的**DSPy程序**，例如问答、信息提取或文本到SQL。

无论任务是什么，一般的工作流程如下：

1. **收集一些数据。** 定义程序输入和输出的示例（例如问题及其答案）。这可能只是您记下的一些快速示例。如果存在大型数据集，那就越多越好！
1. **编写您的程序。** 定义程序的模块（即子任务）以及它们应该如何相互交互以解决您的任务。
1. **定义一些验证逻辑。** 什么样的运行结果对您的程序来说是好的？也许答案需要具有特定的长度或遵循特定的格式？指定检查这一点的逻辑。
1. **编译！** 请求**DSPy**使用您的数据“编译”您的程序。编译器将使用您的数据和验证逻辑来优化您的程序（例如提示和模块），使其高效且有效！
1. **迭代。** 通过改进您的数据、程序、验证，或者使用**DSPy**编译器的更高级功能，重复这个过程。

现在让我们看看这个过程是如何实现的。

### 2] 任务示例

**DSPy** 可以适用于各种应用和任务。**在这个介绍性笔记本中，我们将处理多跳问答（QA）的示例任务。**

其他笔记本和教程将呈现不同的任务。现在，让我们从HotPotQA多跳数据集中加载一个小样本。

In [3]:
from dspy.datasets import HotPotQA

# 加载数据集。
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# 告诉 DSPy 'question' 字段是输入。其他字段是标签和/或元数据。
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)

(20, 50)

我们刚刚加载了`trainset`（20个示例）和`devset`（50个示例）。我们的**训练集**中的每个示例只包含一个**问题**和其（人工标注的）**答案**。

**DSPy**通常需要非常少的标记。而您的流水线可能涉及六七个复杂步骤，您只需要标记初始问题和最终答案。**DSPy**将引导任何需要支持您的流水线的中间标签。如果您以任何方式更改流水线，则引导的数据将相应更改！

现在，让我们看一些数据示例。

In [4]:
train_example = trainset[0]  # 从训练集中取出第一个样本
print(f"Question: {train_example.question}")  # 打印问题
print(f"Answer: {train_example.answer}")  # 打印答案

Question: At My Window was released by which American singer-songwriter?
Answer: John Townes Van Zandt


在**开发集**中的示例包含第三个字段，即相关维基百科文章的**标题**。这并非必要，但为了本介绍的目的，它将帮助我们了解我们的程序表现如何。

In [5]:
# 选择第19个样本
dev_example = devset[18]
# 打印问题
print(f"Question: {dev_example.question}")
# 打印答案
print(f"Answer: {dev_example.answer}")
# 打印相关的维基百科标题
print(f"Relevant Wikipedia Titles: {dev_example.gold_titles}")

Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Answer: English
Relevant Wikipedia Titles: {'Robert Irvine', 'Restaurant: Impossible'}


加载原始数据后，我们对每个示例应用了 `x.with_inputs('question')`，以告诉 **DSPy** 我们每个示例中的输入字段将仅为 `question`。任何其他字段都是标签或元数据，不会提供给系统。

In [6]:
# 打印训练集示例的输入键和标签键
print(f"对于这个数据集，训练示例具有输入键 {train_example.inputs().keys()} 和标签键 {train_example.labels().keys()}")
# 打印验证集示例的输入键和标签键
print(f"对于这个数据集，验证示例具有输入键 {dev_example.inputs().keys()} 和标签键 {dev_example.labels().keys()}")

For this dataset, training examples have input keys ['question'] and label keys ['answer']
For this dataset, dev examples have input keys ['question'] and label keys ['answer', 'gold_titles']


请注意，HotPotQA数据集并没有什么特别之处：它只是一个示例列表。

您可以按照以下方式定义自己的示例。未来的笔记本将指导您在不寻常或数据稀缺的环境中创建自己的数据，这是**DSPy**擅长的领域。

```
dspy.Example(field1=value, field2=value2, ...)
```

### 3] 构建模块

在**DSPy**中，我们将保持**以声明方式定义模块**和**在管道中调用它们来解决任务**之间的清晰分离。

这使您可以专注于管道的信息流。然后，**DSPy**将获取您的程序并自动优化**如何提示**（或微调）适用于您特定管道的LM，以使其正常工作。

如果您有PyTorch的经验，您可以将DSPy视为基础模型空间的PyTorch。在我们看到它实际运行之前，让我们先了解一些关键部分。

##### 使用语言模型：**签名** 和 **预测器**

在 **DSPy** 程序中每次调用 LM 都需要有一个 **签名**。

签名由三个简单元素组成：

- LM 应该解决的子任务的最小描述。
- 一个或多个输入字段的描述（例如，输入问题），我们将提供给 LM。
- 一个或多个输出字段的描述（例如，问题的答案），我们期望从 LM 中得到。

让我们为基本问题回答定义一个简单的签名。

In [7]:
class BasicQA(dspy.Signature):
    """用简短的事实性答案回答问题。"""

    # 定义输入字段 question
    question = dspy.InputField()
    # 定义输出字段 answer，描述为通常在1到5个单词之间
    answer = dspy.OutputField(desc="通常在1到5个单词之间")

在`BasicQA`中，文档字符串描述了这里的子任务（即回答问题）。每个`InputField`或`OutputField`也可以选择包含一个描述`desc`。如果没有提供，它将从字段的名称中推断（例如，`question`）。

请注意，在**DSPy**中，这个签名并没有什么特别之处。我们可以很容易地定义一个签名，它接受来自PDF的长段落，并输出结构化信息，例如。

无论如何，既然我们有了一个签名，让我们定义并使用一个**Predictor**。预测器是一个模块，它知道如何使用LM来实现一个签名。重要的是，预测器可以**学习**来适应任务的行为！

In [8]:
# 定义预测器。
generate_answer = dspy.Predict(BasicQA)

# 在特定输入上调用预测器。
pred = generate_answer(question=dev_example.question)

# 打印输入和预测结果。
print(f"问题: {dev_example.question}")
print(f"预测答案: {pred.answer}")

Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Predicted Answer: American


在上面的示例中，我们询问了有关“餐厅：不可能”的厨师的预测器。该模型输出了一个答案（“美国人”）。

为了更清晰地查看，我们可以检查这个极其基本的预测器是如何实现我们的签名的。让我们检查我们的LM（**turbo**）的历史。

In [9]:
# 调用turbo.inspect_history函数，查看最近一次操作的历史记录
turbo.inspect_history(n=1)





Answer questions with short factoid answers.

---

Follow the following format.

Question: ${question}
Answer: often between 1 and 5 words

---

Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Answer:[32m American[0m





这位厨师既是英国人又是美国人，但我们无法知道模型是否只是猜测“美国人”，因为这是一个常见的答案。一般来说，添加**检索**和**学习**将有助于语言模型更加客观，我们将在接下来的一分钟内探讨这一点！

但在我们这样做之前，我们怎么样_只是_改变预测器呢？允许模型引发一系列思考与预测会是不错的。

In [10]:
# 定义预测器。请注意，我们只是更改了类。签名BasicQA保持不变。
generate_answer_with_chain_of_thought = dspy.ChainOfThought(BasicQA)

# 在相同的输入上调用预测器。
pred = generate_answer_with_chain_of_thought(question=dev_example.question)

# 打印输入、思维链和预测结果。
print(f"问题: {dev_example.question}")
print(f"思维链: {pred.rationale.split('.', 1)[1].strip()}")
print(f"预测答案: {pred.answer}")

Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Thought: We know that the chef and restaurateur featured in Restaurant: Impossible is Robert Irvine.
Predicted Answer: British


这确实是一个更好的答案：模型找出了问题中的厨师是**Robert Irvine**，并正确地确定他是英国人。

这些预测器（`dspy.Predict`和`dspy.ChainOfThought`）可以应用于_任何_签名。正如我们将在下面看到的，它们还可以被优化以从您的数据和验证逻辑中学习。

##### 使用检索模型

使用检索器非常简单。一个模块 `dspy.Retrieve(k)` 将搜索与给定查询匹配的前 `k` 个段落。

默认情况下，这将使用我们在笔记本顶部配置的检索器，即在维基百科索引上的 ColBERTv2。

In [11]:
# 创建一个Retrieve对象，设置k值为3
retrieve = dspy.Retrieve(k=3)

# 使用retrieve对象获取包含问题的topK_passages
topK_passages = retrieve(dev_example.question).passages

# 打印问题的前k个段落
print(f"问题的前{retrieve.k}个段落: {dev_example.question} \n", '-' * 30, '\n')

# 遍历topK_passages并打印每个段落
for idx, passage in enumerate(topK_passages):
    print(f'{idx+1}]', passage, '\n')

Top 3 passages for question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible? 
 ------------------------------ 

1] Restaurant: Impossible | Restaurant: Impossible is an American reality television series, featuring chef and restaurateur Robert Irvine, that aired on Food Network from 2011 to 2016. 

2] Jean Joho | Jean Joho is a French-American chef and restaurateur. He is chef/proprietor of Everest in Chicago (founded in 1986), Paris Club Bistro & Bar and Studio Paris in Chicago, The Eiffel Tower Restaurant in Las Vegas, and Brasserie JO in Boston. 

3] List of Restaurant: Impossible episodes | This is the list of the episodes for the American cooking and reality television series "Restaurant Impossible", produced by Food Network. The premise of the series is that within two days and on a budget of $10,000, celebrity chef Robert Irvine renovates a failing American restaurant with the goal of helping to restore it to profitability and prominence.

随意提出任何其他问题。

In [12]:
# 调用 retrieve 函数，传入问题"When was the first FIFA World Cup held?"，并获取第一个段落的内容
retrieve("When was the first FIFA World Cup held?").passages[0]

'History of the FIFA World Cup | The FIFA World Cup was first held in 1930, when FIFA president Jules Rimet decided to stage an international football tournament. The inaugural edition, held in 1930, was contested as a final tournament of only thirteen teams invited by the organization. Since then, the World Cup has experienced successive expansions and format remodeling to its current 32-team final tournament preceded by a two-year qualifying process, involving over 200 teams from around the world.'

### 4] 程序 1: 基本的检索增强生成（“RAG”）

让我们为这个任务定义我们的第一个完整程序。我们将构建一个用于答案生成的检索增强管道。

给定一个问题，我们将在维基百科中搜索前三个段落，然后将它们作为答案生成的上下文。

让我们从定义这个签名开始：`上下文，问题 --> 答案`。

In [13]:
class GenerateAnswer(dspy.Signature):
    """用简短的事实性答案回答问题。"""

    # 输入字段，可能包含相关事实
    context = dspy.InputField(desc="may contain relevant facts")
    # 输入字段，问题
    question = dspy.InputField()
    # 输出字段，通常为1到5个单词
    answer = dspy.OutputField(desc="often between 1 and 5 words")

好的。现在让我们定义实际的程序。这是一个从`dspy.Module`继承的类。

它需要两个方法：

- `__init__`方法将简单地声明它需要的子模块：`dspy.Retrieve`和`dspy.ChainOfThought`。后者被定义为实现我们的`GenerateAnswer`签名。
- `forward`方法将描述使用我们拥有的模块来回答问题的控制流程。

In [14]:
# 定义一个名为RAG的类，继承自dspy.Module类
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        # 初始化Retrieve模块，设置参数k为num_passages
        self.retrieve = dspy.Retrieve(k=num_passages)
        # 初始化ChainOfThought模块，内部使用GenerateAnswer类
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    # 定义前向传播函数，接收问题作为输入
    def forward(self, question):
        # 使用Retrieve模块获取相关文本段落，并存储在context中
        context = self.retrieve(question).passages
        # 使用generate_answer模块生成答案，传入context和question作为参数
        prediction = self.generate_answer(context=context, question=question)
        # 返回一个Prediction对象，包含context和生成的答案
        return dspy.Prediction(context=context, answer=prediction.answer)

##### 编译 RAG 程序

定义了这个程序之后，现在让我们来**编译**它。编译程序将会更新每个模块中存储的参数。在我们的设置中，这主要是收集和选择好的示范，以便包含在您的提示中。

编译取决于三件事情：

1. **一个训练集。** 我们将使用上面的 `trainset` 中的 20 个问题-答案示例。
1. **一个验证度量。** 我们将定义一个快速的 `validate_context_and_answer` 函数，用于检查预测的答案是否正确。它还会检查检索到的上下文是否确实包含该答案。
1. **一个特定的提示器。** **DSPy** 编译器包含了许多可以优化您的程序的**提示器**。

**电子提示器：** 电子提示器是强大的优化器，可以接受任何程序并学习引导和选择其模块的有效提示。因此得名，意为“远程提示”。

不同的电子提示器在优化成本与质量等方面提供各种权衡。在这个笔记本中，我们将使用一个简单的默认`BootstrapFewShot`。

_如果你喜欢类比，你可以将这看作是标准DNN监督学习设置中的训练数据、损失函数和优化器。而SGD是一种基本的优化器，还有更复杂（更昂贵！）的优化器，比如Adam或RMSProp。_

In [15]:
from dspy.teleprompt import BootstrapFewShot

# 验证逻辑：检查预测答案是否正确。
# 同时检查检索到的上下文是否确实包含该答案。
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# 设置一个基本的提示器，用于编译我们的RAG程序。
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

# 编译！
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

 50%|█████     | 10/20 [00:00<00:00, 121.99it/s]

Bootstrapped 4 full traces after 11 examples in round 0.





现在我们已经编译了我们的RAG程序，让我们试一试。

In [16]:
# 询问任何你想问的问题给这个简单的RAG程序。
my_question = "What castle did David Gregory inherit?"

# 获取预测结果。这包括`pred.context`和`pred.answer`。
pred = compiled_rag(my_question)

# 打印问题和答案。
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")

Question: What castle did David Gregory inherit?
Predicted Answer: Kinnairdy Castle
Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 t...', 'David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University ...']


很好。我们来检查语言模型的最后一个提示吧？

In [17]:
# 导入 turbo 模块
import turbo

# 调用 inspect_history 函数，并设置参数 n=1
turbo.inspect_history(n=1)





Answer questions with short factoid answers.

---

Question: At My Window was released by which American singer-songwriter?
Answer: John Townes Van Zandt

Question: "Everything Has Changed" is a song from an album released under which record label ?
Answer: Big Machine Records

Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?
Answer: 1950

Question: Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20?
Answer: Aleem Sarwar Dar

Question: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?
Answer: "Outfield of Dreams"

Question: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?
Answer: Aleksandr Danilovich Aleksandrov

Question: The Organisation that allows a community to influence their operation or use and to enjoy the benefits 

尽管我们还没有编写任何这些详细的演示，但我们看到**DSPy**能够从我们极其简单的程序中引导出这个包含3,000个标记的提示，用于**3-shot检索增强生成，包括硬负面段落和思维链**。

这展示了组合和学习的力量。当然，这只是由特定的电子提示器生成的，可能在每种设置下都不完美。正如您将在**DSPy**中看到的，您有一个庞大但系统化的选项空间，可以优化和验证程序的质量和成本。

如果您有兴趣，您可以轻松地检查学习到的对象本身。

In [18]:
# 遍历compiled_rag中的所有命名预测器
for name, parameter in compiled_rag.named_predictors():
    # 打印预测器的名称
    print(name)
    # 打印预测器的第一个演示参数
    print(parameter.demos[0])
    # 打印空行
    print()

generate_answer
Example({'augmented': True, 'context': ['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. "Tae Kwon Do Times" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.', "Kwon Tae-man | Kwon Tae-man (born 1941) was an early Korean hapkido practitioner and a pioneer of the art, first in Korea and then in the United States. He formed one of the earliest dojang's for hapkido in the United States in Torrance, California, and has been featured in many magazine articles promoting the art.", 'Hee Il Cho | Cho Hee Il (born October 13, 1940) is a prominent Korean-American master of taekwondo, holding the rank of 9th "dan" in the martial art. He has written 11 martial art books, produced 70 martial art tra

##### 评估答案

现在我们可以在开发集上评估我们的`compiled_rag`程序。当然，这个小数据集并不意味着是一个可靠的基准，但用它来进行说明是很有益的。

首先，让我们评估预测答案的准确性（完全匹配）。

In [19]:
from dspy.evaluate.evaluate import Evaluate

# 设置`evaluate_on_hotpotqa`函数。我们将在下面多次使用它。
evaluate_on_hotpotqa = Evaluate(devset=devset, num_threads=1, display_progress=True, display_table=5)

# 使用`answer_exact_match`指标评估`compiled_rag`程序。
metric = dspy.evaluate.answer_exact_match
evaluate_on_hotpotqa(compiled_rag, metric=metric)

Average Metric: 22 / 50  (44.0): 100%|██████████| 50/50 [00:00<00:00, 116.45it/s]


Average Metric: 22 / 50  (44.0%)


Unnamed: 0,question,example_answer,gold_titles,context,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...",No,✔️ [True]
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'}",['2017–18 Pittsburgh Penguins season | The 2017–18 Pittsburgh Penguins season will be the 51st season for the National Hockey League ice hockey team that was...,National Hockey League,✔️ [True]
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager...",Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Castle', 'Crichton Collegiate Church'}","[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...",River Tyne,✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Æthelweard (son of Alfred)', 'Ealhswith'}","[""Æthelweard of East Anglia | Æthelweard (died 854) was a 9th-century king of East Anglia, the long-lived Anglo-Saxon kingdom which today includes the English counties...",King Alfred the Great,✔️ [True]


44.0

##### 评估检索结果

查看检索准确性也是有益的。有多种方法可以做到这一点。通常，我们可以检查检索到的段落是否包含答案。

话虽如此，由于我们的开发集包含应该被检索的黄金标题，我们可以在这里直接使用这些标题。

In [20]:
def gold_passages_retrieved(example, pred, trace=None):
    # 将example中的gold_titles进行规范化处理后放入集合中
    gold_titles = set(map(dspy.evaluate.normalize_text, example['gold_titles']))
    # 将pred中的context按照' | '分割后取第一个部分进行规范化处理后放入集合中
    found_titles = set(map(dspy.evaluate.normalize_text, [c.split(' | ')[0] for c in pred.context]))

    # 判断gold_titles是否是found_titles的子集
    return gold_titles.issubset(found_titles)

# 对compiled_rag模型在HotpotQA数据集上使用gold_passages_retrieved评估指标进行评估
compiled_rag_retrieval_score = evaluate_on_hotpotqa(compiled_rag, metric=gold_passages_retrieved)

Average Metric: 13 / 50  (26.0): 100%|██████████| 50/50 [00:00<00:00, 671.76it/s]

Average Metric: 13 / 50  (26.0%)





Unnamed: 0,question,example_answer,gold_titles,context,pred_answer,gold_passages_retrieved
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...",No,❌ [False]
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'}",['2017–18 Pittsburgh Penguins season | The 2017–18 Pittsburgh Penguins season will be the 51st season for the National Hockey League ice hockey team that was...,National Hockey League,✔️ [True]
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager...",Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Castle', 'Crichton Collegiate Church'}","[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...",River Tyne,✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Æthelweard (son of Alfred)', 'Ealhswith'}","[""Æthelweard of East Anglia | Æthelweard (died 854) was a 9th-century king of East Anglia, the long-lived Anglo-Saxon kingdom which today includes the English counties...",King Alfred the Great,❌ [False]


尽管这个简单的`compiled_rag`程序能够正确回答相当一部分问题（在这个小数据集上超过40%），但检索质量要低得多。

这可能表明语言模型经常依赖于训练期间记忆的知识来回答问题。为了解决这种检索能力较弱的问题，让我们探索一个涉及更高级搜索行为的第二个程序。

### 5] 程序 2: 多跳搜索（“Baleen”）

通过探索训练/开发集中更难的问题，很明显单个搜索查询通常不足以完成这项任务。例如，当一个问题询问《Right Back At It Again》的作者的出生城市时，可以看到这一点。一个搜索查询正确地识别了作者为“Jeremy McKinnon”，但它无法找出他的出生日期。

在检索增强的自然语言处理文献中，针对这一挑战的标准方法是构建多跳搜索系统，如GoldEn（Qi等，2019）和Baleen（Khattab等，2021）。这些系统读取检索结果，然后根据需要生成额外的查询来收集额外的信息。使用**DSPy**，我们可以轻松地在几行代码中模拟这样的系统。

我们仍然会使用上面RAG实现中的`GenerateAnswer`签名。现在我们需要一个**签名**来描述“跳跃”行为：根据一些部分上下文和一个问题，生成一个搜索查询以找到缺失的信息。

In [21]:
class GenerateSearchQuery(dspy.Signature):
    """编写一个简单的搜索查询，以帮助回答一个复杂的问题。"""

    context = dspy.InputField(desc="可能包含相关事实")
    question = dspy.InputField()
    query = dspy.OutputField()

注意：我们本可以编写 `context = GenerateAnswer.signature.context` 来避免重复描述 `context` 字段。

现在，让我们定义程序本身 `SimplifiedBaleen`。有许多可能的实现方式，但为简单起见，我们将此版本保留在关键要素上。

In [22]:
from dsp.utils import deduplicate

# 导入必要的模块

class SimplifiedBaleen(dspy.Module):
    def __init__(self, passages_per_hop=3, max_hops=2):
        super().__init__()

        # 初始化函数，定义了SimplifiedBaleen类
        # 设置了每个跳跃的段落数和最大跳数

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        self.max_hops = max_hops
    
    def forward(self, question):
        context = []
        
        # 前向传播函数，接收问题作为输入

        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        # 通过循环执行每个跳跃，生成查询并检索段落，最后去重合并段落

        pred = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)


正如我们所看到的，`__init__` 方法定义了几个关键的子模块：

- **generate_query**: 对于每一跳，我们将使用一个带有 `GenerateSearchQuery` 签名的 `dspy.ChainOfThought` 预测器。
- **retrieve**: 这个模块将执行实际的搜索，使用生成的查询。
- **generate_answer**: 这个 `dspy.Predict` 模块将在所有搜索步骤之后使用。它具有一个 `GenerateAnswer`，用于实际生成答案。

`forward` 方法使用这些子模块进行简单的控制流程。

1. 首先，我们会循环最多 `self.max_hops` 次。
2. 在每次迭代中，我们将使用位于 `self.generate_query[hop]` 处的预测器生成搜索查询。
3. 我们将使用该查询检索前 k 个段落。
4. 我们将将（去重的）段落添加到我们的 `context` 累加器中。
5. 循环结束后，我们将使用 `self.generate_answer` 生成一个答案。
6. 我们将返回一个带有检索到的 `context` 和预测的 `answer` 的预测结果。

##### 检查Baleen程序的零射版本

我们很快将编译这个程序。但在此之前，我们可以在“零射”设置中尝试它（即，没有任何编译）。

在零射（未编译）设置中使用程序并不意味着质量会很差。这只是意味着我们直接受到底层LM理解我们的子任务的可靠性的限制。

当使用最昂贵/强大的模型（例如GPT-4）处理最简单和最标准的任务（例如回答关于流行实体的简单问题）时，这通常是可以接受的。

然而，对于更专业的任务、新领域/设置以及更高效（或开放）的模型，零射方法很快就会显得力不从心。**DSPy**可以帮助您处理所有这些设置。

In [23]:
# 询问任何问题，这是一个简单的RAG程序。
my_question = "David Gregory继承的城堡有多少层楼?"

# 获取预测结果。这包括`pred.context`和`pred.answer`。
uncompiled_baleen = SimplifiedBaleen()  # 未编译（即零-shot）程序
pred = uncompiled_baleen(my_question)

# 打印上下文和答案。
print(f"问题: {my_question}")
print(f"预测答案: {pred.answer}")
print(f"检索到的上下文（截断）: {[c[:200] + '...' for c in pred.context]}")

Question: How many storeys are in the castle that David Gregory inherited?
Predicted Answer: five
Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'The Boleyn Inheritance | The Boleyn Inheritance is a novel by British author Philippa Gregory which was first published in 2006. It is a direct sequel to her previous novel "The Other Boleyn Girl," an...', 'Gregory of Gaeta | Gregory was the Duke of Gaeta from 963 until his death. He was the second son of Docibilis II of Gaeta and his wife Orania. He succeeded his brother John II, who had left only daugh...', 'Kinnairdy Castle | Kinnairdy Castle is a tower house, having five storeys and a garret, two miles south of Aberchirder, Aberdeenshire, Scotland. The alternative name is Old Kinnairdy....', 'Kinnaird Head | Kinnaird Head (Scottish Gaelic: "An Ceann

让我们检查最后**三次**对LM的调用（即生成第一跳的查询、生成第二跳的查询和生成答案）。

In [24]:
# 使用 turbo.inspect_history(n=3) 函数来检查最近的 3 个历史记录
turbo.inspect_history(n=3)





Write a simple search query that will help answer a complex question.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the query}. We ...

Query: ${query}

---

Context: N/A

Question: How many storeys are in the castle that David Gregory inherited?

Reasoning: Let's think step by step in order to[32m find the answer to this question. First, we need to find information about David Gregory and the castle he inherited. Then, we can search for details about the castle's architecture or any historical records that mention the number of storeys.

Query: "David Gregory castle inheritance"[0m







Write a simple search query that will help answer a complex question.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the query}. We ...

Query: ${query}

---

Context:
[1] «David Gre

##### 编译 Baleen 程序

现在是时候编译我们的多跳（`SimplifiedBaleen`）程序了。

我们首先定义我们的验证逻辑，这将简单要求：

- 预测的答案与标准答案匹配。
- 检索到的上下文包含标准答案。
- 生成的查询中没有冗长的内容（即，没有超过 100 个字符的查询）。
- 生成的查询中没有粗略重复的内容（即，没有与之前的查询在 0.8 或更高的 F1 分数范围内）。

In [25]:
def validate_context_and_answer_and_hops(example, pred, trace=None):
    # 如果预测答案与示例不完全匹配，则返回 False
    if not dspy.evaluate.answer_exact_match(example, pred): return False
    # 如果预测答案与示例的段落不匹配，则返回 False
    if not dspy.evaluate.answer_passage_match(example, pred): return False

    # 从跟踪中提取问题和查询
    hops = [example.question] + [outputs.query for *_, outputs in trace if 'query' in outputs]

    # 如果最长的查询长度超过 100，则返回 False
    if max([len(h) for h in hops]) > 100: return False
    # 如果任何查询与之前查询的相似度大于 0.8，则返回 False
    if any(dspy.evaluate.answer_exact_match_str(hops[idx], hops[:idx], frac=0.8) for idx in range(2, len(hops))): return False

    return True

就像我们为RAG所做的那样，我们将使用**DSPy**中最基本的电传视器之一，即`BootstrapFewShot`。

In [26]:
# 创建一个BootstrapFewShot对象，使用validate_context_and_answer_and_hops作为度量标准
teleprompter = BootstrapFewShot(metric=validate_context_and_answer_and_hops)

# 使用SimplifiedBaleen()初始化一个对象，并将其作为teacher传入到SimplifiedBaleen对象中，同时设置passages_per_hop参数为2
compiled_baleen = teleprompter.compile(SimplifiedBaleen(), teacher=SimplifiedBaleen(passages_per_hop=2), trainset=trainset)

100%|██████████| 20/20 [00:00<00:00, 64.11it/s]


##### 评估检索

早些时候，我们的简单RAG程序似乎并不很有效地找到回答每个问题所需的所有证据。通过在`SimplifiedBaleen`的`forward`函数中添加一些额外步骤来解决这个问题了吗？编译是否有助于解决这个问题？

这些问题的答案并不总是显而易见。然而，**DSPy**使得尝试许多不同的方法变得非常容易，而且付出的努力很少。

让我们评估我们编译和未编译的Baleen管道的检索质量！

In [27]:
uncompiled_baleen_retrieval_score = evaluate_on_hotpotqa(uncompiled_baleen, metric=gold_passages_retrieved)
# 对未编译的baleen数据进行评估，评估指标为gold_passages_retrieved

In [28]:
# 调用 evaluate_on_hotpotqa 函数，评估 compiled_baleen 的检索分数
compiled_baleen_retrieval_score = evaluate_on_hotpotqa(compiled_baleen, metric=gold_passages_retrieved)

Average Metric: 30 / 50  (60.0): 100%|██████████| 50/50 [00:00<00:00, 54.98it/s]


Average Metric: 30 / 50  (60.0%)


Unnamed: 0,question,example_answer,gold_titles,context,pred_answer,gold_passages_retrieved
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...",No,✔️ [True]
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'}","[""2017 NHL Expansion Draft | The 2017 NHL Expansion Draft was an expansion draft conducted by the National Hockey League on June 18–20, 2017 to...",National Hockey League (NHL),❌ [False]
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","['List of Tampa Bay Lightning general managers | The Tampa Bay Lightning are an American professional ice hockey team based in Tampa, Florida. They play...",Steve Yzerman,❌ [False]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Castle', 'Crichton Collegiate Church'}","[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...",River Tyne,✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Æthelweard (son of Alfred)', 'Ealhswith'}","['Æthelweard (son of Alfred) | Æthelweard (d. 920 or 922) was the younger son of King Alfred the Great and Ealhswith.', 'Æthelred the Unready |...",King Alfred the Great,❌ [False]


In [29]:
print(f"## RAG的检索分数：{compiled_rag_retrieval_score}")  # 注意，对于RAG，编译对检索步骤没有影响
print(f"## 未编译Baleen的检索分数：{uncompiled_baleen_retrieval_score}")
print(f"## 已编译Baleen的检索分数：{compiled_baleen_retrieval_score}")

## Retrieval Score for RAG: 26.0
## Retrieval Score for uncompiled Baleen: 36.0
## Retrieval Score for compiled Baleen: 60.0


太棒了！这个编译的、多跳程序可能有一些东西。但这远非你所能做的全部：**DSPy** 为你提供了一个干净的可组合操作符空间，以处理你所见到的任何缺陷。

我们可以检查一些具体的例子。如果我们发现失败的原因，我们可以：

1. 通过使用额外的子模块来扩展我们的流水线（例如，在检索后进行总结？）
1. 通过使用更复杂的逻辑来修改我们的流水线（例如，也许我们需要在找到所有所需信息后跳出多跳循环？）
1. 优化我们的验证逻辑（例如，也许使用一个度量标准，该度量标准使用第二个 **DSPy** 程序来进行答案评估，而不是依赖严格的字符串匹配）
1. 使用不同的提示器来更积极地优化你的流水线。
1. 添加更多或更好的训练示例！

或者，如果你真的想要的话，我们可以调整我们在程序中使用的签名描述，使它们更精确地适用于它们的子任务。这类似于提示工程，应该是最后的手段，考虑到 **DSPy** 给我们提供的其他强大选项！

In [30]:
# 调用compiled_baleen函数并传入一个问题作为参数
compiled_baleen("How many storeys are in the castle that David Gregory inherited?")

# 调用turbo模块中的inspect_history函数，并指定参数n=3，表示查看最近的3条历史记录
turbo.inspect_history(n=3)





Write a simple search query that will help answer a complex question.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the query}. We ...

Query: ${query}

---

Context: N/A

Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield

Reasoning: Let's think step by step in order to produce the query. We know that the FA Charity Shield is an annual football match played in England between the winners of the previous season's Premier League and FA Cup. In this case, we are looking for the year when Manchester City played against a specific club in the 1972 FA Charity Shield. To find this information, we can search for the history of the FA Charity Shield and the teams that participated in the 1972 edition.

Query: "History of FA Charity Shield 1972"

---

Context: N/A

Question: Which is taller, the Empire State Building or the Bank of Am