<img src="../../docs/images/DSPy8.png" alt="DSPy7 图片" height="120"/>

### 多智能体 DSPy 程序：引导和聚合多个 `ReAct` 智能体

这是一个关于 DSPy 的快速（有点高级）示例。给定一个困难的问答任务和一个智能体架构（`dspy.ReAct`），如何在不调整提示的情况下获得高分呢？

有许多方法，但本笔记本展示了一种复杂的策略，DSPy 使其变得几乎轻而易举：我们将自动引导五种不同的高效提示给 ReAct，然后优化一个聚合器来结合它们的力量。

通常情况下，使用 DSPy 完成这项任务的代码可能比用英语描述要短，所以让我们直接进入代码。

### 0) 简而言之。

我们将在 DSPy 中构建一个 ReAct 代理，该代理在基于检索的问答任务上得分为30%。

然后，我们将使用 `BootstrapFewShotWithRandomSearch` 进行优化，以获得46%的准确率。

接着，我们将在五种不同优化版本的代理上构建一个多代理聚合器。

我们的未优化聚合器将得分为26%。它无法理解任务。因此，我们也将优化聚合器。

最终，我们将得到一个在相同任务上得分高达60%准确率的优化多代理系统。

完成这项工作的核心代码部分可以适应 DSPy 的10行代码，但我们将在下面添加一些简短的解释。

### 1) 设置。

我们将配置语言模型（GPT-3.5）和检索模型（ColBERTv2在维基百科上）。

In [1]:
# 导入必要的库
import dspy
from dspy.evaluate import Evaluate
from dspy.datasets.hotpotqa import HotPotQA
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

# 创建一个 OpenAI 实例，使用 'gpt-3.5-turbo-0125' 模型，设置最大 token 数为 1000
gpt3 = dspy.OpenAI('gpt-3.5-turbo-0125', max_tokens=1000)

# 创建一个 ColBERTv2 实例，指定 URL 为 'http://20.102.90.50:2017/wiki17_abstracts'
colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

# 配置 dspy 使用上面创建的 gpt3 和 colbert 实例
dspy.configure(lm=gpt3, rm=colbert)

### 2) 加载一些数据。

我们将加载150个示例用于训练（`trainset`），50个示例用于验证和优化（`valset`），以及300个示例用于评估（`devset`）。

In [2]:
dataset = HotPotQA(train_seed=1, train_size=200, eval_seed=2023, dev_size=300, test_size=0)
# 创建一个HotPotQA数据集对象，指定训练集大小为200，验证集大小为300

trainset = [x.with_inputs('question') for x in dataset.train[0:150]]
# 从训练集中取出前150个样本，并将每个样本的输入设置为问题

valset = [x.with_inputs('question') for x in dataset.train[150:200]]
# 从训练集中取出第150到第200个样本，并将每个样本的输入设置为问题

devset = [x.with_inputs('question') for x in dataset.dev]
# 将开发集中每个样本的输入设置为问题

# 展示一个数据点的示例；它只是一个问题-答案对
trainset[0]

Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys={'question'})

### 3) ReAct 代理程序。

我们的代理程序将是一个 DSPy ReAct 代理程序，通过使用 ColBERTv2 检索工具，接收一个“问题”并输出一个“答案”。

In [3]:
# 导入必要的库
import dspy

# 创建一个基于规则的对话代理
agent = dspy.ReAct("question -> answer", tools=[dspy.Retrieve(k=1)])

让我们在`devset`上评估这个**未优化**的ReAct代理。

In [4]:
# 在开发集的前300个示例上设置一个评估器。
config = dict(num_threads=8, display_progress=True, display_table=5)
# 创建一个Evaluate对象，传入开发集devset、评估指标answer_exact_match和config参数
evaluate = Evaluate(devset=devset, metric=dspy.evaluate.answer_exact_match, **config)

# 对agent进行评估
evaluate(agent)

Average Metric: 91 / 300  (30.3): 100%|██████████| 300/300 [00:01<00:00, 161.84it/s]


Unnamed: 0,question,example_answer,gold_titles,observations,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","[['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...","No, Cangzhou is in the Hebei province, while Qionghai is in the Hainan province of China.",False
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'}","[[""2017 NHL Expansion Draft | The 2017 NHL Expansion Draft was an expansion draft conducted by the National Hockey League on June 18–20, 2017 to...",National Hockey League,✔️ [True]
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","[['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager...",Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","[[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...",Tweed River,False
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Ealhswith', 'Æthelweard (son of Alfred)'}",[['Æthelweard (son of Alfred) | Æthelweard (d. 920 or 922) was the younger son of King Alfred the Great and Ealhswith.']],King Alfred the Great,✔️ [True]


30.33

### 4) 优化 ReAct。

让我们使用 DSPy 的简单 `BootstrapFewShotWithRandomSearch` 优化器来创建成功的 ReAct 程序示例，并尝试使用这些构建的示例来优化提示。将来，我们还可以尝试更复杂的 DSPy 优化器，比如 `MIPRO`。

我们将以这种方式引导 20 个程序。示例将从 `trainset` 开始引导，并在我们的小 `valset` 上进行优化。我们将在 `devset` 上进行后续评估。

In [5]:
# 定义一个配置字典
config = dict(max_bootstrapped_demos=2, max_labeled_demos=0, num_candidate_programs=20, num_threads=8)
# 使用BootstrapFewShotWithRandomSearch类创建一个对象tp，并传入评估函数和配置参数
tp = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match, **config)
# 调用tp对象的compile方法，传入agent、trainset和valset参数，得到优化后的react对象
optimized_react = tp.compile(agent, trainset=trainset, valset=valset)

Average Metric: 14 / 50  (28.0): 100%|██████████| 50/50 [00:00<00:00, 151.32it/s]
Average Metric: 14 / 50  (28.0): 100%|██████████| 50/50 [00:00<00:00, 1191.35it/s]
  4%|▍         | 6/150 [00:00<00:00, 216.36it/s]
Average Metric: 19 / 50  (38.0): 100%|██████████| 50/50 [00:00<00:00, 158.43it/s]
  3%|▎         | 4/150 [00:00<00:00, 258.73it/s]
Average Metric: 21 / 50  (42.0): 100%|██████████| 50/50 [00:00<00:00, 184.63it/s]
  3%|▎         | 4/150 [00:00<00:01, 125.61it/s]
Average Metric: 24 / 50  (48.0): 100%|██████████| 50/50 [00:00<00:00, 130.39it/s]
  1%|▏         | 2/150 [00:00<00:00, 213.13it/s]
Average Metric: 20 / 50  (40.0): 100%|██████████| 50/50 [00:00<00:00, 158.40it/s]
  3%|▎         | 4/150 [00:00<00:00, 387.38it/s]
Average Metric: 18 / 50  (36.0): 100%|██████████| 50/50 [00:00<00:00, 168.50it/s]
  4%|▍         | 6/150 [00:00<00:00, 201.99it/s]
Average Metric: 12 / 50  (24.0): 100%|██████████| 50/50 [00:00<00:00, 152.09it/s]
  6%|▌         | 9/150 [00:00<00:00, 203.21it/s]


In [13]:
# 调用 evaluate 函数，传入 optimized_react 参数
evaluate(optimized_react)

Average Metric: 138 / 300  (46.0): 100%|██████████| 300/300 [00:00<00:00, 512.74it/s]


Unnamed: 0,question,example_answer,gold_titles,observations,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","[['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...",no,✔️ [True]
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'}",[['2017–18 Pittsburgh Penguins season | The 2017–18 Pittsburgh Penguins season will be the 51st season for the National Hockey League ice hockey team that was...,National Hockey League,✔️ [True]
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","[['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager...",Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","[[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...","Crichton Collegiate Church is located in Midlothian, Scotland, near the hamlet of Crichton, about 7.5 miles south of Edinburgh.",False
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Ealhswith', 'Æthelweard (son of Alfred)'}","[['Æthelweard (son of Alfred) | Æthelweard (d. 920 or 922) was the younger son of King Alfred the Great and Ealhswith.'], ['Æthelstan of Kent |...",Alfred the Great,False


46.0

### 5) 零-shot 聚合器。

现在让我们提取最佳的五个引导式 ReAct 程序。我们将构建一个简单的 DSPy 聚合器，运行所有这些程序，然后生成最终答案。

In [7]:
from dsp.utils import flatten, deduplicate

# 从优化过程中获得性能最佳的五个ReAct程序
AGENTS = [x[-1] for x in optimized_react.candidate_programs[:5]]

class Aggregator(dspy.Module):
    def __init__(self, temperature=0.0):
        self.aggregate = dspy.ChainOfThought('context, question -> answer')
        self.temperature = temperature

    def forward(self, question):
        # 使用高温运行所有五个代理程序，然后提取和去重它们观察到的上下文
        with dspy.context(lm=gpt3.copy(temperature=self.temperature)):
            preds = [agent(question=question) for agent in AGENTS]
            context = deduplicate(flatten([flatten(p.observations) for p in preds]))

        # 运行聚合步骤以生成最终答案
        return self.aggregate(context=context, question=question)

让我们在优化之前快速评估聚合器。

In [8]:
# 创建一个聚合器对象
aggregator = Aggregator()
# 对聚合器进行评估
evaluate(aggregator)

Average Metric: 78 / 300  (26.0): 100%|██████████| 300/300 [00:06<00:00, 45.38it/s]


Unnamed: 0,question,example_answer,gold_titles,rationale,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}",determine if both Cangzhou and Qionghai are in the Hebei province of China. We need to carefully analyze the information provided in the context to...,"No, only Cangzhou is in the Hebei province of China. Qionghai is located in Hainan province.",False
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'}","produce the answer. We know that Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season. Looking at the context provided, we...","The 2017 NHL Expansion Draft conducted by the National Hockey League filled the roster of the Vegas Golden Knights, including selecting Marc-Andre Fleury for the...",False
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}",identify the retired Canadian professional ice hockey player and current general manager of the Tampa Bay Lightning of the National Hockey League (NHL) whose retirement...,Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","identify the river near the Crichton Collegiate Church. We know that the church is situated in Midlothian, Scotland, and the River Esk flows through Midlothian...",The River Esk,False
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Ealhswith', 'Æthelweard (son of Alfred)'}","produce the answer. We know from the context that Ealhswith had a son named Æthelweard in the 10th century A.D. Now, looking at the information...",King Alfred the Great,✔️ [True]


26.0

### 6) 优化的聚合器。

In [9]:
# 定义关键字参数
kwargs = dict(max_bootstrapped_demos=2, max_labeled_demos=6, num_candidate_programs=10, num_threads=8)
# 创建BootstrapFewShotWithRandomSearch对象tp，使用关键字参数kwargs
tp = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match, **kwargs)
# 编译优化聚合器aggregator，使用训练集trainset和验证集valset，得到优化后的聚合器optimized_aggregator
optimized_aggregator = tp.compile(aggregator, trainset=trainset, valset=valset)

Average Metric: 16 / 50  (32.0): 100%|██████████| 50/50 [00:00<00:00, 153.98it/s]
Average Metric: 27 / 50  (54.0): 100%|██████████| 50/50 [00:00<00:00, 82.75it/s]
  3%|▎         | 4/150 [00:00<00:03, 45.32it/s]
Average Metric: 28 / 50  (56.0): 100%|██████████| 50/50 [00:00<00:00, 156.28it/s]
  1%|▏         | 2/150 [00:00<00:03, 39.99it/s]
Average Metric: 28 / 50  (56.0): 100%|██████████| 50/50 [00:00<00:00, 162.26it/s]
  1%|          | 1/150 [00:00<00:02, 51.23it/s]
Average Metric: 26 / 50  (52.0): 100%|██████████| 50/50 [00:00<00:00, 158.64it/s]
  1%|          | 1/150 [00:00<00:00, 155.47it/s]
Average Metric: 28 / 50  (56.0): 100%|██████████| 50/50 [00:00<00:00, 159.96it/s]
  1%|          | 1/150 [00:00<00:04, 31.56it/s]
Average Metric: 27 / 50  (54.0): 100%|██████████| 50/50 [00:00<00:00, 143.11it/s]
  1%|          | 1/150 [00:00<00:03, 43.19it/s]
Average Metric: 29 / 50  (58.0): 100%|██████████| 50/50 [00:00<00:00, 163.95it/s]
  1%|▏         | 2/150 [00:00<00:04, 31.94it/s]
Average 

In [10]:
# 复制 optimized_aggregator 对象并将其赋值给 optimized_aggregator2
optimized_aggregator2 = optimized_aggregator.deepcopy()
# 修改 optimized_aggregator2 对象的 temperature 属性为 0.7
optimized_aggregator2.temperature = 0.7

# 调用 evaluate 函数并传入 optimized_aggregator2 对象作为参数
evaluate(optimized_aggregator2)

Average Metric: 180 / 300  (60.0): 100%|██████████| 300/300 [00:07<00:00, 42.10it/s]


Unnamed: 0,question,example_answer,gold_titles,rationale,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","produce the answer. From the context, we know that Cangzhou is a prefecture-level city in eastern Hebei province, while Qionghai is one of the seven...",no,✔️ [True]
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'}","produce the answer. From the context, we know that Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season. The draft that...",National Hockey League,✔️ [True]
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}",produce the answer. We know from the context that Steve Yzerman is a Canadian retired professional ice hockey player and the current general manager of...,Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","produce the answer. We know that Crichton Collegiate Church is located in Midlothian, Scotland, near the hamlet of Crichton. Since it is close to Edinburgh,...",River Esk,False
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Ealhswith', 'Æthelweard (son of Alfred)'}","produce the answer. From the context, we know that Ealhswith was the wife of King Alfred the Great. Therefore, in the 10th Century A.D., Ealhswith...",King Alfred the Great,✔️ [True]


60.0

### 7) 结论。

通常，我们喜欢发布带有预先计算缓存的笔记本，并使用 `gpt3.inspect_history` 来检查提示，以探索优化的行为。请查看介绍笔记本（或 README 中的任何 Colab 笔记本）以获取这样的带注释示例！

为了保持当前版本的速度快，如果有足够的兴趣，Omar 将会将这个笔记本扩展为带注释的版本。

### 8) 结语。

通过一点点的语法糖，这个笔记本中的主要代码可以简短到只有10行，不包括空格：

```python
agent = dspy.ReAct("question -> answer", tools=[dspy.Retrieve(k=1)])

optimizer = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match)
optimized_react = optimizer.compile(agent, trainset=trainset, valset=valset)

class Aggregator(dspy.Module):
	def __init__(self):
		self.aggregate = dspy.ChainOfThought('context, question -> answer')

	def forward(self, question):
        preds = [agent(question=question) for agent in optimized_react.best_programs[:5]]
		return self.aggregate(context=deduplicate(flatten([p.observations for p in preds])), question=question)
	
optimized_aggregator = optimizer.compile(aggregator, trainset=trainset, valset=valset)

# 使用它！
optimized_aggregator(question="David Gregory继承的城堡有多少层？")
```