# 为您的提示词模板生成合成测试数据

想象您有一个这样的提示词：

"""这是一些我想要您分析的内容：

<thing>
{{thing1}}
</thing>
<thing>
{{thing2}}
</thing>

这些事物是 [事物的描述]。请仔细阅读它们并 [执行一些任务]。"""

在这里，我们将 thing1 和 thing2 称为"变量"——您希望您的提示词对于 thing1 和 thing2 的许多不同可能值都能表现良好。

如何测试这个提示词模板？也许您有一些真实生活中的值可以代入。但也许您没有，或者也许您由于隐私原因无法使用您拥有的值进行测试。不用担心——Claude 可以编造它们！本教程演示如何使用 Claude 和 Claude API 为您的提示词生成合成测试数据。它包括从模板中提取变量、构造示例块、生成测试用例和迭代优化结果的功能。这有两大好处：

1. 提示词评估
您可以使用这些测试用例来查看 Claude 在真实示例上的表现。

2. 使用多轮示例改进提示词
给 Claude 示例可能是提高其性能的最佳方式。这个笔记本可以帮助您生成真实的输入，这是获得理想输入/输出对的一半工作。

In [None]:
% pip install anthropic IPython

In [1]:
import re
import anthropic

# Enter your API key here
api_key = ""
CLIENT = anthropic.Anthropic(api_key=api_key)
MODEL_NAME = "claude-sonnet-4-5"

让我们开始定义一些将在整个笔记本中使用的辅助函数。

In [2]:
# First, we have the `extract_variables` function,
# It takes in a prompt template and extracts the double-mustache-bracketed "variables" contained.
def extract_variables(prompt_template):
    """Extract variables from a prompt template."""
    pattern = r"{{([^}]+)}}"
    variables = re.findall(pattern, prompt_template)
    return set(variables)


# Next, we have `construct_variables_names`, which just joins them together connected by newlines.
def construct_variables_names(prompt_template):
    """Construct a string of variable names from a prompt template."""
    variables = extract_variables(prompt_template)
    return "\n".join(variables)


# The `construct_variables_block` function takes in the list of variables, and constructs a "variables block"
# The variables block might look like this, if the variables were 'animal' and 'topic':
"""
<animal>
[a full, complete, value for the variable "animal"]
</animal>
<topic>
[a full, complete, value for the variable "topic"]
</topic>
"""


def construct_variables_block(prompt_template):
    """Construct a variables block for the synthetic test data prompt."""
    variables = extract_variables(prompt_template)
    output = ""
    for v in variables:
        output += f"<{v}>\n"
        output += f'[a full, complete, value for the variable "{v}". (You do not need to repeat the variable name inside the tags.)]\n'
        output += f"</{v}>\n"
    return output.strip()


# `construct_examples` takes a dictionary of {variable: value} and constructs an XML-formatted example.
# E.g. if the dict is
# {'animal': 'cat', 'topic': 'movement patterns'}, then the example would be
"""
<example>
<variables>
<animal>
cat
</animal>
<topic>
movement patterns
</topic>
</variables>
</example>
"""


def construct_example_block(variable_dict):
    """Construct an example block from a dictionary of variables."""
    output = "<example>\n<variables>\n"
    for k, v in variable_dict.items():
        output += f"<{k}>\n{v}\n</{k}>\n"
    output = output.strip()
    output += "\n</variables>\n</example>"
    return output

## 生成数据的提示词模板

这些提示词模板的一般思想是接收一个带有变量的用户提交提示词模板，并构造一些值来填充模板。

实际上，下面有两个提示词模板；一个假设用户已经提供了示例变量值，另一个不假设这一点。

它们的共同点是两个模板都首先为 Claude 提供关于情况的上下文，并指导 Claude 在输出测试用例之前仔细思考每个变量的规范以及用户提供的提示词模板整体。

In [3]:
# Formatting Prompt Templates for Synthetic Evaluations

# This function prepares the prompt template for generating synthetic test data.


def format_prompt_template_for_synth_evals(prompt_template, examples=None):
    """Format a prompt template for synthetic evaluations."""
    synth_test_data_prompt_template_with_example = """<Prompt Template>
{{PROMPT_TEMPLATE}}
</Prompt Template>

Your job is to construct a test case for the prompt template above. This template contains "variables", which are placeholders to be filled in later. In this case, the variables are:

<variables>
{{CONSTRUCT_VARIABLES_NAMES}}
</variables>

Here are the example test cases provided by the user.
<examples>
{{EXAMPLES}}
</examples>

First, in <planning> tags, do the following:

1. Summarize the prompt template. What is the goal of the user who created it?
2. For each variable in <variables>, carefully consider what a paradigmatic, realistic example of that variable would look like. You'll want to note who will be responsible "in prod" for supplying values. Written by a human "end user"? Downloaded from a website? Extracted from a database? Think about things like length, format, and tone in addition to semantic content. Use the examples provided by the user to guide this exercise. The goal is to acquire a sense of the statistical distribution the examples are being drawn from. The example you write should be drawn from that same distribution, but sufficiently different from the examples that it provides additional signal. A tricky balancing act, but I have faith in you.

Once you're done, output a test case for this prompt template with a full, complete, value for each variable. The output format should consist of a tagged block for each variable, with the value inside the block, like the below:

<variables>
{{CONSTRUCT_VARIABLES_BLOCK}}
</variables>"""

    synth_test_data_prompt_template_without_example = """<Prompt Template>
{{PROMPT_TEMPLATE}}
</Prompt Template>

Your job is to construct a test case for the prompt template above. This template contains "variables", which are placeholders to be filled in later. In this case, the variables are:

<variables>
{{CONSTRUCT_VARIABLES_NAMES}}
</variables>

First, in <planning> tags, do the following:

1. Summarize the prompt template. What is the goal of the user who created it?
2. For each variable in <variables>, carefully consider what a paradigmatic, realistic example of that variable would look like. You'll want to note who will be responsible "in prod" for supplying values. Written by a human "end user"? Downloaded from a website? Extracted from a database? Think about things like length, format, and tone in addition to semantic content.

Then, output a test case for this prompt template with a full, complete, value for each variable. The output format should consist of a tagged block for each variable, with the value inside the block, like the below:
<variables>
{{CONSTRUCT_VARIABLES_BLOCK}}
</variables>"""

    if examples:
        examples_block = "\n".join([construct_example_block(example) for example in examples])
        return (
            synth_test_data_prompt_template_with_example.replace(
                "{{PROMPT_TEMPLATE}}", prompt_template
            )
            .replace("{{CONSTRUCT_VARIABLES_NAMES}}", construct_variables_names(prompt_template))
            .replace("{{CONSTRUCT_VARIABLES_BLOCK}}", construct_variables_block(prompt_template))
            .replace("{{EXAMPLES}}", examples_block)
        )
    else:
        return (
            synth_test_data_prompt_template_without_example.replace(
                "{{PROMPT_TEMPLATE}}", prompt_template
            )
            .replace("{{CONSTRUCT_VARIABLES_NAMES}}", construct_variables_names(prompt_template))
            .replace("{{CONSTRUCT_VARIABLES_BLOCK}}", construct_variables_block(prompt_template))
        )

接下来，另一个用于填充适当提示词模板并调用 Claude 的快速辅助函数。

In [4]:
def get_test_data(prompt_template, examples, custom_planning=None):
    """Generate test data using the Claude API."""
    synth_eval_prompt_ready = format_prompt_template_for_synth_evals(prompt_template, examples)

    messages = [
        {
            "role": "user",
            "content": synth_eval_prompt_ready,
        }
    ]
    if custom_planning:
        messages.append(
            {
                "role": "assistant",
                "content": custom_planning,
            }
        )

    message = (
        CLIENT.messages.create(
            max_tokens=4000,
            messages=messages,
            model=MODEL_NAME,
            temperature=1,
        )
        .content[0]
        .text
    )

    return message

In [5]:
# We'll use this function to sample Claude's response to the filled-in template,
# once we have our example values/test case.


def call_claude_with_template(prompt_template, variables):
    """Call Claude with a filled prompt template."""
    filled_template = prompt_template
    for var, value in variables.items():
        filled_template = filled_template.replace(f"{{{{{var}}}}}", value)

    message = (
        CLIENT.messages.create(
            max_tokens=4000,
            messages=[
                {
                    "role": "user",
                    "content": filled_template,
                }
            ],
            model=MODEL_NAME,
            temperature=0.7,
        )
        .content[0]
        .text
    )

    return message

现在我们可以开始将各个部分组合在一起了。首先，在这里输入您的提示词模板。

In [6]:
# Replace this with your prompt template!
# Use double-brackets to indicate variables
# Here's an example:
prompt_template = """You are a customer support bot for Acme Corporation. 
Here is an FAQ with Acme's relevant policies:

<documents>
{{DOCUMENTS}}
</documents>

Please respond to this customer support question using details from the policies:

<question>
{{QUESTION}}
</question>"""

variables = extract_variables(prompt_template)
print("\nIdentified variables:")
for var in variables:
    print(f"- {var}")


Identified variables:
- DOCUMENTS
- QUESTION


接下来，如果您有任何输入和理想输出的"黄金示例"，您可以输入这些。代码目前被注释掉了。

In [7]:
planning_text = None
USER_EXAMPLES = []

# if input("\nDo you want to provide an example value for your variables? (y/n): ").lower() == 'y':
#     example = {}
#     for var in variables:
#         example[var] = input(f"Enter an example value for {var}: ")
#     USER_EXAMPLES.append(example)

接下来，我们可以获取用此信息填充的测试用例生成提示词模板，并获取一个测试用例！

In [8]:
result = get_test_data(prompt_template, USER_EXAMPLES, planning_text)

现在，让我们查看测试用例和 Claude 用于生成它的规划。

In [10]:
planning_match = re.search(r"<planning>(.*?)</planning>", result, re.DOTALL)
if planning_match and not planning_text:
    planning_text = "<planning>\n" + planning_match.group(1).strip() + "\n</planning>"

extracted_variables = {}
for var in variables:
    var_match = re.search(f"<{var}>(.*?)</{var}>", result[result.index("<variables>") :], re.DOTALL)
    if var_match:
        extracted_variables[var] = var_match.group(1).strip()

USER_EXAMPLES.append(extracted_variables)

print("~~~~~~~~~~~\nGenerated test case:\n~~~~~~~~~~~")
for var, value in extracted_variables.items():
    print(f"{var}:\n{value}\n")

print("~~~~~~~~~~~\nPlanning:\n~~~~~~~~~~~")
print(planning_text)

~~~~~~~~~~~
Generated test case:
~~~~~~~~~~~
DOCUMENTS:
Return Policy
- Items may be returned within 30 days of purchase with original receipt
- Items must be unused and in original packaging
- Shipping costs are non-refundable
- Gift cards are non-returnable

Shipping Information
- Standard shipping (5-7 business days): Free on orders over $50
- Express shipping (2-3 business days): $12.99
- Overnight shipping (next business day): $24.99
- We ship to continental US only
- Alaska and Hawaii orders incur additional $15 fee

Payment Methods
- We accept Visa, Mastercard, American Express, and PayPal
- Payment is processed at time of order
- Gift cards cannot be used for partial payment

QUESTION:
Hi, I ordered a sweater last week but it doesn't fit right. Can I return it? And will I get refunded for the shipping I paid? Thanks!

~~~~~~~~~~~
Planning:
~~~~~~~~~~~
<planning>
1. Prompt Template Summary:
This template creates a customer service chatbot for Acme Corporation that answers custom

从这里开始，我们可以有几种方式继续。我们可以生成更多测试用例，或者我们可以编辑 Claude 的规划逻辑。让我们稍微编辑一下 Claude 的规划逻辑。也许我们知道 ACME 的文档使用编号行。一些其他现实的改变可能是：

- 让 Claude 告诉自己使文档更长更详细。
- 让 Claude 告诉自己使客户支持查询更正式或更不正式。

In [11]:
planning_text = planning_text.replace(
    "each with a question and answer format",
    "each with a question and answer format and associated number.",
)
# You might have slightly different planning text and therefore need to rewrite the replace.

让我们重置示例，但使用这个规划文本作为预填充。（这可以节省一些采样时间。）

In [12]:
USER_EXAMPLES = []
result = get_test_data(prompt_template, USER_EXAMPLES, planning_text)

现在让我们看看新结果。

In [13]:
# Copied and pasted from a cell above.
planning_match = re.search(r"<planning>(.*?)</planning>", result, re.DOTALL)
if planning_match and not planning_text:
    planning_text = "<planning>\n" + planning_match.group(1).strip() + "\n</planning>"

extracted_variables = {}
for var in variables:
    var_match = re.search(f"<{var}>(.*?)</{var}>", result[result.index("<variables>") :], re.DOTALL)
    if var_match:
        extracted_variables[var] = var_match.group(1).strip()

USER_EXAMPLES.append(extracted_variables)

print("~~~~~~~~~~~\nGenerated test case:\n~~~~~~~~~~~")
for var, value in extracted_variables.items():
    print(f"{var}:\n{value}\n")

print("~~~~~~~~~~~\nPlanning:\n~~~~~~~~~~~")
print(planning_text)

~~~~~~~~~~~
Generated test case:
~~~~~~~~~~~
DOCUMENTS:
Return Policy
- Items may be returned within 30 days of purchase with original receipt
- Items must be unused and in original packaging
- Shipping costs are non-refundable
- Store credit will be issued for items returned without receipt

Shipping Information
- Standard shipping (5-7 business days): $5.99
- Express shipping (2-3 business days): $12.99
- Free standard shipping on orders over $50
- We currently ship only within the continental United States
- Alaska and Hawaii orders subject to additional fees

Payment Methods
- We accept Visa, Mastercard, American Express, and PayPal
- Gift cards cannot be used for online purchases
- Payment is processed at time of order
- All prices are in USD

QUESTION:
Hi, I ordered a sweater last week but it doesn't fit right. Can I return it? I still have the tags on it but I threw away the receipt. Thanks!

~~~~~~~~~~~
Planning:
~~~~~~~~~~~
<planning>
1. Prompt Template Summary:
This template 

很好，它确实做了编号的问答！

让我们再举一个例子。这个将使用我们已经拥有的示例，所以希望它会有趣地不同。

In [14]:
result = get_test_data(prompt_template, USER_EXAMPLES, planning_text)

In [15]:
# Copied and pasted from a cell above.
planning_match = re.search(r"<planning>(.*?)</planning>", result, re.DOTALL)
if planning_match and not planning_text:
    planning_text = "<planning>\n" + planning_match.group(1).strip() + "\n</planning>"

extracted_variables = {}
for var in variables:
    var_match = re.search(f"<{var}>(.*?)</{var}>", result[result.index("<variables>") :], re.DOTALL)
    if var_match:
        extracted_variables[var] = var_match.group(1).strip()

USER_EXAMPLES.append(extracted_variables)

print("~~~~~~~~~~~\nGenerated test case:\n~~~~~~~~~~~")
for var, value in extracted_variables.items():
    print(f"{var}:\n{value}\n")

print("~~~~~~~~~~~\nPlanning:\n~~~~~~~~~~~")
print(planning_text)

~~~~~~~~~~~
Generated test case:
~~~~~~~~~~~
DOCUMENTS:
Product Warranty
- All electronics come with a 1-year limited manufacturer warranty
- Warranty covers defects in materials and workmanship
- Warranty does not cover accidental damage or misuse
- Extended warranty available for purchase within 30 days

Price Match Policy
- We match prices from authorized retailers
- Item must be identical model/color/specification
- Must be in stock at competitor's store
- Online retailers excluded from price matching
- Price match requests must be made at time of purchase

Order Cancellation
- Orders can be cancelled within 2 hours of placement
- Once order is shipped, cancellation not possible
- Cancelled orders refunded to original payment method
- Processing time for refunds: 3-5 business days
- Contact customer service for cancellation requests

QUESTION:
Hello, I bought a laptop from your store 3 weeks ago and it keeps shutting down randomly. It's still under warranty, right? What do I need t

仍然是关于 ACME 公司的，但问题不同，知识库也不同。

从这里开始，世界是您的了——您可以通过循环运行代码生成更多测试用例，更多地编辑规划，在这些测试用例上评估 Claude，并将您制作的测试用例与黄金答案一起作为多轮示例放入您的提示词中。

要获得黄金答案，您可以自己从头开始编写，或者让 Claude 编写答案然后根据需要编辑。随着提示词缓存的出现，现在是向您的提示词添加大量示例以提高性能的最佳时机。