## 使用 DSPy 最佳化 classification few-shot prompt 🏆


這個 Notebook 使用 https://dspy-docs.vercel.app/ 進行  prompt 的最佳化

* 輸入你的任務描述以及分類選項
* 就能產出很厲害 few-shot prompt
* 用在有標準答案的單選分類場景

作者和演講投影片: ihower https://ihower.tw/blog/archives/12444

### 流程

1. 使用 o1-preview 合成訓練問題
2. 使用 gpt-4o 進行 DSPy 最佳化
3. 產生適合 gpt-4o-mini 的 few-shot prompt


## 0. 設定 OpenAI API key

請點 google colab 左邊側欄的鑰匙符號，新增密鑰，名稱是 openai_api_key，值就填 API key

In [None]:
from google.colab import userdata
import json
import os

os.environ["OPENAI_API_KEY"] = userdata.get('openai_api_key')

## 1. 設定參數

In [None]:
synthetic_model = "o1-preview" # 合成訓練問題的模型，若你沒有 o1 權限，請改用 gpt-4o"
generation_model = "gpt-4o" # 合成 prompt 的模型
prediction_model = "gpt-4o-mini" # 用來執行 prompt 的模型

task_description = "情感分析，從一段留言中分析出文字的語氣" # 任務描述，請修改成你的任務
categories = ['矛盾', '感慨', '忐忑', '釋然', '敬畏', '憐憫', '懷舊', '失落', '心酸', '困惑'] # 分類的標籤，請修改

questions_num = len(categories) * 10  # 要合成多少訓練資料，跟花費的 API 成本有關

## 2. 合成最佳化需要的 dataset

In [None]:
synthetic_prompt = f"""You are tasked with generating question-answer pairs for a classification task.
The questions should be based on a given task description, and the answers should be one of the provided categories. Here's what you need to do:

First, review the task description:
<task_description>
{task_description}
</task_description>

Next, familiarize yourself with the categories for classification:
<categories>
{categories}
</categories>

Your goal is to create {questions_num} question-answer pairs that are relevant to the task description and can be classified into one of the given categories.

Follow these guidelines when creating the QA pairs:

1. Start with simple, straightforward questions and gradually increase the complexity.
2. Ensure that the more difficult questions require multi-step reasoning or in-depth knowledge.
3. Include a variety of question types (e.g., factual, analytical, hypothetical) relevant to the task description.
4. Ensure that all questions are directly related to the provided task description.
5. Make sure each question can be clearly classified into one of the given categories.

Generate the QA pairs in the following JSON array format:

[
  {{ "question": "Your question here", "answer": "Corresponding category" }},
  {{ "question": "Another question", "answer": "Another category" }},
  ...
]


The returned content must be a valid JSON array containing all generated QA pairs.
Do not add any extra text or explanations outside the JSON array. The returned content should be directly parseable as a JSON array.
All content must be in Traditional Chinese as used in Taiwan.
"""

In [None]:
!pip install litellm

from litellm import completion



In [None]:
messages = [
    { "content": synthetic_prompt, "role": "user"}
]

if not synthetic_model.startswith('o1'):
  response = completion(model=synthetic_model, messages=messages, response_format={ "type": "json_object" })
else:
  # o1 目前還不支援 json mode
  response = completion(model=synthetic_model, messages=messages)

response = response.choices[0].message.content
dataset = json.loads(response)

In [None]:
len(dataset)

50

In [None]:
dataset

[{'question': '我不知道該選擇哪個，好像每個選項都有吸引我的地方。', 'answer': '矛盾'},
 {'question': '回想起小時候的日子，真的讓人感慨萬千。', 'answer': '感慨'},
 {'question': '明天就要考試了，我好緊張啊！', 'answer': '忐忑'},
 {'question': '事情終於解決了，我感到很釋然。', 'answer': '釋然'},
 {'question': '看到壯麗的山河，我對大自然充滿了敬畏。', 'answer': '敬畏'},
 {'question': '看到流浪的小狗，我感到很憐憫。', 'answer': '憐憫'},
 {'question': '每當聽到這首歌，我就想起過去的美好時光。', 'answer': '懷舊'},
 {'question': '我的申請被拒絕了，感覺很失落。', 'answer': '失落'},
 {'question': '聽到這個消息，我的心裡酸酸的。', 'answer': '心酸'},
 {'question': '為什麼事情會變成這樣，我感到很困惑。', 'answer': '困惑'},
 {'question': '雖然努力了這麼久，但結果卻不盡如人意，我該怎麼辦？', 'answer': '矛盾'},
 {'question': '看著舊照片，不禁感慨時間過得真快。', 'answer': '感慨'},
 {'question': '面對未知的前景，我心裡七上八下的。', 'answer': '忐忑'},
 {'question': '經過深思熟慮後，我終於放下了心中的執念。', 'answer': '釋然'},
 {'question': '站在高山之巔，我感受到大自然的偉大力量。', 'answer': '敬畏'},
 {'question': '看到街頭流浪的人們，我心生憐憫。', 'answer': '憐憫'},
 {'question': '那段時光已逝，但回憶卻永遠留在心中。', 'answer': '懷舊'},
 {'question': '原本計劃好的行程被取消了，真是失落。', 'answer': '失落'},
 {'question': '聽到他的故事，我感到深深的心酸。', 'answer': 

## 3. 使用 DSPy 最佳化 few-shot prompt

In [None]:
!pip install dspy-ai

Collecting dspy-ai
  Downloading dspy_ai-2.5.0-py3-none-any.whl.metadata (40 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/40.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.2/40.2 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting backoff (from dspy-ai)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Collecting datasets (from dspy-ai)
  Downloading datasets-3.0.1-py3-none-any.whl.metadata (20 kB)
Collecting optuna (from dspy-ai)
  Downloading optuna-4.0.0-py3-none-any.whl.metadata (16 kB)
Collecting structlog (from dspy-ai)
  Downloading structlog-24.4.0-py3-none-any.whl.metadata (7.3 kB)
Collecting ujson (from dspy-ai)
  Downloading ujson-5.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.3 kB)
Collecting magicattr~=0.1.6 (from dspy-ai)
  Downloading magicattr-0.1.6-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting diskcache (from dspy-ai)
  Dow

In [None]:
import dspy
from dspy.teleprompt import MIPROv2
from dspy.evaluate import answer_exact_match

prompt_llm = dspy.OpenAI(
    model=generation_model,
    api_key=os.environ['OPENAI_API_KEY']
)

task_llm = dspy.OpenAI(
    model=prediction_model,
    api_key=os.environ['OPENAI_API_KEY']
)

dspy.settings.configure(lm=task_llm)

如果你用 gpt-4o 跑，dataset 格式可能會跟我用 o1-preview 跑出來不一樣，以下要自己更正一下:

In [None]:
few_shot_examples = [
    dspy.Example({'question': q["question"], 'answer': q["answer"]}) for q in dataset
]

trainset = [x.with_inputs('question') for x in few_shot_examples]

DSPy 的 code: 定義 prompt 的結構

In [None]:
class QuestionLabel(dspy.Signature):
    question = dspy.InputField(desc="The input question to be categorized")
    answer = dspy.OutputField(desc="The assigned category or label for the question")

class QuestionClassification(dspy.Module):
    def __init__(self):
        super().__init__()
        self.classifier = dspy.Predict(QuestionLabel)

    def forward(self, question: str):
        return self.classifier(question=question)

MIProv2 優化器的文件: https://dspy-docs.vercel.app/docs/deep-dive/optimizers/miprov2

評估採用內建的 answer_exact_match 方法，有標準答案，就檢查是否一模一樣即可

In [None]:
teleprompter = MIPROv2(prompt_model=prompt_llm, task_model=task_llm, metric=answer_exact_match, num_candidates=10, init_temperature=1, verbose=True)

# 開始跑最佳化迭代
compiled_program = teleprompter.compile(QuestionClassification(), trainset=trainset, requires_permission_to_run=False)

Beginning MIPROv2 optimization process...

==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
These will be used for as few-shot examples candidates for our program and for creating instructions.

Bootstrapping N=10 sets of demonstrations...
Bootstrapping set 1/10
Bootstrapping set 2/10
Bootstrapping set 3/10


 70%|███████   | 7/10 [00:03<00:01,  2.18it/s]


Bootstrapped 4 full traces after 8 examples in round 0.
Bootstrapping set 4/10


 70%|███████   | 7/10 [00:03<00:01,  2.02it/s]


Bootstrapped 4 full traces after 8 examples in round 0.
Bootstrapping set 5/10


 20%|██        | 2/10 [00:01<00:04,  1.88it/s]


Bootstrapped 2 full traces after 3 examples in round 0.
Bootstrapping set 6/10


 10%|█         | 1/10 [00:00<00:04,  1.86it/s]


Bootstrapped 1 full traces after 2 examples in round 0.
Bootstrapping set 7/10


 30%|███       | 3/10 [00:01<00:03,  2.18it/s]


Bootstrapped 2 full traces after 4 examples in round 0.
Bootstrapping set 8/10


 20%|██        | 2/10 [00:00<00:03,  2.06it/s]


Bootstrapped 2 full traces after 3 examples in round 0.
Bootstrapping set 9/10


 50%|█████     | 5/10 [00:02<00:02,  2.02it/s]


Bootstrapped 3 full traces after 6 examples in round 0.
Bootstrapping set 10/10


 10%|█         | 1/10 [00:00<00:05,  1.72it/s]


Bootstrapped 1 full traces after 2 examples in round 0.

==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
In this step, by default we will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
SOURCE CODE: QuestionLabel(question -> answer
    instructions='Given the fields `question`, produce the fields `answer`.'
    question = Field(annotation=str required=True json_schema_extra={'desc': 'The input question to be categorized', '__dspy_field_type': 'input', 'prefix': 'Question:'})
    answer = Field(annotation=str required=True json_schema_extra={'desc': 'The assigned category or label for the question', '__dspy_field_type': 'output', 'prefix': 'Answer:'})
)



class QuestionClassification(dspy.Module):
    def __init__(self):
        super().__init__()
        self.classifier = dspy.Predict(QuestionLabel)

    def forward(self, question: str):
        return self.class

Average Metric: 0 / 40  (0.0): 100%|██████████| 40/40 [00:03<00:00, 10.10it/s]


Default program score: 0.0

==> STEP 3: FINDING OPTIMAL PROMPT PARAMETERS <==
In this step, we will evaluate the program over a series of trials with different combinations of instructions and few-shot examples to find the optimal combination. Bayesian Optimization will be used for this search process.

== Minibatch Trial 1 / 30 ==
Evaluating the following candidate program...

Predictor 0
i: The dataset consists of Mandarin Chinese sentences, each describing a scenario or context that evokes a specific emotional response. The aim is to associate each scenario with a corresponding emotional state. Each scenario is concise, declarative, and straightforward, while the responses are single-word labels representing the emotion.

PROGRAM CODE:
```python
QuestionLabel(question -> answer
    instructions='Given the fields `question`, produce the fields `answer`.'
    question = Field(annotation=str required=True json_schema_extra={'desc': 'The input question to be categorized', '__dspy_field_

Average Metric: 12 / 25  (48.0): 100%|██████████| 25/25 [00:04<00:00,  5.46it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 636.85it/s]





The dataset consists of Mandarin Chinese sentences, each describing a scenario or context that evokes a specific emotional response. The aim is to associate each scenario with a corresponding emotional state. Each scenario is concise, declarative, and straightforward, while the responses are single-word labels representing the emotion.

PROGRAM CODE:
```python
QuestionLabel(question -> answer
    instructions='Given the fields `question`, produce the fields `answer`.'
    question = Field(annotation=str required=True json_schema_extra={'desc': 'The input question to be categorized', '__dspy_field_type': 'input', 'prefix': 'Question:'})
    answer = Field(annotation=str required=True json_schema_extra={'desc': 'The assigned category or label for the question',

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋然

Average Metric: 20 / 25  (80.0): 100%|██████████| 25/25 [00:02<00:00, 10.90it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00,  2.15it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**. 2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion. 3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋然

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answe

Average Metric: 19 / 25  (76.0): 100%|██████████| 25/25 [00:02<00:00,  9.88it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 174.60it/s]





Given the input `question` which is a sentence in Mandarin Chinese describing a situation, produce the corresponding `answer` which is a single word in Mandarin Chinese that represents the emotional response elicited by the situation. For example, if the input sentence describes a scenario that makes someone feel relieved, the output should be the word for "relief" in Mandarin. Make sure to accurately interpret the emotional context of the input sentence and provide the most appropriate emotional label.

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 明天就要考試了，我好緊張啊！
Answer: 忐忑

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 看到壯麗的山河，我對大自然充滿了敬畏。
Answer: 敬畏

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋然

---

Question: 我不知道該選擇哪個，好像每個選項都有吸引

Average Metric: 17 / 25  (68.0): 100%|██████████| 25/25 [00:02<00:00,  8.38it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00,  2.47it/s]





The dataset consists of Mandarin Chinese sentences that describe situations or contexts leading to specific emotional responses. Each input sentence is simple and declarative, and the goal is to map these sentences to corresponding emotional states, reported as single-word responses.

PROGRAM CODE:
```python
QuestionLabel(question -> answer
    instructions='Given the fields `question`, produce the fields `answer`.'
    question = Field(annotation=str required=True json_schema_extra={'desc': 'The input question to be categorized', '__dspy_field_type': 'input', 'prefix': 'Question:'})
    answer = Field(annotation=str required=True json_schema_extra={'desc': 'The assigned category or label for the question', '__dspy_field_type': 'output', '

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 看

Average Metric: 18 / 25  (72.0): 100%|██████████| 25/25 [00:02<00:00, 11.51it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 513.50it/s]





1. **Topic and Content**: - The dataset consists of **emotional responses**. Each input sentence provides a scenario or context leading to a specific emotion. - It is designed to map situations or descriptions to corresponding **emotional states**. 2. **Syntax**: - Input sentences are in **Mandarin Chinese**. - Sentences are simple and declarative, offering a natural context for emotion elicitation. 3. **Conciseness**: - Responses are **single words**, representing the emotion elicited by the scenario. - Sentences are concise and direct.

PROGRAM CODE:
```python
QuestionLabel(question -> answer
    instructions='Given the fields `question`, produce

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋然

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 明天就要考試了，我好

Average Metric: 20 / 25  (80.0): 100%|██████████| 25/25 [00:03<00:00,  8.28it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00,  2.39it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

PROGRAM CODE

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 看到壯麗的山河，我對大自然充滿了敬畏。
Answer: 敬畏

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 事情終於解決了，我

Average Metric: 20 / 25  (80.0): 100%|██████████| 25/25 [00:02<00:00, 10.55it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 685.46it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point, often

PROGRAM

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 聽到這個消息，我的心裡酸

Average Metric: 20 / 25  (80.0): 100%|██████████| 25/25 [00:02<00:00, 12.35it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 362.86it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**. 2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion. 3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋然

---

Question: 聽到這個消息，我的心裡酸酸的。
Answer: 心酸



Average Metric: 18 / 25  (72.0): 100%|██████████| 25/25 [00:03<00:00,  7.36it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 653.01it/s]





Given the fields `question`, produce the fields `answer`.

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋然

---

Question: 明天就要考試了，我好緊張啊！
Answer: 忐忑

---

Question: 我不知道該選擇哪個，好像每個選項都有吸引我的地方。
Answer: 矛盾

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 聽到這個消息，我的心裡酸酸的。
Answer: 心酸

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 看到壯麗的山河，我對大自然充滿了敬畏。
Answer: 敬畏

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 雖然努力了這麼久，但結果卻不盡如人意，我該怎麼辦？
Answer:[32m 失望[0m






Given the fields `question`, produce the fields `answer`.

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 事情終於解決了，我感到很釋然。
Answe

Average Metric: 19 / 25  (76.0): 100%|██████████| 25/25 [00:02<00:00, 12.10it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 418.38it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point, often

PROGRAM

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋然

---

Question: 明天就要考試了，我好緊張啊！
Answer: 忐忑

---

Question: 我不知道該選擇哪個，好像每個選項都有

Average Metric: 32 / 40  (80.0): 100%|██████████| 40/40 [00:01<00:00, 28.70it/s]


[92mBest full eval score so far![0m Score: 80.0


== Minibatch Trial 11 / 30 ==
Evaluating the following candidate program...

Predictor 0
i: 1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**. 2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion. 3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point
p: Answer:




Average Metric: 20 / 25  (80.0): 100%|██████████| 25/25 [00:00<00:00, 660.97it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 542.74it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**. 2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion. 3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋然

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answe

Average Metric: 19 / 25  (76.0): 100%|██████████| 25/25 [00:00<00:00, 32.64it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 358.06it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

PROGRAM CODE

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 看到壯麗的山河，我對大自然充滿了敬畏。
Answer: 敬畏

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 事情終於解決了，我

Average Metric: 15 / 25  (60.0): 100%|██████████| 25/25 [00:02<00:00, 11.64it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 665.87it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**. 2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion. 3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 雖然努力了這麼久，但結果卻不盡如人意，我該怎麼辦？
Answer:[32m 失落[0m






1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains 

Average Metric: 16 / 25  (64.0): 100%|██████████| 25/25 [00:02<00:00, 10.97it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 704.57it/s]





1. **Topic and Content**: All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: Each input example is a **sentence** written in **Mandarin Chinese**. The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. The sentences themselves are concise and to the point.

PROGRAM CODE:
```python
QuestionLabel

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 看到壯麗的山河，我對大自然充滿了敬畏。
Answer: 敬畏

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Questi

Average Metric: 19 / 25  (76.0): 100%|██████████| 25/25 [00:03<00:00,  7.85it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 759.84it/s]





1. **Topic and Content**: All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: Each input example is a **sentence** written in **Mandarin Chinese**. The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. The sentences themselves are concise and to the point.

PROGRAM CODE:
```python
QuestionLabel

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 看到壯麗的山河，我對大自然充滿了敬畏。
Answer: 敬畏

---

Questi

Average Metric: 19 / 25  (76.0): 100%|██████████| 25/25 [00:02<00:00, 11.09it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 399.00it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

PROGRAM CODE

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋然

---

Question: 聽到這個消息，我的心裡酸酸的

Average Metric: 20 / 25  (80.0): 100%|██████████| 25/25 [00:02<00:00, 10.77it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00,  2.16it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

PROGRAM CODE

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋然

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 回想起小時候的日子，

Average Metric: 17 / 25  (68.0): 100%|██████████| 25/25 [00:02<00:00, 10.98it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 705.99it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**. 2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion. 3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 看到壯麗的山河，我對大自然充滿了敬畏。
Answer: 敬畏

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 事情終於解決了，我感到很釋然。
Answer: 

Average Metric: 18 / 25  (72.0): 100%|██████████| 25/25 [00:02<00:00,  9.31it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 606.99it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**. 2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion. 3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 看到壯麗的山河，我對大自然充滿了敬畏。
Answer: 敬畏

---

Question: 聽到這個消息，我的心裡酸酸的。
Answer: 

Average Metric: 22 / 25  (88.0): 100%|██████████| 25/25 [00:02<00:00, 11.59it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00,  2.17it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

PROGRAM CODE

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到

Average Metric: 33 / 40  (82.5): 100%|██████████| 40/40 [00:01<00:00, 31.63it/s]


[92mBest full eval score so far![0m Score: 82.5


== Minibatch Trial 21 / 30 ==
Evaluating the following candidate program...

Predictor 0
i: 1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

PROGRAM CODE
p: Answer:




Average Metric: 21 / 25  (84.0): 100%|██████████| 25/25 [00:00<00:00, 611.69it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 585.88it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

PROGRAM CODE

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到

Average Metric: 20 / 25  (80.0): 100%|██████████| 25/25 [00:00<00:00, 818.57it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 229.15it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

PROGRAM CODE

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到

Average Metric: 10 / 25  (40.0): 100%|██████████| 25/25 [00:07<00:00,  3.55it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00,  2.10it/s]





The dataset consists of Mandarin Chinese sentences that describe situations or contexts leading to specific emotional responses. Each input sentence is simple and declarative, and the goal is to map these sentences to corresponding emotional states, reported as single-word responses.

PROGRAM CODE:
```python
QuestionLabel(question -> answer
    instructions='Given the fields `question`, produce the fields `answer`.'
    question = Field(annotation=str required=True json_schema_extra={'desc': 'The input question to be categorized', '__dspy_field_type': 'input', 'prefix': 'Question:'})
    answer = Field(annotation=str required=True json_schema_extra={'desc': 'The assigned category or label for the question', '__dspy_field_type': 'output', '

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 為什

Average Metric: 22 / 25  (88.0): 100%|██████████| 25/25 [00:00<00:00, 564.91it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 729.19it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

PROGRAM CODE

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到

Average Metric: 10 / 25  (40.0): 100%|██████████| 25/25 [00:06<00:00,  3.68it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 289.06it/s]





The dataset consists of Mandarin Chinese sentences, each describing a scenario or context that evokes a specific emotional response. The aim is to associate each scenario with a corresponding emotional state. Each scenario is concise, declarative, and straightforward, while the responses are single-word labels representing the emotion.

PROGRAM CODE:
```python
QuestionLabel(question -> answer
    instructions='Given the fields `question`, produce the fields `answer`.'
    question = Field(annotation=str required=True json_schema_extra={'desc': 'The input question to be categorized', '__dspy_field_type': 'input', 'prefix': 'Question:'})
    answer = Field(annotation=str required=True json_schema_extra={'desc': 'The assigned category or label for the question',

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失

Average Metric: 21 / 25  (84.0): 100%|██████████| 25/25 [00:02<00:00,  9.80it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00,  2.10it/s]





Given the input `question` which is a sentence in Mandarin Chinese describing a situation, produce the corresponding `answer` which is a single word in Mandarin Chinese that represents the emotional response elicited by the situation. For example, if the input sentence describes a scenario that makes someone feel relieved, the output should be the word for "relief" in Mandarin. Make sure to accurately interpret the emotional context of the input sentence and provide the most appropriate emotional label.

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋然

---

Question: 我不知道該選擇哪個，好像每個選項都有吸引我的地方。
Answer: 矛盾

---

Question: 明天就要考試了，我好緊張啊！
Answer: 忐忑

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 聽到這個消息，我的心裡

Average Metric: 22 / 25  (88.0): 100%|██████████| 25/25 [00:02<00:00, 11.55it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 542.74it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**. 2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion. 3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋

Average Metric: 19 / 25  (76.0): 100%|██████████| 25/25 [00:02<00:00, 12.01it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 507.60it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**. 2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion. 3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 明天就要考試了，我好緊張啊！
Answer: 忐忑

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 

Average Metric: 21 / 25  (84.0): 100%|██████████| 25/25 [00:01<00:00, 23.65it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00, 624.25it/s]





1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**. 2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion. 3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋

Average Metric: 23 / 25  (92.0): 100%|██████████| 25/25 [00:02<00:00, 11.72it/s]


Full trace of prompts in use on an example...


Average Metric: 0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00,  2.65it/s]





Given the fields `question`, produce the fields `answer`.

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到很釋然。
Answer: 釋然

---

Question: 我不知道該選擇哪個，好像每個選項都有吸引我的地方。
Answer: 矛盾

---

Question: 明天就要考試了，我好緊張啊！
Answer: 忐忑

---

Question: 每當聽到這首歌，我就想起過去的美好時光。
Answer: 懷舊

---

Question: 聽到這個消息，我的心裡酸酸的。
Answer: 心酸

---

Question: 看到流浪的小狗，我感到很憐憫。
Answer: 憐憫

---

Question: 看到壯麗的山河，我對大自然充滿了敬畏。
Answer: 敬畏

---

Question: 雖然努力了這麼久，但結果卻不盡如人意，我該怎麼辦？
Answer:[32m 失望[0m






Given the fields `question`, produce the fields `answer`.

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 我的申請被拒絕了，感覺很失落。
A

Average Metric: 31 / 40  (77.5): 100%|██████████| 40/40 [00:01<00:00, 29.66it/s]

Full eval score: 77.5
Best full eval score so far: 82.5







## 4. 最後結果

DSPy 不只是最佳化工具，本身就是一個執行框架

In [None]:
compiled_program('班長有什麼了不起，我小學也當過班長啊!')

Prediction(
    answer='嫉妒'
)

可以把最佳化參數存下來，下次載入使用

In [None]:
compiled_program.save("compiled_program.json")

[('classifier', Predict(StringSignature(question -> answer
    instructions='1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.\n2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.\n3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.\n\nPROGRAM CODE'
    question = Field(annotation=str required=True json_schema_extra={'desc': 'The input question to be categorized', '__dspy_field_type': 'input', 'prefix': 'Question:'})
    answer = Field(annotation=str required=True json

觀察最後的最佳化 prompt:

In [None]:
print( task_llm.inspect_history(1) )




1. **Topic and Content**: - All examples revolve around **emotional responses**. Each input sentence contains a scenario or context that leads to a specific emotion. - The dataset seems aimed at mapping situations or descriptions to corresponding **emotional states**.
2. **Syntax**: - Each input example is a **sentence** written in **Mandarin Chinese**. - The structure of the sentences is fairly simple and primarily declarative, providing a natural context for the emotion.
3. **Conciseness**: - The responses (answers) are **single words**, representing the emotion elicited by the situation described in the sentence. - The sentences themselves are concise and to the point.

PROGRAM CODE

---

Follow the following format.

Question: The input question to be categorized
Answer: The assigned category or label for the question

---

Question: 回想起小時候的日子，真的讓人感慨萬千。
Answer: 感慨

---

Question: 我的申請被拒絕了，感覺很失落。
Answer: 失落

---

Question: 為什麼事情會變成這樣，我感到很困惑。
Answer: 困惑

---

Question: 事情終於解決了，我感到