## 使用 Textgrad 最佳化 system prompt 🏆



這個 Notebook 使用 https://textgrad.com/ 進行 system prompt 的最佳化

* 輸入你的任務描述
* 就能產出很厲害 zero-shot prompt
* 用在沒有標準答案的場景，採用 LLM 自動評估

作者和演講投影片: ihower https://ihower.tw/blog/archives/12444

### 流程

1. 使用 o1-preview 合成訓練問題
2. 使用 gpt-4o 進行 textgrad 最佳化，採用 LLM-as-a-judge 自動化評估
3. 產生適合 gpt-4o-mini 的 system prompt

成本: 最佳化迭代大約要花5分鐘，耗費 USD 0.8 美金 (10個訓練範例)

## 0. 設定 OpenAI API key

請點 google colab 左邊側欄的鑰匙符號，新增密鑰，名稱是 openai_api_key，值就填 API key

In [None]:
from google.colab import userdata
import os
import json

os.environ["OPENAI_API_KEY"] = userdata.get('openai_api_key')

## 1. 設定參數

In [None]:
synthetic_model = "o1-preview" # 合成訓練問題的模型，若你沒有 o1 權限，請改用 gpt-4o"
generation_model = "gpt-4o" # 合成 prompt 的模型
prediction_model = "gpt-4o-mini" # 用來執行 prompt 的模型

task_description = "根據用戶輸入的專業領域，條列其中的關鍵知識重點" # 任務描述，請修改成你的任務

questions_num = 10  # 要合成多少訓練資料，跟花費的 API 成本有關，建議不要再少了，會 overfitting

# 用來評估答案好不好的 prompt，可以改，但請保留 [question_string] 字串
eval_prompt_template = """Here's a question: [question_string].
Evaluate any given answer to this question, be smart, logical, and very critical.
Just provide concise feedback."""

## 2. 合成最佳化需要的 dataset


In [None]:
synthetic_prompt = f"""You are tasked with creating a test dataset for an AI question-answering system. Your goal is to generate {questions_num} example questions based on a given task description. These questions should range from simple to complex, with the more difficult questions requiring reasoning and presenting a significant challenge.
Here are the guidelines for generating the questions:

Start with simple, straightforward questions and gradually increase the complexity.
Ensure that the more difficult questions require multi-step reasoning or in-depth knowledge.
Include a variety of question types (e.g., factual, analytical, hypothetical) relevant to the task description.
Ensure that all questions are directly related to the provided task description.

The task description you should base your questions on is as follows:
<task_description>
{task_description}
</task_description>

Please generate {questions_num} example questions based on this task description. Format your output as a JSON array of objects, where each object contains a 'question' key with the question text as its value, and an 'answer' key with the answer text as its value. The output should look like this:
[
{{"question": "Question 1 text here", "answer": "Answer 1 text here"}},
{{"question": "Question 2 text here", "answer": "Answer 2 text here"}},
...
]

Remember to increase the difficulty and complexity of the questions as you progress through the examples. The final few questions should be particularly challenging, requiring complex reasoning and demonstrating a high level of difficulty."""

In [None]:
!pip install litellm

from litellm import completion

Collecting litellm
  Downloading litellm-1.46.0-py3-none-any.whl.metadata (32 kB)
Collecting openai>=1.45.0 (from litellm)
  Downloading openai-1.45.0-py3-none-any.whl.metadata (22 kB)
Collecting python-dotenv>=0.2.0 (from litellm)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting tiktoken>=0.7.0 (from litellm)
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting httpx<1,>=0.23.0 (from openai>=1.45.0->litellm)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai>=1.45.0->litellm)
  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai>=1.45.0->litellm)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai>=1.45.0->litellm)
  Downloading h11-0.14.0-py3-none-any.whl.

In [None]:
messages = [
    { "content": synthetic_prompt, "role": "user"}
]

if not synthetic_model.startswith('o1'):
  response = completion(model=synthetic_model, messages=messages, response_format={ "type": "json_object" })
else:
  # o1 目前還不支援 json mode
  response = completion(model=synthetic_model, messages=messages)

response = response.choices[0].message.content
dataset = json.loads(response)

  Expected `CompletionTokensDetails` but got `dict` with value `{'reasoning_tokens': 4480}` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(


In [None]:
dataset

[{'question': '1. 这个任务的主要目标是什么？', 'answer': '根据用户输入的专业领域，列出其中的关键知识重点。'},
 {'question': '2. 如果用户输入“计算机科学”作为专业领域，应该列出哪些关键知识点？',
  'answer': '计算机科学的关键知识点包括算法、数据结构、计算机组成原理、操作系统、编程语言、数据库、网络、软件工程原理和人工智能等。'},
 {'question': '3. 如何判断某个知识点在一个专业领域内是关键的？',
  'answer': '可以根据其基础性、对多个子领域的相关性、在专业中使用的频率、作为高级主题的前提条件的作用，以及其在标准课程和认证中的包含情况来判断。'},
 {'question': '4. 如何确保列出的关键知识点清单是全面且准确的？',
  'answer': '可以参考权威来源，如学术课程、行业标准、专业指南、教科书、专家意见，并通过交叉检验多个可信资源来验证每个知识点的重要性。'},
 {'question': '5. 描述一种将专业领域的关键知识点进行结构化组织的方法。',
  'answer': '一种方法是将知识点分类到该专业的主要领域或子领域，按照从一般到具体的层次结构排列，或根据主题、概念或能力进行分组，以提供逻辑性的结构。'},
 {'question': '6. 为什么根据用户的特定需求或背景定制关键知识点列表很重要？',
  'answer': '因为不同的用户可能具有不同的专业水平、具体兴趣或特定应用，定制列表可以提高其相关性和实用性，增强其有效性和适用性。'},
 {'question': '7. 如果要为一个资源有限的专业小众领域列出关键知识点，你将如何着手？',
  'answer': '我会联系该领域的专家，审阅任何可用的文献或案例研究，分析相关的更广泛的领域以寻找重叠的知识点，并利用专业人士讨论相关主题的在线社区或论坛。'},
 {'question': '8. 讨论在识别跨学科专业领域的关键知识点时可能遇到的挑战，以及如何克服它们。',
  'answer': '挑战包括整合多个学科的概念的复杂性、术语或方法论的潜在冲突，以及所需知识的广度。克服这些挑战需要深入研究，与各学科专家合作，仔细综合信息，突出最关键的交叉点。'},
 {'que

## 3. 使用 Textgrad 最佳化 system prompt

In [None]:
!pip install textgrad

Collecting textgrad
  Downloading textgrad-0.1.5.tar.gz (65 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/65.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━[0m [32m61.4/65.7 kB[0m [31m2.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.7/65.7 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting datasets>=2.14.6 (from textgrad)
  Downloading datasets-3.0.0-py3-none-any.whl.metadata (19 kB)
Collecting diskcache>=5.6.3 (from textgrad)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting gdown>=5.2.0 (from textgrad)
  Downloading gdown-5.2.0-py3-none-any.whl.metadata (5.8 kB)
Collecting pyarrow>=15.0.0 (from datasets>=2.14.6->textgrad)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from dataset

In [None]:
import textgrad as tg
from textgrad.tasks import load_task

llm_engine = tg.get_engine(prediction_model, override=True )
tg.set_backward_engine(generation_model, override=True )

system_prompt = tg.Variable("You are a concise LLM.",
                            requires_grad=True,
                            role_description="system prompt to guide the LLM's reasoning strategy for accurate responses")

model = tg.BlackboxLLM(llm_engine, system_prompt=system_prompt)
optimizer = tg.TGD(parameters=list(model.parameters()))

In [None]:
# 開始跑最佳化迭代
for data in dataset:
    question_string = data["question"]
    question = tg.Variable(question_string, role_description="question to the LLM", requires_grad=False)

    optimizer.zero_grad()
    prediction = model(question)
    prediction.set_role_description("concise and accurate answer to the question")

    evaluation_instruction = eval_prompt_template.replace( '[question_string]', question_string)
    loss_fn = tg.TextLoss(evaluation_instruction)
    loss = loss_fn(prediction)

    loss.backward()
    optimizer.step()

INFO:textgrad:LLMCall function forward
INFO:textgrad:LLMCall function forward
INFO:textgrad:_backward_through_llm prompt
INFO:textgrad:_backward_through_llm gradient
INFO:textgrad:_backward_through_llm prompt
INFO:textgrad:_backward_through_llm gradient
INFO:textgrad:TextualGradientDescent prompt for update
INFO:textgrad:TextualGradientDescent optimizer response
INFO:textgrad:TextualGradientDescent updated text
INFO:textgrad:LLMCall function forward
INFO:textgrad:LLMCall function forward
INFO:textgrad:_backward_through_llm prompt
INFO:textgrad:_backward_through_llm gradient
INFO:textgrad:_backward_through_llm prompt
INFO:textgrad:_backward_through_llm gradient
INFO:textgrad:TextualGradientDescent prompt for update
INFO:textgrad:TextualGradientDescent optimizer response
INFO:textgrad:TextualGradientDescent updated text
INFO:textgrad:LLMCall function forward
INFO:textgrad:LLMCall function forward
INFO:textgrad:_backward_through_llm prompt
INFO:textgrad:_backward_through_llm gradient
INFO

In [None]:
# 輸出最終的 system prompt 結果
print(system_prompt.value)

You are a concise LLM that provides clear, specific, direct, and accurate information to help users solve problems or complete tasks. Follow these guidelines:

1. **Foundational and Industry-Relevant Knowledge**:
   - Prioritize critical information first.
   - Use consistent terminology and detail for each category.
   - Reference authoritative bodies and industry standards (e.g., ISO, ASTM).
   - Include high-impact journals, open-access journals, and preprint servers like arXiv and bioRxiv to capture the latest research.
   - Ensure data comprehensiveness and representativeness by selecting diverse publication types and geographic diversity.

2. **Domain Definition**:
   - Provide criteria for determining the domain's boundaries, such as scope of research topics, geographical limitations, or specific industry applications.

3. **Audience Understanding**:
   - Suggest methods like preliminary surveys, interviews, or focus groups to gather detailed information about the audience's bac

以下是這個 system prompt 中文翻譯供對照: https://chatgpt.com/share/66ea86da-4040-8008-a2f9-cc5806fa5f05