<table style="width:100%">
<tr>
<td style="vertical-align:middle; text-align:left;">
<font size="2">
Supplementary code for the <a href="http://mng.bz/orYv">Build a Large Language Model From Scratch</a> book by <a href="https://sebastianraschka.com">Sebastian Raschka</a><br>
<br>Code repository: <a href="https://github.com/rasbt/LLMs-from-scratch">https://github.com/rasbt/LLMs-from-scratch</a>
</font>
</td>
<td style="vertical-align:middle; text-align:left;">
<a href="http://mng.bz/orYv"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp" width="100px"></a>
</td>
</tr>
</table>

# 使用 GPT-4 通过 Reflection-Tuning 改进指令数据

- 本笔记本使用 OpenAI 的 GPT-4 API 实现了 [Reflection-Tuning：数据回收改进了 LLM 指令调整](https://arxiv.org/abs/2310.11716) 论文中的数据集细化过程

![](https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/reflection-tuning/reflection-tuning.webp)

- 在原始论文中，研究人员细化了 [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca) 和 [WizardLM](https://huggingface.co/datasets/WizardLMTeam/WizardLM_evol_instruct_70k) 指令微调数据集；在此笔记本中，我们改进了第 7 章中使用的 [指令数据集](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/01_main-chapter-code/instruction-data.json)（但是，由于它与 Alpaca 的格式相同，因此相同的代码也适用于 Alpaca 数据集）

- 预期的数据集格式如下：

```python
    {
        "instruction": "Edit the following sentence for grammar.",
        "input": "He go to the park every day.",
        "output": "He goes to the park every day."
    },
    {
        "instruction": "Convert 45 kilometers to meters.",
        "input": "",
        "output": "45 kilometers is 45000 meters."
    },
```

> 请注意，此笔记本复制了论文中作者使用 GPT API 来增强现有数据集的方法。但是，请务必注意，GPT API 生成的数据不得用于开发与 OpenAI 竞争的模型，如 [OpenAI 使用条款](https://openai.com/policies/row-terms-of-use/) 中所述：“您不能做的事情... 使用输出来开发与 OpenAI 竞争的模型。”
您可以在 [此处](https://www.reddit.com/r/LocalLLaMA/comments/17vbg1f/does_openai_tos_prohibit_generating_datasets_for/)) 找到相关讨论。

In [1]:
# pip install -r requirements-extra.txt

In [2]:
from importlib.metadata import version

pkgs = [
    "openai",  # OpenAI API
    "tqdm",    # Progress bar
]

for p in pkgs:
    print(f"{p} version: {version(p)}")

openai version: 1.30.3
tqdm version: 4.66.4


## 测试 OpenAI API

- 首先，让我们测试一下 OpenAI API 是否设置正确
- 如果您还没有帐户，则需要在 https://platform.openai.com/ 上创建一个
- 请注意，您还必须将一些资金转入您的帐户，因为 GPT-4 API 不是免费的（请参阅 https://platform.openai.com/settings/organization/billing/overview）
- 截至撰写本文时，使用 GPT-4o-mini 运行本笔记本中显示的代码的成本约为 \0.03 美元（3 美分）
- 将上述两种方法应用于第 7 章指令数据集中的所有 1100 个条目的成本约为 \0.60 美元（60 美分）

- 首先，我们需要提供我们的 OpenAI API 密钥，可在 https://platform.openai.com/api-keys 找到
- 确保不要与任何人共享此密钥
- 将此密钥（`"sk-..."`）添加到此文件夹中的 `config.json` 文件中

In [3]:
import json
from openai import OpenAI

# Load API key from a JSON file.
# Make sure to replace "sk-..." with your actual API key from https://platform.openai.com/api-keys
with open("config.json", "r") as config_file:
    config = json.load(config_file)
    api_key = config["OPENAI_API_KEY"]

client = OpenAI(api_key=api_key)

- 首先，让我们用一个简单的示例尝试该 API，以确保它能按预期工作：

In [4]:
def run_chatgpt(prompt, client, model="gpt-4o-mini", system_prompt=None):
    # Define the system message if a system_prompt is provided
    messages = []
    
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    
    # Add the user prompt to the messages
    messages.append({"role": "user", "content": prompt})

    # Call the API
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.0,
        seed=123,
    )
    
    # Return the model's response
    return response.choices[0].message.content


prompt = f"Respond with 'hello world' if you got this message."
run_chatgpt(prompt, client)

'hello world'

## 加载 JSON 条目

- 接下来，让我们加载和处理指令数据集
- 在这里，我们假设我们将测试数据集和模型响应保存为 JSON 文件，我们可以按如下方式加载该文件：

In [5]:
from pathlib import Path


json_file = Path("..") / "01_main-chapter-code" / "instruction-data.json"

with open(json_file, "r") as file:
    json_data = json.load(file)

print("Number of entries:", len(json_data))

Number of entries: 1100


- 让我们打印其中一个数据集条目来查看其结构：

In [6]:
from pprint import pp as pprint

pprint(json_data[0])

{'instruction': 'Evaluate the following phrase by transforming it into the '
                'spelling given.',
 'input': 'freind --> friend',
 'output': 'The spelling of the given phrase "freind" is incorrect, the '
           'correct spelling is "friend".'}


## 改进指令

- Reflection-Tuning 的作者分享了两种方法：(1) 改进指令和 (2) 改进响应
- 让我们从改进给定数据集中的指令开始
- 下面是来自 [Reflection-Tuning 存储库](https://github.com/tianyi-lab/Reflection_Tuning/blob/main/reflection_code/reflect_response.py) 的一个小实用函数，用于格式化 GPT-4 模型的输入以进行此数据集细化

In [7]:
def instr_prompt_no_input(ins, outp):

    sys_prompt = "You are a helpful, precise but picky assistant for checking the quality of a given instruction."
    prompt_template = "[Instruction]\n{ins}\n\n[The Start of Answer]\n{outp}\n\n[The End of Answer]\n\n[System]\n{criteria}\n\n"
    criteria = "We would like you to answer several questions related to the quality of a given instruction. \n" + \
                "1. Why this instruction is not good? First analyse the instruction based on Complexity of the Topic, Level of Detail Required, Knowledge Required, Ambiguity of the Instruction and Logical Reasoning or Problem-Solving Involved. \n" + \
                "Then analyse why this answer is not good for the given instruction? Analyse based on the Helpfulness, Relevance, Accuracy and Level of Details. \n" + \
                "Finally analyse why this bad instruction lead to a bad answer. " +\
                "2. Based on the reason you provided, generate a new and complete instruction which is complex and difficult to answer directly. " + \
                "Make sure the new instruction is relevent but independent to the original instruction, which can be answered without knowing the original instruction, put the new instruction in the format of [New Instruction] your instruction [End]" +\
                "3. Answer the newly generated instruction as detailed as possible, in the format of [New Answer] your answer [End] \n"
    prompt = prompt_template.format(
        ins=ins, outp=outp, criteria=criteria
    )
    return sys_prompt, prompt

- 要了解其工作原理，请考虑数据集条目, `json_data[2]`

In [8]:
pprint(json_data[2])

{'instruction': 'Convert 45 kilometers to meters.',
 'input': '',
 'output': '45 kilometers is 45000 meters.'}


- 我们可以使用上面定义的 `instr_prompt_no_input` 函数按如下方式细化指令：

In [None]:
entry = json_data[2]

system_prompt, prompt = instr_prompt_no_input(ins=entry["instruction"], outp=entry["output"])
output = run_chatgpt(prompt=prompt, client=client, system_prompt=system_prompt)

print(output)

1. **Analysis of the Instruction:**

   - **Complexity of the Topic:** The topic of converting kilometers to meters is relatively simple and straightforward, as it involves basic unit conversion.
   - **Level of Detail Required:** The instruction does not require much detail; it simply asks for a conversion without any additional context or explanation.
   - **Knowledge Required:** Basic knowledge of metric units and their conversions is required, which is common knowledge.
   - **Ambiguity of the Instruction:** The instruction is clear and unambiguous; it specifies exactly what needs to be converted.
   - **Logical Reasoning or Problem-Solving Involved:** There is minimal logical reasoning involved, as the conversion factor (1 kilometer = 1000 meters) is a standard fact.

   **Analysis of the Answer:**

   - **Helpfulness:** The answer is helpful in that it provides the correct conversion.
   - **Relevance:** The answer is relevant to the instruction, as it directly addresses the conv

- 响应非常详细，这对于分析目的很有用；此外，它还有助于 GPT-4 模型通过思路链提示方法进行改进
- 但是，为了构建改进的数据集，我们实际上只对新指令和输出感兴趣，而不是分析
- 我们可以使用 [Reflection-Tuning 存储库](https://github.com/tianyi-lab/Reflection_Tuning/blob/main/reflection_code/reflect_response.py) 中的以下实用程序代码来提取模型改进的指令和输出

In [None]:
import re

def extract_ins(text, no_input=True):
    if '[New Instruction]' in text:
        pattern = r'(\[New Instruction\])(.*?)(\[End\]|\[New Answer\]|New Answer:)'
    else:
        pattern = r'(New Instruction:)(.*?)(\[End\]|\[New Answer\]|New Answer:)'
    segments = re.findall(pattern, text, re.DOTALL)
    if len(segments) == 0:
        seg_ins = ''
    else:
        seg_ins = segments[0][1].strip()
    if seg_ins.endswith("\n\n3."):
        seg_ins = seg_ins[:-4]
    return seg_ins


def extract_oup(text, no_input=True):
    if '[New Answer]' in text:
        pattern = r'(\[New Answer\])(.*?)(\[End\]|$)'
    else:
        pattern = r'(New Answer:)(.*?)(\[End\]|$)'
        # pattern = r'(\[New Answer\]|New Answer:)(.*?)(\[End\]|$)'
    segments = re.findall(pattern, text, re.DOTALL)
    if len(segments) == 0:
        seg_oup = ''
    else:
        seg_oup = segments[0][1].strip()
    return seg_oup


def extract_instruction(text):
    if text == '':
        return []
    seg_ins = extract_ins(text, no_input=True)
    seg_oup = extract_oup(text, no_input=True)
    return [seg_ins, seg_oup]

-让我们使用这些实用函数从之前生成的长 GPT-4 输出中提取改进的指令和响应：

In [None]:
new_instr, new_outp = extract_instruction(output)

In [None]:
print(new_instr)

Explain the significance of the metric system in global trade and provide examples of how unit conversions can impact international business transactions.


In [None]:
print(new_outp)

The metric system, also known as the International System of Units (SI), is a decimal-based system of measurement that is used globally. Its significance in global trade lies in its standardization, which facilitates international communication and commerce. 

   One of the primary advantages of the metric system is that it is universally recognized, which reduces confusion and errors in measurement. For example, when a company in the United States imports goods from Europe, the specifications for those goods are often provided in metric units. If the U.S. company is accustomed to using imperial units (like inches or pounds), they must convert these measurements to ensure compatibility. 

   Unit conversions can significantly impact international business transactions. For instance, if a manufacturer orders 100 kilograms of a product but mistakenly interprets it as 100 pounds, they will receive a much smaller quantity than intended, leading to production delays and financial losses. 



- 请注意，指令细化目前仅针对没有 `"input"` 字段的数据集条目实现

## 改善响应

- 以类似的方式，我们还可以将 Reflection-Tuning 细化过程专门应用于数据集响应（即“输出”字段）
- 以下是来自 [Reflection-Tuning 存储库](https://github.com/tianyi-lab/Reflection_Tuning/blob/main/reflection_code/reflect_response.py) 的两个小实用函数，用于格式化 GPT-4 模型的输入以进行数据集细化

In [None]:
def res_gen_prompt_no_input(ins, outp):

    sys_prompt = "You are a helpful, precise but picky assistant for checking the quality of the answer to a given instruction."
    prompt_template = "[Instruction]\n{ins}\n\n[The Start of Answer]\n{outp}\n\n[The End of Answer]\n\n[System]\n{criteria}\n\n"
    criteria = "We would like you to answer several questions related to the quality of the answer to the given instruction. \n" + \
                "1. Why this answer is not good for the given instruction? Analyse based on the Helpfulness, Relevance, Accuracy and Level of Details. \n" + \
                "2. Based on the reason you provided, generate a better answer, new and complete, as detailed as possible, in the format of [Better Answer] your answer [End] \n" 
    prompt = prompt_template.format(
        ins=ins, outp=outp, criteria=criteria
    )
    return sys_prompt, prompt


def res_gen_prompt_input(ins, inp, outp):

    sys_prompt = "You are a helpful and precise assistant for checking the quality of the answer to a given instruction and its input."
    prompt_template = "[Instruction]\n{ins}\n\n[The Start of Input]\n{inp}\n\n[The End of Input]\n\n[The Start of Answer]\n{outp}\n\n[The End of Answer]\n\n[System]\n{criteria}\n\n"
    criteria = "We would like you to answer several questions related to the quality of the answer to the given instruction and corresponding input. \n" + \
                "1. Why this answer is not good for the given instruction and corresponding input? Analyse based on the Helpfulness, Relevance, Accuracy and Level of Details. \n" + \
                "2. Based on the reason you provided, generate a better answer, new and complete, as detailed as possible, in the format of [Better Answer] your answer [End] \n" 
    prompt = prompt_template.format(
        ins=ins, inp=inp, outp=outp, criteria=criteria
    )
    return sys_prompt, prompt

- 再次，让我们将其应用到其中一个数据集条目中，看看它是如何工作的，并生成改进的响应：

In [15]:
entry = json_data[2]

system_prompt, prompt = res_gen_prompt_no_input(ins=entry["instruction"], outp=entry["output"])
output = run_chatgpt(prompt=prompt, client=client, system_prompt=system_prompt)

print(output)

1. The answer provided is not good for the given instruction for several reasons:

- **Helpfulness**: While the answer does provide the correct conversion, it lacks any explanation or context. A more helpful answer would include a brief explanation of the conversion process, which would aid understanding.

- **Relevance**: The answer is relevant in that it addresses the instruction to convert kilometers to meters, but it could be more relevant by including the conversion factor used (1 kilometer = 1000 meters).

- **Accuracy**: The answer is accurate in terms of the numerical conversion (45 kilometers = 45000 meters). However, it could be misleading if the reader does not understand how the conversion was derived.

- **Level of Details**: The answer is very brief and lacks detail. A more detailed response would include the conversion factor and a step-by-step explanation of how the conversion is performed.

2. [Better Answer] To convert kilometers to meters, you can use the conversion 

- 如上所示，响应包含对原始响应的分析；我们可以使用 [Reflection-Tuning 存储库](https://github.com/tianyi-lab/Reflection_Tuning/blob/main/reflection_code/reflect_response.py) 中的以下实用函数提取新响应

In [16]:
def extract_response(text):
    if text.count('[Better Answer]') >= 2:
        pattern = r'\[(Better Answer)\](.*?)(\[End\]|\[Better Answer\]|$)'
        segments = re.findall(pattern, text, re.DOTALL)
    else:
        # pattern = r'\[(Better Answer)\](.*?)\[End\]'
        pattern = r'\[(Better Answer)\](.*?)(\[End\]|End|$)'
        segments = re.findall(pattern, text, re.DOTALL)
    return [segment[1].strip() for segment in segments]

In [17]:
response = extract_response(output)[0]
print(response)

To convert kilometers to meters, you can use the conversion factor that 1 kilometer is equal to 1000 meters. Therefore, to convert 45 kilometers to meters, you multiply 45 by 1000. 

So, 45 kilometers × 1000 meters/kilometer = 45000 meters. 

Thus, 45 kilometers is equal to 45000 meters.


## 改进数据集

- 现在，让我们将指令反射和响应反射技术应用于实际数据集
- 注意：出于演示目的，我们仅将其应用于一小部分数据子集；要将其应用于整个数据集，请更改

```python
data_to_process = json_data[:3]
```

to

```python
data_to_process = json_data
```

### Reflect 指令

- 以下代码将用于数据集细化的Reflection-Tuning方法应用于原始数据集中的指令

In [18]:
data_to_process = json_data[:3]

In [19]:
from tqdm import tqdm


def reflect_instructions(json_data, client):
    new_json_data = [] 
    
    for entry in tqdm(json_data):
        
        if not entry["input"]:
            system_prompt, prompt = instr_prompt_no_input(ins=entry["instruction"], outp=entry["output"])
            output = run_chatgpt(prompt=prompt, client=client, system_prompt=system_prompt)
            new_instr, new_outp = extract_instruction(output)
            new_entry = {"instruction": new_instr, "input": "", "output": new_outp}
            new_json_data.append(new_entry)
        else:
            new_json_data.append(entry)

    return new_json_data

In [20]:
data_to_process = json_data[:3]

new_json_data = reflect_instructions(data_to_process, client)

100%|█████████████████████████████████████████████| 3/3 [00:06<00:00,  2.17s/it]


In [21]:
for i in new_json_data[:3]:
    pprint(i)
    print("\n\n")

{'instruction': 'Evaluate the following phrase by transforming it into the '
                'spelling given.',
 'input': 'freind --> friend',
 'output': 'The spelling of the given phrase "freind" is incorrect, the '
           'correct spelling is "friend".'}



{'instruction': 'Edit the following sentence for grammar.',
 'input': 'He go to the park every day.',
 'output': 'He goes to the park every day.'}



{'instruction': 'Explain the significance of understanding metric conversions '
                'in scientific research, and provide an example of how a '
                'miscalculation in unit conversion could impact experimental '
                'results.',
 'input': '',
 'output': 'Understanding metric conversions is crucial in scientific research '
           'because accurate measurements are fundamental to the validity of '
           'experimental results. The metric system is widely used in '
           'scientific disciplines due to its ease of use and universal '
    

- Let's save the new dataset:

In [22]:
with open("instruction-reflected.json", "w") as file:
    json.dump(new_json_data, file, indent=4)

### Reflect 响应

- 现在让我们对response-reflection做同样的事情：

In [23]:
data_to_process = json_data[:3]

In [24]:
def reflect_responses(json_data, client):
    new_json_data = [] 
    
    for entry in tqdm(json_data):
        
        if not entry["input"]:
            system_prompt, prompt = res_gen_prompt_no_input(ins=entry["instruction"], outp=entry["output"])
            output = run_chatgpt(prompt=prompt, client=client, system_prompt=system_prompt)
            new_response = extract_response(output)

            if not len(new_response):
                new_response = entry["output"]
                      
            new_entry = {"instruction": entry["instruction"], "input": "", "output": new_response[0]}
            new_json_data.append(new_entry)

        else:
            system_prompt, prompt = res_gen_prompt_input(ins=entry["instruction"], inp=entry["input"], outp=entry["output"])
            output = run_chatgpt(prompt=prompt, client=client, system_prompt=system_prompt)
            new_response = extract_response(output)

            if not len(new_response):
                new_response = entry["output"]

            new_entry = {"instruction": entry["instruction"], "input": entry["input"], "output": new_response[0]}
            new_json_data.append(new_entry)

    return new_json_data

In [25]:
new_json_data = reflect_responses(data_to_process, client)

100%|█████████████████████████████████████████████| 3/3 [00:07<00:00,  2.40s/it]


In [26]:
for i in new_json_data[:3]:
    pprint(i)
    print("\n\n")

{'instruction': 'Evaluate the following phrase by transforming it into the '
                'spelling given.',
 'input': 'freind --> friend',
 'output': 'The input phrase "freind" contains a spelling error. The correct '
           'transformation of the word is as follows: "freind" should be '
           'corrected to "friend." Therefore, the correct spelling is '
           '"friend."'}



{'instruction': 'Edit the following sentence for grammar.',
 'input': 'He go to the park every day.',
 'output': 'The original sentence "He go to the park every day" contains a '
           'grammatical error in the verb form. The correct form should be "He '
           'goes to the park every day." This is because the subject "He" is '
           'third person singular, and in English, the verb "to go" changes to '
           '"goes" when used with third person singular subjects. Therefore, '
           'the corrected sentence is grammatically accurate and maintains the '
           'original mea

- 保存新的数据集：

In [27]:
with open("response-reflected.json", "w") as file:
    json.dump(new_json_data, file, indent=4)

## 创建改进的指令数据

- 将上述两种方法应用于第 7 章指导数据集中的所有 1100 个条目，成本约为 \$0.60（60 美分）
- 为避免数据集文件充斥 GitHub 存储库，可从 Google Drive 获取生成的数据集文件：
  - [instruction-reflected.json](https://drive.google.com/file/d/1c1QnuTdt9nP1u51vBn4_b05mWR_ZNGBv/view?usp=sharing)
  - [response-reflected.json](https://drive.google.com/file/d/1RNckTZ2ELcdUoJtaylao6NvyZPMtNv1v/view?usp=sharing)