<table style="width:100%">
<tr>
<td style="vertical-align:middle; text-align:left;">
<font size="2">
Supplementary code for the <a href="http://mng.bz/orYv">Build a Large Language Model From Scratch</a> book by <a href="https://sebastianraschka.com">Sebastian Raschka</a><br>
<br>Code repository: <a href="https://github.com/rasbt/LLMs-from-scratch">https://github.com/rasbt/LLMs-from-scratch</a>
</font>
</td>
<td style="vertical-align:middle; text-align:left;">
<a href="http://mng.bz/orYv"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp" width="100px"></a>
</td>
</tr>
</table>

# Generating A Preference Dataset With Llama 3.1 70B And Ollama
# 使用 Llama 3.1 70B 和 Ollama 生成偏好数据集

- Preference finetuning is a process to align an instruction-finetuned LLM with human preferences
- 偏好微调是一个将指令微调的LLM与人类偏好对齐的过程

- There are multiple ways to create a dataset for preference finetuning an LLM
- 有多种方法可以创建用于LLM偏好微调的数据集

  1. We use the instruction-finetuned LLM to generate multiple responses and have humans rank them based on their preference and/or given preference criteria
  
  1. 我们使用指令微调的LLM生成多个回答，并让人类根据他们的偏好和/或给定的偏好标准对其进行排名

  2. We use the instruction-finetuned LLM to generate multiple responses and have LLMs rank them based on given preference criteria
  
  2. 我们使用指令微调的LLM生成多个回答，并让LLM根据给定的偏好标准对其进行排名

  3. We use an LLM to generate preferred and dispreferred responses given certain preference criteria
  
  3. 我们使用LLM根据特定的偏好标准生成首选和非首选的回答

- In this notebook, we consider approach 3
- 在本笔记本中，我们考虑方法3

- This notebook uses a 70-billion-parameter Llama 3.1-Instruct model through ollama to generate preference labels for an instruction dataset
- 本笔记本通过ollama使用700亿参数的Llama 3.1-Instruct模型为指令数据集生成偏好标签

- The expected format of the instruction dataset is as follows:
- 指令数据集的预期格式如下：

### Input

```json
[
    {
        "instruction": "What is the state capital of California?",
        "input": "",
        "output": "The state capital of California is Sacramento.",
    },
    {
        "instruction": "Provide a synonym for 'fast'.",
        "input": "",
        "output": "A synonym for 'fast' is 'quick'.",
    },
    {
        "instruction": "What is the capital of Greece?",
        "input": "",
        "output": "The capital of Greece is Athens.",

    },
...
]
```

The output dataset will look as follows, where more polite responses are preferred (`'chosen'`), and more impolite responses are dispreferred (`'rejected'`):

输出数据集将如下所示，其中更有礼貌的回答被标记为首选(`'chosen'`)，而不太有礼貌的回答被标记为非首选(`'rejected'`):

```json
[
    {
        "instruction": "What is the state capital of California?",
        "input": "",
        "output": "The state capital of California is Sacramento.",
        "rejected": "Look, the state capital of California is obviously Sacramento.",
        "chosen": "The state capital of California is Sacramento."
    },
    {
        "instruction": "Provide a synonym for 'fast'.",
        "input": "",
        "output": "A synonym for 'fast' is 'quick'.",
        "chosen": "A suitable alternative to 'fast' would be 'quick'.",
        "rejected": "A synonym for 'fast' is 'quick'."
    },
    {
        "instruction": "What is the capital of Greece?",
        "input": "",
        "output": "The capital of Greece is Athens.",
        "chosen": "I'd be happy to help! The capital of Greece is indeed Athens.",
        "rejected": "The capital of Greece is Athens."
    },
...
]
```

### Output




- The code doesn't require a GPU and runs on a laptop given enough RAM
- 代码不需要 GPU，只要有足够的内存就可以在笔记本电脑上运行

In [1]:
from importlib.metadata import version

pkgs = ["tqdm",    # Progress bar
        ]

for p in pkgs:
    print(f"{p} version: {version(p)}")

tqdm version: 4.66.4


## Installing Ollama and Downloading Llama 3.1
## 安装 Ollama 并下载 Llama 3.1

- Ollama is an application to run LLMs efficiently
- Ollama 是一个高效运行大语言模型的应用程序
- It is a wrapper around [llama.cpp](https://github.com/ggerganov/llama.cpp), which implements LLMs in pure C/C++ to maximize efficiency  
- 它是 [llama.cpp](https://github.com/ggerganov/llama.cpp) 的封装，llama.cpp 使用纯 C/C++ 实现大语言模型以最大化效率
- Note that it is a tool for using LLMs to generate text (inference), not training or finetuning LLMs
- 请注意，它是一个用于使用大语言模型生成文本(推理)的工具，而不是用于训练或微调大语言模型
- Prior to running the code below, install ollama by visiting [https://ollama.com](https://ollama.com) and following the instructions (for instance, clicking on the "Download" button and downloading the ollama application for your operating system)
- 在运行以下代码之前，请访问 [https://ollama.com](https://ollama.com) 并按照说明安装 ollama(例如，点击"Download"按钮并下载适用于您操作系统的 ollama 应用程序)

- For macOS and Windows users, click on the ollama application you downloaded; if it prompts you to install the command line usage, say "yes"
- 对于 macOS 和 Windows 用户，点击下载的 ollama 应用程序；如果提示安装命令行用法，请选择"是"
- Linux users can use the installation command provided on the ollama website
- Linux 用户可以使用 ollama 网站提供的安装命令

- In general, before we can use ollama from the command line, we have to either start the ollama application or run `ollama serve` in a separate terminal
- 通常，在使用命令行的 ollama 之前，我们需要启动 ollama 应用程序或在单独的终端中运行 `ollama serve`

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/ollama-eval/ollama-serve.webp?1">


- With the ollama application or `ollama serve` running, in a different terminal, on the command line, execute the following command to try out the 70-billion-parameter Llama 3.1 model 
- 在运行 ollama 应用程序或 `ollama serve` 的情况下，在另一个终端中，在命令行执行以下命令来尝试使用 700 亿参数的 Llama 3.1 模型

```bash
# 70B model
ollama run llama3.1:70b
```


The output looks like as follows:

输出如下所示：

```
$ ollama run llama3.1:70b
pulling manifest
pulling aa81b541aae6... 100% ▕████████████████▏ 39 GB
pulling 8cf247399e57... 100% ▕████████████████▏ 1.7 KB
pulling f1cd752815fc... 100% ▕████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████▏ 96 B
pulling 3c1c2d3df5b3... 100% ▕████████████████▏ 486 B
verifying sha256 digest
writing manifest
removing any unused layers
success
```

- Note that `llama3.1:70b` refers to the instruction finetuned 70-billion-parameter Llama 3.1 model
- 注意 `llama3.1:70b` 指的是经过指令微调的700亿参数的 Llama 3.1 模型

- Alternatively, you can also use the smaller, more resource-effiicent 8-billion-parameters Llama 3.1 model, by replacing `llama3.1:70b` with `llama3.1`
- 另外，你也可以使用更小、更节省资源的80亿参数的 Llama 3.1 模型，只需将 `llama3.1:70b` 替换为 `llama3.1`

- After the download has been completed, you will see a command line prompt that allows you to chat with the model
- 下载完成后，你会看到一个命令行提示符，可以用它与模型进行对话

- Try a prompt like "What do llamas eat?", which should return an output similar to the following:
- 试试输入类似 "What do llamas eat?" 这样的提示词，它会返回类似下面的输出：

```
>>> What do llamas eat?
Llamas are ruminant animals, which means they have a four-chambered 
stomach and eat plants that are high in fiber. In the wild, llamas 
typically feed on:
1. Grasses: They love to graze on various types of grasses, including tall 
grasses, wheat, oats, and barley.
```

- You can end this session using the input `/bye`
- 你可以使用输入 `/bye` 来结束会话

## Using Ollama's REST API
## 使用 Ollama 的 REST API

- Now, an alternative way to interact with the model is via its REST API in Python via the following function
- 现在，另一种与模型交互的方式是通过Python中的REST API使用以下函数
- Before you run the next cells in this notebook, make sure that ollama is still running, as described above, via
- 在运行本笔记本中的下一个单元格之前，请确保按照上述方式运行ollama，通过：
  - `ollama serve` in a terminal
  - 在终端中运行 `ollama serve`
  - the ollama application
  - ollama应用程序
- Next, run the following code cell to query the model
- 接下来，运行以下代码单元格来查询模型

- First, let's try the API with a simple example to make sure it works as intended:
- 首先，让我们用一个简单的例子来测试 API，确保它能按预期工作：

In [2]:
# 导入urllib.request库用于发送HTTP请求
import urllib.request
# 导入json库用于处理JSON数据
import json


def query_model(prompt, model="llama3.1:70b", url="http://localhost:11434/api/chat"):
    # 创建数据载荷字典
    data = {
        # 指定要使用的模型
        "model": model,
        # 消息列表,包含用户输入
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        # 设置模型选项
        "options": {
            "seed": 123,  # 随机种子
            "temperature": 0,  # 温度参数
        }
    }

    # 将字典转换为JSON字符串并编码为字节
    payload = json.dumps(data).encode("utf-8")

    # 创建请求对象,设置POST方法和必要的头部
    request = urllib.request.Request(url, data=payload, method="POST")
    request.add_header("Content-Type", "application/json")

    # 发送请求并获取响应
    response_data = ""
    with urllib.request.urlopen(request) as response:
        # 读取并解码响应
        while True:
            # 逐行读取响应
            line = response.readline().decode("utf-8")
            # 如果没有更多数据则退出循环
            if not line:
                break
            # 解析JSON响应
            response_json = json.loads(line)
            # 累加响应内容
            response_data += response_json["message"]["content"]

    # 返回完整的响应内容
    return response_data


# 测试查询模型
result = query_model("What do Llamas eat?")
# 打印结果
print(result)

Llamas are herbivores, which means they primarily eat plants and plant-based foods. Their diet consists of:

1. **Grasses**: Various types of grasses, including timothy grass, orchard grass, and brome grass.
2. **Hay**: High-quality hay, such as alfalfa or clover hay, is a staple in a llama's diet.
3. **Leaves**: Leaves from trees and shrubs, like willow, cottonwood, and mesquite, are also eaten.
4. **Fruits and vegetables**: Llamas enjoy fruits like apples, carrots, and sweet potatoes, as well as leafy greens like kale and spinach.
5. **Grains**: In moderation, llamas can eat grains like oats, barley, and corn.

It's essential to note that llamas have a unique digestive system, with a three-part stomach and a large cecum (a specialized part of the large intestine). This allows them to break down and extract nutrients from plant material more efficiently than many other animals.

A typical llama diet might consist of:

* 1-2% of their body weight in hay per day
* 0.5-1% of their body w

## Load JSON Entries
## 加载JSON条目

- 现在,让我们开始数据生成部分
- Now, let's get to the data generation part
- 在这个实践示例中,我们使用之前在第7章用于指令微调模型的`instruction-data.json`文件:
- Here, for a hands-on example, we use the `instruction-data.json` file that we originally used to instruction-finetune the model in chapter 7:

In [3]:
# 导入Path类用于处理文件路径
from pathlib import Path

# 设置JSON文件路径,指向上一级目录的instruction-data.json文件
json_file = Path("..", "01_main-chapter-code", "instruction-data.json")

# 打开并读取JSON文件
with open(json_file, "r") as file:
    json_data = json.load(file)

# 打印数据条目数量
print("Number of entries:", len(json_data))

Number of entries: 1100


- The structure of this file is as follows, where we have the given response in the test dataset (`'output'`) that we trained the model to generate via instruction finetuning based on the `'input'` and `'instruction'`
- 该文件的结构如下,其中包含了测试数据集中的给定响应(`'output'`),这是我们基于`'input'`和`'instruction'`通过指令微调训练模型生成的

In [4]:
# 打印第一个数据条目以查看数据结构
json_data[0]

{'instruction': 'Evaluate the following phrase by transforming it into the spelling given.',
 'input': 'freind --> friend',
 'output': 'The spelling of the given phrase "freind" is incorrect, the correct spelling is "friend".'}

- Below is a small utility function that formats the instruction and input:
- 下面是一个用于格式化指令和输入的小工具函数：

In [5]:
def format_input(entry):
    """
    格式化输入文本
    
    参数:
        entry (dict): 包含instruction和input字段的字典
        
    返回:
        str: 格式化后的文本,包含指令和输入(如果有)
    """
    instruction_text = (
        f"Below is an instruction that describes a task. Write a response that "
        f"appropriately completes the request."
        f"\n\n### Instruction:\n{entry['instruction']}"
    )

    input_text = f"\n\n### Input:\n{entry['input']}" if entry["input"] else ""
    instruction_text + input_text

    return instruction_text + input_text

- Now, let's try the ollama API to generate a `'chosen'` and `'rejected'` response for preference tuning a model
- 现在,让我们尝试使用ollama API来生成用于偏好调优模型的`'chosen'`和`'rejected'`响应
- Here, to for illustration purposes, we create answers that are more or less polite
- 在这里,为了说明目的,我们创建或多或少带有礼貌性的答案


In [6]:
# 导入random模块用于随机选择
import random


# 遍历前5个数据条目
for entry in json_data[:5]:
    
    # 随机选择是生成礼貌或不礼貌的回复
    politeness = random.choice(["polite", "impolite"])    
    # 构建提示语,要求模型根据原始输入和输出生成更礼貌或不礼貌的回复
    prompt = (
        f"Given the input `{format_input(entry)}` "
        f"and correct output `{entry['output']}`, "
        f"slightly rewrite the output to be more {politeness}."
        "Keep the modification minimal."
        "Only return return the generated response and nothing else."
    )
    # 打印原始数据集中的回复
    print("\nDataset response:")
    print(">>", entry['output'])
    # 打印生成的新回复(更礼貌或不礼貌)
    print(f"\n{politeness} response:")
    print(">>", query_model(prompt))    


Dataset response:
>> The spelling of the given phrase "freind" is incorrect, the correct spelling is "friend".

impolite response:
>> The spelling of the given phrase "freind" is flat out wrong, get it together, the correct spelling is "friend".

Dataset response:
>> He goes to the park every day.

polite response:
>> He goes to the park daily, if I'm not mistaken.

Dataset response:
>> 45 kilometers is 45000 meters.

polite response:
>> 45 kilometers is equivalent to 45000 meters.

Dataset response:
>> Although it was raining, they went for a walk.

polite response:
>> Although it was raining outside, they still decided to go for a walk.

Dataset response:
>> 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.

impolite response:
>> Here are your precious square numbers: 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.


- If we find that the generated responses above look reasonable, we can go to the next step and apply the prompt to the whole dataset
- 如果我们发现上面生成的响应看起来合理，我们就可以进入下一步，将提示应用到整个数据集

- Here, we add a `'chosen'` key for the preferred response and a `'rejected'` response for the dispreferred response
- 在这里，我们为首选响应添加一个`'chosen'`键，为非首选响应添加一个`'rejected'`键

In [7]:
# 导入随机数生成模块
import random
# 导入进度条显示模块
from tqdm import tqdm

# 定义生成模型响应的函数
def generate_model_responses(json_data):

    # 遍历数据集中的每个条目,显示进度条
    for i, entry in enumerate(tqdm(json_data, desc="Writing entries")):
        # 随机选择是生成礼貌或不礼貌的回答
        politeness = random.choice(["polite", "impolite"])    
        # 构建提示语,要求模型根据输入和正确输出生成更礼貌或不礼貌的回答
        prompt = (
            f"Given the input `{format_input(entry)}` "
            f"and correct output `{entry['output']}`, "
            f"slightly rewrite the output to be more {politeness}."
            "Keep the modification minimal."
            "Only return return the generated response and nothing else."
        )
        # 调用模型生成回答
        response = query_model(prompt)
        
        # 如果是礼貌回答,将其设为chosen,原回答设为rejected
        if politeness == "polite":
            json_data[i]["chosen"] = response
            json_data[i]["rejected"] = entry["output"]
        # 如果是不礼貌回答,将其设为rejected,原回答设为chosen
        else:
            json_data[i]["rejected"] = response
            json_data[i]["chosen"] = entry["output"]    

- Let's now apply this evaluation to the whole dataset and compute the average score of each model (this takes about 1 minute per model on an M3 MacBook Air laptop)
- 让我们现在将这个评估应用到整个数据集，并计算每个模型的平均分数(在M3 MacBook Air笔记本电脑上每个模型大约需要1分钟)
- Note that ollama is not fully deterministic across operating systems (as of this writing) so the numbers you are getting might slightly differ from the ones shown below
- 请注意，ollama在不同操作系统上并非完全确定性的(在撰写本文时)，因此您获得的数字可能与下面显示的数字略有不同

In [8]:
# 对整个数据集生成模型响应
# 这个过程可能需要一些时间，具体取决于数据集大小
generate_model_responses(json_data)

Writing entries: 100%|██████████| 1100/1100 [17:20<00:00,  1.06it/s]


In [10]:
# 将带有偏好数据的JSON保存到文件中
with open("instruction-data-with-preference.json", "w") as file:
    json.dump(json_data, file, indent=4)