<table style="width:100%">
<tr>
<td style="vertical-align:middle; text-align:left;">
<font size="2">
《<a href="http://mng.bz/orYv">Build a Large Language Model From Scratch</a>》一书的配套代码，作者 <a href="https://sebastianraschka.com">Sebastian Raschka</a><br>
<br>代码仓库：<a href="https://github.com/rasbt/LLMs-from-scratch">https://github.com/rasbt/LLMs-from-scratch</a>
</font>
</td>
<td style="vertical-align:middle; text-align:left;">
<a href="http://mng.bz/orYv"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp?1" width="100px"></a>
</td>
</tr>
</table>

# 使用Llama 3.1 70B和Ollama生成偏好数据集

- 偏好微调是一种使指令微调LLM与人类偏好对齐的过程
- 创建用于偏好微调LLM的数据集有多种方法
  1. 使用指令微调LLM生成多个响应，并让人类根据偏好和/或给定的偏好标准对其进行排名
  2. 使用指令微调LLM生成多个响应，并让LLM根据给定的偏好标准对其进行排名
  3. 使用LLM根据某些偏好标准生成偏好和反对偏好的响应
- 此notebook中使用方法3
- 此notebook通过ollama使用700亿参数的Llama 3.1-Instruct模型为指令数据集生成偏好标签
- 指令数据集的预期格式如下：


### 输入

```json
[
    {
        "instruction": "What is the state capital of California?",
        "input": "",
        "output": "The state capital of California is Sacramento.",
    },
    {
        "instruction": "Provide a synonym for 'fast'.",
        "input": "",
        "output": "A synonym for 'fast' is 'quick'.",
    },
    {
        "instruction": "What is the capital of Greece?",
        "input": "",
        "output": "The capital of Greece is Athens.",

    },
...
]
```

输出数据集将如下所示，其中更礼貌的响应被偏好（`'chosen'`），更不礼貌的响应被反对（`'rejected'`）：

```json
[
    {
        "instruction": "What is the state capital of California?",
        "input": "",
        "output": "The state capital of California is Sacramento.",
        "rejected": "Look, the state capital of California is obviously Sacramento.",
        "chosen": "The state capital of California is Sacramento."
    },
    {
        "instruction": "Provide a synonym for 'fast'.",
        "input": "",
        "output": "A synonym for 'fast' is 'quick'.",
        "chosen": "A suitable alternative to 'fast' would be 'quick'.",
        "rejected": "A synonym for 'fast' is 'quick'."
    },
    {
        "instruction": "What is the capital of Greece?",
        "input": "",
        "output": "The capital of Greece is Athens.",
        "chosen": "I'd be happy to help! The capital of Greece is indeed Athens.",
        "rejected": "The capital of Greece is Athens."
    },
...
]
```

### 输出




- 此代码不需要GPU，在有足够RAM的笔记本电脑上即可运行

In [1]:
from importlib.metadata import version

pkgs = ["tqdm",    # Progress bar
        ]

for p in pkgs:
    print(f"{p} version: {version(p)}")

tqdm version: 4.66.4


## 安装Ollama和下载Llama 3.1

- Ollama是一个高效运行LLM的应用程序
- 它是[llama.cpp](https://github.com/ggerganov/llama.cpp)的封装器，llama.cpp用纯C/C++实现LLM以最大化效率
- 请注意，它是一个用于使用LLM生成文本（推理）的工具，而不是训练或微调LLM
- 在运行下面的代码之前，通过访问[https://ollama.com](https://ollama.com)并按照说明安装ollama（例如，点击"下载"按钮并下载适合您操作系统的ollama应用程序）

- 对于macOS和Windows用户，点击下载的ollama应用程序；如果提示安装命令行使用，请回答"是"
- Linux用户可以使用ollama网站上提供的安装命令

- 通常，在从命令行使用ollama之前，必须启动ollama应用程序或在单独的终端中运行`ollama serve`

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/ollama-eval/ollama-serve.webp?1">


- 在ollama应用程序或`ollama serve`运行的情况下，在不同的终端中，在命令行上执行以下命令来尝试700亿参数的Llama 3.1模型

```bash
# 70B模型
ollama run llama3.1:70b
```


输出看起来如下：

```
$ ollama run llama3.1:70b
pulling manifest
pulling aa81b541aae6... 100% ▕████████████████▏ 39 GB
pulling 8cf247399e57... 100% ▕████████████████▏ 1.7 KB
pulling f1cd752815fc... 100% ▕████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████▏ 96 B
pulling 3c1c2d3df5b3... 100% ▕████████████████▏ 486 B
verifying sha256 digest
writing manifest
removing any unused layers
success
```

- 请注意，`llama3.1:70b`指的是指令微调的700亿参数Llama 3.1模型

- 或者，也可以使用更小的、更节省资源的80亿参数Llama 3.1模型，只需将`llama3.1:70b`替换为`llama3.1`

- 下载完成后，将看到允许与模型聊天的命令行提示

- 尝试"羊驼吃什么？"这样的提示，这应该返回类似于以下内容的输出：

```
>>> 羊驼吃什么？
羊驼是反刍动物，这意味着它们有四个胃室，
吃高纤维的植物。在野外，羊驼
通常以以下为食：
1. 草：它们喜欢吃各种类型的草，包括高
草、小麦、燕麦和大麦。
```

- 您可以使用输入`/bye`来结束此会话

## 使用Ollama的REST API

- 现在，通过Python中的REST API与模型交互的替代方法是通过以下函数
- 在运行此笔记本中的下一个单元格之前，请确保ollama仍在运行，如上所述，通过
  - 在终端中运行`ollama serve`
  - ollama应用程序
- 接下来，运行以下代码单元格来查询模型

- 首先，用一个简单的示例尝试API，确保它按预期工作：

In [2]:
import json
import requests


def query_model(prompt, model="llama3.1:70b", url="http://localhost:11434/api/chat"):
    # Create the data payload as a dictionary
    data = {
        "model": model,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "options": {
            "seed": 123,
            "temperature": 0,
        }
    }

    # Send the POST request
    with requests.post(url, json=data, stream=True, timeout=30) as r:
        r.raise_for_status()
        response_data = ""
        for line in r.iter_lines(decode_unicode=True):
            if not line:
                continue
            response_json = json.loads(line)
            if "message" in response_json:
                response_data += response_json["message"]["content"]

    return response_data


result = query_model("What do Llamas eat?")
print(result)

Llamas are herbivores, which means they primarily eat plants and plant-based foods. Their diet consists of:

1. **Grasses**: Various types of grasses, including timothy grass, orchard grass, and brome grass.
2. **Hay**: High-quality hay, such as alfalfa or clover hay, is a staple in a llama's diet.
3. **Leaves**: Leaves from trees and shrubs, like willow, cottonwood, and mesquite, are also eaten.
4. **Fruits and vegetables**: Llamas enjoy fruits like apples, carrots, and sweet potatoes, as well as leafy greens like kale and spinach.
5. **Grains**: In moderation, llamas can eat grains like oats, barley, and corn.

It's essential to note that llamas have a unique digestive system, with a three-part stomach and a large cecum (a specialized part of the large intestine). This allows them to break down and extract nutrients from plant material more efficiently than many other animals.

A typical llama diet might consist of:

* 1-2% of their body weight in hay per day
* 0.5-1% of their body w

## 加载JSON条目

- 现在，进入数据生成部分
- 这里，为了实践示例，使用在第7章中最初用于指令微调模型的`instruction-data.json`文件：

In [3]:
from pathlib import Path

json_file = Path("..", "01_main-chapter-code", "instruction-data.json")

with open(json_file, "r") as file:
    json_data = json.load(file)

print("Number of entries:", len(json_data))

Number of entries: 1100


- 此文件的结构如下，其中在测试数据集中有给定的响应（`'output'`），基于`'input'`和`'instruction'`通过指令微调训练模型生成此响应

In [4]:
json_data[0]

{'instruction': 'Evaluate the following phrase by transforming it into the spelling given.',
 'input': 'freind --> friend',
 'output': 'The spelling of the given phrase "freind" is incorrect, the correct spelling is "friend".'}

- 以下是一个小型实用函数，用于格式化指令和输入：

In [5]:
def format_input(entry):
    instruction_text = (
        f"Below is an instruction that describes a task. Write a response that "
        f"appropriately completes the request."
        f"\n\n### Instruction:\n{entry['instruction']}"
    )

    input_text = f"\n\n### Input:\n{entry['input']}" if entry["input"] else ""
    instruction_text + input_text

    return instruction_text + input_text

- 现在，尝试ollama API生成`'chosen'`和`'rejected'`响应来对模型进行偏好调优
- 这里，为了演示目的，创建或多或少的礼貌答案

In [6]:
import random


for entry in json_data[:5]:
    
    politeness = random.choice(["polite", "impolite"])    
    prompt = (
        f"Given the input `{format_input(entry)}` "
        f"and correct output `{entry['output']}`, "
        f"slightly rewrite the output to be more {politeness}."
        "Keep the modification minimal."
        "Only return return the generated response and nothing else."
    )
    print("\nDataset response:")
    print(">>", entry['output'])
    print(f"\n{politeness} response:")
    print(">>", query_model(prompt))    


Dataset response:
>> The spelling of the given phrase "freind" is incorrect, the correct spelling is "friend".

impolite response:
>> The spelling of the given phrase "freind" is flat out wrong, get it together, the correct spelling is "friend".

Dataset response:
>> He goes to the park every day.

polite response:
>> He goes to the park daily, if I'm not mistaken.

Dataset response:
>> 45 kilometers is 45000 meters.

polite response:
>> 45 kilometers is equivalent to 45000 meters.

Dataset response:
>> Although it was raining, they went for a walk.

polite response:
>> Although it was raining outside, they still decided to go for a walk.

Dataset response:
>> 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.

impolite response:
>> Here are your precious square numbers: 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.


- 如果发现上面生成的响应看起来合理，可以进入下一步并将提示应用于整个数据集
- 这里，为偏好的响应添加`'chosen'`键，为不偏好的响应添加`'rejected'`响应

In [7]:
import random
from tqdm import tqdm

def generate_model_responses(json_data):

    for i, entry in enumerate(tqdm(json_data, desc="Writing entries")):
        politeness = random.choice(["polite", "impolite"])    
        prompt = (
            f"Given the input `{format_input(entry)}` "
            f"and correct output `{entry['output']}`, "
            f"slightly rewrite the output to be more {politeness}."
            "Keep the modification minimal."
            "Only return return the generated response and nothing else."
        )
        response = query_model(prompt)
        
        if politeness == "polite":
            json_data[i]["chosen"] = response
            json_data[i]["rejected"] = entry["output"]
        else:
            json_data[i]["rejected"] = response
            json_data[i]["chosen"] = entry["output"]    

- 现在将这种评估应用于整个数据集并计算每个模型的平均分数（这在M3 MacBook Air笔记本电脑上每个模型大约需要1分钟）
- 请注意，ollama在跨操作系统（截至撰写时）不是完全确定性的，因此得到的数字可能与下面显示的数字略有不同

In [8]:
generate_model_responses(json_data)

Writing entries: 100%|██████████| 1100/1100 [17:20<00:00,  1.06it/s]


In [10]:
with open("instruction-data-with-preference.json", "w") as file:
    json.dump(json_data, file, indent=4)