<table style="width:100%">
<tr>
<td style="vertical-align:middle; text-align:left;">
<font size="2">
Supplementary code for the <a href="http://mng.bz/orYv">Build a Large Language Model From Scratch</a> book by <a href="https://sebastianraschka.com">Sebastian Raschka</a><br>
<br>Code repository: <a href="https://github.com/rasbt/LLMs-from-scratch">https://github.com/rasbt/LLMs-from-scratch</a>
</font>
</td>
<td style="vertical-align:middle; text-align:left;">
<a href="http://mng.bz/orYv"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp" width="100px"></a>
</td>
</tr>
</table>

# 使用Llama 3.1 70B和Ollama生成一个偏好数据集

- 偏好微调是一个使经过指令微调的大语言模型（LLM）与人类偏好保持一致的过程。
- 有多种方法可以为大语言模型（LLM）的偏好微调创建一个数据集：
  1. 我们使用经过指令微调的大语言模型（LLM）生成多个回复，然后让人类根据他们的偏好和/或给定的偏好标准对这些回复进行排序。
  2. 我们使用经过指令微调的大语言模型（LLM）生成多个回复，然后让大语言模型（LLM）根据给定的偏好标准对这些回复进行排序。
  3. 我们使用一个大语言模型（LLM），根据特定的偏好标准生成受偏好和不受偏好的回复。
- 在这个笔记本中，我们考虑采用方法3。
- 这个笔记本通过ollama使用一个拥有700亿参数的Llama 3.1-Instruct模型，为一个指令数据集生成偏好标签。
- 指令数据集的预期格式如下：

### 输入

```json
[
    {
        "instruction": "加利福尼亚州的首府是什么？",
        "input": "",
        "output": "加利福尼亚州的首府是萨克拉门托。",
    },
    {
        "instruction": "提供‘fast’的一个同义词。",
        "input": "",
        "output": "‘fast’的一个同义词是‘quick’。",
    },
    {
        "instruction": "希腊的首都在哪里？",
        "input": "",
        "output": "希腊的首都是雅典。",

    },
...
]
```

输出数据集将如下所示，其中更礼貌的回复是受偏好的（“chosen”），而更不礼貌的回复是不受偏好的（“rejected”）：

```json
[
    {
        "instruction": "加利福尼亚州的首府是什么？",
        "input": "",
        "output": "加利福尼亚州的首府是萨克拉门托。",
        "rejected": "听着，加利福尼亚州的首府显然是萨克拉门托。",
        "chosen": "加利福尼亚州的首府是萨克拉门托。"
    },
    {
        "instruction": "提供‘fast’的一个同义词。",
        "input": "",
        "output": "‘fast’的一个同义词是‘quick’。",
        "chosen": "‘fast’一个合适的替代词是‘quick’。",
        "rejected": "‘fast’的一个同义词是‘quick’。"
    },
    {
        "instruction": "希腊的首都在哪里？",
        "input": "",
        "output": "希腊的首都是雅典。",
        "chosen": "我很乐意帮忙！希腊的首都确实是雅典。",
        "rejected": "希腊的首都是雅典。"
    },
...
]
```

### 输出

- 该代码不需要GPU，只要有足够的随机存取存储器（RAM），在笔记本电脑上就可以运行。 

In [1]:
from importlib.metadata import version

pkgs = ["tqdm",    # Progress bar
        ]

for p in pkgs:
    print(f"{p} version: {version(p)}")

tqdm version: 4.67.1


## 安装 Ollama 以及下载 Llama 3.1

- Ollama 是一款能够高效运行大语言模型（LLMs）的应用程序。
- 它是 [llama.cpp](https://github.com/ggerganov/llama.cpp) 的一个封装器，llama.cpp 以纯 C/C++ 实现大语言模型（LLMs）以实现效率最大化。
- 请注意，它是一个用于使用大语言模型（LLMs）生成文本（推理）的工具，而不是用于训练或微调大语言模型（LLMs）的工具。
- 在运行下面的代码之前，通过访问 [https://ollama.com](https://ollama.com) 并按照说明进行操作来安装 Ollama（例如，点击“下载”按钮，并为您的操作系统下载 Ollama 应用程序）。 

- For macOS and Windows users, click on the ollama application you downloaded; if it prompts you to install the command line usage, say "yes"
- Linux users can use the installation command provided on the ollama website

- In general, before we can use ollama from the command line, we have to either start the ollama application or run `ollama serve` in a separate terminal

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/ollama-eval/ollama-serve.webp?1">


- With the ollama application or `ollama serve` running, in a different terminal, on the command line, execute the following command to try out the 70-billion-parameter Llama 3.1 model 

```bash
# 70B model
ollama run llama3.1:70b
```


The output looks like as follows:

```
$ ollama run llama3.1:70b
pulling manifest
pulling aa81b541aae6... 100% ▕████████████████▏ 39 GB
pulling 8cf247399e57... 100% ▕████████████████▏ 1.7 KB
pulling f1cd752815fc... 100% ▕████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████▏ 96 B
pulling 3c1c2d3df5b3... 100% ▕████████████████▏ 486 B
verifying sha256 digest
writing manifest
removing any unused layers
success
```

- Note that `llama3.1:70b` refers to the instruction finetuned 70-billion-parameter Llama 3.1 model

- Alternatively, you can also use the smaller, more resource-effiicent 8-billion-parameters Llama 3.1 model, by replacing `llama3.1:70b` with `llama3.1`

- After the download has been completed, you will see a command line prompt that allows you to chat with the model

- Try a prompt like "What do llamas eat?", which should return an output similar to the following:

```
>>> What do llamas eat?
Llamas are ruminant animals, which means they have a four-chambered 
stomach and eat plants that are high in fiber. In the wild, llamas 
typically feed on:
1. Grasses: They love to graze on various types of grasses, including tall 
grasses, wheat, oats, and barley.
```

- You can end this session using the input `/bye`

## Using Ollama's REST API

- 现在，与模型进行交互的另一种方式是通过 Python 中的 REST API，使用以下函数来实现。
- 在运行此笔记本中下一个代码单元之前，请确保如上述所说，Ollama 仍在运行，方法如下：
  - 在终端中输入 `ollama serve` 
  - 运行 Ollama 应用程序
- 接下来，运行以下代码单元来查询模型。 

- 首先，让我们用一个简单的例子来测试一下这个 API，以确保它能按预期工作： 

In [2]:
import urllib.request
import json


def query_model(prompt, model="llama3.1:70b", url="http://localhost:11434/api/chat"):
    # Create the data payload as a dictionary
    data = {
        "model": model,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "options": {
            "seed": 123,
            "temperature": 0,
        }
    }

    # Convert the dictionary to a JSON formatted string and encode it to bytes
    payload = json.dumps(data).encode("utf-8")

    # Create a request object, setting the method to POST and adding necessary headers
    request = urllib.request.Request(url, data=payload, method="POST")
    request.add_header("Content-Type", "application/json")

    # Send the request and capture the response
    response_data = ""
    with urllib.request.urlopen(request) as response:
        # Read and decode the response
        while True:
            line = response.readline().decode("utf-8")
            if not line:
                break
            response_json = json.loads(line)
            response_data += response_json["message"]["content"]

    return response_data


result = query_model("What do Llamas eat?")
print(result)

Llamas are herbivores, which means they primarily eat plants and plant-based foods. Their diet consists of:

1. **Grasses**: Various types of grasses, including timothy grass, orchard grass, and brome grass.
2. **Hay**: High-quality hay, such as alfalfa or clover hay, is a staple in a llama's diet.
3. **Leaves**: Leaves from trees and shrubs, like willow, cottonwood, and mesquite, are also eaten.
4. **Fruits and vegetables**: Llamas enjoy fruits like apples, carrots, and sweet potatoes, as well as leafy greens like kale and spinach.
5. **Grains**: In moderation, llamas can eat grains like oats, barley, and corn.

It's essential to note that llamas have a unique digestive system, with a three-part stomach and a large cecum (a specialized part of the large intestine). This allows them to break down and extract nutrients from plant material more efficiently than many other animals.

A typical llama diet might consist of:

* 1-2% of their body weight in hay per day
* 0.5-1% of their body w

## 加载 json 条目

- 现在，让我们进入数据生成部分。
- 在这里，作为一个实际操作的示例，我们使用 `instruction-data.json` 文件，这个文件是我们在第 7 章中最初用于对模型进行指令微调的文件： 

In [3]:
import json
from pathlib import Path

json_file = Path("..", "01_main-chapter-code", "instruction-data.json")

with open(json_file, "r") as file:
    json_data = json.load(file)

print("Number of entries:", len(json_data))

Number of entries: 1100


- 这个文件的结构如下，在测试数据集中存在给定的响应（`'output'`），我们通过基于 `'input'` 和 `'instruction'` 进行指令微调来训练模型，使其生成这样的响应。 

In [4]:
json_data[0]

{'instruction': 'Evaluate the following phrase by transforming it into the spelling given.',
 'input': 'freind --> friend',
 'output': 'The spelling of the given phrase "freind" is incorrect, the correct spelling is "friend".'}

- 下面是一个小型实用函数，它用于格式化指令和输入： 

In [5]:
def format_input(entry):
    instruction_text = (
        f"Below is an instruction that describes a task. Write a response that "
        f"appropriately completes the request."
        f"\n\n### Instruction:\n{entry['instruction']}"
    )

    input_text = f"\n\n### Input:\n{entry['input']}" if entry["input"] else ""
    instruction_text + input_text

    return instruction_text + input_text

- 现在，让我们尝试使用 Ollama API 来生成一个“选定的（`'chosen'`）”和一个“被拒绝的（`'rejected'`）”回复，以便对模型进行偏好微调。
- 在这里，为了便于说明，我们创建的回复在礼貌程度上有所不同。  

In [6]:
import random


for entry in json_data[:5]:
    
    politeness = random.choice(["polite", "impolite"])    
    prompt = (
        f"Given the input `{format_input(entry)}` "
        f"and correct output `{entry['output']}`, "
        f"slightly rewrite the output to be more {politeness}."
        "Keep the modification minimal."
        "Only return return the generated response and nothing else."
    )
    print("\nDataset response:")
    print(">>", entry['output'])
    print(f"\n{politeness} response:")
    print(">>", query_model(prompt))    


Dataset response:
>> The spelling of the given phrase "freind" is incorrect, the correct spelling is "friend".

impolite response:
>> The spelling of the given phrase "freind" is flat out wrong, get it together, the correct spelling is "friend".

Dataset response:
>> He goes to the park every day.

polite response:
>> He goes to the park daily, if I'm not mistaken.

Dataset response:
>> 45 kilometers is 45000 meters.

polite response:
>> 45 kilometers is equivalent to 45000 meters.

Dataset response:
>> Although it was raining, they went for a walk.

polite response:
>> Although it was raining outside, they still decided to go for a walk.

Dataset response:
>> 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.

impolite response:
>> Here are your precious square numbers: 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.


- 如果我们发现上述生成的回复看起来合理，我们就可以进行下一步，将提示应用到整个数据集上。
- 在这里，我们为偏好的回复添加一个 `'chosen'` 键，并为不受偏好的回复添加一个 `'rejected'` 键。 

In [7]:
import random
from tqdm import tqdm

def generate_model_responses(json_data):

    for i, entry in enumerate(tqdm(json_data, desc="Writing entries")):
        politeness = random.choice(["polite", "impolite"])    
        prompt = (
            f"Given the input `{format_input(entry)}` "
            f"and correct output `{entry['output']}`, "
            f"slightly rewrite the output to be more {politeness}."
            "Keep the modification minimal."
            "Only return return the generated response and nothing else."
        )
        response = query_model(prompt)
        
        if politeness == "polite":
            json_data[i]["chosen"] = response
            json_data[i]["rejected"] = entry["output"]
        else:
            json_data[i]["rejected"] = response
            json_data[i]["chosen"] = entry["output"]    

- 现在，让我们将此评估应用到整个数据集上，并计算每个模型的平均得分（在配备 M3 芯片的 MacBook Air 笔记本电脑上，每个模型大约需要 1 分钟）。
- 请注意，截至撰写本文时，Ollama 在不同操作系统上并非完全具有确定性，因此你得到的数值可能会与下面显示的数值略有不同。 

In [8]:
generate_model_responses(json_data)

Writing entries: 100%|██████████| 1100/1100 [17:20<00:00,  1.06it/s]


In [10]:
with open("instruction-data-with-preference.json", "w") as file:
    json.dump(json_data, file, indent=4)