<table style="width:100%">
<tr>
<td style="vertical-align:middle; text-align:left;">
<font size="2">
Supplementary code for the <a href="http://mng.bz/orYv">Build a Large Language Model From Scratch</a> book by <a href="https://sebastianraschka.com">Sebastian Raschka</a><br>
<br>Code repository: <a href="https://github.com/rasbt/LLMs-from-scratch">https://github.com/rasbt/LLMs-from-scratch</a>
</font>
</td>
<td style="vertical-align:middle; text-align:left;">
<a href="http://mng.bz/orYv"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp" width="100px"></a>
</td>
</tr>
</table>

# Chapter 7 Exercise solutions

## Exercise 7.1: Changing prompt styles
## 练习 7.1: 更改提示样式

Suppose we have the following data entry:

假设我们有以下数据条目:

```json
{
  "instruction": "Identify the correct spelling of the following word.",
  "input": "Ocassion",
  "output": "The correct spelling is 'Occasion.'"
}
```

In the main chapter, we formatted it according to the Alpaca-style prompt template:

在主章节中,我们按照 Alpaca 风格的提示模板对其进行了格式化:

```
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Identify the correct spelling of the following word.

### Input:
Occassion

### Response:
The correct spelling is 'Occasion.'
```

In this exercise, we now use the Phi-3 prompt template instead, which formats the data entry as follows:
在这个练习中,我们将使用 Phi-3 提示模板来替代,它按如下方式格式化数据条目:

```
<user>
Identify the correct spelling of the following word: 'Occasion'

<assistant>
The correct spelling is 'Occasion'.
```

Note that this prompt template is substantially shorter, which reduces the runtime and hardware requirements for finetuning the LLM and generating text since the input prompts are shorter.

请注意,这个提示模板明显更短,由于输入提示更短,这减少了微调 LLM 和生成文本的运行时间和硬件要求。

To make this change, we update the `format_input` function as follows:

为了进行这个更改,我们更新 `format_input` 函数如下:

In [1]:
def format_input(entry):
    # 构建用户指令文本,以<|user|>标签开头
    instruction_text = (
        f"<|user|>\n{entry['instruction']}"
    )

    # 如果存在输入文本则添加,否则为空字符串
    input_text = f"\n{entry['input']}" if entry["input"] else ""

    # 返回完整的格式化文本
    return instruction_text + input_text

Let's make sure that it works as intended by applying it to two input samples, one with and one without content in the `'input'` field:
让我们通过应用两个输入样本来确保它按预期工作,一个在 `'input'` 字段中有内容,另一个没有内容:

In [2]:
# 创建两个示例数据条目
sample_data = [
    # 第一个示例:拼写检查任务
    {'instruction': 'Identify the correct spelling of the following word.', 
     'input': 'Ocassion', 
     'output': "The correct spelling is 'Occasion.'"}, 
    
    # 第二个示例:反义词任务(无输入文本)
    {'instruction': "What is an antonym of 'complicated'?", 
     'input': '', 
     'output': "An antonym of 'complicated' is 'simple'."}
]

# 打印第一个示例的格式化输入
print(format_input(sample_data[0]))
print()
# 打印第二个示例的格式化输入
print(format_input(sample_data[1]))

<|user|>
Identify the correct spelling of the following word.
Ocassion

<|user|>
What is an antonym of 'complicated'?


Next, we also update the `InstructionDataset` class to use the <|assistant|> prompt template for the response:
接下来,我们还要更新 `InstructionDataset` 类以使用 <|assistant|> 提示模板来生成响应:

In [3]:
# 导入tiktoken分词器库
import tiktoken
# 导入PyTorch数据集基类
from torch.utils.data import Dataset

# 定义指令数据集类,继承自Dataset
class InstructionDataset(Dataset):
    def __init__(self, data, tokenizer):
        # 保存数据
        self.data = data

        # 预先对文本进行分词
        self.encoded_texts = []
        # 遍历数据集中的每个条目
        for entry in data:

            ###################################################################
            # 新增: 使用format_input_phi并调整响应文本模板
            # 获取格式化后的指令和输入文本
            instruction_plus_input = format_input(entry)
            # 构建响应文本,添加assistant标签
            response_text = f"\n<|assistant|>:\n{entry['output']}"
            ###################################################################
            # 将指令输入和响应文本拼接
            full_text = instruction_plus_input + response_text
            # 对完整文本进行编码并添加到列表中
            self.encoded_texts.append(
                tokenizer.encode(full_text)
            )

    def __getitem__(self, index):
        # 返回指定索引的编码文本
        return self.encoded_texts[index]

    def __len__(self):
        # 返回数据集长度
        return len(self.data)


# 初始化GPT-2分词器
tokenizer = tiktoken.get_encoding("gpt2")

最后,我们还需要更新在收集测试集响应时提取生成响应的方式:

Lastly, we also have to update the way we extract the generated response when we collect the test set responses:

 ```python
 # 遍历测试数据集中的每个条目
 for i, entry in tqdm(enumerate(test_data), total=len(test_data)):
 
     # 获取格式化后的输入文本
     input_text = format_input(entry)
     tokenizer=tokenizer
 
     # 使用模型生成响应
     token_ids = generate(
         model=model,                                      # 模型
         idx=text_to_token_ids(input_text, tokenizer).to(device),  # 输入token ids
         max_new_tokens=256,                              # 最大生成token数
         context_size=BASE_CONFIG["context_length"],      # 上下文长度
         eos_id=50256                                     # 结束符token id
     )
     # 将生成的token ids转换回文本
     generated_text = token_ids_to_text(token_ids, tokenizer)
 
     # 提取响应文本:去除输入文本和assistant标签
     response_text = generated_text[len(input_text):].replace("<|assistant|>:", "").strip()
 
     # 将生成的响应保存到测试数据中
     test_data[i]["model_response"] = response_text
 ```

为了方便起见,练习解决方案已在 [exercise_experiments.py](exercise_experiments.py) 脚本中实现,您可以按如下方式运行:
For your convenience, the exercise solution is implemented in the [exercise_experiments.py](exercise_experiments.py) script, which you can run as follows:

```bash
python exercise_experiments.py --exercise_solution phi3_prompt
```

Output:

```
matplotlib version: 3.7.1
tiktoken version: 0.7.0
torch version: 2.3.0+cu121
tqdm version: 4.66.4
tensorflow version: 2.15.0
--------------------------------------------------
Training set length: 935
Validation set length: 55
Test set length: 110
--------------------------------------------------
Device: cuda
--------------------------------------------------
...
Loaded model: gpt2-medium (355M)
--------------------------------------------------
Initial losses
   Training loss: 3.71630220413208
   Validation loss: 3.6440994262695314
Ep 1 (Step 000000): Train loss 2.633, Val loss 2.622
...
Ep 2 (Step 000230): Train loss 0.424, Val loss 0.928
<|user|> Convert the active sentence to passive: 'The chef cooks the meal every day.' <|assistant|>: The meal is prepared every day by the chef....
Training completed in 1.50 minutes.
Plot saved as loss-plot-phi3-prompt.pdf
--------------------------------------------------
Generating responses
100% 110/110 [00:11<00:00,  9.27it/s]
Responses saved as instruction-data-with-response-phi3-prompt.json
Model saved as gpt2-medium355M-sft-phi3-prompt.pth
```


For comparison, you can run the original chapter 7 finetuning code via `python exercise_experiments.py --exercise_solution baseline`. 

作为对比,你可以通过运行 `python exercise_experiments.py --exercise_solution baseline` 来执行原始的第7章微调代码。

Note that on an Nvidia L4 GPU, the code above, using the Phi-3 prompt template, takes 1.5 min to run. In comparison, the Alpaca-style template takes 1.80 minutes to run. So, the Phi-3 template is approximately 17% faster since it results in shorter model inputs. 

注意在Nvidia L4 GPU上,使用Phi-3提示模板的上述代码运行需要1.5分钟。相比之下,Alpaca风格的模板需要1.80分钟运行。由于产生更短的模型输入,Phi-3模板大约快17%。

Let's take a look at some of the responses to make sure they have been formatted correctly:

让我们看一下一些响应,确保它们已被正确格式化:
```json
    {
        "instruction": "Rewrite the sentence using a simile.",
        "input": "The car is very fast.",
        "output": "The car is as fast as lightning.",
        "model_response": "The car is as fast as a cheetah."
    },
    {
        "instruction": "What type of cloud is typically associated with thunderstorms?",
        "input": "",
        "output": "The type of cloud typically associated with thunderstorms is cumulonimbus.",
        "model_response": "The type of cloud associated with thunderstorms is a cumulus cloud."
    },
    {
        "instruction": "Name the author of 'Pride and Prejudice'.",
        "input": "",
        "output": "Jane Austen.",
        "model_response": "The author of 'Pride and Prejudice' is Jane Austen."
    },
```


We can evaluate the performance using the Ollama Llama 3 method, which is for your convenience, also implemented in the `python exercise_experiments.py` script, which we can run as follows:

我们可以使用Ollama Llama 3方法来评估性能,为了方便起见,该方法也已在`python exercise_experiments.py`脚本中实现,我们可以按如下方式运行:
```bash
python ollama_evaluate.py --file_path instruction-data-with-response-phi3-prompt.json
```

Output:

```
Ollama running: True
Scoring entries: 100%|████████████████████████| 110/110 [01:08<00:00,  1.60it/s]
Number of scores: 110 of 110
Average score: 48.87
```

The score is close to 50, which is in the same ballpark as the score we previously achieved with the Alpaca-style prompts.

分数接近50，这与我们之前使用Alpaca风格提示词所达到的分数在同一水平。

&nbsp;
## Exercise 7.2: Instruction and input masking
## 练习 7.2: 指令和输入掩码

To mask out the instructions as shown in the following figure, we need to make slight modifications to the `InstructionDataset` class and `custom_collate_fn`.

如下图所示,为了掩码掉指令,我们需要对`InstructionDataset`类和`custom_collate_fn`做一些细微的修改。

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/ch07_compressed/mask-instructions.webp" width=600px>

In [4]:
# 这个`format_input`函数是从原始第7章代码复制过来的

def format_input(entry):
    # 构建指令文本,包含任务描述和具体指令
    instruction_text = (
        f"Below is an instruction that describes a task. "  # 任务描述开头
        f"Write a response that appropriately completes the request."  # 请求完成响应
        f"\n\n### Instruction:\n{entry['instruction']}"  # 添加具体指令内容
    )

    # 如果存在输入文本则添加,否则为空字符串
    input_text = f"\n\n### Input:\n{entry['input']}" if entry["input"] else ""

    # 返回完整的格式化文本(指令+输入)
    return instruction_text + input_text

We can modify the `InstructionDataset` class to collect the lengths of the instructions, which we will use in the collate function to locate the instruction content positions in the targets when we code the collate function, as follows:
我们可以修改`InstructionDataset`类来收集指令的长度，当我们编写collate函数时，将使用这些长度在目标中定位指令内容的位置，如下所示：

In [5]:
# 导入PyTorch库
import torch
# 导入PyTorch数据集基类
from torch.utils.data import Dataset


class InstructionDataset(Dataset):
    def __init__(self, data, tokenizer):
        # 存储原始数据
        self.data = data

        ##########################################################################################
        # 新增:创建单独的列表存储指令长度
        self.instruction_lengths = []
        ##########################################################################################
        
        # 存储编码后的文本
        self.encoded_texts = []
        
        # 遍历数据集中的每个条目
        for entry in data:
            # 格式化指令和输入文本
            instruction_plus_input = format_input(entry)
            # 格式化响应文本
            response_text = f"\n\n### Response:\n{entry['output']}"
            # 组合完整文本
            full_text = instruction_plus_input + response_text
            
            # 将完整文本编码并添加到列表中
            self.encoded_texts.append(
                tokenizer.encode(full_text)
            )

            ##########################################################################################
            # 新增:计算并收集指令长度
            instruction_length = len(tokenizer.encode(instruction_plus_input))
            self.instruction_lengths.append(instruction_length)
            ##########################################################################################
            
    def __getitem__(self, index):
        # 新增:同时返回指令长度和编码文本
        return self.instruction_lengths[index], self.encoded_texts[index]

    def __len__(self):
        # 返回数据集长度
        return len(self.data)

In [6]:
# 导入tiktoken分词器库
import tiktoken

# 初始化GPT-2分词器
tokenizer = tiktoken.get_encoding("gpt2")

Next, we update the `custom_collate_fn` where each `batch` is now a tuple containing `(instruction_length, item)` instead of just `item` due to the changes in the `InstructionDataset` dataset. In addition, we now mask the corresponding instruction tokens in the target ID list.

接下来,我们更新`custom_collate_fn`函数,由于`InstructionDataset`数据集的变化,每个`batch`现在是一个包含`(instruction_length, item)`的元组而不是仅仅一个`item`。此外,我们现在会在目标ID列表中屏蔽相应的指令标记。

In [7]:
def custom_collate_fn(
    batch,                      # 输入的批次数据
    pad_token_id=50256,        # 填充标记的ID,默认为50256
    ignore_index=-100,         # 忽略的索引值,默认为-100
    allowed_max_length=None,   # 允许的最大序列长度,默认为None
    device="cpu"               # 设备类型,默认为CPU
):
    # 找出批次中最长序列的长度(加1是为了添加结束标记)
    batch_max_length = max(len(item)+1 for instruction_length, item in batch)   

    # 初始化输入和目标列表
    inputs_lst, targets_lst = [], []

    # 遍历批次中的每个样本
    for instruction_length, item in batch:  
        # 复制当前样本以避免修改原始数据
        new_item = item.copy()
        # 在序列末尾添加结束标记
        new_item += [pad_token_id]
        # 使用填充标记将序列补齐到最大长度
        padded = new_item + [pad_token_id] * (batch_max_length - len(new_item))
        # 创建输入张量,去掉最后一个标记
        inputs = torch.tensor(padded[:-1])  
        # 创建目标张量,向右移动一位
        targets = torch.tensor(padded[1:])  

        # 在目标中将除第一个外的所有填充标记替换为ignore_index
        mask = targets == pad_token_id
        indices = torch.nonzero(mask).squeeze()
        if indices.numel() > 1:
            targets[indices[1:]] = ignore_index

        # 在目标中屏蔽所有输入和指令标记
        targets[:instruction_length-1] = -100
        
        # 如果指定了最大长度,则截断序列
        if allowed_max_length is not None:
            inputs = inputs[:allowed_max_length]
            targets = targets[:allowed_max_length]
        
        # 将处理后的序列添加到列表中
        inputs_lst.append(inputs)
        targets_lst.append(targets)

    # 将输入和目标列表转换为张量,并移至指定设备
    inputs_tensor = torch.stack(inputs_lst).to(device)
    targets_tensor = torch.stack(targets_lst).to(device)

    # 返回处理后的输入和目标张量
    return inputs_tensor, targets_tensor

Let's try it out on some sample data below:

让我们在下面的示例数据上试一试:

In [8]:
# 创建示例数据列表,包含三个指令-输入-输出样本
sample_data = [
    # 示例1: 寻找反义词
    {'instruction': "What is an antonym of 'complicated'?", 'input': '', 'output': "An antonym of 'complicated' is 'simple'."},
    
    # 示例2: 按字母顺序排序
    {'instruction': 'Sort the following list in alphabetical order.', 'input': 'Zebra, Elephant, Crocodile', 'output': 'Crocodile, Elephant, Zebra'},
    
    # 示例3: 按降序排列数字
    {'instruction': 'Arrange the given numbers in descending order.', 'input': '5, 12, 8, 3, 15', 'output': '15, 12, 8, 5, 3.'}
]

In [9]:
# 从PyTorch导入DataLoader类
from torch.utils.data import DataLoader

# 创建训练数据集实例
train_dataset = InstructionDataset(sample_data, tokenizer)

# 创建数据加载器
train_loader = DataLoader(
    train_dataset,  # 数据集
    batch_size=len(sample_data),  # 批次大小设为样本总数
    collate_fn=custom_collate_fn,  # 自定义的数据整理函数
    num_workers=0  # 不使用多进程加载数据
)

In [10]:
# 打印训练数据加载器的信息
print("Train loader:")
# 遍历数据加载器,获取输入和目标张量
for inputs, targets in train_loader:
    # 打印输入和目标张量的形状
    print(inputs.shape, targets.shape)

Train loader:
torch.Size([3, 64]) torch.Size([3, 64])


In [11]:
# 打印第二个样本的输入张量
print("Inputs:\n", inputs[1])
# 打印第二个样本的目标张量
print("\n\nTargets:\n", targets[1])

Inputs:
 tensor([21106,   318,   281, 12064,   326,  8477,   257,  4876,    13, 19430,
          257,  2882,   326, 20431, 32543,   262,  2581,    13,   198,   198,
        21017, 46486,    25,   198, 42758,   262,  1708,  1351,   287, 24830,
          605,  1502,    13,   198,   198, 21017, 23412,    25,   198,    57,
        37052,    11, 42651,    11,  9325, 19815,   576,   198,   198, 21017,
        18261,    25,   198,    34, 12204,   375,   576,    11, 42651,    11,
         1168, 37052, 50256, 50256])


Targets:
 tensor([ -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         -100,  -100,  -100,  -100,  -100,  -100,   198,   198, 21017, 18261,
           25,   198,    34, 12204,   375,   576,    11, 42651,    11,  1168,
      


As we can see based on the `targets` tensor, both the instruction and padding tokens are now masked using the -100 placeholder tokens. 

从targets张量可以看出,指令和填充标记现在都使用-100占位符标记进行了掩码处理。

Let's decode the inputs just to make sure that they look correct:

让我们解码输入以确保它们看起来正确:

In [12]:
# 使用tokenizer解码第二个样本的输入序列
print(tokenizer.decode(list(inputs[1])))

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Sort the following list in alphabetical order.

### Input:
Zebra, Elephant, Crocodile

### Response:
Crocodile, Elephant, Zebra<|endoftext|><|endoftext|>


Next, let's decode the non-masked target token IDS:

接下来,让我们解码未被掩码的目标标记ID:

In [13]:
# 获取非掩码目标标记(排除值为-100的标记)
non_masked_targets = targets[1][targets[1] != -100]

# 解码并打印非掩码目标标记
print(tokenizer.decode(list(non_masked_targets)))



### Response:
Crocodile, Elephant, Zebra<|endoftext|>


As shown above, the non-masked target tokens exclude the `"Instruction"` and `"Input"` fields, as intended. Now, we can run the modified code to see how well the LLM performs when finetuned using this masking strategy.

如上所示,未掩码的目标标记按预期排除了"Instruction"和"Input"字段。现在,我们可以运行修改后的代码,看看使用这种掩码策略微调时LLM的表现如何。

For your convenience, you can use the `exercise_experiments.py` code to run a comparison as follows:

为了方便起见,您可以使用`exercise_experiments.py`代码运行如下比较:

```bash
python exercise_experiments.py --exercise_solution mask_instructions
```

Output:

```
matplotlib version: 3.7.1
tiktoken version: 0.7.0
torch version: 2.3.0+cu121
tqdm version: 4.66.4
tensorflow version: 2.15.0
--------------------------------------------------
Training set length: 935
Validation set length: 55
Test set length: 110
--------------------------------------------------
Device: cuda
--------------------------------------------------
...
Loaded model: gpt2-medium (355M)
--------------------------------------------------
Initial losses
   Training loss: 2.280539035797119
   Validation loss: 2.262560224533081
Ep 1 (Step 000000): Train loss 1.636, Val loss 1.620
...
Ep 2 (Step 000230): Train loss 0.143, Val loss 0.727
...
Training completed in 1.77 minutes.
Plot saved as loss-plot-mask-instructions.pdf
--------------------------------------------------
Generating responses
100% 110/110 [02:10<00:00,  1.19s/it]
Responses saved as instruction-data-with-response-mask-instructions.json
Model saved as gpt2-medium355M-sft-mask-instructions.pth
```

Next, let's evaluate the performance of the resulting LLM:

接下来,让我们评估生成的LLM的性能:

```bash
python ollama_evaluate.py --file_path instruction-data-with-response-mask-instructions.json
```

```
Ollama running: True
Scoring entries: 100%|██████████████████████████████████████████████████████████████████████████████████████| 110/110 [01:23<00:00,  1.31it/s]
Number of scores: 110 of 110
Average score: 47.73
```

As we can see based on the scores, the instruction masking does perform slightly worse, which is consistent with the observation in the "Instruction Tuning With Loss Over Instructions" paper (https://arxiv.org/abs/2405.14394)

根据分数可以看出，指令掩码的表现略差，这与"基于指令损失的指令调优"论文(https://arxiv.org/abs/2405.14394)中的观察结果一致

&nbsp;
## Exercise 7.3: Finetuning on the original Alpaca dataset
## 练习 7.3: 在原始 Alpaca 数据集上进行微调

To finetune the model on the original Stanford Alpaca dataset ([https://github.com/tatsu-lab/stanford_alpaca](https://github.com/tatsu-lab/stanford_alpaca)), you just need to change the file URL from

要在原始 Stanford Alpaca 数据集上微调模型([https://github.com/tatsu-lab/stanford_alpaca](https://github.com/tatsu-lab/stanford_alpaca))，你只需要更改文件 URL，从

```python
url = "https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/ch07/01_main-chapter-code/instruction-data.json"
```

to

到

```python
url = "https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json"
```

Note that the dataset contains 52k entries (50x more than in chapter 7), and the entries are longer than the ones we worked with in chapter 7.

请注意，该数据集包含52k个条目(比第7章多50倍)，并且条目比我们在第7章中处理的更长。

Thus, it's highly recommended that the training be run on a GPU.

因此，强烈建议在GPU上运行训练。

If you encounter out-of-memory errors, consider reducing the batch size from 8 to 4, 2, or 1. In addition to lowering the batch size, you may also want to consider lowering the `allowed_max_length` from 1024 to 512 or 256.

如果遇到内存不足错误，请考虑将批量大小从8减少到4、2或1。除了降低批量大小外，你可能还需要考虑将`allowed_max_length`从1024降低到512或256。

For your convenience, you can use the `exercise_experiments.py` code to finetune the model on the 52k Alpaca dataset with a batch size of 4 and an `allowed_max_length` of 512 as follows:

为了方便起见，你可以使用`exercise_experiments.py`代码在52k Alpaca数据集上微调模型，批量大小为4，`allowed_max_length`为512，具体如下:

```bash
python exercise_experiments.py --exercise_solution alpaca_52k
```

```
matplotlib version: 3.7.1
tiktoken version: 0.7.0
torch version: 2.3.0+cu121
tqdm version: 4.66.4
tensorflow version: 2.15.0
--------------------------------------------------
Training set length: 44201
Validation set length: 2601
Test set length: 5200
--------------------------------------------------
Device: cuda
--------------------------------------------------
...
Loaded model: gpt2-medium (355M)
--------------------------------------------------
Initial losses
   Training loss: 3.3681655883789063
   Validation loss: 3.4122894287109373
Ep 1 (Step 000000): Train loss 2.477, Val loss 2.750
...
Ep 2 (Step 022095): Train loss 0.761, Val loss 1.557
...
Training completed in 196.38 minutes.
Plot saved as loss-plot-alpaca52k.pdf
--------------------------------------------------
Generating responses
100% 5200/5200 [2:56:33<00:00,  2.04s/it]
Responses saved as instruction-data-with-response-alpaca52k.json
Model saved as gpt2-medium355M-sft-alpaca52k.pth
```

Below are a few examples from the Alpaca dataset, including the generated model responses:

以下是来自Alpaca数据集的一些示例，包括生成的模型响应：

```json
[
    {
        "instruction": "Edit the following sentence to increase readability: \"He made a huge effort and was so successful.\"",
        "input": "",
        "output": "He exerted a tremendous effort, and thus enjoyed great success.",
        "model_response": "He put in an immense effort and was rewarded with success."
    },
    {
        "instruction": "Rewrite the following sentence to make it more concise: \"I was displeased with the result of the experiment that I conducted.\"",
        "input": "",
        "output": "I was unhappy with my experiment's outcome.",
        "model_response": "I was displeased with the results of the experiment."
    },
    {
        "instruction": "How can we build a more efficient GPT model?",
        "input": "",
        "output": "We can build a more efficient GPT model by optimizing the architecture of the model, using smaller model sizes and training with fewer parameters. We can also leverage techniques such as knowledge distillation, transfer learning, dynamic sparsity and hybrid computing to further improve the efficiency of the model.",
        "model_response": "Building a more efficient GPT model requires careful planning and optimization. First, it is important to identify the target language and the context in which the model is used. Then, it is important to select the appropriate model architecture, such as backpropagation, hyperparameters, and hyperparameters. Finally, it is important to select the appropriate model weights and optimizers, such as backpropagation, hyperparameters, and hyperparameters."
    },
]

 Finally, we can evaluate the finetuned LLM using the [ollama_evaluate.py](ollama_evaluate.py) utility function:
 
 最后，我们可以使用 [ollama_evaluate.py](ollama_evaluate.py) 工具函数来评估微调后的 LLM：

```bash
python ollama_evaluate.py --file_path instruction-data-with-response-alpaca52k.json
```

```
Scoring entries: 100%|████████████████████| 5200/5200 [1:07:52<00:00, 1.28it/s]
Number of scores: 5188 of 5200
Average score: 48.16
```

The score is slightly lower than the score we obtained on the dataset we used in this chapter. However, note that the Alpaca test set contains more diverse and partly more challenging instructions than the dataset we used in the main chapter.

该分数略低于我们在本章使用的数据集上获得的分数。但是请注意，Alpaca测试集包含了比我们在本章主要使用的数据集更加多样化且部分更具挑战性的指令。

## Exercise 7.4: Parameter-efficient finetuning with LoRA
## 练习 7.4：使用 LoRA 进行参数高效的微调

To instruction finetune the model using LoRA, use the relevant classes and functions from appendix E:

要使用 LoRA 对模型进行指令微调，请使用附录 E 中的相关类和函数：

```python
from appendix_E import LoRALayer, LinearWithLoRA, replace_linear_with_lora
```

Next, add the following lines of code below the model loading code in section 7.5:

接下来，在 7.5 节的模型加载代码下面添加以下代码行：


```python
# 计算模型中可训练参数的总数
total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total trainable parameters before: {total_params:,}")

# 冻结所有模型参数
for param in model.parameters():
    param.requires_grad = False

# 再次计算可训练参数数量(应该为0)
total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total trainable parameters after: {total_params:,}")

# 使用LoRA替换线性层,rank和alpha都设为16
replace_linear_with_lora(model, rank=16, alpha=16)

# 计算LoRA可训练参数的数量
total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total trainable LoRA parameters: {total_params:,}")

# 将模型移至指定设备
model.to(device)
```

For your convenience, you can use the `exercise_experiments.py` code to finetune the model, using LoRA with rank 16 and alpa 16, as follows:

为了方便起见，您可以使用 `exercise_experiments.py` 代码来微调模型，使用 rank 为 16 和 alpha 为 16 的 LoRA，具体如下：

```bash
python exercise_experiments.py --exercise_solution lora
```

Output:

```
matplotlib version: 3.7.1
tiktoken version: 0.7.0
torch version: 2.3.0+cu121
tqdm version: 4.66.4
tensorflow version: 2.15.0
--------------------------------------------------
Training set length: 935
Validation set length: 55
Test set length: 110
--------------------------------------------------
Device: cuda
--------------------------------------------------
File already exists and is up-to-date: gpt2/355M/checkpoint
File already exists and is up-to-date: gpt2/355M/encoder.json
File already exists and is up-to-date: gpt2/355M/hparams.json
File already exists and is up-to-date: gpt2/355M/model.ckpt.data-00000-of-00001
File already exists and is up-to-date: gpt2/355M/model.ckpt.index
File already exists and is up-to-date: gpt2/355M/model.ckpt.meta
File already exists and is up-to-date: gpt2/355M/vocab.bpe
Loaded model: gpt2-medium (355M)
--------------------------------------------------
Total trainable parameters before: 406,286,336
Total trainable parameters after: 0
Total trainable LoRA parameters: 7,898,384
Initial losses
   Training loss: 3.7684114456176756
   Validation loss: 3.7619335651397705
Ep 1 (Step 000000): Train loss 2.509, Val loss 2.519
...
Ep 2 (Step 000230): Train loss 0.308, Val loss 0.652
...
--------------------------------------------------
Generating responses
100% 110/110 [01:52<00:00,  1.03s/it]
Responses saved as instruction-data-with-response-lora.json
Model saved as gpt2-medium355M-sft-lora.pth
```

For comparison, you can run the original chapter 7 finetuning code via `python exercise_experiments.py --exercise_solution baseline`. 

作为对比，您可以通过运行 `python exercise_experiments.py --exercise_solution baseline` 来执行第7章的原始微调代码。

Note that on an Nvidia L4 GPU, the code above, using LoRA, takes 1.30 min to run. In comparison, the baseline takes 1.80 minutes to run. So, LoRA is approximately 28% faster.

请注意，在 Nvidia L4 GPU 上，使用 LoRA 的上述代码需要运行1.30分钟。相比之下，基准测试需要1.80分钟。因此，LoRA 大约快28%。

We can evaluate the performance using the Ollama Llama 3 method, which is for your convenience, also implemented in the `python exercise_experiments.py` script, which we can run as follows:

我们可以使用 Ollama Llama 3 方法评估性能，为了方便起见，该方法也已在 `python exercise_experiments.py` 脚本中实现，我们可以按如下方式运行：

```bash
python ollama_evaluate.py --file_path instruction-data-with-response-lora.json
```

Output:

```
Ollama running: True
Scoring entries: 100%|████████████████████████| 110/110 [01:13<00:00,  1.50it/s]
Number of scores: 110 of 110
Average score: 50.23
```

The score is around 50, which is in the same ballpark as the original model.

分数大约在50左右，与原始模型在同一水平。