# PEFT 库 LoRA 实战 - OPT-6.7B

在本教程中，我们将介绍如何使用最新的 `peft` 库和 `bitsandbytes` 来以 8-bits 加载大语言模型，并对其进行高效微调。

微调方法将依赖于一种名为“低秩适配器”（LoRA）的方法，与其微调整个模型，您只需要微调这些适配器（Adapter）并在模型中正确加载它们。

### 加载模型

`Facebook opt-6.7b` 模型，半精度（float16）模型权重大约需要13GB左右显存。

下面我们以8-bits 加载它，只需要大约7GB左右显存。

In [1]:
import os

import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import GPT2Tokenizer, AutoConfig, OPTForCausalLM

model_id = "facebook/opt-6.7b"

model = OPTForCausalLM.from_pretrained(model_id, load_in_8bit=True)

tokenizer = GPT2Tokenizer.from_pretrained(model_id)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

  return self.fget.__get__(instance, owner)()


### PEFT 微调前的模型处理

在使用 `peft` 训练 int8 模型之前，需要进行一些预处理：
- 将所有非 `int8` 模块转换为全精度（`fp32`）以保证稳定性
- 为输入嵌入层添加一个 `forward_hook`，以启用输入隐藏状态的梯度计算
- 启用梯度检查点以实现更高效的内存训练

使用 `peft` 库预定义的工具函数 `prepare_model_for_int8_training`，便可自动完成以上模型处理工作。

In [2]:
from peft import prepare_model_for_int8_training

model = prepare_model_for_int8_training(model)



In [3]:
# 获取当前模型占用的 GPU显存（差值为预留给 PyTorch 的显存）
memory_footprint_bytes = model.get_memory_footprint()
memory_footprint_mib = memory_footprint_bytes / (1024 ** 3)  # 转换为 GB

print(f"{memory_footprint_mib:.2f}GB")

6.80GB


In [4]:
model

OPTForCausalLM(
  (model): OPTModel(
    (decoder): OPTDecoder(
      (embed_tokens): Embedding(50272, 4096, padding_idx=1)
      (embed_positions): OPTLearnedPositionalEmbedding(2050, 4096)
      (final_layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (layers): ModuleList(
        (0-31): 32 x OPTDecoderLayer(
          (self_attn): OPTAttention(
            (k_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=True)
            (v_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=True)
            (q_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=True)
            (out_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=True)
          )
          (activation_fn): ReLU()
          (self_attn_layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
          (fc1): Linear8bitLt(in_features=4096, out_features=16384, bias=True)
          (fc2): Linear8bitLt(in_features=16384, out_features=4096, bias=True

### LoRA Adapter 配置

在 `peft` 中使用`LoRA`非常简捷，借助 `PeftModel`抽象，我们可以快速使用低秩适配器（LoRA）到任意模型。

通过使用 `peft` 中的 `get_peft_model` 工具函数来实现。

#### 关于 LoRA 超参数的说明：
```
MatMul(B,A) * Scaling
Scaling = LoRA_Alpha / Rank
```

In [8]:
# 从peft库导入LoraConfig和get_peft_model函数
from peft import LoraConfig, get_peft_model

# 创建一个LoraConfig对象，用于设置LoRA（Low-Rank Adaptation）的配置参数
config = LoraConfig(
    r=8,  # LoRA的秩，影响LoRA矩阵的大小
    lora_alpha=32,  # LoRA适应的比例因子
    # 指定将LoRA应用到的模型模块，通常是attention和全连接层的投影
    target_modules = ["q_proj", "k_proj", "v_proj", "out_proj", "fc_in", "fc_out"],
    lora_dropout=0.05,  # 在LoRA模块中使用的dropout率
    bias="none",  # 设置bias的使用方式，这里没有使用bias
    task_type="CAUSAL_LM"  # 任务类型，这里设置为因果(自回归）语言模型
)

# 使用get_peft_model函数和给定的配置来获取一个PEFT模型
model = get_peft_model(model, config)

# 打印出模型中可训练的参数
model.print_trainable_parameters()

trainable params: 8,388,608 || all params: 6,666,862,592 || trainable%: 0.12582542214183376


参考：打印待训练模型参数的实现逻辑
```python
def print_trainable_parameters(self,):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )
```

### 数据处理

In [9]:
from datasets import load_dataset

dataset = load_dataset("Abirate/english_quotes")

In [10]:
dataset["train"]

Dataset({
    features: ['quote', 'author', 'tags'],
    num_rows: 2508
})

In [11]:
from datasets import ClassLabel, Sequence
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
        elif isinstance(typ, Sequence) and isinstance(typ.feature, ClassLabel):
            df[column] = df[column].transform(lambda x: [typ.feature.names[i] for i in x])
    display(HTML(df.to_html()))

In [12]:
show_random_elements(dataset["train"])

Unnamed: 0,quote,author,tags
0,"“As usual, there is a great woman behind every idiot.”",John Lennon,"[beatles, men, women]"
1,"“Promise YourselfTo be so strong that nothingcan disturb your peace of mind.To talk health, happiness, and prosperityto every person you meet.To make all your friends feelthat there is something in themTo look at the sunny side of everythingand make your optimism come true.To think only the best, to work only for the best,and to expect only the best.To be just as enthusiastic about the success of othersas you are about your own.To forget the mistakes of the pastand press on to the greater achievements of the future.To wear a cheerful countenance at all timesand give every living creature you meet a smile.To give so much time to the improvement of yourselfthat you have no time to criticize others.To be too large for worry, too noble for anger, too strong for fear,and too happy to permit the presence of trouble.To think well of yourself and to proclaim this fact to the world,not in loud words but great deeds.To live in faith that the whole world is on your sideso long as you are true to the best that is in you.”","Christian D. Larson,","[happiness, inspirational, love, optimism, strength]"
2,"“Forgive, O Lord, my little jokes on TheeAnd I'll forgive Thy great big one on me.”",Robert Frost,"[humor, poetry, religion]"
3,“Never put off till tomorrow what may be done day after tomorrow just as well.”,Mark Twain,"[humor, procrastination]"
4,"“Dreams are always crushing when they don't come true. But it's the simple dreams that are often the most painful because they seem so personal, so reasonable, so attainable. You're always close enough to touch, but never quite close enough to hold and it's enough to break your heart.”","Nicholas Sparks,",[life-dreams]
5,"“No matter how careful you are, there's going to be the sense you missed something, the collapsed feeling under your skin that you didn't experience it all. There's that fallen heart feeling that you rushed right through the moments where you should've been paying attention.Well, get used to that feeling. That's how your whole life will feel some day.This is all practice.”","Chuck Palahniuk,",[life]
6,"“Life is not a matter of holding good cards, but of playing a poor hand well.”",Robert Louis Stevenson,[life]
7,“Never miss a good chance to shut up.”,Will Rogers,"[advice, funny, reticence, silence]"
8,“Ask not what you can do for your country. Ask whatâ€™s for lunch.”,Orson Welles,"[food, humor]"
9,“The measure of intelligence is the ability to change.”,Albert Einstein,"[adaptation, flexibility, intelligence, open-mindedness, wisdom]"


In [13]:

tokenized_dataset = dataset.map(lambda samples: tokenizer(samples["quote"]), batched=True)


Map:   0%|          | 0/2508 [00:00<?, ? examples/s]

In [14]:
from transformers import DataCollatorForLanguageModeling

# 数据收集器，用于处理语言模型的数据，这里设置为不使用掩码语言模型(MLM)
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)

### 微调模型

In [15]:
from transformers import TrainingArguments, Trainer

model_dir = "models"

training_args = TrainingArguments(
        output_dir=f"{model_dir}/{model_id}-lora",  # 指定模型输出和保存的目录
        per_device_train_batch_size=4,  # 每个设备上的训练批量大小
        learning_rate=2e-4,  # 学习率
        fp16=True,  # 启用混合精度训练，可以提高训练速度，同时减少内存使用
        logging_steps=20,  # 指定日志记录的步长，用于跟踪训练进度
        max_steps=100, # 最大训练步长
        # num_train_epochs=1  # 训练的总轮数
    )

In [16]:
# 添加 LoRA 模块后的模型
model

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): PeftModelForCausalLM(
      (base_model): LoraModel(
        (model): PeftModelForCausalLM(
          (base_model): LoraModel(
            (model): PeftModelForCausalLM(
              (base_model): LoraModel(
                (model): OPTForCausalLM(
                  (model): OPTModel(
                    (decoder): OPTDecoder(
                      (embed_tokens): Embedding(50272, 4096, padding_idx=1)
                      (embed_positions): OPTLearnedPositionalEmbedding(2050, 4096)
                      (final_layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
                      (layers): ModuleList(
                        (0-31): 32 x OPTDecoderLayer(
                          (self_attn): OPTAttention(
                            (k_proj): lora.Linear8bitLt(
                              (base_layer): Linear8bitLt(in_features=4096, out_features=4096, bias=True)
                              (lora_dro

In [17]:
trainer = Trainer(
    model=model,  # 指定训练时使用的模型
    train_dataset=tokenized_dataset["train"],  # 指定训练数据集
    args=training_args,
    data_collator=data_collator,
)

Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [None]:
model.use_cache = False

In [18]:
trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Step,Training Loss
20,1.9623
40,2.0051
60,1.8505
80,1.8696
100,2.0219


TrainOutput(global_step=100, training_loss=1.9418774795532228, metrics={'train_runtime': 251.8563, 'train_samples_per_second': 1.588, 'train_steps_per_second': 0.397, 'total_flos': 1364172665978880.0, 'train_loss': 1.9418774795532228, 'epoch': 0.16})

### 保存 LoRA 模型

In [20]:
model_path = f"{model_dir}/{model_id}-lora-int8"

#trainer.save_model(model_path)
model.save_pretrained(model_path)

### 使用 LoRA 模型

In [22]:
lora_model = trainer.model

In [24]:
text = "Two things are infinite: "
inputs = tokenizer(text, return_tensors="pt").to(0)

out = lora_model.generate(**inputs, max_new_tokens=48)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Two things are infinite:  The universe and human stupidity.  And I'm not sure about the universe.  -Albert Einstein
I'm not sure about the universe either.


通过在 [english_quotes 数据集](https://huggingface.co/datasets/Abirate/english_quotes)上的少量微调（100 steps，不到1个epoch），LoRA 适配器恢复了阿尔伯特·爱因斯坦的名言警句。

In [25]:
out[0]

tensor([    2,  9058,   383,    32, 32952,    35,  1437,    20,  9468,     8,
         1050, 38821,     4,  1437,   178,    38,   437,    45,   686,    59,
            5,  9468,     4,  1437,   111, 36977, 27648, 50118,   100,   437,
           45,   686,    59,     5,  9468,  1169,     4,     2],
       device='cuda:0')