使用 peft 与 trl 示例，对使用低秩适应（LoRA）微调 8 位模型的示例。

原始文本：huggingface.co/docs/trl/lora_tuning_peft

这些示例中的笔记本和脚本展示了如何使用低秩适应（LoRA）以内存高效的方式微调模型。peft 库支持大多数 PEFT 方法，但请注意，某些 PEFT 方法，如提示调整，不受支持。有关 LoRA 的更多信息，请参阅原始论文。

以下是trl 存储库中启用了peft的笔记本和脚本的概述：

文件	任务	描述
`stack_llama/rl_training.py`	RLHF	使用学习奖励模型和`peft`对 7b 参数 LLaMA 模型进行分布式微调。
`stack_llama/reward_modeling.py`	奖励建模	使用`peft`对 7b 参数 LLaMA 奖励模型进行分布式训练。
`stack_llama/supervised_finetuning.py`	SFT	使用`peft`对 7b 参数 LLaMA 模型进行分布式指导/监督微调。

安装

注意：peft 正在积极开发中，因此我们直接从他们的 Github 页面安装。Peft 还依赖于 transformers 的最新版本。

pip install trl[peft]
pip install bitsandbytes loralib
pip install git+https://github.com/huggingface/transformers.git@main
#optional: wandb
pip install wandb

注意：如果您不想使用wandb记录日志，请在脚本/笔记本中删除log_with="wandb"。您也可以用您喜欢的实验追踪器替换它，该追踪器由accelerate支持。

如何使用？

只需在脚本中声明一个PeftConfig对象，并通过.from_pretrained传递它以加载 TRL+PEFT 模型。

from peft import LoraConfig
from trl import AutoModelForCausalLMWithValueHead

model_id = "edbeeching/gpt-neo-125M-imdb"
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = AutoModelForCausalLMWithValueHead.from_pretrained(
    model_id, 
    peft_config=lora_config,
)

如果您想以 8 位精度加载模型：

pretrained_model = AutoModelForCausalLMWithValueHead.from_pretrained(
    config.model_name, 
    load_in_8bit=True,
    peft_config=lora_config,
)

… 或者使用 4 位精度：

pretrained_model = AutoModelForCausalLMWithValueHead.from_pretrained(
    config.model_name, 
    peft_config=lora_config,
    load_in_4bit=True,
)

启动脚本

trl库由accelerate提供支持。因此，最好使用以下命令配置和启动训练：

accelerate config # will prompt you to define the training configuration
accelerate launch scripts/gpt2-sentiment_peft.py # launches training

使用 trl + peft 和数据并行性

只要您能够将训练过程适配到单个设备中，您可以扩展到尽可能多的 GPU。您需要应用的唯一调整是按以下方式加载模型：

from peft import LoraConfig
...

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

pretrained_model = AutoModelForCausalLMWithValueHead.from_pretrained(
    config.model_name, 
    peft_config=lora_config,
)

如果您想以 8 位精度加载模型：

pretrained_model = AutoModelForCausalLMWithValueHead.from_pretrained(
    config.model_name, 
    peft_config=lora_config,
    load_in_8bit=True,
)

… 或者使用 4 位精度：

pretrained_model = AutoModelForCausalLMWithValueHead.from_pretrained(
    config.model_name, 
    peft_config=lora_config,
    load_in_4bit=True,
)

最后，确保奖励也在正确的设备上计算，为此，您可以使用ppo_trainer.model.current_device。

大型模型（>60B 模型）的天真管道并行性（NPP）

trl库还支持大型模型（>60B 模型）的天真管道并行性（NPP）。这是一种简单的方式，可以在多个 GPU 上并行化模型。这种范式被称为“天真管道并行性”（NPP），是一种简单的方式，可以在多个 GPU 上并行化模型。我们将模型和适配器加载到多个 GPU 上，激活和梯度将在 GPU 之间天真地通信。这支持int8模型以及其他dtype模型。

如何使用 NPP？

只需在from_pretrained上使用自定义的device_map参数加载您的模型，以将模型分割到多个设备上。查看这篇不错的教程，了解如何为您的模型正确创建device_map。

还要确保将lm_head模块放在第一个 GPU 设备上，否则可能会出错。在撰写本文时，您需要安装accelerate的main分支：pip install git+https://github.com/huggingface/accelerate.git@main和peft：pip install git+https://github.com/huggingface/peft.git@main。

启动脚本

尽管trl库由accelerate提供支持，但您应该在单个进程中运行训练脚本。请注意，我们目前不支持数据并行处理与 NPP 一起使用。

python PATH_TO_SCRIPT

微调 Llama-2 模型

您可以使用SFTTrainer和官方脚本轻松地对 Llama2 模型进行微调！例如，要在 Guanaco 数据集上对 llama2-7b 进行微调，请运行（在单个 NVIDIA T4-16GB 上测试过）：

python examples/scripts/sft.py --model_name meta-llama/Llama-2-7b-hf --dataset_name timdettmers/openassistant-guanaco --load_in_4bit --use_peft --batch_size 4 --gradient_accumulation_steps 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trl07_23.md

trl07_23.md

使用 peft 与 trl 示例，对使用低秩适应（LoRA）微调 8 位模型的示例。

安装

如何使用？

启动脚本

使用 trl + peft 和数据并行性

大型模型（>60B 模型）的天真管道并行性（NPP）

如何使用 NPP？

启动脚本

微调 Llama-2 模型

Files

trl07_23.md

Latest commit

History

trl07_23.md

File metadata and controls

使用 peft 与 trl 示例，对使用低秩适应（LoRA）微调 8 位模型的示例。

安装

如何使用？

启动脚本

使用 trl + peft 和数据并行性

大型模型（>60B 模型）的天真管道并行性（NPP）

如何使用 NPP？

启动脚本

微调 Llama-2 模型