# 🎯 Qwen2.5系列模型**LoRA/QLoRA** **微调案例**

> 说明：先用一个兼容的小模型（例如 `Qwen/Qwen2.5-1.5B-Instruct`）跑通流程，后续将 `MODEL_ID` 替换为你找到的 DeepSeek 模型仓库名即可，代码无需改动。
> 

**目标**：在单卡 A10（24GB）上，以 *小参数量* 的模型为例（本案例采用ModelScope来替换HuggingFace），用 **LoRA/QLoRA** 跑通一次完整的 *指令微调*（Instruction Tuning）流程。  
**硬件建议**：A10 24GB；  
**软件建议**：Python 3.10+、CUDA 12.x、PyTorch 2.3+。

---

## ✅ 本教程包括
1. LoRA/QLoRA 简介
2. 硬件检测与配置环境
3. 模型与数据集下载  
4. 数据预处理 
5. LoRA微调
6. 模型测试评估

> 注：全流程都在 **Jupyter Lab** 中逐格运行即可。

## 一、LoRA / QLoRA 简介

### LoRA（Low-Rank Adaptation）
LoRA 是一种 **轻量化模型微调方法**，它的核心思想是：  
- 在保持原始预训练模型参数 **冻结不变** 的前提下，只在部分权重矩阵（通常是 Transformer 的注意力层）上引入 **低秩矩阵分解**。  
- 用一个低秩的参数矩阵（A、B）来近似原始大矩阵的更新，从而 **大幅减少训练参数量**。  
- 优点：  
  - **参数高效**：只需训练极少量的新增参数（可低至 0.1%）。  
  - **存储友好**：多个下游任务可以共享同一个基础模型，仅保存不同任务的 LoRA 权重。  
  - **部署灵活**：推理时直接将 LoRA 权重合并到原模型，无需额外计算开销。

> 简单理解：LoRA 就像是在大模型的“固定主干”上，插入一些 **小而聪明的适配器**，让它快速学会新任务。

### QLoRA（Quantized LoRA）本文不涉及
QLoRA 是对 LoRA 的进一步优化，它结合了 **量化技术**，使得大模型的微调在 **单卡消费级显卡** 上也可行。  
- 核心思路：  
  1. 先将大模型的参数进行 **4-bit 量化（NF4 方案）**，降低显存占用。  
  2. 在量化后的权重上，应用 **LoRA 适配器** 进行微调。  
  3. 训练时仅更新 LoRA 层，而量化权重保持冻结。  

- 优点：  
  - **极致显存节省**：可在一张 24GB 显存的 GPU 上微调百亿参数模型。  
  - **保持性能**：量化后的 QLoRA 与全精度微调效果接近甚至相当。  
  - **实用性强**：特别适合个人开发者和中小团队。

---

### 对比总结
| 方法   | 主要手段                   | 显存消耗 | 训练参数量 | 适用场景 |
|--------|---------------------------|----------|------------|----------|
| LoRA   | 低秩矩阵分解               | 较低     | 千万级别   | 中等规模模型的高效微调 |
| QLoRA  | 量化（4-bit） + LoRA 适配 | 极低     | 千万级别   | 超大模型在消费级 GPU 上的微调 |

---

## 二、硬件检测与配置环境

### 硬件检测（Linux）

In [1]:
import platform
import sys

print("操作系统:", platform.system())          # Windows / Linux / Darwin (macOS)
print("系统版本:", platform.version())         # 内核或版本号
print("发行版本:", platform.release())         # 例如 10 / 11 / 22.6.0
print("详细信息:", platform.platform())        # 汇总
print("Python版本:", sys.version)             # Python 解释器版本
print("处理器:", platform.processor())         # CPU 信息
print("机器类型:", platform.machine())         # x86_64 / arm64

# 安装检测显卡的依赖
%pip install gputil

import GPUtil

gpus = GPUtil.getGPUs()
for gpu in gpus:
    print(f"显卡型号: {gpu.name}")
    print(f"显存总量: {gpu.memoryTotal} MB")
    print("-" * 30)

操作系统: Linux
系统版本: #1 SMP Mon Mar 10 11:07:41 CST 2025
发行版本: 5.10.134-18.0.2.lifsea8.x86_64
详细信息: Linux-5.10.134-18.0.2.lifsea8.x86_64-x86_64-with-glibc2.35
Python版本: 3.11.11 (main, Mar 11 2025, 18:25:39) [GCC 11.4.0]
处理器: x86_64
机器类型: x86_64
[0mLooking in indexes: https://mirrors.aliyun.com/pypi/simple/
    torch (>=1.9.*)
           ~~~~~~^[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.
显卡型号: NVIDIA A10
显存总量: 23028.0 MB
------------------------------


### 配置环境

In [2]:
%pip install torch==2.3.1+cu121
%pip install transformers==4.55.4
%pip install modelscope==1.29.0
%pip install peft==0.17.1
%pip install datasets==3.2.0
%pip install accelerate==1.10.0

[0mLooking in indexes: https://mirrors.aliyun.com/pypi/simple/
[31mERROR: Could not find a version that satisfies the requirement torch==2.3.1+cu121 (from versions: 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.6.0, 2.7.0, 2.7.1, 2.8.0)[0m[31m
[0m[31mERROR: No matching distribution found for torch==2.3.1+cu121[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
[0mLooking in indexes: https://mirrors.aliyun.com/pypi/simple/
    torch (>=1.9.*)
           ~~~~~~^[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.
[0mLooking in indexes: https://mirrors.aliyun.com/pypi/simple/
    torch (>=1.9.*)
           ~~~~~~^[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.
[0mLooking in indexes: https://mirrors.aliyun.com/pypi/simple/
    torch (>=1.9.*)
           ~~~~~~^[0m[33m
[0mNote: you may need to restart the kernel to use updat

### 检查环境版本

In [3]:
# 📌 打印脚本相关库的版本信息
import torch, transformers, modelscope, peft

print("torch:", torch.__version__)

# transformers 是 peft 和 modelscope 依赖的核心库
try:
    import transformers
    print("transformers:", transformers.__version__)
except ImportError:
    print("transformers: 未安装")

try:
    import modelscope
    print("modelscope:", modelscope.__version__)
except ImportError:
    print("modelscope: 未安装")

try:
    import peft
    print("peft:", peft.__version__)
except ImportError:
    print("peft: 未安装")

try:
    import datasets
    print("datasets:", datasets.__version__)
except ImportError:
    print("datasets: 未安装")

try:
    import accelerate
    print("accelerate:", accelerate.__version__)
except ImportError:
    print("accelerate: 未安装")

2025-08-27 01:59:07.036101: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-08-27 01:59:07.623431: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


torch: 2.3.1+cu121
transformers: 4.55.4
modelscope: 1.29.0
peft: 0.17.1
datasets: 3.2.0
accelerate: 1.10.0


## 三、模型与数据集下载

### Qwen2.5-1.5B模型下载

In [4]:
from modelscope import AutoTokenizer, AutoModelForCausalLM

model_id = "qwen/Qwen2.5-1.5B-Instruct"  # 可替换
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True
)

Downloading Model from https://www.modelscope.cn to directory: /mnt/workspace/.cache/modelscope/hub/models/qwen/Qwen2.5-1.5B-Instruct


2025-08-27 01:59:11,412 - modelscope - INFO - Target directory already exists, skipping creation.


Downloading Model from https://www.modelscope.cn to directory: /mnt/workspace/.cache/modelscope/hub/models/qwen/Qwen2.5-1.5B-Instruct


2025-08-27 01:59:12,225 - modelscope - INFO - Target directory already exists, skipping creation.


### alpaca数据集下载

In [5]:
from modelscope.msdatasets import MsDataset

ds =  MsDataset.load('AI-ModelScope/alpaca-gpt4-data-zh', subset_name='default', split='train')



## 四、数据预处理

### 计算max_length
根据数据集的token长度来计算最合适的长度，有利于数据预处理的速度

In [8]:
%pip install tqdm

from tqdm import tqdm
import numpy as np

def recommend_max_length(dataset, tokenizer, sample_size=5000, quantile=95):
    """
    自动统计 token 长度分布，并推荐 max_length
    Args:
        dataset: MsDataset 对象
        tokenizer: HF AutoTokenizer
        sample_size: 抽样数量（避免全量太慢）
        quantile: 分位数（默认95）
    """
    total = min(sample_size, len(dataset))
    lengths = []

    print(f"开始统计，抽样 {total} 条数据 ...")

    for i in tqdm(range(total)):
        ex = dataset[i]
        instruction = ex.get("instruction", "")
        input_text = ex.get("input", "") or ""
        output_text = ex.get("output", "")

        if input_text.strip():
            prompt = f"指令: {instruction}\n输入: {input_text}\n回答:"
        else:
            prompt = f"指令: {instruction}\n回答:"

        full_text = prompt + output_text
        tokenized = tokenizer(full_text, truncation=False)
        lengths.append(len(tokenized["input_ids"]))

    max_len = max(lengths)
    avg_len = np.mean(lengths)
    q_len = np.percentile(lengths, quantile)

    print("\n=== Token 长度统计结果 ===")
    print(f"最大长度: {max_len}")
    print(f"平均长度: {avg_len:.2f}")
    print(f"{quantile}% 分位数长度: {q_len:.0f}")
    print("=========================")
    print(f"👉 推荐 max_length = {int(min(q_len, tokenizer.model_max_length))}")
    print(f"(模型支持的最大长度 = {tokenizer.model_max_length})")

    return int(min(q_len, tokenizer.model_max_length))


# 计算数据集应该定义的最大长度
max_length = recommend_max_length(ds, tokenizer)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[0mLooking in indexes: https://mirrors.aliyun.com/pypi/simple/
    torch (>=1.9.*)
           ~~~~~~^[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.
开始统计，抽样 5000 条数据 ...


100%|██████████| 5000/5000 [00:01<00:00, 2857.29it/s]


=== Token 长度统计结果 ===
最大长度: 532
平均长度: 143.24
95% 分位数长度: 293
👉 推荐 max_length = 293
(模型支持的最大长度 = 131072)





### 预处理（构造Dataset）
需要针对自己的数据集范式来编写，这里只针对alpaca

In [12]:
# 自定义数据处理函数(需要针对自己的数据集范式来编写，这里只针对alpaca)
def preprocess(example):
    # 丢掉 instruction 或 output 缺失的样本
    if not example['instruction'] or not example['output']:
        return None

    # alpaca 数据有指令、输入、输出三个标签
    instruction = example['instruction']
    input_text = example.get('input') or ""  # input 可能为 None
    output_text = example['output']

    if input_text.strip():
        prompt = f"指令: {instruction}\n输入: {input_text}\n回答:"
    else:
        prompt = f"指令: {instruction}\n回答:"

    full_text = prompt + output_text

    enc = tokenizer(
        full_text,  # 需要进行token化的文本
        truncation=True,  # 文本过大的时候是否截断
        max_length=max_length,  # 根据模型和数据集决定最合适的
        padding="max_length",  # 🔹 保证长度一致，DataLoader 堆叠安全
        return_tensors="pt"  # 返回的数据类型，pt:pytorch.tensor; tf:tensorflow; np:numpy
    )
    # 单个样本是字典格式
    return {
        "input_ids": enc["input_ids"][0],
        "labels": enc["input_ids"][0]
    }

train_dataset = ds.map(preprocess)
train_dataset = train_dataset.filter(lambda x: x is not None)

print('数据处理完成')

数据处理完成


### 构造DataLoader
将预处理好的Dataset进行Padding后方便拼接成多个batch的向量，为训练作准备

In [13]:
from torch.utils.data import DataLoader, Subset

# 🛠️ 自定义批处理函数 (collate_fn)
def collate_fn(batch):
    """
    作用：
    - DataLoader 会把一个 batch 的样本（list[dict]）传进来
    - 这里需要手动拼接成 tensor，并且对齐长度（pad）
    """

    # 取出每个样本的 input_ids 和 labels，转成 tensor
    input_ids = [torch.tensor(item["input_ids"]) for item in batch]
    labels = [torch.tensor(item["labels"]) for item in batch]

    # 🔹 对 input_ids 做 padding
    #   - batch_first=True: 结果形状 (batch_size, seq_len)
    #   - padding_value=tokenizer.pad_token_id: 使用 tokenizer 的 pad_token_id 填充
    input_ids = torch.nn.utils.rnn.pad_sequence(
        input_ids, batch_first=True, padding_value=tokenizer.pad_token_id
    )

    # 🔹 对 labels 做 padding
    #   - 注意这里 padding_value = -100
    #   - 在 PyTorch 的 CrossEntropyLoss 里，-100 会被忽略，不参与 loss 计算
    labels = torch.nn.utils.rnn.pad_sequence(
        labels, batch_first=True, padding_value=-100
    )

    # 返回字典，方便直接喂给模型
    return {
        "input_ids": input_ids,
        "labels": labels
    }


# 在数据预处理后，分割训练集和测试集
from sklearn.model_selection import train_test_split
import numpy as np

# 获取数据集的总长度
total_samples = len(train_dataset)
print(f"总样本数: {total_samples}")

# 设置随机种子确保可重复性
np.random.seed(42)

# 分割训练集和测试集 (80% 训练, 20% 测试)
train_indices, test_indices = train_test_split(
    range(total_samples), 
    test_size=0.2, 
    random_state=42
)

# 创建训练集和测试集
train_subset = Subset(train_dataset, train_indices)
test_subset = Subset(train_dataset, test_indices)

print(f"训练集样本数: {len(train_subset)}")
print(f"测试集样本数: {len(test_subset)}")

# 为了快速验证，可以只取部分数据
small_train_dataset = Subset(train_subset, range(min(2000, len(train_subset))))
small_test_dataset = Subset(test_subset, range(min(500, len(test_subset))))

print(f"小训练集样本数: {len(small_train_dataset)}")
print(f"小测试集样本数: {len(small_test_dataset)}")

# 构建训练和测试的 DataLoader
train_loader = DataLoader(
    small_train_dataset,
    batch_size=4,
    shuffle=True,
    collate_fn=collate_fn
)

test_loader = DataLoader(
    small_test_dataset,
    batch_size=4,
    shuffle=False,  # 测试时不需要打乱
    collate_fn=collate_fn
)

总样本数: 48818
训练集样本数: 39054
测试集样本数: 9764
小训练集样本数: 2000
小测试集样本数: 500


## 五、LoRA微调

### LoRA参数说明

```python
lora_config = LoraConfig(
    r=4,                          
    lora_alpha=16,               
    target_modules=["q_proj", "v_proj"],  
    lora_dropout=0.05,             
    bias="none",               
    task_type="CAUSAL_LM"         
)
```

r=4
- 表示低秩矩阵的秩值（rank），值越大 → 适配能力更强 → 参数量也随之增加。  
- 这里选择 `4`，意味着 **轻量级训练**，适合小规模任务或快速实验。  

lora_alpha=16
- 缩放因子，用于调整 LoRA 的输出幅度。  
- 一般经验是 **lora_alpha ≈ 2 × r**，所以这里 `16` 配合 `r=4` 是合理的。  

target_modules=["q_proj", "v_proj"]
- LoRA 只在注意力机制的 **Query** 和 **Value** 投影层中生效。  
- 这是最常见的设置，既保证效果，又控制参数量。
- ["q_proj", "v_proj"] → 默认推荐，90% 的场景适用（对话、问答、指令跟随）。
- ["q_proj", "k_proj", "v_proj", "o_proj"] → 全量 Attention，参数量大一些，适合任务更复杂（如多模态对齐）。
- ["gate_proj", "up_proj", "down_proj"] → 改 MLP 层，适合需要调整“知识表达”的场景（例如数学推理）。
- ["q_proj", "v_proj", "down_proj"] → 混合方式，有时能进一步提升性能。

lora_dropout=0.05
- 在 LoRA 层中添加 **5% 的 dropout**，提升泛化能力。  
- 数据量很大时可以调低到 `0`；数据少时可以适当调高（如 `0.1`）。  

bias="none"
- 不训练 bias 参数，保证模型轻量化。  
- 大多数场景下用 `"none"` 即可。  

task_type="CAUSAL_LM"
- 表示任务是 **自回归语言建模**（比如 Qwen、GPT 类模型）。  
- 必须和任务类型一致，否则 forward 过程会报错。  

### 配置 LoRA 训练参数

In [14]:
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=4,            
    lora_alpha=16,                
    target_modules=["q_proj", "v_proj"], 
    lora_dropout=0.05,             
    bias="none",                    
    task_type="CAUSAL_LM"           
)

# 🚀 将基础模型包装为 PEFT 模型
model = get_peft_model(model, lora_config)  # 这里默认会冻结非LoRA的参数

# 打印当前可训练参数量（仅 LoRA 部分），其余参数被冻结
model.print_trainable_parameters()

# 训练超参数
num_train_epochs = 2  # 对完整的数据集训练多少个批次

I have left this message as the final dev message to help you transition.

Important Notice:
- AutoAWQ is officially deprecated and will no longer be maintained.
- The last tested configuration used Torch 2.6.0 and Transformers 4.51.3.
- If future versions of Transformers break AutoAWQ compatibility, please report the issue to the Transformers project.

Alternative:
- AutoAWQ has been adopted by the vLLM Project: https://github.com/vllm-project/llm-compressor

For further inquiries, feel free to reach out:
- X: https://x.com/casper_hansen_
- LinkedIn: https://www.linkedin.com/in/casper-hansen-804005170/



trainable params: 544,768 || all params: 1,544,259,072 || trainable%: 0.0353


### 确认参数是否冻结
这里只需要训练LoRA新增的参数层
输出应该只包含 lora_A、lora_B 之类的增量参数名

In [15]:
for name, param in model.named_parameters():
    if param.requires_grad:  # 这里只打印有梯度信息的
        print(name)

base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight
base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight
base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight
base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight
base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight
base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight
base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight
base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight
base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight
base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight
base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight
base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight
base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight
base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight
base_m

### 开始微调

In [16]:
from torch.optim import AdamW
import os

# 创建保存目录
model_save_path = "./qwen2.5-finetuned-lora"
os.makedirs(model_save_path, exist_ok=True)

# 优化器设置
optimizer = AdamW(model.parameters(), lr=2e-4)

# 训练循环
model.train()
total_steps = len(train_loader) * num_train_epochs
current_step = 0

for epoch in range(num_train_epochs):
    epoch_loss = 0.0
    print(f"\n开始第 {epoch + 1}/{num_train_epochs} 轮训练...")
    
    for step, batch in enumerate(train_loader):
        input_ids = batch["input_ids"].to(model.device)
        labels = batch["labels"].to(model.device)

        optimizer.zero_grad()
        outputs = model(input_ids=input_ids, labels=labels)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        current_step += 1

        # 每 100 个 step 打印一次进度
        if step % 100 == 0:
            avg_loss = epoch_loss / (step + 1)
            print(f"Epoch {epoch + 1}/{num_train_epochs} | Step {step}/{len(train_loader)} | "
                  f"Loss {loss.item():.4f} | Avg Loss {avg_loss:.4f} | "
                  f"Progress {current_step}/{total_steps}")
    
    # 每轮结束后打印平均损失
    avg_epoch_loss = epoch_loss / len(train_loader)
    print(f"第 {epoch + 1} 轮训练完成，平均损失: {avg_epoch_loss:.4f}")

print("\n训练完成！")

# 保存微调后的模型
print(f"正在保存模型到: {model_save_path}")
model.save_pretrained(model_save_path)
tokenizer.save_pretrained(model_save_path)

# 保存训练配置信息
import json
config_info = {
    "model_id": model_id,
    "lora_config": {
        "r": lora_config.r,
        "lora_alpha": lora_config.lora_alpha,
        "target_modules": list(lora_config.target_modules),  # 将set转换为list
        "lora_dropout": lora_config.lora_dropout,
        "bias": lora_config.bias,
        "task_type": lora_config.task_type
    },
    "training_config": {
        "num_epochs": num_train_epochs,
        "learning_rate": 2e-4,
        "batch_size": 4,
        "total_steps": total_steps
    }
}

with open(os.path.join(model_save_path, "training_config.json"), "w", encoding="utf-8") as f:
    json.dump(config_info, f, ensure_ascii=False, indent=2)

print(f"模型和配置已保存到: {model_save_path}")
print(f"可训练参数数量: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")


开始第 1/2 轮训练...
Epoch 1/2 | Step 0/500 | Loss 4.3384 | Avg Loss 4.3384 | Progress 1/1000
Epoch 1/2 | Step 100/500 | Loss 1.0286 | Avg Loss 1.2016 | Progress 101/1000
Epoch 1/2 | Step 200/500 | Loss 0.7692 | Avg Loss 1.0262 | Progress 201/1000
Epoch 1/2 | Step 300/500 | Loss 0.3082 | Avg Loss 0.9605 | Progress 301/1000
Epoch 1/2 | Step 400/500 | Loss 0.4948 | Avg Loss 0.9209 | Progress 401/1000
第 1 轮训练完成，平均损失: 0.8999

开始第 2/2 轮训练...
Epoch 2/2 | Step 0/500 | Loss 0.6446 | Avg Loss 0.6446 | Progress 501/1000
Epoch 2/2 | Step 100/500 | Loss 0.4342 | Avg Loss 0.7611 | Progress 601/1000
Epoch 2/2 | Step 200/500 | Loss 0.9296 | Avg Loss 0.7682 | Progress 701/1000
Epoch 2/2 | Step 300/500 | Loss 1.0355 | Avg Loss 0.7783 | Progress 801/1000
Epoch 2/2 | Step 400/500 | Loss 0.5083 | Avg Loss 0.7844 | Progress 901/1000
第 2 轮训练完成，平均损失: 0.7941

训练完成！
正在保存模型到: ./qwen2.5-finetuned-lora
模型和配置已保存到: ./qwen2.5-finetuned-lora
可训练参数数量: 544,768


## 六、模型测试评估

### 模型加载
加载微调后的模型

In [17]:
from peft import PeftModel

# 重新加载基础模型
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True
)

# 加载 LoRA 权重
finetuned_model = PeftModel.from_pretrained(base_model, model_save_path)
finetuned_model.eval()  # 设置为评估模式

print("微调后的模型加载完成")

Downloading Model from https://www.modelscope.cn to directory: /mnt/workspace/.cache/modelscope/hub/models/qwen/Qwen2.5-1.5B-Instruct


2025-08-27 02:16:48,187 - modelscope - INFO - Target directory already exists, skipping creation.


微调后的模型加载完成


### 测试
用部份未经过训练的数据集进行测试，这里采用的是交叉熵损失（CrossEntropyLoss）  

In [18]:
from sklearn.metrics import accuracy_score
import re

def evaluate_model(model, test_loader, tokenizer):
    """
    评估模型在测试集上的表现
    """
    model.eval()
    total_loss = 0
    all_predictions = []
    all_targets = []
    
    with torch.no_grad():
        for batch in tqdm(test_loader, desc="测试中"):
            input_ids = batch["input_ids"].to(model.device)
            labels = batch["labels"].to(model.device)
            
            # 计算损失
            outputs = model(input_ids=input_ids, labels=labels)
            total_loss += outputs.loss.item()
            
            # 生成预测
            generated_ids = model.generate(
                input_ids=input_ids,
                max_new_tokens=max_length,  # 最大生成100个新token
                do_sample=True,
                temperature=0.7,
                pad_token_id=tokenizer.pad_token_id,
                eos_token_id=tokenizer.eos_token_id
            )
            
            # 解码预测结果和目标
            for i in range(len(generated_ids)):
                # 获取输入部分（用于提取指令）
                input_text = tokenizer.decode(input_ids[i], skip_special_tokens=True)
                
                # 获取生成的部分（去掉输入部分）
                generated_text = tokenizer.decode(generated_ids[i][len(input_ids[i]):], skip_special_tokens=True)
                
                # 获取目标答案
                target_text = tokenizer.decode(labels[i][labels[i] != -100], skip_special_tokens=True)
                
                all_predictions.append(generated_text.strip())
                all_targets.append(target_text.strip())
    
    # 计算平均损失
    avg_loss = total_loss / len(test_loader)
    return avg_loss, all_predictions, all_targets

# 执行测试
print("开始测试微调后的模型...")
test_loss, predictions, targets = evaluate_model(finetuned_model, test_loader, tokenizer)

print(f"测试集平均损失: {test_loss:.4f}")
print(f"测试样本数: {len(predictions)}")

开始测试微调后的模型...


测试中:   0%|          | 0/125 [00:00<?, ?it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
测试中:   1%|          | 1/125 [00:09<19:12,  9.30s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
测试中:   2%|▏         | 2/125 [00:18<18:43,  9.14s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
测试中:   2%|▏         | 3/125 [00:27<18:22,  9.03s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
测试中:   3%|▎         | 4/125 [00:36<18:06,  8.98s/it]A decoder-only architecture is being used, but right-pad

测试集平均损失: 0.8200
测试样本数: 500





### LLM模型其他常见指标

| 指标                   | 说明                  | 适用场景        |
| -------------------- | ------------------- | ----------- |
| **BLEU**             | n-gram 精确匹配         | 翻译、摘要       |
| **ROUGE**            | 关注 recall（覆盖率）      | 摘要生成、问答     |
| **METEOR**           | 考虑同义词匹配             | 文本生成        |
| **BERTScore**        | 基于语义嵌入比较文本相似度       | 生成任务，语义评价   |
| **Exact Match (EM)** | 完全匹配                | 问答、选择题、逻辑输出 |
| **Perplexity (困惑度)** | 模型预测 token 的概率反映流畅度 | 文本生成、语言建模   |


### 计算BERTScore

| 指标                | 含义               | 值域     | 越大越好？  |
| ----------------- | ---------------- | ------ | ------ |
| **Precision (P)** | 生成文本和目标文本语义重叠的比例 | 0 \~ 1 | ✅ 越大越好 |
| **Recall (R)**    | 目标文本中被生成文本覆盖的比例  | 0 \~ 1 | ✅ 越大越好 |
| **F1-score (F1)** | P 和 R 的调和平均      | 0 \~ 1 | ✅ 越大越好 |

In [20]:
%pip install bert-score

from bert_score import score

# 假设 predictions 和 targets 已经是你的生成结果和真实答案列表
P, R, F1 = score(predictions, targets, lang="zh", verbose=True)  # lang 可以根据任务选择 'en' 或 'zh'

# P, R, F1 都是 tensor，形状为 [样本数]
avg_precision = P.mean().item()
avg_recall = R.mean().item()
avg_f1 = F1.mean().item()

print(f"BERTScore 平均 Precision: {avg_precision:.4f}")
print(f"BERTScore 平均 Recall:    {avg_recall:.4f}")
print(f"BERTScore 平均 F1:        {avg_f1:.4f}")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[0mLooking in indexes: https://mirrors.aliyun.com/pypi/simple/
    torch (>=1.9.*)
           ~~~~~~^[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


OSError: We couldn't connect to 'https://huggingface.co' to load the files, and couldn't find them in the cached files.
Check your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.