## 引言
[前文](https://golfxiao.blog.csdn.net/article/details/141440847)训练时都做了一定的编码工作，其实有一些框架可以支持我们零代码微调，[LLama-Factory](https://llamafactory.readthedocs.io/zh-cn/latest/)就是其中一个。这是一个专门针对大语言模型的微调和训练平台，有如下特性：
- 支持常见的模型种类：LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen、Yi、Gemma、Baichuan、ChatGLM、Phi 等等。 
- 支持单GPU和多GPU训练。
- 支持全参微调、Lora微调、QLora微调。
……
还有很多优秀的特性，详细参考：[https://llamafactory.readthedocs.io/zh-cn/latest/](https://llamafactory.readthedocs.io/zh-cn/latest/)

本文会尝试用LLamaFactory进行一次多GPU训练。



## 数据集准备

针对sft， llamafactory支持多种数据格式，我们这里选用alpaca，此格式简单清晰，每条数据只需包含三个字段：
- instruction 列对应的内容为人类指令； 
- input 列对应的内容为人类输入；  
- output 列对应的内容为模型回答。

```python
{
  "instruction": "计算这些物品的总费用。 ",
  "input": "输入：汽车 - $3000，衣服 - $100，书 - $20。",
  "output": "汽车、衣服和书的总费用为 $3000 + $100 + $20 = $3120。"
}
```
为了格式匹配，封装一个函数`to_alpaca`用于转换数据。

In [1]:
import json
import os

def to_alpaca(input_path, output_path):
    with open(input_path, 'r', encoding='utf-8') as infile, open(output_path, 'w', encoding='utf-8') as outfile:  
        dataset = []
        for line in infile:  
            # 解析每一行的 JSON  
            data = json.loads(line)  
            response = {"is_fraud":data["label"], "fraud_speaker":data["fraud_speaker"], "reason":data["reason"]}
            item = {
                'input': data['input'],
                'output': json.dumps(response, ensure_ascii=False),
                'instruction':data['instruction'],
            }  
            dataset.append(item)
        # 将结果写入输出文件  
        outfile.write(json.dumps(dataset, indent=4, ensure_ascii=False))  
        print(f"convert over，{input_path} to {output_path}")


批量将前一节构建好的数据作格式转换。

In [4]:
# 假设输入数据存储在 input.jsonl 文件中  
input_files = [
    '../dataset/train_test/train0902.jsonl',
    '../dataset/train_test/test0902.jsonl',
    '../dataset/train_test/eval0902.jsonl',
]

def filename(path):
    filename_with_ext = os.path.basename(path)
    filename, extention = os.path.splitext(filename_with_ext)
    return filename

for input_path in input_files:
    output_path = f'../dataset/fraud/train_test/{filename(input_path)}_alpaca.jsonl'
    to_alpaca(input_path, output_path)

convert over，../dataset/train_test/train0902.jsonl to ../dataset/fraud/train_test/train0902_alpaca.jsonl
convert over，../dataset/train_test/test0902.jsonl to ../dataset/fraud/train_test/test0902_alpaca.jsonl
convert over，../dataset/train_test/eval0902.jsonl to ../dataset/fraud/train_test/eval0902_alpaca.jsonl


convert over，../dataset/fraud/train_test/train0819.jsonl to ../dataset/fraud/train_test/train0819_alpaca.json
convert over，../dataset/fraud/train_test/test0819.jsonl to ../dataset/fraud/train_test/test0819_alpaca.json
convert over，../dataset/fraud/train_test/eval0819.jsonl to ../dataset/fraud/train_test/eval0819_alpaca.json

转换好数据集后，需要将其配置到LLamaFactory安装目录下的`data/dataset_info.json`文件中，只需要在文件最后添加我们新构造的数据集。

In [None]:
{
  "identity": {
    "file_name": "identity.json"
  },
  ……
  "anti_fraud": {
    "file_name": "train0819_alpaca.jsonl",
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "response": "output"
    }
  }
}

## 参数配置
LLamaFactory的训练参数采用yaml文件保存，在安装目录下的`examples`子目录下有各种微调方法的示例配置，可以直接拷贝一份进行修改。

![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/95c191a523e840fc969c0d014c82047e.png)

查看配置文件

In [9]:
!cat /data2/downloads/LLaMA-Factory/qwen2_lora_sft.yaml 

### model
model_name_or_path: /data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct
resume_from_checkpoint: /data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0826/checkpoint-1200

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj
lora_rank: 16
lora_alpha: 32
lora_dropout: 0.2


### dataset
dataset_dir: /data2/downloads/LLaMA-Factory/data
dataset: anti_fraud
template: qwen
cutoff_len: 1024
max_samples: 200000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: /data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0826
logging_steps: 10
save_steps: 100
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 16
gradient_accumulation_steps: 1
gradient_checkpointing: true
learning_rate: 1.0e-4
num_train_epochs: 10.0
lr_scheduler_type: cosine
warmup_ratio: 0.05
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 8
eval_str

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


## 开始训练

设置环境变量CUDA_VISIBLE_DEVICES声明训练过程中允许使用4张显卡，显卡编号分别为1、2、3、4。

使用	`llamafactory-cli`命令启动训练。

In [2]:
import os 

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2"
os.environ["NCCL_P2P_DISABLE"] = "1"
os.environ["NCCL_IB_DISABLE"] = "1"

In [3]:
! llamafactory-cli train /data2/downloads/LLaMA-Factory/qwen2_lora_sft.yaml 

[INFO|2025-03-17 01:19:52] llamafactory.cli:143 >> Initializing 3 distributed tasks at: 127.0.0.1:29193
[INFO|2025-03-17 01:20:01] llamafactory.hparams.parser:383 >> Process rank: 0, world size: 3, device: cuda:0, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2048] 2025-03-17 01:20:02,027 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2048] 2025-03-17 01:20:02,027 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2048] 2025-03-17 01:20:02,027 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2048] 2025-03-17 01:20:02,027 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2048] 2025-03-17 01:20:02,027 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2048] 2025-03-17 01:20:02,027 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2048] 2025-03-17 01:20:02,027 >> loading file chat_template.jinja
[INFO|2025-03-17 01:20:02] llamafactory.hparams.parser

## 验证数据集上评估

In [4]:
%run evaluate.py
testdata_path = '/data2/anti_fraud/dataset/eval0819.jsonl'
model_path = '/data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct'
device = 'cuda'

分别评估验证数据集上在不同checkpoint上的性能表现。

In [6]:
%%time
## eval_loss=0.0152
checkpoint_path_900 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0830_2/checkpoint-900'
evaluate(model_path, checkpoint_path_900, testdata_path, device, batch=True, debug=True)

progress: 100%|██████████| 2348/2348 [02:39<00:00, 14.76it/s]

tn：1157, fp:8, fn:117, tp:1066
precision: 0.9925512104283054, recall: 0.9010989010989011, accuracy: 0.946763202725724
CPU times: user 3min 4s, sys: 3.78 s, total: 3min 8s
Wall time: 2min 41s





In [7]:
%%time
## eval_loss=0.0137
checkpoint_path_1400 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0830_2/checkpoint-1400'
evaluate(model_path, checkpoint_path_1400, testdata_path, device, batch=True, debug=True)

progress: 100%|██████████| 2348/2348 [02:39<00:00, 14.77it/s]

tn：1145, fp:20, fn:22, tp:1161
precision: 0.983065198983912, recall: 0.981403212172443, accuracy: 0.9821124361158433
CPU times: user 3min 3s, sys: 2.89 s, total: 3min 6s
Wall time: 2min 41s





In [8]:
%%time
## eval_loss=0.020
checkpoint_path_1800 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0830_2/checkpoint-1800'
evaluate(model_path, checkpoint_path_1800, testdata_path, device, batch=True, debug=True)

progress: 100%|██████████| 2348/2348 [02:39<00:00, 14.71it/s]

tn：1162, fp:3, fn:34, tp:1149
precision: 0.9973958333333334, recall: 0.9712595097210481, accuracy: 0.9842419080068143
CPU times: user 3min 6s, sys: 2.92 s, total: 3min 8s
Wall time: 2min 41s





In [9]:
%%time
## eval_loss=0.035
checkpoint_path_2800 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0830_2/checkpoint-2800'
evaluate(model_path, checkpoint_path_2800, testdata_path, device, batch=True, debug=True)

progress: 100%|██████████| 2348/2348 [02:39<00:00, 14.71it/s]

tn：1159, fp:6, fn:10, tp:1173
precision: 0.9949109414758269, recall: 0.9915469146238377, accuracy: 0.9931856899488927
CPU times: user 3min 5s, sys: 3.26 s, total: 3min 8s
Wall time: 2min 41s





从验证数据集上的评估结果来看，模型的精确率和召回率都有了显著的提升，多卡训练效果显著好于单卡，应该是批量增大，训练更稳定带来的好处。

## 测试数据集上评估


In [10]:
%run evaluate.py
testdata_path = '/data2/anti_fraud/dataset/test0819.jsonl'
model_path = '/data2/anti_fraud/models/modelscope/hub/Qwen/Qwen2-1___5B-Instruct'
device = 'cuda:1'

分别评估不同checkpoint在测试数据集上的性能表现。

In [11]:
%%time
## eval_loss=0.0152
checkpoint_path_900 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0830_2/checkpoint-900'
evaluate(model_path, checkpoint_path_900, testdata_path, device, batch=True, debug=True)

progress: 100%|██████████| 2349/2349 [02:44<00:00, 14.24it/s]

tn：1141, fp:26, fn:174, tp:1008
precision: 0.9748549323017408, recall: 0.8527918781725888, accuracy: 0.9148573861217539
CPU times: user 3min 10s, sys: 2.87 s, total: 3min 13s
Wall time: 2min 46s





In [12]:
%%time
## eval_loss=0.0137
checkpoint_path_1400 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0830_2/checkpoint-1400'
evaluate(model_path, checkpoint_path_1400, testdata_path, device, batch=True, debug=True)

progress: 100%|██████████| 2349/2349 [02:45<00:00, 14.18it/s]

tn：1104, fp:63, fn:66, tp:1116
precision: 0.9465648854961832, recall: 0.9441624365482234, accuracy: 0.9450830140485313
CPU times: user 3min 11s, sys: 2.8 s, total: 3min 13s
Wall time: 2min 47s





In [13]:
%%time
## eval_loss=0.020
checkpoint_path_1800 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0830_2/checkpoint-1800'
evaluate(model_path, checkpoint_path_1800, testdata_path, device, batch=True, debug=True)

progress: 100%|██████████| 2349/2349 [02:45<00:00, 14.21it/s]

tn：1134, fp:33, fn:109, tp:1073
precision: 0.9701627486437613, recall: 0.9077834179357022, accuracy: 0.9395487441464453
CPU times: user 3min 11s, sys: 2.93 s, total: 3min 14s
Wall time: 2min 47s





In [14]:
%%time
## eval_loss=0.035
checkpoint_path_2800 = '/data2/anti_fraud/models/Qwen2-1___5B-Instruct_ft_0830_2/checkpoint-2800'
evaluate(model_path, checkpoint_path_2800, testdata_path, device, batch=True, debug=True)

progress: 100%|██████████| 2349/2349 [02:45<00:00, 14.22it/s]

tn：1118, fp:49, fn:72, tp:1110
precision: 0.9577221742881795, recall: 0.9390862944162437, accuracy: 0.9484887186036611
CPU times: user 3min 10s, sys: 3.05 s, total: 3min 13s
Wall time: 2min 46s





测试数据集上的评测结果相比验证数据集上的评估结果，性能有明显差距，模型训练中应该是出现了过拟合。
从相应的损失和梯度数据上也能看出来，在2885步时训练损失已经为0，梯度也变得非常小（0.00016）。
```json
{'loss': 0.0, 'grad_norm': 0.0001625923760002479, 'learning_rate': 8.870936304049726e-07, 'epoch': 9.43}
```