Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

实现了baichuan-7B模型的LoRA微调 #23

Open
hiyouga opened this issue Jun 15, 2023 · 99 comments
Open

实现了baichuan-7B模型的LoRA微调 #23

hiyouga opened this issue Jun 15, 2023 · 99 comments

Comments

@hiyouga
Copy link

hiyouga commented Jun 15, 2023

支持Alpaca等指令数据集的SFT和RLHF流程:https://github.com/hiyouga/LLaMA-Efficient-Tuning

LoRA微调可在单块3090 GPU上运行,同时支持QLoRA方法。(最低12G显存)

微调模型的 LoRA 权重:https://huggingface.co/hiyouga/baichuan-7b-sft

运行以下指令即可实现 Alpaca 数据集指令微调(instruction-tuning):

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path baichuan-7B模型文件夹路径或huggingface地址 \
    --do_train \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --lora_rank 8 \
    --lora_target W_pack \
    --output_dir alpaca_baichuan \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --learning_rate 5e-5 \
    --max_grad_norm 0.5 \
    --num_train_epochs 3.0 \
    --dev_ratio 0.01 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --plot_loss \
    --fp16

程序运行截图示例:
20230615160340

经过LoRA指令微调后的对话效果:
20230615164836

@Chenzongchao
Copy link

牛逼,好快啊

@SMR-S
Copy link

SMR-S commented Jun 15, 2023

牛逼

@70557dzqc
Copy link

大佬太强了

@GalSang17
Copy link

支持Alpaca等指令数据集的SFT和RLHF流程:https://github.com/hiyouga/LLaMA-Efficient-Tuning

运行以下指令即可实现 Alpaca 数据集指令微调(instruction-tuning):

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path baichuan-7B模型文件夹路径 \
    --do_train \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --lora_rank 8 \
    --lora_target W_pack \
    --output_dir alpaca_baichuan \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --learning_rate 5e-5 \
    --max_grad_norm 0.5 \
    --num_train_epochs 3.0 \
    --dev_ratio 0.01 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --plot_loss \
    --fp16

程序运行截图示例: 20230615160340

有微调数据集格式吗?

@hiyouga
Copy link
Author

hiyouga commented Jun 15, 2023

@GalSang17 项目自带了,点进data文件夹就可以看示例格式。

@GalSang17
Copy link

@GalSang17 项目自带了,点进data文件夹就可以看示例格式。

谢谢!

@suncheng-s
Copy link

赞👍🏻

@bytes-lost
Copy link

bytes-lost commented Jun 15, 2023

@hiyouga 没有出现这个错误吗?

./aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [51,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [52,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [53,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [54,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [55,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [56,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [57,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [58,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [59,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

@hiyouga
Copy link
Author

hiyouga commented Jun 15, 2023

@bytes-lost 完整的报错信息是什么?哪一行代码导致的?

@bytes-lost
Copy link

bytes-lost commented Jun 15, 2023

@hiyouga

[INFO|trainer.py:622] 2023-06-15 17:12:03,926 >> Using cuda_amp half precision backend
[INFO|trainer.py:1779] 2023-06-15 17:12:03,933 >> ***** Running training *****
[INFO|trainer.py:1780] 2023-06-15 17:12:03,934 >>   Num examples = 48,329
[INFO|trainer.py:1781] 2023-06-15 17:12:03,934 >>   Num Epochs = 3
[INFO|trainer.py:1782] 2023-06-15 17:12:03,934 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:1783] 2023-06-15 17:12:03,934 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1784] 2023-06-15 17:12:03,934 >>   Gradient Accumulation steps = 8
[INFO|trainer.py:1785] 2023-06-15 17:12:03,934 >>   Total optimization steps = 4,530
[INFO|trainer.py:1786] 2023-06-15 17:12:03,935 >>   Number of trainable parameters = 4,194,304

0%|          | 0/4530 [00:00<?, ?it/s]
  0%|          | 1/4530 [00:04<5:45:55,  4.58s/it]
  0%|          | 2/4530 [00:07<4:42:43,  3.75s/it]Traceback (most recent call last):
  File "/mnt/data/user/LLaMA-Efficient-Tuning/src/train_sft.py", line 97, in <module>
    main()
  File "/mnt/data/user/LLaMA-Efficient-Tuning/src/train_sft.py", line 69, in main
    train_result = trainer.train()
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train
    return inner_training_loop(
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step
    loss = self.compute_loss(model, inputs)
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 2767, in compute_loss
    outputs = model(**inputs)
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward
    return self.base_model(
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/baichuan-7b/modeling_baichuan.py", line 617, in forward
    outputs = self.model(
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/baichuan-7b/modeling_baichuan.py", line 501, in forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 89, in forward
    ctx.fwd_gpu_devices, ctx.fwd_gpu_states = get_device_states(*args)
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 50, in get_device_states
    fwd_gpu_states.append(torch.cuda.get_rng_state())
  File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/cuda/random.py", line 31, in get_rng_state
    return default_generator.get_state()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

@hiyouga
Copy link
Author

hiyouga commented Jun 15, 2023

@bytes-lost 应该是数组越界了,我在加载 tokenizer 时手动将 pad_token_id 设置为了 0,检查一下你那边有没有设置。输入序列中不能有大于等于 64000 的值。

@bytes-lost
Copy link

@hiyouga
我在train_sft.py这里加上了一行,但是还是一样的报错

model, tokenizer = load_pretrained(model_args, finetuning_args, training_args.do_train, stage="sft")
tokenizer.pad_token_id = 0  # 指定pad_token_id
dataset = preprocess_data(dataset, tokenizer, data_args, training_args, stage="sft")

@hiyouga
Copy link
Author

hiyouga commented Jun 15, 2023

@bytes-lost 看起来是 torch 的 checkpointing 过程出现了问题,可能和本地的 torch 以及 CUDA 环境有关,我这边测试了好几遍都没有问题。

@bytes-lost
Copy link

@bytes-lost 看起来是 torch 的 checkpointing 过程出现了问题,可能和本地的 torch 以及 CUDA 环境有关,我这边测试了好几遍都没有问题。

好的,我重新创建环境测测看,torch=2.0.1版本是可以的吗?

@gebilaoman
Copy link

gebilaoman commented Jun 15, 2023

我这边一直在自己对话,而且“你是谁”,也不是需要的答案,微调代码跟上面提供的一模一样的呢
image

@hiyouga
Copy link
Author

hiyouga commented Jun 15, 2023

@gebilaoman 用项目自带 cli_demo 启动时请添加 --prompt_template ziya 参数

@Xin-20
Copy link

Xin-20 commented Jun 15, 2023

好快的速度,好猛

@shibing624
Copy link

shibing624 commented Jun 15, 2023

我这边也实现了baichuan-7b 的lora微调,baichuan模型的结构跟llama一致,它的SFT微调方法跟bloom/llama基本一致的。

支持baichuan-7b微调项目地址:https://github.com/shibing624/MedicalGPT

该项目还实现了GPT模型训练,包括二次预训练、有监督微调、奖励建模、强化学习训练。

运行以下指令即可实现 belle 数据集指令微调(instruction-tuning):

python3 supervised_finetuning.py \
    --model_type auto \
    --model_name_or_path baichuan-inc/baichuan-7B \
    --train_file_dir ./data/finetune \
    --validation_file_dir ./data/finetune \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 1 \
    --do_train \
    --do_eval \
    --use_peft True \
    --max_train_samples 1000 \
    --max_eval_samples 10 \
    --num_train_epochs 1 \
    --learning_rate 2e-5 \
    --warmup_ratio 0.05 \
    --weight_decay 0.05 \
    --logging_strategy steps \
    --logging_steps 10 \
    --eval_steps 50 \
    --evaluation_strategy steps \
    --save_steps 500 \
    --save_strategy steps \
    --save_total_limit 3 \
    --gradient_accumulation_steps 1 \
    --preprocessing_num_workers 1 \
    --max_source_length 256 \
    --max_target_length 256 \
    --output_dir outputs-sft-baichuan-v1 \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --target_modules all \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --fp16 \
    --torch_dtype float16 \
    --device_map auto \
    --report_to tensorboard \
    --ddp_find_unused_parameters False \
    --gradient_checkpointing True

运行过程截图(loss 稳定下降):
Xnip2023-06-15_20-57-19

Xnip2023-06-15_20-57-34

欢迎大家测试,验证效果。

@XiaofengZHOU
Copy link

@bytes-lost 看起来是 torch 的 checkpointing 过程出现了问题,可能和本地的 torch 以及 CUDA 环境有关,我这边测试了好几遍都没有问题。

好的,我重新创建环境测测看,torch=2.0.1版本是可以的吗?

我同样的问题tokenizer.pad_token_id = 0 之后就可以了

@weicheng59
Copy link

image
运行报这个错误,是要改模型里的config.json吗?

@suncheng-s
Copy link

image 运行报这个错误,是要改模型里的config.json吗?

不是 ChatGLM 的代码,是 LLAMA 那一份。https://github.com/hiyouga/LLaMA-Efficient-Tuning

@usun1997
Copy link

能实现多轮对话的微调吗,具体多轮对话的数据格式能不能演示一下谢谢

@hiyouga
Copy link
Author

hiyouga commented Jun 16, 2023

@cristianohello
Copy link

@hiyouga
你好,项目自带 cli_demo 启动时,为什么要添加 --prompt_template ziya 参数?
为什么是ziya?不应该是baichuan吗

@hiyouga
Copy link
Author

hiyouga commented Jun 16, 2023

@cristianohello 因为我微调时候用的是 ziya 的 template😁
@usun1997 正确。

@cristianohello
Copy link

@hiyouga
你好,感谢回复。
又遇到连续自问自答的情况,如何解决?

@hiyouga
Copy link
Author

hiyouga commented Jun 16, 2023

@cristianohello 目前的 SFT 模型没有进行多轮对话训练,所以多轮时候偶尔会出现问题。

@grantchenhuarong
Copy link

好吧,对比了些别的指引,启动加了参数--lora_target W_pack 就可以正常启动了。。。

@xiaoningli92
Copy link

我使用自有的领域数据集+Alpaca数据集对baichuan-7B进行lora微调,进行了多次尝试每次都会把loss跑飞,自有测试集的预测结果全部为空。尝试了多个learning_rate最终loss都会爆炸,相同参数再微调llama时,loss收敛且结果正常。
image

python finetune.py \
    --base_model '/data1/models/baichuan-7B' \
    --train_data_path '/train_data/alpaca_plus_data_rewrite.json' \
    --eval_data_path '/eval_data/test2json_case.json' \
    --output_dir '/data1/models/baichuan_lora_0627' \
    --batch_size 128 \
    --micro_batch_size 16 \
    --num_epochs 2 \
    --learning_rate 5e-5 \
    --cutoff_len 512 \
    --val_set_size 0 \
    --lora_r 16 \
    --lora_alpha 32 \
    --lora_dropout 0.05 \
    --lora_target_modules '[q_proj,v_proj,o_proj,k_proj]' \
    --train_on_inputs \
    --group_by_length

@hiyouga
Copy link
Author

hiyouga commented Jun 28, 2023

@xiaoningli92 你用的是谁的代码?

@xiaoningli92
Copy link

@xiaoningli92 你用的是谁的代码?

Alpaca Lora的代码 https://github.com/tloen/alpaca-lora

@hiyouga
Copy link
Author

hiyouga commented Jun 28, 2023

@xiaoningli92
Copy link

@xiaoningli92 用这个试试 https://github.com/hiyouga/LLaMA-Efficient-Tuning

跑成功了,但是没有diff出哪里的问题,想问下Baichuan 的Lora微调相比Llama,需要做哪些特殊配置吗?

@heshuguo
Copy link

heshuguo commented Jul 7, 2023

@hiyouga 微调完后,推理后出现重复回答。请问大概知道什么原因吗?
问题:numbers由几个字母组成?
回答:
numbers由3个字母组成。
numbers是数字的英文名称,它由3个字母n、o、u组成。
n是数字1的英文单词,u是数字2的英文单词,o是数字3的英文单词。
所以,numbers由3个字母n、o、u组成。
numbers是数字的英文名称,它由3个字母n、o、u组成。
n是数字1的英文单词,u是数字2的英文单词,o是数字3的英文单词。
所以,numbers由3个字母n、o、u组成。

@PageIV
Copy link

PageIV commented Jul 7, 2023

可以看下你微调的 loss 曲线吗?

@hiyouga
Copy link
Author

hiyouga commented Jul 7, 2023

@mabin
Copy link

mabin commented Jul 12, 2023

@bytes-lost 看起来是 torch 的 checkpointing 过程出现了问题,可能和本地的 torch 以及 CUDA 环境有关,我这边测试了好几遍都没有问题。

好的,我重新创建环境测测看,torch=2.0.1版本是可以的吗?


@bytes-lost 您好,请问这个问题解决了吗? 我这边也是碰到同样的问题,在推理环节报错数组越界

@22zhangqian
Copy link

我也想问这个问题,能否在微调之后利用最新得到的模型参数进行二次微调,有大佬指教吗?

@22zhangqian
Copy link

好吧,对比了些别的指引,启动加了参数--lora_target W_pack 就可以正常启动了。。。

你好,请问一下为什么我多卡微调会报错?RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)
单卡是可以运行的,双卡就报错了:
CUDA_VISIBLE_DEVICES=2,3 python src/train_bash.py
--stage sft
--model_name_or_path ../Baichuan-7B
--do_train
--dataset data-700
--lora_target W_pack
--finetuning_type lora
--output_dir output/baichuan_7B_700_5000
--overwrite_cache
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--max_steps 5000
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 3.0
--plot_loss
--fp16

作者说多卡跑需要使用deepspeed,但是具体我不太明白,可以指教一下吗?

@warkcod
Copy link

warkcod commented Aug 19, 2023

支持Alpaca等指令数据集的SFT和RLHF流程:https://github.com/hiyouga/LLaMA-Efficient-Tuning

LoRA微调可在单块3090 GPU上运行,同时支持QLoRA方法。(最低12G显存)

微调模型的 LoRA 权重:https://huggingface.co/hiyouga/baichuan-7b-sft

运行以下指令即可实现 Alpaca 数据集指令微调(instruction-tuning):

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path baichuan-7B模型文件夹路径或huggingface地址 \
    --do_train \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --lora_rank 8 \
    --lora_target W_pack \
    --output_dir alpaca_baichuan \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --learning_rate 5e-5 \
    --max_grad_norm 0.5 \
    --num_train_epochs 3.0 \
    --dev_ratio 0.01 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --plot_loss \
    --fp16

程序运行截图示例: 20230615160340

经过LoRA指令微调后的对话效果: 20230615164836

lib/python3.10/site-packages/transformers/hf_argparser.py", line 347, in parse_args_into_dataclasses
raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--dev_ratio', '0.01']

出现这个--dev_ratio 的错误,这是怎么回事? @hiyouga

@hiyouga
Copy link
Author

hiyouga commented Aug 19, 2023

@warkcod 改为 --val_size

@warkcod
Copy link

warkcod commented Aug 19, 2023

@warkcod 改为 --val_size

可以了,谢谢

@Elllllllvin
Copy link

支持Alpaca等指令数据集的SFT和RLHF流程:https://github.com/hiyouga/LLaMA-Efficient-Tuning

LoRA微调可在单块3090 GPU上运行,同时支持QLoRA方法。(最低12G显存)

微调模型的 LoRA 权重:https://huggingface.co/hiyouga/baichuan-7b-sft

运行以下指令即可实现 Alpaca 数据集指令微调(instruction-tuning):

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path baichuan-7B模型文件夹路径或huggingface地址 \
    --do_train \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --lora_rank 8 \
    --lora_target W_pack \
    --output_dir alpaca_baichuan \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --learning_rate 5e-5 \
    --max_grad_norm 0.5 \
    --num_train_epochs 3.0 \
    --dev_ratio 0.01 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --plot_loss \
    --fp16

程序运行截图示例: 20230615160340

经过LoRA指令微调后的对话效果: 20230615164836

您好,请问出现
[INFO|tokenization_utils_base.py:2041] 2023-10-12 12:26:39,469 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2041] 2023-10-12 12:26:39,470 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2041] 2023-10-12 12:26:39,470 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2041] 2023-10-12 12:26:39,470 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2041] 2023-10-12 12:26:39,470 >> loading file tokenizer.json
Traceback (most recent call last):
File "/home/jovyan/LLaMA-Efficient-Tuning/src/train_bash.py", line 14, in
main()
File "/home/jovyan/LLaMA-Efficient-Tuning/src/train_bash.py", line 5, in main
run_exp()
.......
File "/opt/conda/envs/llama_etuning/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 366, in init
self._add_tokens(self.all_special_tokens_extended, special_tokens=True)
File "/opt/conda/envs/llama_etuning/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 462, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/baichuan-7b/tokenization_baichuan.py", line 108, in get_vocab
vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/baichuan-7b/tokenization_baichuan.py", line 104, in vocab_size
return self.sp_model.get_piece_size()
AttributeError: 'BaiChuanTokenizer' object has no attribute 'sp_model'
这个错误应该怎么解决呢? @hiyouga

@hiyouga
Copy link
Author

hiyouga commented Oct 12, 2023

@Elllllllvin use transformers==4.33.2

@Elllllllvin
Copy link

@Elllllllvin use transformers==4.33.2

谢谢这个问题解决了,但是显存爆了。torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 820.00 MiB. GPU 0 has a total capacty of 31.74 GiB of which 33.38 MiB is free. Process 10596 has 31.70 GiB memory in use. Of the allocated memory 30.19 GiB is allocated by PyTorch, and 638.01 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%|▏ | 8/4530 [01:12<11:24:57, 9.09s/it]
请问不是说12G就够了吗?有些迷惑,希望大佬解惑。

我使用的指令为:
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py
--stage sft
--model_name_or_path /home/jovyan/models/baichuan-7b
--do_train
--dataset alpaca_gpt4_zh
--template baichuan
--finetuning_type lora
--lora_rank 8
--lora_target W_pack
--val_size 0.01
--output_dir alpaca_baichuan
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 100
--eval_steps 100
--learning_rate 5e-5
--max_grad_norm 0.5
--num_train_epochs 3.0
--evaluation_strategy steps
--load_best_model_at_end
--plot_loss
--fp16

@grantchenhuarong
Copy link

grantchenhuarong commented Oct 18, 2023 via email

@hiyouga
Copy link
Author

hiyouga commented Oct 19, 2023

@Elllllllvin use transformers==4.33.2

谢谢这个问题解决了,但是显存爆了。torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 820.00 MiB. GPU 0 has a total capacty of 31.74 GiB of which 33.38 MiB is free. Process 10596 has 31.70 GiB memory in use. Of the allocated memory 30.19 GiB is allocated by PyTorch, and 638.01 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%|▏ | 8/4530 [01:12<11:24:57, 9.09s/it] 请问不是说12G就够了吗?有些迷惑,希望大佬解惑。

我使用的指令为: CUDA_VISIBLE_DEVICES=0 python src/train_bash.py --stage sft --model_name_or_path /home/jovyan/models/baichuan-7b --do_train --dataset alpaca_gpt4_zh --template baichuan --finetuning_type lora --lora_rank 8 --lora_target W_pack --val_size 0.01 --output_dir alpaca_baichuan --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 10 --save_steps 100 --eval_steps 100 --learning_rate 5e-5 --max_grad_norm 0.5 --num_train_epochs 3.0 --evaluation_strategy steps --load_best_model_at_end --plot_loss --fp16

12G 是使用 4bit 量化后的占用量

@promisecc
Copy link

@hiyouga 微调完后,推理后出现重复回答。请问大概知道什么原因吗? 问题:numbers由几个字母组成? 回答: numbers由3个字母组成。 numbers是数字的英文名称,它由3个字母n、o、u组成。 n是数字1的英文单词,u是数字2的英文单词,o是数字3的英文单词。 所以,numbers由3个字母n、o、u组成。 numbers是数字的英文名称,它由3个字母n、o、u组成。 n是数字1的英文单词,u是数字2的英文单词,o是数字3的英文单词。 所以,numbers由3个字母n、o、u组成。

推理的时候是不是没有用对应的模版?

@xyu85
Copy link

xyu85 commented Dec 13, 2023

@Elllllllvin use transformers==4.33.2

谢谢这个问题解决了,但是显存爆了。torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 820.00 MiB. GPU 0 has a total capacty of 31.74 GiB of which 33.38 MiB is free. Process 10596 has 31.70 GiB memory in use. Of the allocated memory 30.19 GiB is allocated by PyTorch, and 638.01 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%|▏ | 8/4530 [01:12<11:24:57, 9.09s/it] 请问不是说12G就够了吗?有些迷惑,希望大佬解惑。

我使用的指令为: CUDA_VISIBLE_DEVICES=0 python src/train_bash.py --stage sft --model_name_or_path /home/jovyan/models/baichuan-7b --do_train --dataset alpaca_gpt4_zh --template baichuan --finetuning_type lora --lora_rank 8 --lora_target W_pack --val_size 0.01 --output_dir alpaca_baichuan --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 10 --save_steps 100 --eval_steps 100 --learning_rate 5e-5 --max_grad_norm 0.5 --num_train_epochs 3.0 --evaluation_strategy steps --load_best_model_at_end --plot_loss --fp16

hi, have you fixed this problem?
I get the same problem.

@Elllllllvin
Copy link

@Elllllllvin use transformers==4.33.2

谢谢这个问题解决了,但是显存爆了。torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 820.00 MiB. GPU 0 has a total capacty of 31.74 GiB of which 33.38 MiB is free. Process 10596 has 31.70 GiB memory in use. Of the allocated memory 30.19 GiB is allocated by PyTorch, and 638.01 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%|▏ | 8/4530 [01:12<11:24:57, 9.09s/it] 请问不是说12G就够了吗?有些迷惑,希望大佬解惑。
我使用的指令为: CUDA_VISIBLE_DEVICES=0 python src/train_bash.py --stage sft --model_name_or_path /home/jovyan/models/baichuan-7b --do_train --dataset alpaca_gpt4_zh --template baichuan --finetuning_type lora --lora_rank 8 --lora_target W_pack --val_size 0.01 --output_dir alpaca_baichuan --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 10 --save_steps 100 --eval_steps 100 --learning_rate 5e-5 --max_grad_norm 0.5 --num_train_epochs 3.0 --evaluation_strategy steps --load_best_model_at_end --plot_loss --fp16

hi, have you fixed this problem? I get the same problem.
If you use single GPU, you can try adding :
--quantization_bit 4 \
to conduct 4-bit QLoRA fine-tune ,
If you have multiple GPU , you can try deepspeed to conduct distributed training.

@zhangyun-w
Copy link

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py
--do_train
--model_name_or_path /home/Baichuan-13B/Baichuan-13B-chat
--template baichuan
--dataset alpaca_gpt4_zh
--output_dir baichuan_lora_checkpoint
--max_source_length 24
--max_target_length 48
--per_device_train_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 10000
--learning_rate 5e-5
--num_train_epochs 1.0
--plot_loss
--fp16
--lora_target W_pack
--lora_rank 8
--padding_side right
--quantization_bit 4
报这个错怎么解决:
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--max_source_length', '24', '--max_target_length', '48', '--padding_side', 'right']

@hiyouga
Copy link
Author

hiyouga commented Dec 26, 2023

@zhangyun-w 去掉这三个参数,改为 --cutoff_len 512

@zhangyun-w
Copy link

@zhangyun-w去掉这三个参数,改为--cutoff_len 512

谢谢,明天试试

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py
--do_train
--model_name_or_path /home/Baichuan-13B/Baichuan-13B-chat
--template baichuan
--dataset alpaca_gpt4_zh
--output_dir baichuan_lora_checkpoint
--cutoff_len 512
--per_device_train_batch_size 2
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 5000
--learning_rate 5e-5
--num_train_epochs 1.0
--plot_loss
--fp16
--lora_target W_pack
--lora_rank 8 \

@zhangyun-w
Copy link

zhangyun-w commented Dec 27, 2023

@hiyouga 这个怎么解决呢?
(baichuan) [root@test LLaMA-Efficient-Tuning]# CUDA_VISIBLE_DEVICES=0 python src/train_bash.py --do_train --model_name_or_path /home/Baichuan-13B/Baichuan-13B-chat --template baichuan --dataset alpaca_gpt4_zh --output_dir baichuan_lora_checkpoint --cutoff_len 512 --per_device_train_batch_size 2 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --logging_steps 10 --save_steps 5000 --learning_rate 5e-5 --num_train_epochs 1.0 --plot_loss --fp16 --lora_target W_pack --lora_rank 8
Traceback (most recent call last):
File "/home/LLaMA-Efficient-Tuning/src/train_bash.py", line 1, in
from llmtuner import run_exp
File "/home/LLaMA-Efficient-Tuning/src/llmtuner/init.py", line 3, in
from llmtuner.api import create_app
File "/home/LLaMA-Efficient-Tuning/src/llmtuner/api/init.py", line 1, in
from llmtuner.api.app import create_app
File "/home/LLaMA-Efficient-Tuning/src/llmtuner/api/app.py", line 22, in
from llmtuner.chat import ChatModel
File "/home/LLaMA-Efficient-Tuning/src/llmtuner/chat/init.py", line 1, in
from llmtuner.chat.chat_model import ChatModel
File "/home/LLaMA-Efficient-Tuning/src/llmtuner/chat/chat_model.py", line 8, in
from llmtuner.data.template import get_template_and_fix_tokenizer
File "/home/LLaMA-Efficient-Tuning/src/llmtuner/data/init.py", line 1, in
from llmtuner.data.loader import get_dataset
File "/home/LLaMA-Efficient-Tuning/src/llmtuner/data/loader.py", line 4, in
from datasets import concatenate_datasets, interleave_datasets, load_dataset, load_from_disk
File "/usr/local/bin/miniconda3/envs/baichuan/lib/python3.10/site-packages/datasets/init.py", line 22, in
from .arrow_dataset import Dataset
File "/usr/local/bin/miniconda3/envs/baichuan/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 66, in
from .arrow_reader import ArrowReader
File "/usr/local/bin/miniconda3/envs/baichuan/lib/python3.10/site-packages/datasets/arrow_reader.py", line 30, in
from .download.download_config import DownloadConfig
File "/usr/local/bin/miniconda3/envs/baichuan/lib/python3.10/site-packages/datasets/download/init.py", line 9, in
from .download_manager import DownloadManager, DownloadMode
File "/usr/local/bin/miniconda3/envs/baichuan/lib/python3.10/site-packages/datasets/download/download_manager.py", line 31, in
from ..utils import tqdm as hf_tqdm
File "/usr/local/bin/miniconda3/envs/baichuan/lib/python3.10/site-packages/datasets/utils/init.py", line 19, in
from .info_utils import VerificationMode
File "/usr/local/bin/miniconda3/envs/baichuan/lib/python3.10/site-packages/datasets/utils/info_utils.py", line 5, in
from huggingface_hub.utils import insecure_hashlib
ImportError: cannot import name 'insecure_hashlib' from 'huggingface_hub.utils' (/usr/local/bin/miniconda3/envs/baichuan/lib/python3.10/site-packages/huggingface_hub/utils/init.py)

@hiyouga
Copy link
Author

hiyouga commented Dec 27, 2023

@zhangyun-w pip install -U huggingface_hub

@YQCW
Copy link

YQCW commented Feb 6, 2024

@hiyouga 你好,我使用transformers==4.33.2的时候会报下面的错误,应该是需要升级transformers,
image
提升了transformers的版本号到4.37.2之后Baichuan-7B那里又会报下面的错误,
image

我试了4.34,4.35和4.36,都会报错。请问这个问题该怎么解决呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests