We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataset=Bilingual-code-v10-en output_dir=outputs/Bilingual-code-v10-en-LLaMA3-8B-5epoch ds_config=configs/deepspeed/ds_config_zero2.json #zero2deepspeed model_name_or_path=/home/LLM/LLaMA3-8B template=自定义的模板 date +"%Y-%m-%d %H:%M:%S" torchrun --nnodes ${NODES} --nproc_per_node ${NUM_GPUS} --node_rank=${NODE_RANK} --master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} src/train_bash.py --deepspeed ${ds_config} --stage sft --do_train --finetuning_type lora --lora_target all --lora_rank 64 --model_name_or_path ${model_name_or_path} --template ${template} --dataset ${dataset} --output_dir ${output_dir} --per_device_train_batch_size 1 --gradient_accumulation_steps 64 --lr_scheduler_type cosine --logging_steps 2 --save_strategy epoch --learning_rate 3e-4 --num_train_epochs 5.0 --warmup_ratio 0.1 --plot_loss --fp16 --flash_attn --seed 42 --ddp_timeout 1800000 --dataloader_num_workers 1 --cutoff_len 2048 >> $OUTPUT_LOG 2>&1 --quantization_bit 4
数据集总数量应该是94822 并且检查了全部数据集token长度全部小于cutoff的值,但是训练的时候只加载了其中的60646条 数据集的格式如第一张图,正式训练加载的数据集数量如第二张图,并且60646条训练之后的模型全部输出为空 微调之后的lora权重文件如图三,lora合并之后的全部模型权重如图四
No response
The text was updated successfully, but these errors were encountered:
加载数量没有实际多通常是因为数据集中包含不规范的样本
Sorry, something went wrong.
好的 感谢了 我明白了原因 数据集只会加载prompt和output同时不为“”的数据 我的某些输出数据中含有“” 因此导致数据量减少了 感谢
No branches or pull requests
Reminder
Reproduction
dataset=Bilingual-code-v10-en
output_dir=outputs/Bilingual-code-v10-en-LLaMA3-8B-5epoch
ds_config=configs/deepspeed/ds_config_zero2.json #zero2deepspeed
model_name_or_path=/home/LLM/LLaMA3-8B
template=自定义的模板
date +"%Y-%m-%d %H:%M:%S"
torchrun --nnodes ${NODES}
--nproc_per_node ${NUM_GPUS}
--node_rank=${NODE_RANK}
--master_addr=${MASTER_ADDR}
--master_port=${MASTER_PORT}
src/train_bash.py
--deepspeed ${ds_config}
--stage sft
--do_train
--finetuning_type lora
--lora_target all
--lora_rank 64
--model_name_or_path ${model_name_or_path}
--template ${template}
--dataset ${dataset}
--output_dir ${output_dir}
--per_device_train_batch_size 1
--gradient_accumulation_steps 64
--lr_scheduler_type cosine
--logging_steps 2
--save_strategy epoch
--learning_rate 3e-4
--num_train_epochs 5.0
--warmup_ratio 0.1
--plot_loss
--fp16
--flash_attn
--seed 42
--ddp_timeout 1800000
--dataloader_num_workers 1
--cutoff_len 2048 >> $OUTPUT_LOG 2>&1
--quantization_bit 4
Expected behavior
数据集总数量应该是94822 并且检查了全部数据集token长度全部小于cutoff的值,但是训练的时候只加载了其中的60646条
数据集的格式如第一张图,正式训练加载的数据集数量如第二张图,并且60646条训练之后的模型全部输出为空
微调之后的lora权重文件如图三,lora合并之后的全部模型权重如图四
System Info
No response
Others
No response
The text was updated successfully, but these errors were encountered: